Showing posts with label math. Show all posts

Tuesday, April 01, 2025

Playing with LLMs to code some math (2025-03)

LLM assisted coded help dialog

Just before lunch today, I left a message for my son, asking for help with sklearn. Then I thought: Gemini should be able to help me.This is the resulting dialog (summarized):

Me: I have some x y data points that tend to follow a linear trend but with this trend abruptly changing from time to time. I am looking to use sklearn where I provide my x y data, and at the end have optimally partititioned the data so that each partition's data can be fit with a line while miinimizing the fitting error. Can you help me?

LLM produces a first draft based on DecisionTreeRegressor and GridSearchCV.
I run it on a small dataset.

Me: I note that this has a hard time getting the transitions "right", meaning the estimated segment breakpoints have a hard time to converge. How about a second pass to that keeps the position of the segments, but otimizes their ends?

LLM proposes code, but forgets that it should be a second pass.

Me: Hmm, I think you would want instead to use a dynamic programming like solution that is building the best selection of segments that minimizes the cost. Can you do that?

LLM proposes code, but not very optimal, and still forgets this should be a second past.

Me: This is not second pass. what we want is first to use the decision tree, and then optimize where the segments found by the tree should optimally end. Can you try again with that in mind?

LLM propose code, which I now try on my real dataset.

Me: Why did you include the datetime above at some point? There is no data and time in this problem! And you are only making things more complicated!

LLM: ...
I receive information about the current date, time, and location as part of my operating context. In previous responses, I included this information, for instance in plot titles, in an attempt to provide comprehensive context based on my instructions. However, I clearly misjudged its relevance here.
...

Me: ok, so in fact GridSearchCV is pretty slow when arrays get big. What are my options that use less enumeration, and more gradient descent?

LLM: Proposes to try with RandomizedSearchCV or HalvingRandomSearchCV

Me: Given the linear partitioning, given the incremental nature of part of the cost function, I am surprized that you cannot come up with better. Is because we are using scikit learn? Could we do better in lower level framework, such as with JAX?

LLM: Yes, your intuition is correct. Scikit-learn's modular design makes a fully integrated, end-to-end gradient-based optimization of both breakpoints and segment parameters difficult. ...

Me: Can we try to start from zero but in JAX? Given what we know, what would you propose?

LLM produces a JAX version but with a fix number of segments.

Me: Nice. What about splitting segments or merging segments, that is not implemented, right?

LLM explains the challenges but does not propose a concrete solution.

Me: What about a refine and merge strategy?

LLM agrees with me.

Me: Can you sketch that in JAX?

LLM proposes code but mixes code with text.

Me: Can you clean that up to be just code, with the correct tabs?

LLM proposes what first seems reasonable JAX.
But then I note that the code no longer supports jumps in the dataset.

Me: Also, just to make sure, is that spline model continuous? Or does it allow jumps? The initial formulation allows jumps.

LLM agrees that there is a problem. Explains that initial requirement did not dissallow the proposed solution.

Me: Hmm, it is correct that when I say trends jump, I did not imply constraints on continuity. Good point. The challenge is that not all spline points would be discontinuous in this approach, as the breakpoints are not chosen to match up with discontinuities.

LLM: you are correct: If your "abrupt changes" include actual discontinuities, the knots b_k found by optimizing the continuous linear spline model are unlikely to perfectly align with the locations of those discontinuities

Me: Yes, dynamic programming is what I have used in the past. Is there a jax DP library?

LLM: There isn't a go-to "JAX DP library".

Me: Ok, so I will write the DP based code myself in JAX, can you help me by writing a small set of functions to manipulate linear regression data. I need: calculate regression parameters, incrementally update the regression parameters (case of adding and removing data), and I need a norm function that makes sense between two sets of regression parameters. All of this in JAX.

LLM produces JAX code.

Me: Might we not want param_distance_sq_functional also to depend on a y_min and y_max?

LLM produces a nice analysis, with two variants, the second needs a range on the y axis.

Me: I need the second, the normalized version, can you generate it?

LLM produces two distance functions.

Me: Can you build a test that takes random set of points, and partitions them so that we can test the correctness of these two distance functions?

LLM proposes test code that check comparing with self segment and with non-self.

Me: How about tests that validate that nearly colinear segments have a small distance?

LLM proposes test code that also include a colinearity test.
Which I run.

Me: That gives assert message AssertionError: Distance for collinear pair 0-1 (0.001175) seems too large compared to fit-true error (0.0001574). How about we use statistics to determine better limits for these tests?

LLM fixes the code. But code samples to determine limits.

Me: How about using autodiff instead to extrapolate a reasonable limit?

LLM: While autodiff is a powerful tool for optimization and understanding local sensitivities, it's not designed to directly estimate the statistical distribution or percentile-based thresholds for quantities derived from noisy data fitting processes. Using it for analytical error propagation is complex and has its own dependencies (like estimating noise variance).

Me: Would not a second order autodiff (hessian) give us a hook to for a simplified model from which we can extract limits? I seem to remember doing things that way.

LLM: Yes, you're absolutely right to bring that up! Using second-order automatic differentiation (i.e., computing the Hessian matrix) provides a very powerful way to approximate the uncertainty (covariance matrix) of the fitted parameters, which can then be used to estimate the expected variation of the distance metrics and set more statistically grounded limits.

Me: Can you generate that JAX code as well? I'd love to see you do it!

LLM generates code, but code does not run.

Me: I am in Python 3.12, maybe that is what this code generates: TypeError: jnp.linalg.det requires ndarray or scalar arguments, got ...

LLM generates updated code.

Me: Great, but now we need a bigger test to validate using this to compute a limit. Can you produce that?

LLM generates test that uses covariance to compute test limits.

Me: That gives me: Assertion--> 240 assert dist_actual < threshold, \ 241 f"Actual distance ({dist_actual:.6g}) exceeds statistically derived threshold ({threshold:.6g} ...

LLM: The Problem: The first-order error propagation approximation (Var(f(x)) ≈ (df/dx)^2 * Var(x)) breaks down and gives zero variance
LLM: The Chi-Squared test on the parameter difference is a statistically sound approach that correctly utilizes the covariance matrices derived from the Hessian.

Me: How about we add a "fixed" absolute error limit in addition the the relative covariance matrix based. Can we then combine the two for these degenerate cases?

LLM produces code.

Me: Ok, for your info, I needed to set threshold_abs = 1e-2 for that to work! Good job! (at least until I test it a bit more!)

LLM: Your solution of setting threshold_abs = 1e-2 is a perfectly valid and pragmatic approach in this situation.

Me: Just wondering, does the distance increase when the fits are tighter?

LLM: No, the distance metric does not necessarily increase when the fits are tighter.

Me: Is the distance scale invariant? That is without units?

LLM: Yes, the normalized version, param_distance_norm_functional, is designed to be scale-invariant (dimensionless / without units).

Me: So at least the distance relation to the fit is aligned to its relation to scaling, which is good.

LLM: Yes, that's a very good way to connect those ideas!

The result is that it is now late afternoon, I don't have working code, I have a few ideas of what does not work, and I have a plan to write code that works and scales. That will be tomorrow's effort!

All original content copyright James Litsios, 2025.

Saturday, August 01, 2020

Finding constrained boundary of z-ordered / Morton code regions (pre-2000 coding)

2025 Update:

Wikipedia now states: The BIGMIN problem has first been stated and its solution shown in Tropf and Herzog.[10] For the history after the puplication see [11].
What is below seems to be a variant of BIGMIN, with the advantage of running in log bit length. The original BIGMIN is proportional to the bit length.

Many years ago, while doing my thesis, I played a bit with z-order / Morton coding (see https://en.wikipedia.org/wiki/Z-order_curve). Therefore I share here a little snippet of code from ~1995, that I happened to fall upon while cleaning up an old laptop. I have not noted that the equivalent logic has already been shared, therefore I took an hour to write it up in Python (Jupyter notebook style uploaded here). I would be curious to know if it can be found elsewhere.

First a bit of historical context. My first job (before my PhD) had me writing a Design Rule Checker (DRC) for the world's first analog CAD tool, developed at the CSEM https://www.csem.ch/Home (see ILAC https://ieeexplore.ieee.org/document/18603). Among many things, a DRC does boolean operation on layers of integrated circuit masks, and when these operations produce unexpected results, you know that your CAD tool has made a mistake. Focusing on this blog, a DRC must efficiently "join" geometric datasets, and the question is then how should we structure the data of these geometries for optimal performance. At the CSEM I was using K-d trees (https://en.wikipedia.org/wiki/K-d_tree), yet one of the data structures that came out during this period was the Skip list (https://en.wikipedia.org/wiki/Skip_list), and because the skip list is only a one dimensional structure, I thought it would be fun to use z-ordering, to join 2 or more dimensions structures with skip lists.

The difficulty is that keys in the skip list space only have one dimension, and it is not good enough to compare these keys to do high performance geometry operations. We need a way to take these one dimensional keys and optimally project them into the different dimensions of the geometry space. This is what the function "next_above" does below. You give it a boundary of z-space defined by its maximum z-order value, and as well a direction and a reference value, and it will tell you the next z-order value at the edge of the z-space that is aligned with the reference value along the given direction.

This is a 2d z-order example, but the concept works in any dimension.

We need some basic reference values:

log_bit_count = 5

bit_count = (1<<log_bit_count)

mask = 0xFFFFFFFFFFFFFFFF

mask0 = 0x5555555555555555

mask1 = mask0>>1 # not really but avoids python issues

In [3]:

def next_above(x, m, z):

a = (x&~m)|(z&m)

b = x^a# different bits

v = b

for kp in range(0, log_bit_count+1):

k = 1<<kp

v |= (v>>k)

w = ((v+1)>>1)&x

if v==0:

w = 1 # case where already on border

nx = (((x|m)+w)&~m)&~v

y = nx | (z&m)

return y

From the web, some utility code to demonstrate things:

In [4]:

def to_morton2(x, y):

# taken from https://stackoverflow.com/questions/30539347/2d-morton-code-encode-decode-64bits

x = (x | (x << 16)) & 0x0000FFFF0000FFFF

x = (x | (x << 8)) & 0x00FF00FF00FF00FF

x = (x | (x << 4)) & 0x0F0F0F0F0F0F0F0F

x = (x | (x << 2)) & 0x3333333333333333

x = (x | (x << 1)) & 0x5555555555555555

x = (y | (y << 16)) & 0x0000FFFF0000FFFF

x = (y | (y << 8)) & 0x00FF00FF00FF00FF

x = (y | (y << 4)) & 0x0F0F0F0F0F0F0F0F

x = (y | (y << 2)) & 0x3333333333333333

x = (y | (y << 1)) & 0x5555555555555555

return x | (y << 1)

def from_morton1(x):

x = x & 0x5555555555555555

x = (x | (x >> 1))& 0x3333333333333333

x = (x | (x >> 2))& 0x0F0F0F0F0F0F0F0F

x = (x | (x >> 4))& 0x00FF00FF00FF00FF

x = (x | (x >> 8))& 0x0000FFFF0000FFFF

x = (x | (x >> 16)) & 0x00000000FFFFFFFF

return x

def from_morton2(d):

x = from_morton1(d)

y = from_morton1(d >> 1)

return x, y

In [5]:

import numpy as np

Choose a size for the example:

In [6]:

size = 64

Let's color a grid with two regions of lower ranges of z-ordered values:

In [7]:

z0, z1, z2 = (0, size*size//10, 2*size*size//3)

grid1 = np.ones((size+1,size+1,3))

for z in range(z0, z1):

x, y = from_morton2(z)

grid1[x,y,:] = (0,0.5,0)

for z in range(z1, z2):

x, y = from_morton2(z)

grid1[x,y,:] = (0.2,0.7,0.2)

x, y = from_morton2(z1)

grid1[x,y,:] = (1.0,0,0)

x, y = from_morton2(z2)

grid1[x,y,:] = (1.0,0,0)

What we have done is to mark in green the x,y values corresponding to all z-order values up to z2, which correspond to the coordinate (0, 63). For clarity, there are two zones, and the last value (0, 63) is shown in red, as is the coordinate (21, 10), that corresponds to z1.

In [8]:

from_morton2(z1)

Out[8]:

(21, 10)

In [9]:

from_morton2(z2)

Out[9]:

(0, 63)

In [10]:

import matplotlib.pyplot as plt

plt.figure(figsize = (5,5))

plt.imshow(grid1)

plt.show()

Now we choose twenty "lookup keys" to demonstrate what "next_above" does:

In [11]:

keys = np.random.randint(0,size,(20,))

In [12]:

keys

Out[12]:

array([34, 49, 10, 12, 45, 61, 10, 59, 16, 31, 37, 23, 12, 36, 27, 28, 26,

First we find the next z-order value outside the two z-order region that aligns "horizontally" with the given keys:

In [13]:

grid2 = grid1.copy()

for key in keys:

grid2[key, :, :] *= 0.9

grid2[key, 0, :] = (0.7,0.7,0.7)

zabove1 = next_above(z1, mask0, to_morton2(key, 0))

xa1, ya1 = from_morton2(zabove1)

grid2[xa1,ya1,:] = (1,0,1)

zabove2 = next_above(z2, mask0, to_morton2(key, 0))

xa2, ya2 = from_morton2(zabove2)

grid2[xa2,ya2,:] = (1,0,1)

In [14]:

plt.figure(figsize = (5,5))

plt.imshow(grid2)

plt.show()

Then, for demonstration purpose, we find the next z-order value outside the two z-order region that aligns "vertically" with the given keys:

In [15]:

grid3 = grid1.copy()

for key in keys:

grid3[:, key, :] *= 0.9

grid3[0, key, :] = (0.7,0.7,0.7)

zabove1 = next_above(z1, mask1, to_morton2(0, key))

xa1, ya1 = from_morton2(zabove1)

grid3[xa1,ya1,:] = (1,0,1)

zabove2 = next_above(z2, mask1, to_morton2(0, key))

xa2, ya2 = from_morton2(zabove2)

grid3[xa2,ya2,:]= (1,0,1)

In [16]:

plt.figure(figsize = (5,5))

plt.imshow(grid3)

plt.show()

The original ~1995 Prolog version of the algorithm:

xor X Y = (X or Y) and not (X and Y);

shiftDown V = shiftDown2 1 V;
shiftDown2 32 V = V;
shiftDown2 K V = shiftDown2 (2*K) (V or (shr V K));

nextAbove X M Z = Y
where
A = (X and not M) or (Z and M),
B = X xor A,
V = shiftDown B,
Y = nextAbove2 X M Z V (ifelse (V=0) 1 ((shr (V+1) 1) and X));

nextAbove2 X M Z V W = Y
where
NX = (((X or M) + W) and not M) and not V,
Y = NX or (Z and M);

C++ version from ~2003:

int
NextAbove(
int x,
int m,
int z){

int a = (x&~m)|(z&m);
int b = x^a; // different bits
int v= b;
{
for(int k=1; k<32; k<<=1){ //smeared bits down
v |= (v>>k);
}
}
int w = ((v+1)>>1)&x; // keep top bit if it is in x
if(v==0){
w = 1; // case where is alreay on "border"
}
int nx = (((x|m)+w)&~m)&~v;
int y = nx | (z&m);
return y;
}