Skip to content

Conversation

@davidnwobi
Copy link

@davidnwobi davidnwobi commented Aug 29, 2024

Updated models that require 1D integration to use new integration system. The new model have "_high_res" appended to their names.

The updated models are:

  • Core-Shell Bicelle
  • Core-Shell Cylinder
  • Core-Shell Ellipsoid
  • Cylinder
  • Ellipsoid
  • Flexible Cylinder with Elliptical Cross-section
  • Hollow Cylinder

Details of the method can be found at this repo

@pkienzle
Copy link
Contributor

pkienzle commented Sep 6, 2024

(1) Using Δq = 2 π / d_max as a period spacing, and selecting, say, k=5 sample points per period, then you should get a ballpark estimate for the number of integration points by looking at the arc length at radius |q|. That is, Δq = |q| k Δθ, so step size in theta would be Δθ = Δq/(k |q|), and number of steps would be n = (π/2) / Δθ = k |q| d_max / 4.

How does this value compare to the n estimated by Gauss-Kronrod rule? If it is good enough, then we can apply it more easily across all the shape models without a bespoke spline fit for each model type.

(2) Rather than generic Gauss-Legendre, we could look at the structure of the function and select more points where it is larger (importance sampling). The problematic shapes have high eccentricity, which will show up near θ=0° or θ=90°.

(3) There are analytic approximations for large disks and long rods (#109). Can these be used for large q? Can they be extended to other models?

(4) The current code cannot run in OpenCL since the arrays are too large to put in constant memory. Not sure how to allocate them in main memory and make them accessible to the kernels.

(5) We could unroll the loops, computing the different (q, θ) values in parallel, then summing afterward. This would speed up the kernels a lot on high end graphics cards.

(6) Ideally we would update all models to use dynamic integration rather than having normal and high-resolution versions. It would be nice to have a mechanism to set the precision.

(7) What is the target precision on your models? Do you really need 32000 angles for them? You may need to be more careful about summing the terms to avoid loss in precision when adding many small numbers to a large total.

@pkienzle
Copy link
Contributor

pkienzle commented Sep 6, 2024

Note that you can adjust the number of points in the loop for an existing model. You can see this with

python -m sasmodels.compare -ngauss=512 ellipsoid

This is roughly equivalent to the following [untested]:

from sasmodels.core import load_model_info
from sasmodels.generate import set_integration_size
model_info = load_model_info('ellipsoid')
set_integration_size(model_info, 4096)

SasView could do this now, giving the number of theta points as a control parameter and sending that to sasmodels before evaluating. Not as good as having the model set the number of theta points as done in this PR, but it immediately applies to all models.

@davidnwobi
Copy link
Author

davidnwobi commented Sep 6, 2024

Thanks you for your feedback, there are several aspects here that I should have considered earlier.

  1. Comparison of integration points: I'll take a closer look and compare your method to the Gauss-Kronrod rule. While I'm no longer working with STFC, I do have some free time, so I'll run the comparisons and see how they stack up.

  2. Importance sampling: I completely agree. For the cylinder model, peaks do occur near the ends of the interval, so targeting those areas would definitely improve efficiency. While it might involve more work, the payoff could be significant.

  3. Analytical approximations: They can be used for large values of q but It also depends on the values of the other parameters. For example, for the cylinder model there are good approximations when either $q R$ and $\frac{q L}{2}$ is small. For other models, we might be able to find regions where certain parameters are small and see if the remaining part of the function can be integrated analytically.

  4. OpenCL compatibility: I've had similar issues with OpenCL. I haven’t been able to get either the updated or default models running with OpenCL, whether in sasmodels or SasView. This might be a bug. Interestingly, CUDA works fine,AMD’s parallel processing works but runs out of memory for the updated models (integrated graphics though). The OpenCL definitely needs some further investigation.

  5. Loop Unrolling: I could see if it's at least being done with theta at the moment.

  6. Having precision control would be ideal, especially if the tolerance can vary based on the model or region. This looks doable, but it would require some more work.

  7. What is the target precision on your models?

Target tolerance is a relative tolerance of 1 $\times 10^{-3}$.
As for needing 32,000 angles—no, not for the entire parameter space. Gauss-Quadrature struggles with highly oscillatory functions. it has a very slow convergence as the function becomes more oscillatory. Quick example for a decaying oscillatory
function:
$e^{- \sqrt{k} x} \sin(k x)$

With $k = 4000$:

  • $n = 512$; Rel error = $3.547640 \times 10^{-1}$
  • $n = 600$; Rel error = $4.000811 \times 10^{-2}$
  • $n = 700$; Rel error = $4.470038 \times 10^{-4}$
  • $n = 800$; Rel error = $1.042314 \times 10^{-4}$
  • $n = 900$; Rel error = $5.589838 \times 10^{-7}$
  • $n = 1024$; Rel error = $8.715261 \times 10^{-12}$

With $k = 400000$:

  • $n = 16384$; Rel error = $2.155721 \times 10^{0}$
  • $n = 20768$; Rel error = $1.492304 \times 10^{-1}$
  • $n = 22768$; Rel error = $5.929450 \times 10^{-2}$
  • $n = 24768$; Rel error = $7.977614 \times 10^{-3}$
  • $n = 26768$; Rel error = $2.309563 \times 10^{-4}$
  • $n = 28768$; Rel error = $4.160178 \times 10^{-4}$
  • $n = 30768$; Rel error = $3.771680 \times 10^{-5}$
  • $n = 32768$; Rel error = $5.919487 \times 10^{-6}$

Due to memory/space limitations, the number of points is selected from a discrete set (powers of 2 from 1 to 15), so that only those points need to be stored and loaded into memory. As a result, if it selects 32,768 points, it only means 16,384 wouldn’t be enough. It doesn't check any of the possibilities in between. It could have gotten away with less but it can't figure that out. A bit crude but it helps. Do you have any suggestions on a better way to handle this?

I'll look some of these issues.

  • Comparing with the Gauss Kronrod rule
  • And a dynamic way of setting the tolerance.

@wpotrzebowski
Copy link

This PR is poiniting to SasView:master. Shouldn't be main?

@wpotrzebowski
Copy link

This PR is poiniting to SasView:master. Shouldn't be main?

Sorry, my bad it is sasmodels...

@pkienzle
Copy link
Contributor

pkienzle commented Feb 1, 2025

Hacking the cylinder high res so that it returns the $n = 2^w$ for $w$ returned by the poly_eval function I can display the number of quadrature points $n$ as a function of Q.

Here is a plot of $n$ for radius R=1000, length L=20. The polynomial uses QR and QL, so this is showing behaviour for radius up to 1000000 when $Q<1$. It is not linear in Q like I was predicting.

image

Changing to R=1000, L=2000 so that the aspect ratio is larger changes the shape, but generally is within a factor of two:
image

Core shell bicelle $n$ scales similarly with the outer radius and cylinder length.

Hollow cylinder gives strange results, cutting off at $n=300$. This is even for radius=0, which means it should have the same behaviour as cylinder. I didn't look into the polynomial generator to see if it has artificial limits applied. [Note: I'm used the Integral-Fitting-Tools repo for my tests.]

Flexible cylinder elliptical $n$ is driven primarily by radius. Curiously, within existing sasmodels the flexible cylinder elliptical with one segment (i.e., kuhn length = length) is more accurate than the cylinder model when given a very long cylinder length. Maybe something special in Sk_WR?

Before we consider merging, we should spot check some of these results using sympy and mpmath. This can be added to explore/symint.py. For now symint is only checking whether increasing $n$ changes the answer, not whether the answer it gives is correct.

@pkienzle
Copy link
Contributor

I implemented a proof of concept adaptive integration for cylinder using a qr heuristic:

https://github.com/SasView/sasmodels/blob/c0eb28eebd9387a5182859e568019d65b5a81cbf/sasmodels/models/lib/adaptive.c#L11203C1-L11213C2

Here's the updated cylinderp model (p for precision):

double qr = q*(radius > length ? radius : length);
double *w, *z;
int n = gauss_weights(qr, &w, &z);
for (int i=0; i<n ;i++) {
const double theta = z[i]*zm + zb;
double sin_theta, cos_theta; // slots to hold sincos function output
// theta (theta,phi) the projection of the cylinder on the detector plane
SINCOS(theta , sin_theta, cos_theta);
const double form = _fq(q*sin_theta, q*cos_theta, radius, length);
total_F1 += w[i] * form * sin_theta;
total_F2 += w[i] * form * form * sin_theta;
}

These are available on the ticket-535-adaptive-integration branch.

I called the adaptive model cylinderp and left cylinder in place so that it is easy to compare accuracy and performance.

It has similar run speed for well behaved models:

$ python -m sasmodels.compare -exq cylinder,cylinderp -ngauss=0,0 radius=10 length=20 -exq -nq=1000                                              
DLL[64] t=3.65 ms, intensity=2151.   <==== cost using 76 gauss points everywhere
DLL[64] t=4.40 ms, intensity=2151.   <==== cost for adaptive model
|DLL[64]-DLL[64]|            max:3.494e-11  median:0.000e+00  98%:1.552e-11  rms:3.981e-12  zero-offset:+8.329e-13
|(DLL[64]-DLL[64])/DLL[64]|  max:3.490e-08  median:0.000e+00  98%:1.551e-08  rms:3.979e-09  zero-offset:+8.324e-10

It is a lot slower for pathological models:

$ python -m sasmodels.compare -exq cylinder,cylinderp -ngauss=10000,0 radius=10 length=20000 -exq -nq=1000
DLL[64] t=227.83 ms, intensity=11740.    <==== cost using 10000 gauss points everywhere
DLL[64] t=97.11 ms, intensity=11740        <==== cost for adaptive model
|DLL[64]-DLL[64]|            max:3.803e-08  median:7.744e-10  98%:1.406e-08  rms:4.119e-09  zero-offset:+1.507e-09
|(DLL[64]-DLL[64])/DLL[64]|  max:1.018e-05  median:5.512e-10  98%:5.591e-06  rms:1.395e-06  zero-offset:+4.587e-07

but much more accurate:

$ python -m sasmodels.compare -exq cylinder,cylinder -ngauss=10000,0 radius=10 length=20000 -exq -nq=1000 
DLL[64] t=221.55 ms, intensity=11740   <==== cost using 10000 gauss points everywhere
DLL[64] t=4.54 ms, intensity=11739      <==== cost using 76 gauss points everywhere
|DLL[64]-DLL[64]|            max:7.673e-01  median:8.364e-03  98%:6.608e-01  rms:2.538e-01  zero-offset:+1.487e-01
|(DLL[64]-DLL[64])/DLL[64]|  *** max:3.545e+00 *** median:2.664e-02  98%:2.524e+00  rms:5.524e-01  zero-offset:+1.938e-01

It should be pretty easy to go through all our shapes and update them to adaptive.

For shapes where we are integrating or θ and φ we should consider using specialized spherical point schemes, such as Lebedev quadrature (wikipedia).

@pkienzle
Copy link
Contributor

pkienzle commented Aug 1, 2025

With the repo at https://github.com/davidnwobi/Integral-Fitting-Tool and the sasmodels.compare tool we can compare model values and execution time.

Here is an example showing a 22% deviation between a 10000 point gaussian integration scheme and ellipsoid_high_res:

$ python -m sasmodels.compare -double! background=0 ellipsoid,ellipsoid_high_res.py -ngauss=10000,0 -random=656333 -pars -nq=1000
Randomize using -random=656333
==== ellipsoid =====
scale: 0.00173375
background: 0
sld: 11.8516
sld_solvent: 6.02864
radius_polar: 659.823
radius_equatorial: 33071.7
==== ellipsoid_high_res =====
scale: 0.00173375
background: 0
sld: 11.8516
sld_solvent: 6.02864
radius_polar: 659.823
radius_equatorial: 33071.7
DLL[64] t=84.29 ms, intensity=1160661912
DLL[64] t=4.17 ms, intensity=1160661682
|DLL[64]-DLL[64]|            max:1.220e+01  median:1.630e-06  98%:3.440e+00  rms:9.596e-01  zero-offset:+2.304e-01
|(DLL[64]-DLL[64])/DLL[64]|  max:2.223e-01  median:4.301e-10  98%:8.113e-03  rms:1.311e-02  zero-offset:+1.434e-03

Speed is generally comparable to the 76 point non-adaptive integration, being faster for small shapes but slower for large shapes. For example:

$ python -m sasmodels.compare -double! background=0 ellipsoid,ellipsoid_high_res.py -ngauss=76,0 -random=65012 -pars -neval=10
Randomize using -random=65012
==== ellipsoid =====
scale: 0.027392
background: 0
sld: 6.25461
sld_solvent: 5.21939
radius_polar: 96.5405
radius_equatorial: 83119.2
...
DLL[64] t=0.15 ms, intensity=10960877
DLL[64] t=1.16 ms, intensity=10968737
...

The simple adaptive integration branch which uses qr as a heuristic to decide the number of integration points is more accurate but generally slower than the high res branch. When running the simple integration branch on the GPU with polydispersity (Apple M2) the speed comparison is mixed, sometimes 2x faster and sometimes 2x slower than the high precision branch on CPU.

Here's a shape with a large difference between the high res integration and the 10000 point integration (85x):

$ python -m sasmodels.compare background=0 core_shell_cylinder,core_shell_cylinder_high_res.py -double! -ngauss=10000,0 -random=83174 -nq=1000 -pars
Randomize using -random=83174
...
==== core_shell_cylinder_high_res =====
scale: 0.291548
background: 0
sld_core: 8.92736
sld_shell: 11.2834
sld_solvent: 10.7719
radius: 291.224
thickness: 33528.1
length: 1.53983
DLL[64] t=41.60 ms, intensity=36122229857
DLL[64] t=172.93 ms, intensity=28338819779
|DLL[64]-DLL[64]|            max:1.717e+08  median:7.214e+01  98%:1.290e+08  rms:2.878e+07  zero-offset:+7.783e+06
|(DLL[64]-DLL[64])/DLL[64]|  max:8.495e+01  median:3.732e-01  98%:2.028e+01  rms:6.405e+00  zero-offset:+2.910e+00

The simple adaptive scheme is also out of spec for this shape (3.5% difference).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants