-
Notifications
You must be signed in to change notification settings - Fork 30
Better integration #608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Better integration #608
Conversation
A more accurate cylinder model has been added to the sasmodels library and library functions have been added to support the model
The new models are appended with "_high_res" to indicate that they are using the new integration system.
|
(1) Using How does this value compare to the (2) Rather than generic Gauss-Legendre, we could look at the structure of the function and select more points where it is larger (importance sampling). The problematic shapes have high eccentricity, which will show up near θ=0° or θ=90°. (3) There are analytic approximations for large disks and long rods (#109). Can these be used for large q? Can they be extended to other models? (4) The current code cannot run in OpenCL since the arrays are too large to put in constant memory. Not sure how to allocate them in main memory and make them accessible to the kernels. (5) We could unroll the loops, computing the different (q, θ) values in parallel, then summing afterward. This would speed up the kernels a lot on high end graphics cards. (6) Ideally we would update all models to use dynamic integration rather than having normal and high-resolution versions. It would be nice to have a mechanism to set the precision. (7) What is the target precision on your models? Do you really need 32000 angles for them? You may need to be more careful about summing the terms to avoid loss in precision when adding many small numbers to a large total. |
|
Note that you can adjust the number of points in the loop for an existing model. You can see this with This is roughly equivalent to the following [untested]: SasView could do this now, giving the number of theta points as a control parameter and sending that to sasmodels before evaluating. Not as good as having the model set the number of theta points as done in this PR, but it immediately applies to all models. |
|
Thanks you for your feedback, there are several aspects here that I should have considered earlier.
Target tolerance is a relative tolerance of 1 With
With
Due to memory/space limitations, the number of points is selected from a discrete set (powers of 2 from 1 to 15), so that only those points need to be stored and loaded into memory. As a result, if it selects 32,768 points, it only means 16,384 wouldn’t be enough. It doesn't check any of the possibilities in between. It could have gotten away with less but it can't figure that out. A bit crude but it helps. Do you have any suggestions on a better way to handle this? I'll look some of these issues.
|
|
This PR is poiniting to SasView:master. Shouldn't be main? |
Sorry, my bad it is sasmodels... |
|
I implemented a proof of concept adaptive integration for cylinder using a qr heuristic: Here's the updated cylinderp model (p for precision): sasmodels/sasmodels/models/cylinderp.c Lines 71 to 82 in c0eb28e
These are available on the ticket-535-adaptive-integration branch. I called the adaptive model cylinderp and left cylinder in place so that it is easy to compare accuracy and performance. It has similar run speed for well behaved models: $ python -m sasmodels.compare -exq cylinder,cylinderp -ngauss=0,0 radius=10 length=20 -exq -nq=1000
DLL[64] t=3.65 ms, intensity=2151. <==== cost using 76 gauss points everywhere
DLL[64] t=4.40 ms, intensity=2151. <==== cost for adaptive model
|DLL[64]-DLL[64]| max:3.494e-11 median:0.000e+00 98%:1.552e-11 rms:3.981e-12 zero-offset:+8.329e-13
|(DLL[64]-DLL[64])/DLL[64]| max:3.490e-08 median:0.000e+00 98%:1.551e-08 rms:3.979e-09 zero-offset:+8.324e-10It is a lot slower for pathological models: $ python -m sasmodels.compare -exq cylinder,cylinderp -ngauss=10000,0 radius=10 length=20000 -exq -nq=1000
DLL[64] t=227.83 ms, intensity=11740. <==== cost using 10000 gauss points everywhere
DLL[64] t=97.11 ms, intensity=11740 <==== cost for adaptive model
|DLL[64]-DLL[64]| max:3.803e-08 median:7.744e-10 98%:1.406e-08 rms:4.119e-09 zero-offset:+1.507e-09
|(DLL[64]-DLL[64])/DLL[64]| max:1.018e-05 median:5.512e-10 98%:5.591e-06 rms:1.395e-06 zero-offset:+4.587e-07but much more accurate: $ python -m sasmodels.compare -exq cylinder,cylinder -ngauss=10000,0 radius=10 length=20000 -exq -nq=1000
DLL[64] t=221.55 ms, intensity=11740 <==== cost using 10000 gauss points everywhere
DLL[64] t=4.54 ms, intensity=11739 <==== cost using 76 gauss points everywhere
|DLL[64]-DLL[64]| max:7.673e-01 median:8.364e-03 98%:6.608e-01 rms:2.538e-01 zero-offset:+1.487e-01
|(DLL[64]-DLL[64])/DLL[64]| *** max:3.545e+00 *** median:2.664e-02 98%:2.524e+00 rms:5.524e-01 zero-offset:+1.938e-01It should be pretty easy to go through all our shapes and update them to adaptive. For shapes where we are integrating or θ and φ we should consider using specialized spherical point schemes, such as Lebedev quadrature (wikipedia). |
|
With the repo at https://github.com/davidnwobi/Integral-Fitting-Tool and the sasmodels.compare tool we can compare model values and execution time. Here is an example showing a 22% deviation between a 10000 point gaussian integration scheme and ellipsoid_high_res: $ python -m sasmodels.compare -double! background=0 ellipsoid,ellipsoid_high_res.py -ngauss=10000,0 -random=656333 -pars -nq=1000
Randomize using -random=656333
==== ellipsoid =====
scale: 0.00173375
background: 0
sld: 11.8516
sld_solvent: 6.02864
radius_polar: 659.823
radius_equatorial: 33071.7
==== ellipsoid_high_res =====
scale: 0.00173375
background: 0
sld: 11.8516
sld_solvent: 6.02864
radius_polar: 659.823
radius_equatorial: 33071.7
DLL[64] t=84.29 ms, intensity=1160661912
DLL[64] t=4.17 ms, intensity=1160661682
|DLL[64]-DLL[64]| max:1.220e+01 median:1.630e-06 98%:3.440e+00 rms:9.596e-01 zero-offset:+2.304e-01
|(DLL[64]-DLL[64])/DLL[64]| max:2.223e-01 median:4.301e-10 98%:8.113e-03 rms:1.311e-02 zero-offset:+1.434e-03Speed is generally comparable to the 76 point non-adaptive integration, being faster for small shapes but slower for large shapes. For example: $ python -m sasmodels.compare -double! background=0 ellipsoid,ellipsoid_high_res.py -ngauss=76,0 -random=65012 -pars -neval=10
Randomize using -random=65012
==== ellipsoid =====
scale: 0.027392
background: 0
sld: 6.25461
sld_solvent: 5.21939
radius_polar: 96.5405
radius_equatorial: 83119.2
...
DLL[64] t=0.15 ms, intensity=10960877
DLL[64] t=1.16 ms, intensity=10968737
...The simple adaptive integration branch which uses qr as a heuristic to decide the number of integration points is more accurate but generally slower than the high res branch. When running the simple integration branch on the GPU with polydispersity (Apple M2) the speed comparison is mixed, sometimes 2x faster and sometimes 2x slower than the high precision branch on CPU. Here's a shape with a large difference between the high res integration and the 10000 point integration (85x): $ python -m sasmodels.compare background=0 core_shell_cylinder,core_shell_cylinder_high_res.py -double! -ngauss=10000,0 -random=83174 -nq=1000 -pars
Randomize using -random=83174
...
==== core_shell_cylinder_high_res =====
scale: 0.291548
background: 0
sld_core: 8.92736
sld_shell: 11.2834
sld_solvent: 10.7719
radius: 291.224
thickness: 33528.1
length: 1.53983
DLL[64] t=41.60 ms, intensity=36122229857
DLL[64] t=172.93 ms, intensity=28338819779
|DLL[64]-DLL[64]| max:1.717e+08 median:7.214e+01 98%:1.290e+08 rms:2.878e+07 zero-offset:+7.783e+06
|(DLL[64]-DLL[64])/DLL[64]| max:8.495e+01 median:3.732e-01 98%:2.028e+01 rms:6.405e+00 zero-offset:+2.910e+00The simple adaptive scheme is also out of spec for this shape (3.5% difference). |


Updated models that require 1D integration to use new integration system. The new model have "_high_res" appended to their names.
The updated models are:
Details of the method can be found at this repo