-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use randomized sobol sampling #167
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: David Widmann <[email protected]>
Co-authored-by: David Widmann <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #167 +/- ##
==========================================
+ Coverage 92.78% 92.83% +0.04%
==========================================
Files 11 11
Lines 832 837 +5
==========================================
+ Hits 772 777 +5
Misses 60 60 ☔ View full report in Codecov by Sentry. |
e12a571
to
b49608a
Compare
b49608a
to
8fa44eb
Compare
Co-authored-by: David Widmann <[email protected]>
test/sobol_method.jl
Outdated
@@ -22,7 +22,7 @@ function linear(X) | |||
A * X[1] + B * X[2] | |||
end | |||
|
|||
n = 600000 | |||
n = 524288 | |||
lb = -ones(4) * π | |||
ub = ones(4) * π | |||
sampler = SobolSample() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be changed as well, shouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the comment from the PR that added RQMC that you had put in the issue. The OP mentions
To nuance my words, for the use case M=2 of sensitivity analysis this is probably not as bad as I show for small enough cases.
So it makes sense to enforce the randomization only for bootstrap cases, what do you think? Changing this is significantly affecting the results of these trivial tests and the indices values for this are pretty commonly accepted, though to be honest I guess the new values could be justified a bit better I am just worried about users from R trying to match the values and then having to defend this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of what this PR ends up doing: I think the tests, examples and recommendations should consistently do the same thing as the gsa
implementations with samples
kwarg (is this function tested at all if the changes do not affect the tests?).
My main concern is that generally doing non-standard things with low discrepancy sequences is a bad idea. E.g., even just omitting the initial point of zeros (as done in Sobol.jl) is a bad idea, as discussed in the issue in Sobol.jl and shown in https://arxiv.org/pdf/2008.08051. What's currently done in GlobalSensitivity - and what IMO rightfully shows a warning to users - is that the samples * num_mats
samples are generated by taking a single consecutive subset of samples * num_mats
and partitioning it: https://github.com/SciML/QuasiMonteCarlo.jl/blob/5c5483565d5b6a083256861bcf7e00cf7075c5f4/src/RandomizedQuasiMonteCarlo/iterators.jl#L268-L271 In general this won't preserve any of the distribution properties of the original sequence.
Switching to a proper RQMC approach would be much more sound. However, as far as I know, generally it shouldn't lead to (much) different/worse estimates of integrals, so something seems off or some other part of the algorithm seems to interact badly with the randomized sequences. A quick search reveals quite a few papers though regarding global sensitivity and RQMC, so I think it should work and be an improvement in general.
Regardless of the mathematical details, IMO just sticking with the current implementation is not a good idea for another reason: Users might get the same results as with other (R) packages - but every time they run gsa
without pre-computed design matrices they will see a warning.
So in case it's too unclear yet how to fix the RQMC issues, I suggest as a temporary workaround to just do a single sample
call (as in the implementation of generate_design_matrices
that's currently used) in tests, examples, and gsa
that at least makes it apparent how the samples are generated and fixes the warning (and should even lead to a more efficient gsa
implementation since some of the partitioning and hcat
ing can be avoided, only a single split is needed). Probably it would be good to still add a comment about this choice and links to RQMC in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was pretty helpful, I need to get to it @devmotion though sorry for the delay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something blocking or making your work harder?
52c3ddf
to
3b7cf99
Compare
Checklist
contributor guidelines, in particular the SciML Style Guide and
COLPRAC.
Additional context
Add any other context about the problem here.
Fixes #166