-
Notifications
You must be signed in to change notification settings - Fork 60
Using same random values on all MPI tasks in test_statistics.py
#2090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using same random values on all MPI tasks in test_statistics.py
#2090
Conversation
|
Thank you for the PR! |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2090 +/- ##
=======================================
Coverage 91.68% 91.68%
=======================================
Files 89 89
Lines 13945 13945
=======================================
Hits 12786 12786
Misses 1159 1159
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Thank you for the PR! |
|
Thank you for the PR! |
|
Thank you for the PR! |
ClaudiaComito
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @brownbaerchen .
batchparallel was introduced last year (@mrfh92 🙏) because Threefry was /is quite slow and in most cases not necessary. But I agree that the documentation and potentially API needs to make the difference more obvious.
|
Regarding the tests, one also needs to be aware of line 22 in The rationale behind "batchparallel" was: threefry is very cool but has two main limitations:
The batchparallel option is also closer to Heats rationale to take process-local operations from Torch whenever possible. Only disadvantage: The actual random numbers depend on the number of processes, not only on the seed. |
|
Indeed, setting a seed requires differences for the two options:
In the current implementation, full reproducibility should be given for threefry and partial reproducibility (under same seed and number of processes/larray shapes) should be given for batchparallel. In my opinion, the larger problem with randomized tests is likely due to the fact that sometimes we set seeds, sometimes not, and additionally due to the recent change mentioned above the test class sets a seed as well. |
|
I am not disagreeing with having both Threefry and batchparallel. In fact, I don't know anything about generating random numbers or these methods in particular. |
|
Thank you for the PR! |
I think the team has settled on the formulation "for historical reasons" 😝 These must be some of the first tests we implemented, even before the ht.random module existed. A nice refactoring was long overdue. Thanks @brownbaerchen ! |
|
Thank you for the PR! |
) * Using same random values on all MPI tasks in test_statistics.py * Add more random seeds to statistics test * Seeding the heat random generator properly in statistics tests --------- Co-authored-by: Claudia Comito <[email protected]> (cherry picked from commit 9e1eaea)
|
Successfully created backport PR for |
Due Diligence
Description
Running the file
test_statistics.pyfrom any directory had two issues: The paths to the datasets were incorrect and there is some funny business with the random seed where, depending on how you run the tests, the random seeds are set up differently.I finally figured out why.
heat.randomhas a global variable__rngwhich determines the behavior of the random generator on different MPI tasks. Specifically, when set toThreefry,heat.random.seedgives the same random seed on all ranks, whereas withbatchparallel, you get the seed plus the rank.So if you want to be sure the code is doing what you want, irrespective of what has been run before, you need to use
rather than just
I have to say I am not super happy with this because clearly it can lead to confusion. The names of allowed states of the
__rngvariable are not helpful in understanding the parallel behavior. I understand that both modes are needed, but maybe the API should be overhauled. If whoever reviews this agrees, feel free to open an issue or ping me to do so.Issue/s resolved: #2069
Changes proposed:
Type of change