Skip to content

Conversation

@brownbaerchen
Copy link
Collaborator

@brownbaerchen brownbaerchen commented Jan 7, 2026

Due Diligence

  • General:
  • Implementation:
    • benchmarks: performance improved or maintained
    • documentation updated where needed

Description

Running the file test_statistics.py from any directory had two issues: The paths to the datasets were incorrect and there is some funny business with the random seed where, depending on how you run the tests, the random seeds are set up differently.

I finally figured out why. heat.random has a global variable __rng which determines the behavior of the random generator on different MPI tasks. Specifically, when set to Threefry, heat.random.seed gives the same random seed on all ranks, whereas with batchparallel, you get the seed plus the rank.
So if you want to be sure the code is doing what you want, irrespective of what has been run before, you need to use

ht.random.set_state((<'Threefry' or 'batchparallel'>, seed, local_seed)) 

rather than just

ht.random.seed(seed)

I have to say I am not super happy with this because clearly it can lead to confusion. The names of allowed states of the __rng variable are not helpful in understanding the parallel behavior. I understand that both modes are needed, but maybe the API should be overhauled. If whoever reviews this agrees, feel free to open an issue or ping me to do so.

Issue/s resolved: #2069

Changes proposed:

  • Set random seed at the beginning of each test
  • Made paths to datasets relative to heat installation rather than current working directory

Type of change

  • Bug fix

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

Thank you for the PR!

@codecov
Copy link

codecov bot commented Jan 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 91.68%. Comparing base (88adfc3) to head (b3c3825).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2090   +/-   ##
=======================================
  Coverage   91.68%   91.68%           
=======================================
  Files          89       89           
  Lines       13945    13945           
=======================================
  Hits        12786    12786           
  Misses       1159     1159           
Flag Coverage Δ
unit 91.68% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Contributor

Thank you for the PR!

@github-actions
Copy link
Contributor

Thank you for the PR!

@github-actions
Copy link
Contributor

Thank you for the PR!

Copy link
Contributor

@ClaudiaComito ClaudiaComito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @brownbaerchen .

batchparallel was introduced last year (@mrfh92 🙏) because Threefry was /is quite slow and in most cases not necessary. But I agree that the documentation and potentially API needs to make the difference more obvious.

@github-project-automation github-project-automation bot moved this from Todo to Merge queue in Roadmap Jan 14, 2026
@mrfh92
Copy link
Collaborator

mrfh92 commented Jan 14, 2026

Regarding the tests, one also needs to be aware of line 22 in heat/core/tests/test_suites/basic_test.py which sets a seed whenever starting a new test.

The rationale behind "batchparallel" was: threefry is very cool but has two main limitations:

  • The limit of possible generations is reached fast
  • quite slow

The batchparallel option is also closer to Heats rationale to take process-local operations from Torch whenever possible. Only disadvantage: The actual random numbers depend on the number of processes, not only on the seed.

@mrfh92
Copy link
Collaborator

mrfh92 commented Jan 14, 2026

Indeed, setting a seed requires differences for the two options:

  • one (global) seed for threefry
  • one (global) seed that produces local seeds for Torch for the batchparallel option.

In the current implementation, full reproducibility should be given for threefry and partial reproducibility (under same seed and number of processes/larray shapes) should be given for batchparallel.

In my opinion, the larger problem with randomized tests is likely due to the fact that sometimes we set seeds, sometimes not, and additionally due to the recent change mentioned above the test class sets a seed as well.

@brownbaerchen
Copy link
Collaborator Author

I am not disagreeing with having both Threefry and batchparallel. In fact, I don't know anything about generating random numbers or these methods in particular.
Actually, after giving it some thought, I think the current API seems fine and it was just used a bit sillily in the test. The setUp method @mrfh92 mentions sets a seed for heat and then the tests generates some random numbers with torch. heat.seed sets a seat for torch, but depending on the __rng global variable, it's either the same or not.
So, the "correct" way of seeding is: seed torch if you want to use torch directly or seed heat if you want to use heat.
Having different torch seeds on different ranks is perfectly reasonable. I remember running a simulation with random perturbations and thinking I have a bug because the solution repeated in space. But then it was just the same seed on all tasks...

@github-actions
Copy link
Contributor

Thank you for the PR!

@ClaudiaComito
Copy link
Contributor

I think the current API seems fine and it was just used a bit sillily in the test.

I think the team has settled on the formulation "for historical reasons" 😝

These must be some of the first tests we implemented, even before the ht.random module existed. A nice refactoring was long overdue. Thanks @brownbaerchen !

@ClaudiaComito ClaudiaComito added this to the 1.7.1 milestone Jan 14, 2026
@github-actions
Copy link
Contributor

Thank you for the PR!

@ClaudiaComito ClaudiaComito merged commit 9e1eaea into main Jan 15, 2026
10 checks passed
@github-project-automation github-project-automation bot moved this from Merge queue to Done in Roadmap Jan 15, 2026
@ClaudiaComito ClaudiaComito deleted the bugs/2069-_Bug_errors_in_test_statistics_py branch January 15, 2026 05:05
github-actions bot pushed a commit that referenced this pull request Jan 15, 2026
)

* Using same random values on all MPI tasks in test_statistics.py

* Add more random seeds to statistics test

* Seeding the heat random generator properly in statistics tests

---------

Co-authored-by: Claudia Comito <[email protected]>
(cherry picked from commit 9e1eaea)
@github-actions
Copy link
Contributor

Successfully created backport PR for stable:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport stable bug Something isn't working core

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Bug]: errors in test_statistics.py

4 participants