Skip to content

Redistribute intervals fix #855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 20, 2025
Merged

Redistribute intervals fix #855

merged 5 commits into from
Aug 20, 2025

Conversation

keskitalo
Copy link
Member

Currently, integration tests run on just two processes. When they are launched with four processes, the problem size and data distribution changes and new unit test failures come to light.

This PR fixes an error in the interval redistribution, making necessary adjustments to the spt3g import/export facilities.

It also fixes a corner case issue in automatic WCS bounds finding.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@keskitalo keskitalo changed the base branch from master to toast3 August 19, 2025 06:06
@keskitalo
Copy link
Member Author

@tskisner I cannot complete the unit tests on my system because op_sim_tod_atm fails with Process 1: MPIShared_5029865228910440920 failed to attach buffer of 288 bytes (36 elements of 8 bytes each): [Errno 24] Too many open files: '/MPIShared_5029865228910440920' Any ideas?

@tskisner
Copy link
Member

@keskitalo Some systems have a small number of allowed open file handles, and shared memory segments look like a "file" to the operating system. In the unit tests, we are repeatedly creating and destroying these. It may be that there is a leak somewhere that does not cleanly shut those down. My hunch is that if you look at the test which hit that error and try to run it separately (with toast.tests.run("name_of_test_file")), then it would complete.

@tskisner
Copy link
Member

I just pushed another change based on our side-channel communication. With this final fix all tests run locally for me on 4 processes. I'll look into implement MPI oversubscription on github CI so that we can improve our test coverage.

@keskitalo keskitalo requested a review from tskisner August 20, 2025 04:37
Copy link
Member

@tskisner tskisner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad we got these fixes in place!

@keskitalo keskitalo merged commit 342d528 into toast3 Aug 20, 2025
7 checks passed
@keskitalo keskitalo deleted the redistribute_intervals_fix branch August 20, 2025 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants