Skip to content

Conversation

@bernstei
Copy link
Collaborator

fix issue that configs on non-head MPI task had wrong (old) step sizes. Initial fix is to copy current step size into buffer used to write configs during snapshot. Reason for hang is subtle, since code appears to depend only on values that are the same on all MPI tasks.

Additional steps that might be worth it

  • clone step size from a specific single config when reading snapshot
  • check that all step sizes are the same during writing
  • test for this failure

closes #20

Make sure all configs written in snapshot have same, latest, step size.
And add check for consistent step sizes when distributing initial configurations.
Add pytest that times out when original bug is present.
@bernstei bernstei force-pushed the snapshot_wrong_step_sizes branch from c8e1ddb to da5bc91 Compare June 16, 2025 17:27
@bernstei bernstei merged commit 951c3b0 into main Jun 16, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

not all configs saved in snapshot have correct step size info

2 participants