Skip to content

Add Python support for seeded random sqsgenerator runs#93

Open
fkiwit wants to merge 1 commit intodgehringer:mainfrom
fkiwit:python-seed-support
Open

Add Python support for seeded random sqsgenerator runs#93
fkiwit wants to merge 1 commit intodgehringer:mainfrom
fkiwit:python-seed-support

Conversation

@fkiwit
Copy link
Copy Markdown

@fkiwit fkiwit commented Apr 7, 2026

Thanks for maintaining this amazing package, I have been using it recently and it is very helpful! 😊

For reproducibility during debugging I have forked the repo and exposed the random seed to the Python API. Before I used a workaround by hashing the generated structures afterward to check whether repeated runs were effectively deterministic. Exposing the seed directly through the Python binding is more elegant and I am wondering if this could be a useful feature for the project.

Here a summary of the changes:

Changes

  • add seed: Optional[uint64] to the core configuration
  • parse seed from dict/JSON config input
  • include seed in JSON serialization
  • expose seed in the Python bindings
  • pass the configured seed into the internal shuffler
  • document seed in the parameter docs
  • add a Python test for reproducibility

Notes

  • this affects iteration_mode="random"
  • reproducibility should hold for the same configuration, seed, and thread setup
  • the new test uses thread_config=1 to keep the reproducibility guarantee explicit

Verification

  • a config with and without seed parses successfully
  • repeated runs with the same seed produce the same objective value and the same species assignment

@dgehringer
Copy link
Copy Markdown
Owner

Hi @fkiwit!

Many thanks for your contribution. Furthermore, thanks a lot for the careful implementation of your "seed" parameter. I do agree that this might be helpful for debugging purposes.

However, there are two things which should be addressed before:

The central point is this line here

auto shuffler{this->transpose_setting([](auto&& c) { return c.shuffler; })};

  1. Usability: In the end there is one seed per sub lattice, as the "shuffler" object is copied for each sub lattice. Consequently, although it might not be your use-case other might be interested in fixing the seed for each sublattice. So the data type reflecting the whole complexity would be optional<vector<optional<uint64_t>> or list[int | None] | None. Because now you distribute the same seed on each sub lattice, I regard, this is unexpected behaviour to

  2. Implementation: You have correctly passed the configured seed into the internal shuffler. However, in practice, in the way how sqsgen in architecture this is sadly not enough to fulfil your expectations:

You said:

  • reproducibility should hold for the same configuration, seed, and thread setup
  • the new test uses thread_config=1 to keep the reproducibility guarantee explicit

Sadly this is not true. The shuffler implementation is (by intention) not thread-safe. So it will be in any case only deterministic in case you use one-thread.

The problem is that a "seed" is global state or at least global state per thread. Hence to make it deterministic you we would have to add a synchronisation (a lock) around the routine that computes a new permutation (not around the random number computation). That sacrifices performance, since you add a lock in a "hot-loop". Furthermore, the shuffler object would have to be global in the scope of an optimization (so no seed per sub lattice). Currently each object shuffler object is shared amongst all threads. So in the end you have to make sure that pull out random numbers out of your random generator independent of the number of threads. The advantage is you would have just a single seed for all sub lattice as you have already implemented it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants