Skip to content

Load random states from checkpoint#238

Open
gritukan wants to merge 1 commit intohuggingface:mainfrom
thenno:fix_random_state_load
Open

Load random states from checkpoint#238
gritukan wants to merge 1 commit intohuggingface:mainfrom
thenno:fix_random_state_load

Conversation

@gritukan
Copy link

@gritukan gritukan commented Nov 2, 2024

Model's random states are saved into checkpoint but not loaded during recovery which cause non-deterministic behavior.

@NouamaneTazi
Copy link
Member

We're actually thinking about removing random states as they're not very pytorch native.
Instead we'd recommend using REDUCE_SCATTER (which is sequence parallelism) that ensure determinism instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants