-
Notifications
You must be signed in to change notification settings - Fork 212
Open
Labels
Description
Describe the bug
When I initialize the Memoizer with a checkpoint_mode but no checkpoint_files, parsl looks for checkpoint files in the run_dir. However, before doing this, parsl automatically appends the run-specific sub-folder (002, for example) which is incremented each run. So it doesn't ever find any checkpoints in there.
Relevant lines:
parsl/parsl/dataflow/memoization.py
Line 214 in 2571116
| checkpoint_files = get_all_checkpoints(self.run_dir) |
- A simple workaround is to pass checkpoint_files by first finding them externally using get_checkpoint_files from the parent run_dir.
- (Note that I'm on benc-checkpoint-plugins, but the author @benclifford mentioned the behavior is copied from master, so I'm reporting here.)
- Side note: run_dir sometimes refers to the runinfo folder and sometimes to the sub-folders, i.e. runinfo/000? More consistent naming could help mitigate this issue.
To Reproduce
Steps to reproduce the behavior, for e.g:
- Do not pass checkpoint_files to the Memoizer
- Run twice
Expected behavior
I would expect the memoizer to load checkpoints from the previous run, i.e. runinfo/000/checkpints, however, it only searches in runinfo/001
Environment
- OS: [ubuntu]
- Python version: 3.10
- Parsl version: master
Distributed Environment
- Where are you running the Parsl script from ? Slurm cluster
- Where do you need the workers to run ? Slurm cluster