Skip to content

Memoizer searches in run-specific sub-folder for checkpoints and ignores previous runs #4040

@PfeifferMicha

Description

@PfeifferMicha

Describe the bug
When I initialize the Memoizer with a checkpoint_mode but no checkpoint_files, parsl looks for checkpoint files in the run_dir. However, before doing this, parsl automatically appends the run-specific sub-folder (002, for example) which is incremented each run. So it doesn't ever find any checkpoints in there.
Relevant lines:

checkpoint_files = get_all_checkpoints(self.run_dir)

  • A simple workaround is to pass checkpoint_files by first finding them externally using get_checkpoint_files from the parent run_dir.
  • (Note that I'm on benc-checkpoint-plugins, but the author @benclifford mentioned the behavior is copied from master, so I'm reporting here.)
  • Side note: run_dir sometimes refers to the runinfo folder and sometimes to the sub-folders, i.e. runinfo/000? More consistent naming could help mitigate this issue.

To Reproduce
Steps to reproduce the behavior, for e.g:

  1. Do not pass checkpoint_files to the Memoizer
  2. Run twice

Expected behavior
I would expect the memoizer to load checkpoints from the previous run, i.e. runinfo/000/checkpints, however, it only searches in runinfo/001

Environment

  • OS: [ubuntu]
  • Python version: 3.10
  • Parsl version: master

Distributed Environment

  • Where are you running the Parsl script from ? Slurm cluster
  • Where do you need the workers to run ? Slurm cluster

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions