Software Versions
$ snakemake --version
snakemake 9.19.0
$ conda list | grep "snakemake-executor-plugin-slurm"
snakemake-executor-plugin-slurm 2.6.1 pyhdfd78af_0 bioconda
snakemake-executor-plugin-slurm-jobstep 0.4.0 pyhdfd78af_0 bioconda
$ sinfo --version
slurm 23.11.10-BullSequana.1.2.1
Describe the bug
A clear and concise description of what the bug is.
snakemake --profile profiles/leonardo.yaml --dry-run crashes with:
KeyError: collect_loop_result
when the workflow contains a checkpoint-driven dynamic branch that is part of a group (described inside profiles/leonardo.yaml).
The same workflow executes correctly in a real run. In that case, the dynamic loop branch is executed inside a single grouped SLURM job and is faster compared to the non-grouped version (since we may have scheduling overhead at each iteration).
Logs
[agentil1@login01 snakemake-tutorial]$ make dry-run
make[1]: Entering directory '/leonardo_work/PHD_gentili/snakemake-tutorial'
snakemake --profile profiles/leonardo.yaml --dry-run
Using profile profiles/leonardo.yaml for setting default command line arguments.
host: login01.leonardo.local
Building DAG of jobs...
Job stats:
job count
------------------- -------
bwa_map 3
evaluate_node_n 1
collect_loop_result 1
samtools_sort 3
samtools_index 3
bcftools_call 1
plot_quals 1
all 1
total 14
...
... ### REST OF THE WORKFLOW ###
...
[Mon May 25 15:31:26 2026]
rule samtools_sort:
input: mapped_reads/A.bam
output: sorted_reads/A.bam
jobid: 3
reason: Missing output files: sorted_reads/A.bam; Input files updated by another job: mapped_reads/A.bam
wildcards: sample=A
threads: 3
resources: tmpdir=/scratch_local, disk_mb=<TBD>, disk=<TBD>, disk_mib=<TBD>, mem_mb=8000, mem=8 GB, mem_mib=7630, slurm_account=phd_gentili, slurm_partition=boost_usr_prod, runtime=24, ntasks_per_node=8, cpus_per_task=4
Traceback (most recent call last):
File "/leonardo/home/userinternal/agentil1/miniconda3/envs/snakemake-tutorial/lib/python3.12/site-packages/snakemake/cli.py", line 2308, in args_to_api
dag_api.execute_workflow(
File "/leonardo/home/userinternal/agentil1/miniconda3/envs/snakemake-tutorial/lib/python3.12/site-packages/snakemake/api.py", line 646, in execute_workflow
workflow.execute(
File "/leonardo/home/userinternal/agentil1/miniconda3/envs/snakemake-tutorial/lib/python3.12/site-packages/snakemake/workflow.py", line 1461, in execute
raise e
File "/leonardo/home/userinternal/agentil1/miniconda3/envs/snakemake-tutorial/lib/python3.12/site-packages/snakemake/workflow.py", line 1457, in execute
success = self.scheduler.schedule()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/leonardo/home/userinternal/agentil1/miniconda3/envs/snakemake-tutorial/lib/python3.12/site-packages/snakemake/scheduling/job_scheduler.py", line 410, in schedule
raise e
File "/leonardo/home/userinternal/agentil1/miniconda3/envs/snakemake-tutorial/lib/python3.12/site-packages/snakemake/scheduling/job_scheduler.py", line 233, in schedule
self._finish_jobs()
File "/leonardo/home/userinternal/agentil1/miniconda3/envs/snakemake-tutorial/lib/python3.12/site-packages/snakemake/scheduling/job_scheduler.py", line 495, in _finish_jobs
self.workflow.async_run(postprocess())
File "/leonardo/home/userinternal/agentil1/miniconda3/envs/snakemake-tutorial/lib/python3.12/site-packages/snakemake/workflow.py", line 268, in async_run
return runner.run(coro)
^^^^^^^^^^^^^^^^
File "/leonardo/home/userinternal/agentil1/miniconda3/envs/snakemake-tutorial/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/leonardo/home/userinternal/agentil1/miniconda3/envs/snakemake-tutorial/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/leonardo/home/userinternal/agentil1/miniconda3/envs/snakemake-tutorial/lib/python3.12/site-packages/snakemake/scheduling/job_scheduler.py", line 490, in postprocess
await self.workflow.dag.finish(
File "/leonardo/home/userinternal/agentil1/miniconda3/envs/snakemake-tutorial/lib/python3.12/site-packages/snakemake/dag.py", line 2325, in finish
potential_new_ready_jobs = self.update_ready(depending)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/leonardo/home/userinternal/agentil1/miniconda3/envs/snakemake-tutorial/lib/python3.12/site-packages/snakemake/dag.py", line 1906, in update_ready
group = self._group[job]
~~~~~~~~~~~^^^^^
KeyError: collect_loop_result
make[1]: *** [makefile:26: dry-run] Error 1
make[1]: Leaving directory '/leonardo_work/PHD_gentili/snakemake-tutorial'
[agentil1@login01 snakemake-tutorial]$
Minimal example
Consider a Snakefile with this dynamic node + slurm grouping:
# --- CHECKPOINT LOOP LOGIC ---
# Snakemake's DAG evaluates backward. To create a forward "while" loop, we use
# a checkpoint to manually generate the next file, and an input function to recursively
# evaluate whether the loop is done or needs the next iteration.
checkpoint evaluate_node_n:
input:
"loop_files/{n}.txt"
output:
"loop_files_check/{n}_status.txt"
resources:
mem_mb=1000
group: "loop" # Separate group for the loop logic
script:
"scripts/increment_loop.py"
def evaluate_loop_dynamically(wildcards):
"""
A recursive Python function that acts as a forward-processing while-loop using Snakemake Checkpoints.
It triggers checkpoint n=0, waits for it, checks the output, and either returns the final file
or asks for n=1, and so on.
"""
def check_node(n):
# 1. Trigger execution of checkpoint for this specific 'n'.
# Snakemake raises an IncompleteCheckpointException implicitly, pausing evaluate_loop_dynamically
# until evaluate_node_n finishes executing for {n}.
with checkpoints.evaluate_node_n.get(n=n).output[0].open() as f:
status = f.read().strip()
if status == "done":
# 2a. Loop condition is met. Internal checkpoint created nothing_special.txt!
return "loop_files/nothing_special.txt"
else:
# 2b. Checkpoint generated {n+1}.txt manually on disk.
# Recursively evaluate the NEXT node dynamically!
return check_node(n + 1)
# Start the forward-chain from node 0
return check_node(0)
# Final aggregating rule that triggers the dynamic input function
rule collect_loop_result:
input:
evaluate_loop_dynamically
output:
"loop_files/loop_finished.txt"
resources:
mem_mb=1000
group: "loop" # Separate group for the loop logic
shell:
"cp {input} {output}"
Additional context
The same configuration runs as expected:
[agentil1@login01 snakemake-tutorial]$ make run-login
make[1]: Entering directory '/leonardo_work/PHD_gentili/snakemake-tutorial'
snakemake --profile profiles/leonardo.yaml
Using profile profiles/leonardo.yaml for setting default command line arguments.
host: login01.leonardo.local
Building DAG of jobs...
You are running snakemake in a SLURM job context. This is not recommended, as it may lead to unexpected behavior. If possible, please run Snakemake directly on the login node.
SLURM run ID: workflow_node_2c8cc88e-46ed-47a9-b6ee-423dce74b686
MinJobAge 30s (>= 30s). 'squeue' should work reliably for status queries.
Using shell: /usr/bin/bash
Provided remote nodes: 3
Conda environments: ignored
Job stats:
job count
------------------- -------
bwa_map 3
evaluate_node_n 1
collect_loop_result 1
samtools_sort 3
samtools_index 3
bcftools_call 1
plot_quals 1
all 1
total 14
Select jobs to execute...
Execute 3 jobs...
Select jobs to execute...
Job 595ad3d4-a096-53e6-9bff-b8eeb6dd7ec7 has been submitted with SLURM jobid 42512254 (log: /leonardo_work/PHD_gentili/snakemake-tutorial/logs/slurm/group_pre_processing_bwa_map_samtools_index_samtools_sort/42512254.log).
Job 74b63bb2-ef49-51b8-9df2-ecd163c74f57 has been submitted with SLURM jobid 42512255 (log: /leonardo_work/PHD_gentili/snakemake-tutorial/logs/slurm/group_pre_processing_bwa_map_samtools_index_samtools_sort/42512255.log).
Job de2f1f4f-f96c-5d06-b5bc-8f130f0ca6e9 has been submitted with SLURM jobid 42512270 (log: /leonardo_work/PHD_gentili/snakemake-tutorial/logs/slurm/group_pre_processing_bwa_map_samtools_index_samtools_sort/42512270.log).
Write-protecting output file sorted_reads/A.bam.
[Mon May 25 15:35:44 2026]
Finished jobid: 4 (Rule: bwa_map)
[Mon May 25 15:35:44 2026]
Finished jobid: 3 (Rule: samtools_sort)
[Mon May 25 15:35:44 2026]
Finished jobid: 9 (Rule: samtools_index)
3 of 14 steps (21%) done
Write-protecting output file sorted_reads/B.bam.
[Mon May 25 15:35:44 2026]
Finished jobid: 6 (Rule: bwa_map)
[Mon May 25 15:35:44 2026]
Finished jobid: 5 (Rule: samtools_sort)
[Mon May 25 15:35:44 2026]
Finished jobid: 10 (Rule: samtools_index)
6 of 14 steps (43%) done
Write-protecting output file sorted_reads/C.bam.
[Mon May 25 15:35:45 2026]
Finished jobid: 8 (Rule: bwa_map)
[Mon May 25 15:35:45 2026]
Finished jobid: 7 (Rule: samtools_sort)
[Mon May 25 15:35:45 2026]
Finished jobid: 11 (Rule: samtools_index)
9 of 14 steps (64%) done
Execute 2 jobs...
Job c126a408-0728-5756-8954-806a37075583 has been submitted with SLURM jobid 42512285 (log: /leonardo_work/PHD_gentili/snakemake-tutorial/logs/slurm/group_loop_collect_loop_result_evaluate_node_n/42512285.log).
Job 162f9ed7-6b03-5146-8516-a06e591c52d5 has been submitted with SLURM jobid 42512286 (log: /leonardo_work/PHD_gentili/snakemake-tutorial/logs/slurm/group_core_analysis_bcftools_call/42512286.log).
[Mon May 25 15:36:17 2026]
Finished jobid: 13 (Rule: evaluate_node_n)
[Mon May 25 15:36:17 2026]
Finished jobid: 12 (Rule: collect_loop_result)
12 of 14 steps (86%) done
Updating checkpoint dependencies.
Removing temporary output mapped_reads/A.bam.
Removing temporary output mapped_reads/A.bam.
Removing temporary output mapped_reads/B.bam.
Removing temporary output mapped_reads/B.bam.
Removing temporary output mapped_reads/C.bam.
Removing temporary output mapped_reads/C.bam.
[Mon May 25 15:36:17 2026]
Finished jobid: 2 (Rule: bcftools_call)
13 of 14 steps (93%) done
Select jobs to execute...
Execute 1 jobs...
[Mon May 25 15:36:17 2026]
localrule plot_quals:
input: calls/all.vcf
output: plots/quals.svg
log: logs/plots.log
jobid: 1
reason: Missing output files: plots/quals.svg; Input files updated by another job: calls/all.vcf
resources: tmpdir=/scratch_local, disk_mb=1000, disk=1 GB, disk_mib=954, mem_mb=32000, mem=32 GB, mem_mib=30518, slurm_account=phd_gentili, slurm_partition=boost_usr_prod, runtime=24, ntasks_per_node=8, cpus_per_task=4
[Mon May 25 15:36:18 2026]
Finished jobid: 1 (Rule: plot_quals)
14 of 14 steps (100%) done
Select jobs to execute...
Execute 1 jobs...
[Mon May 25 15:36:18 2026]
localrule all:
input: plots/quals.svg, loop_files/loop_finished.txt
jobid: 0
reason: Input files updated by another job: loop_files/loop_finished.txt, plots/quals.svg
resources: tmpdir=/scratch_local, disk_mb=1000, disk=1 GB, disk_mib=954, mem_mb=32000, mem=32 GB, mem_mib=30518, slurm_account=phd_gentili, slurm_partition=boost_usr_prod, runtime=24, ntasks_per_node=8, cpus_per_task=4
[Mon May 25 15:36:18 2026]
Finished jobid: 0 (Rule: all)
15 of 14 steps (107%) done
Efficiency report for workflow workflow_node_2c8cc88e-46ed-47a9-b6ee-423dce74b686 saved to /leonardo_work/PHD_gentili/snakemake-tutorial/efficiency_report_workflow_node_2c8cc88e-46ed-47a9-b6ee-423dce74b686.csv.
Complete log(s): /leonardo_work/PHD_gentili/snakemake-tutorial/.snakemake/log/2026-05-25T153500.582461.snakemake.log
make[1]: Leaving directory '/leonardo_work/PHD_gentili/snakemake-tutorial'
[agentil1@login01 snakemake-tutorial]$
Software Versions
$ snakemake --version
snakemake 9.19.0
$ conda list | grep "snakemake-executor-plugin-slurm"
snakemake-executor-plugin-slurm 2.6.1 pyhdfd78af_0 bioconda
snakemake-executor-plugin-slurm-jobstep 0.4.0 pyhdfd78af_0 bioconda
$ sinfo --version
slurm 23.11.10-BullSequana.1.2.1
Describe the bug
A clear and concise description of what the bug is.
snakemake --profile profiles/leonardo.yaml --dry-runcrashes with:KeyError: collect_loop_resultwhen the workflow contains a checkpoint-driven dynamic branch that is part of a group (described inside
profiles/leonardo.yaml).The same workflow executes correctly in a real run. In that case, the dynamic loop branch is executed inside a single grouped SLURM job and is faster compared to the non-grouped version (since we may have scheduling overhead at each iteration).
Logs
Minimal example
Consider a Snakefile with this dynamic node + slurm grouping:
Additional context
The same configuration runs as expected: