Skip to content

Bot is not compatible with newer Slurm versions #368

@bedroge

Description

@bedroge

On the HCA RISC-V cluster Slurm was upgraded to 25.11.3, but now the job manager fails with errors like:

  File "/home/eessibot/eessi-bot-software-layer/eessi_bot_job_manager.py", line 287, in parse_scontrol_show_job_output
    key, value = pair.split('=', 1)

The problem is that scontrol now also includes a SubmitLine, which has space-separated fields.

$ scontrol --oneliner show jobid 100013
JobId=100013 JobName=prod UserId=eessibot(11065) GroupId=users(100) MCS_label=N/A Priority=0 Nice=0 Account=(null) QOS=(null) JobState=PENDING Reason=JobHeldUser Dependency=(null) Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=10-00:00:00 TimeMin=N/A SubmitTime=2026-02-24T11:53:08 EligibleTime=Unknown AccrueTime=Unknown StartTime=Unknown EndTime=Unknown Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2026-02-24T11:53:08 Scheduler=Main Partition=premier AllocNode:Sid=ns.mont.blanc.168.192.in-addr.arpa:2380704 ReqNodeList=(null) ExcNodeList=(null) NodeList= NumNodes=1-1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* ReqTRES=cpu=1,mem=32062M,node=1,billing=1 AllocTRES=(null) Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=NO Contiguous=0 Licenses=(null) LicensesAlloc=(null) Network=(null) Command=/home/eessibot/bot-build-dev.eessi.io.slurm SubmitLine=/opt/slurm/25.11.3/bin/sbatch --hold --time=10-0:0:0 --nodes=1 --exclusive --cpus-per-task=1 --hold --partition=premier --job-name=prod /home/eessibot/bot-build-dev.eessi.io.slurm WorkDir=/home/eessibot/shared/jobs/2026.02/pr_55/event_fe75be00-116e-11f1-9ae8-8f834c5757f8/run_000/riscv64/generic/dev.eessi.io-riscv-2025.06-001 StdErr= StdIn=/dev/null StdOut=/home/eessibot/shared/jobs/2026.02/pr_55/event_fe75be00-116e-11f1-9ae8-8f834c5757f8/run_000/riscv64/generic/dev.eessi.io-riscv-2025.06-001/slurm-100013.out TresPerTask=cpu=1

It splits this whole line on spaces first, meaning that SubmitLine=/opt/slurm/25.11.3/bin/sbatch --hold --time=10-0:0:0 --nodes=1 --exclusive --cpus-per-task=1 --hold gets split into different fields. Then --hold cannot be split into a key value pair anymore here https://github.com/EESSI/eessi-bot-software-layer/blob/develop/eessi_bot_job_manager.py#L285.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions