Add NVIDIA GPU support across Docker/Slurm + example job #68

andrewssobral · 2025-11-04T15:38:37Z

Add NVIDIA GPU Support to Docker + Slurm Cluster

✅ Verified on Docker Engine 28.5.1 + Slurm 25.05 with NVIDIA RTX 2070 GPUs

Summary

This PR adds native GPU support to slurm-docker-cluster, enabling CUDA workloads within Slurm jobs via NVIDIA Container Toolkit integration.

It introduces a GPU-enabled worker (g1), GRES GPU configuration, and an example job to verify isolation and visibility of GPU resources through Slurm.

Motivation

Until now, slurm-docker-cluster supported only CPU-based nodes.
This update introduces GPU scheduling capabilities to enable testing and development of CUDA workloads within a fully containerized Slurm environment — useful for AI/ML pipelines, HPC prototyping, and CI validation of GPU-enabled jobs.

Key Features

🧩 Docker / Image Updates

Added Dockerfile.gpu — a CUDA 12.6–based image built with:
- NVIDIA Container Toolkit
- Slurm configuration copied dynamically based on Slurm version (e.g. 25.05)
- Optional installation of cgroup.conf and gres.conf
Ensures environment variables (to restrict and identify visible GPUs):
- bash NVIDIA_VISIBLE_DEVICES=0
- bash CUDA_HOME=/usr/local/cuda

🐳 Docker Compose

Added new GPU node g1 service:
Uses the CUDA image (slurm-docker-cluster-gpu)
Declares NVIDIA_VISIBLE_DEVICES: 0 and device reservation for GPU 0
Mounts /sys/fs/cgroup for Slurm’s cgroup access
Joins the same slurm-network for full cluster interoperability

⚙️ Slurm Configuration

Updated slurm.conf:

GresTypes=gpu
NodeName=g1 CPUs=2 RealMemory=2000 Gres=gpu:1
PartitionName=cpu Nodes=c[1-2] Default=YES MaxTime=INFINITE State=UP
PartitionName=gpu Nodes=g1 Default=NO MaxTime=INFINITE State=UP

Added minimal config/common/gres.conf:

NodeName=g1 Name=gpu File=/dev/nvidia0
# To enable two (or more) GPUs:
# NodeName=g1 Name=gpu File=/dev/nvidia[0-1]

Optionalized cgroup.conf and gres.conf — gracefully skipped if not present.

🔁 Helper Script

update_slurmfiles.sh now syncs gres.conf into containers and restarts services cleanly.

Verification

Quick Start

# 1. Start the cluster
make up

# 2. Check node and partition info
docker exec slurmctld sinfo

# 3. Run a GPU job
docker exec -it slurmctld bash -lc 'srun -p gpu --gres=gpu:1 nvidia-smi -L'

Compatibility

Works with Slurm versions 24.11 and 25.05 (auto-detected at build time)
Compatible with both docker compose v2.40+ and Docker Engine 28+
Requires NVIDIA drivers and nvidia-container-toolkit installed on the host

Cluster Status

After make up:

docker exec slurmctld sinfo

Output:

PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
cpu*         up   infinite      2   idle c[1-2]
gpu          up   infinite      1   idle g1

GPU Information Summary

Source	GPUs Visible	Notes
Host	2 GPUs (GPU 0, GPU 1)	Both RTX 2070s detected
Container g1	1 GPU (GPU 0)	Isolated via `NVIDIA_VISIBLE_DEVICES=0`
Slurm (srun)	1 GPU (GPU 0)	Matches container visibility — verified via `srun`

GPU Test Command

Run a quick validation:

docker exec -it slurmctld bash -lc 'srun -p gpu --gres=gpu:1 nvidia-smi -L'

Expected output:

GPU 0: NVIDIA GeForce RTX 2070 (UUID: GPU-b7862af4-0f16-56bf-0d89-a36ead3f3f2f)

This confirms that Slurm correctly schedules on the GPU node and inherits container GPU visibility.

Example: GPU Isolation and Scalability

The host system has 2 physical GPUs, but only one (GPU 0) is mapped into the g1 container.
This demonstrates that:

GPU resources can be isolated per compute node (e.g., g1 → GPU 0, g2 → GPU 1)
Or multiple GPUs can be attached to a single node by adjusting:
NVIDIA_VISIBLE_DEVICES=all
gres.conf to File=/dev/nvidia[0-1]
device_ids: ['0'] to gpus: "all" on docker-compose.yml

This flexibility allows both multi-GPU single-node and multi-node GPU setups.

Runtime Evidence

===== GPU Device Info on host =====
GPU 0: NVIDIA GeForce RTX 2070 (UUID: GPU-b7862af4-0f16-56bf-0d89-a36ead3f3f2f)
GPU 1: NVIDIA GeForce RTX 2070 (UUID: GPU-22dfd02e-a668-a6a6-a90a-39d6efe475ee)

===== GPU Device Info inside g1 =====
NVDEV=0
/dev/nvidia0
/dev/nvidiactl
/dev/nvidia-uvm
/dev/nvidia-uvm-tools
GPU 0: NVIDIA GeForce RTX 2070 (UUID: GPU-b7862af4-0f16-56bf-0d89-a36ead3f3f2f)

===== GPU check via srun in Slurm =====
GPU 0: NVIDIA GeForce RTX 2070 (UUID: GPU-b7862af4-0f16-56bf-0d89-a36ead3f3f2f)

The output validates the intended GPU mapping and Slurm GRES behavior.

Why This Matters

Provides a reproducible GPU-enabled Slurm environment for local testing, CI, or hybrid setups.
Demonstrates resource isolation between host, container, and scheduler.
Serves as a base for future multi-GPU or heterogeneous cluster support.

Known Limitations

Currently supports a single GPU node (g1); multi-node GPU setups require adding additional g2, g3, etc.
GPU accounting is basic; Slurm GRES tracking is active but cgroup GPU enforcement is disabled for simplicity.
nvcc is available in the container, but CUDA samples are not included.

Implementation Summary

Checklist

Added CUDA image (Dockerfile.gpu)
Added GPU node (g1) to docker-compose
Updated Slurm configs for GRES and gpu partition
Optionalized config copy for cgroup.conf / gres.conf
Added live update support in update_slurmfiles.sh
Verified isolation via collect_docker_slurm_info.sh
Confirmed single-GPU scheduling via Slurm

Follow-ups

Update test scripts to detect partition dynamically (cpu/gpu)
Extend README with “Using GPUs” section
Add multi-GPU examples (g1+g2 or single node with 2 GPUs)
Automate nvidia-smi validation in test flow
Add multi-node GPU deployment support
- Implement Docker Swarm mode to distribute GPU nodes (g1, g2, etc.) across multiple physical hosts:
  - Initialize Swarm (docker swarm init) on the manager
  - Join additional GPU machines using docker swarm join
  - Label each node with docker node update --label-add gpu=<id>
  - Use an overlay network for inter-host communication
  - Extend slurm.conf and gres.conf to include g2 and additional GPU nodes
- Optionally, explore overlay networks without Swarm using tools like Weave Net or Flannel
- For persistent storage, support NFS-mounted volumes or a distributed volume driver
- Validate with:
```
 docker service ps slurm-cluster_g1
docker service ps slurm-cluster_g2
docker exec slurmctld sinfo
```
- Goal: seamless Slurm cluster scaling across multiple GPU-equipped hosts

giovtorres

Thank you so much for this feature! I took a first pass and it looks good. I do have some questions and some requested changes.

giovtorres · 2025-11-05T00:11:08Z

config/24.11/slurm.conf


-PartitionName=normal Nodes=c1,c2 Default=YES MaxTime=INFINITE State=UP
+PartitionName=cpu Nodes=c1,c2 Default=YES MaxTime=INFINITE State=UP
+PartitionName=gpu Nodes=g1 Default=No MaxTime=INFINITE State=UP


Suggested change

PartitionName=gpu Nodes=g1 Default=No MaxTime=INFINITE State=UP

PartitionName=gpu Nodes=g1 Default=NO MaxTime=INFINITE State=UP

giovtorres · 2025-11-05T00:11:36Z

config/25.05/slurm.conf


-PartitionName=normal Nodes=c1,c2 Default=YES MaxTime=INFINITE State=UP
+PartitionName=cpu Nodes=c1,c2 Default=YES MaxTime=INFINITE State=UP
+PartitionName=gpu Nodes=g1 Default=No MaxTime=INFINITE State=UP


Suggested change

PartitionName=gpu Nodes=g1 Default=No MaxTime=INFINITE State=UP

PartitionName=gpu Nodes=g1 Default=NO MaxTime=INFINITE State=UP

giovtorres · 2025-11-05T00:14:47Z

examples/jobs/gpu_env.sh

+# #SBATCH -w g1
+
+# Make CUDA visible to Slurm-launched shells (they don't inherit container ENV)
+export CUDA_HOME=/usr/local/cuda-12.6


I think we should allow flexibility here:

Suggested change

export CUDA_HOME=/usr/local/cuda-12.6

export CUDA_HOME="${CUDA_HOME:-/usr/local/cuda-12.6}"

Something like this. Thoughts?

giovtorres · 2025-11-05T00:15:37Z

docker-compose.yml

+      dockerfile: Dockerfile.gpu
+      args:
+        SLURM_VERSION: ${SLURM_VERSION:-25.05.3}
+    # runtime: nvidia


Can this comment be removed?

giovtorres · 2025-11-05T00:15:50Z

docker-compose.yml

+    # privileged: true
+    # gpus: "all"
+    # gpus:
+    #   - device: "0"  # Expose only GPU 0


Are these comments needed?

giovtorres · 2025-11-05T00:22:13Z

Dockerfile.gpu

+# Multi-stage Dockerfile for Slurm runtime with NVIDIA GPU support
+# Stage 1: Build RPMs using the builder image
+# Stage 2: Install RPMs in a clean runtime image with NVIDIA support
+
+ARG SLURM_VERSION
+
+# ============================================================================
+# Stage 1: Build RPMs
+# ============================================================================
+FROM rockylinux/rockylinux:9 AS builder
+
+ARG SLURM_VERSION
+
+# Enable CRB and EPEL repositories for development packages
+RUN set -ex \
+    && dnf makecache \
+    && dnf -y update \
+    && dnf -y install dnf-plugins-core epel-release \
+    && dnf config-manager --set-enabled crb \
+    && dnf makecache
+
+# Install RPM build tools and dependencies
+RUN set -ex \
+    && dnf -y install \
+       autoconf \
+       automake \
+       bzip2 \
+       freeipmi-devel \
+       dbus-devel \
+       gcc \
+       gcc-c++ \
+       git \
+       gtk2-devel \
+       hdf5-devel \
+       http-parser-devel \
+       hwloc-devel \
+       json-c-devel \
+       libcurl-devel \
+       libyaml-devel \
+       lua-devel \
+       lz4-devel \
+       make \
+       man2html \
+       mariadb-devel \
+       munge \
+       munge-devel \
+       ncurses-devel \
+       numactl-devel \
+       openssl-devel \
+       pam-devel \
+       perl \
+       python3 \
+       python3-devel \
+       readline-devel \
+       rpm-build \
+       rpmdevtools \
+       rrdtool-devel \
+       wget \
+    && dnf clean all \
+    && rm -rf /var/cache/dnf
+
+# Setup RPM build environment
+RUN rpmdev-setuptree
+
+# Copy RPM macros
+COPY rpmbuild/slurm.rpmmacros /root/.rpmmacros
+
+# Download official Slurm release tarball and build RPMs with slurmrestd enabled
+RUN set -ex \
+    && wget -O /root/rpmbuild/SOURCES/slurm-${SLURM_VERSION}.tar.bz2 \
+       https://download.schedmd.com/slurm/slurm-${SLURM_VERSION}.tar.bz2 \
+    && cd /root/rpmbuild/SOURCES \
+    && rpmbuild -ta slurm-${SLURM_VERSION}.tar.bz2 \
+    && ls -lh /root/rpmbuild/RPMS/x86_64/
+
+# ============================================================================
+# Stage 2: Runtime image with NVIDIA GPU support
+# ============================================================================
+FROM rockylinux/rockylinux:9
+
+LABEL org.opencontainers.image.source="https://github.com/giovtorres/slurm-docker-cluster" \
+      org.opencontainers.image.title="slurm-docker-cluster-gpu" \
+      org.opencontainers.image.description="Slurm Docker cluster on Rocky Linux 9 with NVIDIA GPU support" \
+      maintainer="Giovanni Torres"
+
+ARG SLURM_VERSION
+
+# Enable CRB and EPEL repositories for runtime dependencies
+RUN set -ex \
+    && dnf makecache \
+    && dnf -y update \
+    && dnf -y install dnf-plugins-core epel-release \
+    && dnf config-manager --set-enabled crb \
+    && dnf makecache
+
+# Install runtime dependencies only
+RUN set -ex \
+    && dnf -y install \
+       bash-completion \
+       bzip2 \
+       gettext \
+       hdf5 \
+       http-parser \
+       hwloc \
+       json-c \
+       jq \
+       libaec \
+       libyaml \
+       lua \
+       lz4 \
+       mariadb \
+       munge \
+       numactl \
+       perl \
+       procps-ng \
+       psmisc \
+       python3 \
+       readline \
+       vim-enhanced \
+       wget \
+    && dnf clean all \
+    && rm -rf /var/cache/dnf
+


Could we avoid some duplication by extending the base image? For example,

FROM slurm-docker-cluster:${SLURM_VERSION} RUN dnf config-manager --add-repo ... [... rest of GPU only changes ...]

giovtorres · 2025-11-05T00:28:36Z

Dockerfile.gpu

+RUN set -ex \
+    && dnf -y install \
+       nvidia-container-toolkit \
+       cuda-toolkit-12-6 \


This should be a Docker ARG, e.g. ARG CUDA_VERSION=12.6 and then

RUN dnf install -y cuda-toolkit-${CUDA_VERSION//./-} ENV CUDA_HOME=/usr/local/cuda-${VERSION}

giovtorres · 2025-11-05T00:29:33Z

Dockerfile.gpu

+    && dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo \
+    && curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo \
+       | tee /etc/yum.repos.d/nvidia-container-toolkit.repo \
+    && sed -i -e 's/^gpgcheck=1/gpgcheck=0/' -e 's/^repo_gpgcheck=1/repo_gpgcheck=0/' /etc/yum.repos.d/nvidia-container-toolkit.repo \


Why disable the gpgcheck? There should be an RPM GPG key that can be imported, no?

giovtorres · 2025-11-05T00:30:17Z

config/24.11/slurm.conf

+NodeName=g1 CPUs=4 RealMemory=1000 Gres=gpu:1 State=UNKNOWN

-PartitionName=normal Nodes=c1,c2 Default=YES MaxTime=INFINITE State=UP
+PartitionName=cpu Nodes=c1,c2 Default=YES MaxTime=INFINITE State=UP


Changing the partition name is a breaking change. Was this intentional?

andrewssobral · 2025-11-13T13:47:48Z

Thank you @giovtorres for your suggestions. I will work on it and adapt this PR.

Added GPU support across Docker/Slurm config and example job

db7e7f8

giovtorres requested changes Nov 5, 2025

View reviewed changes

andrewssobral mentioned this pull request Nov 11, 2025

Not an issue : Quick way to spin up a Slurm cluster with Docker SergioMEV/slurm-for-dummies#20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add NVIDIA GPU support across Docker/Slurm + example job #68

Add NVIDIA GPU support across Docker/Slurm + example job #68

Uh oh!

andrewssobral commented Nov 4, 2025

Uh oh!

giovtorres left a comment

Uh oh!

giovtorres Nov 5, 2025

Uh oh!

giovtorres Nov 5, 2025

Uh oh!

giovtorres Nov 5, 2025

Uh oh!

giovtorres Nov 5, 2025

Uh oh!

giovtorres Nov 5, 2025

Uh oh!

giovtorres Nov 5, 2025

Uh oh!

giovtorres Nov 5, 2025

Uh oh!

giovtorres Nov 5, 2025

Uh oh!

giovtorres Nov 5, 2025

Uh oh!

andrewssobral commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	PartitionName=gpu Nodes=g1 Default=No MaxTime=INFINITE State=UP
	PartitionName=gpu Nodes=g1 Default=NO MaxTime=INFINITE State=UP

	export CUDA_HOME=/usr/local/cuda-12.6
	export CUDA_HOME="${CUDA_HOME:-/usr/local/cuda-12.6}"

Add NVIDIA GPU support across Docker/Slurm + example job #68

Are you sure you want to change the base?

Add NVIDIA GPU support across Docker/Slurm + example job #68

Uh oh!

Conversation

andrewssobral commented Nov 4, 2025

Add NVIDIA GPU Support to Docker + Slurm Cluster

Summary

Motivation

Key Features

🧩 Docker / Image Updates

Verification

Quick Start

Compatibility

Cluster Status

GPU Test Command

Example: GPU Isolation and Scalability

Runtime Evidence

Why This Matters

Known Limitations

Implementation Summary

Checklist

Follow-ups

Uh oh!

giovtorres left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewssobral commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants