-
Notifications
You must be signed in to change notification settings - Fork 34
Adds support for configuring MIG #656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -48,6 +48,20 @@ | |
name: cuda | ||
tasks_from: "{{ 'runtime.yml' if appliances_mode == 'configure' else 'install.yml' }}" | ||
|
||
- name: Setup vGPU | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This runs before slurm when run from site.yml, is that OK? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, at this point we are just creating the mig devices |
||
hosts: vgpu | ||
become: yes | ||
gather_facts: yes | ||
tags: vgpu | ||
tasks: | ||
- include_role: | ||
name: stackhpc.linux.vgpu | ||
tasks_from: "{{ 'configure.yml' if appliances_mode == 'configure' else 'install.yml' }}" | ||
handlers: | ||
- name: reboot | ||
fail: | ||
msg: Reboot handler for stackhpc.linux.vgpu role fired unexpectedly. This was supposed to be unreachable. | ||
|
||
- name: Persist hostkeys across rebuilds | ||
# Must be after filesystems.yml (for storage) | ||
# and before portal.yml (where OOD login node hostkeys are scanned) | ||
|
sjpb marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -250,6 +250,16 @@ | |
name: cloudalchemy.grafana | ||
tasks_from: install.yml | ||
|
||
- name: Add support for NVIDIA GPU auto detection to Slurm | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't like having these tasks outside a role - we've always regretted that. It can't be run with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also - we should be really clear about idempotency/when its safe to run this. If its in the cuda role its obvious where to state that! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, sounds reasonable. I did wonder if we'd want to recompile slurm for other reasons so could live in a slurm-recompile role? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Possibly - for this specifically either way there's a cuda/slurm dependency so I'd go with sticking it in cuda for the moment, probably. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I stuck it in slurm_reccompile, but will move if you prefer |
||
hosts: cuda | ||
sjpb marked this conversation as resolved.
Show resolved
Hide resolved
|
||
become: yes | ||
tasks: | ||
- name: Recompile slurm | ||
import_role: | ||
name: slurm_recompile | ||
vars: | ||
recompile_slurm_nvml: "{{ groups.cuda | length > 0 }}" | ||
|
||
- name: Run post.yml hook | ||
vars: | ||
appliances_environment_root: "{{ lookup('env', 'APPLIANCES_ENVIRONMENT_ROOT') }}" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
--- | ||
|
||
- name: Set cuda_facts_version_short | ||
set_fact: | ||
cuda_facts_version_short: "{{ cuda_version_short }}" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
--- | ||
slurm_recompile_nvml: false | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
--- | ||
- name: Get facts about CUDA installation | ||
import_role: | ||
name: cuda | ||
tasks_from: facts.yml | ||
|
||
- name: Gather the package facts | ||
ansible.builtin.package_facts: | ||
manager: auto | ||
|
||
- name: Set fact containing slurm package facts | ||
set_fact: | ||
slurm_package: "{{ ansible_facts.packages['slurm-slurmd-ohpc'].0 }}" | ||
|
||
- name: Recompile and install slurm packages | ||
shell: | | ||
#!/bin/bash | ||
source /etc/profile | ||
set -eux | ||
dnf download -y --source slurm-slurmd-ohpc-{{ slurm_package.version }}-{{ slurm_package.release }} | ||
rpm -i slurm-ohpc-*.src.rpm | ||
cd /root/rpmbuild/SPECS | ||
dnf builddep -y slurm.spec | ||
rpmbuild -bb{% if slurm_recompile_nvml | bool %} -D "_with_nvml --with-nvml=/usr/local/cuda-{{ cuda_facts_version_short }}/targets/x86_64-linux/"{% endif %} slurm.spec | ||
dnf reinstall -y /root/rpmbuild/RPMS/x86_64/*.rpm | ||
become: true | ||
|
||
- name: Workaround missing symlink | ||
# Workaround path issue: https://groups.google.com/g/slurm-users/c/cvGb4JnK8BY | ||
command: ln -s /lib64/libnvidia-ml.so.1 /lib64/libnvidia-ml.so | ||
args: | ||
creates: /lib64/libnvidia-ml.so | ||
when: slurm_recompile_nvml | bool | ||
|
||
- name: Cleanup Dependencies | ||
shell: | | ||
#!/bin/bash | ||
set -eux | ||
set -o pipefail | ||
dnf history list | grep Install | grep 'builddep -y slurm.spec' | head -n 1 | awk '{print $1}' | xargs dnf history -y undo | ||
become: true |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,209 @@ | ||
# vGPU/MIG configuration | ||
|
||
This page details how to configure Multi Instance GPU (MIG) in Slurm. | ||
|
||
## Pre-requisites | ||
|
||
- Image built with cuda support. This should automatically recompile slurm against NVML. | ||
|
||
## Inventory | ||
|
||
Add relevant hosts to the ``vgpu`` group, for example in ```environments/$ENV/inventory/groups``: | ||
|
||
``` | ||
[vgpu:children] | ||
cuda | ||
``` | ||
|
||
## Configuration | ||
|
||
Use variables from the [stackhpc.linux.vgpu](https://github.com/stackhpc/ansible-collection-linux/tree/main/roles/vgpu) role. | ||
|
||
For example in: `environments/<environment>/inventory/group_vars/all/vgpu`: | ||
|
||
``` | ||
--- | ||
vgpu_definitions: | ||
- pci_address: "0000:17:00.0" | ||
mig_devices: | ||
"1g.10gb": 4 | ||
"4g.40gb": 1 | ||
- pci_address: "0000:81:00.0" | ||
mig_devices: | ||
"1g.10gb": 4 | ||
"4g.40gb": 1 | ||
``` | ||
|
||
The appliance will use the driver installed via the ``cuda`` role. | ||
|
||
Use ``lspci`` to determine the PCI addresses e.g: | ||
|
||
``` | ||
[root@io-io-gpu-02 ~]# lspci -nn | grep -i nvidia | ||
06:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H100 SXM5 80GB] [10de:2330] (rev a1) | ||
0c:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H100 SXM5 80GB] [10de:2330] (rev a1) | ||
46:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H100 SXM5 80GB] [10de:2330] (rev a1) | ||
4c:00.0 3D controller [0302]: NVIDIA Corporation GH100 [H100 SXM5 80GB] [10de:2330] (rev a1) | ||
``` | ||
|
||
The supported profiles can be discovered by consulting the [NVIDIA documentation](https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#supported-mig-profiles) | ||
or interactively by running the following on one of the compute nodes with GPU resources: | ||
|
||
``` | ||
[rocky@io-io-gpu-05 ~]$ sudo nvidia-smi -i 0 -mig 1 | ||
Enabled MIG Mode for GPU 00000000:06:00.0 | ||
All done. | ||
[rocky@io-io-gpu-05 ~]$ sudo nvidia-smi mig -lgip | ||
+-----------------------------------------------------------------------------+ | ||
| GPU instance profiles: | | ||
| GPU Name ID Instances Memory P2P SM DEC ENC | | ||
| Free/Total GiB CE JPEG OFA | | ||
|=============================================================================| | ||
| 0 MIG 1g.10gb 19 7/7 9.75 No 16 1 0 | | ||
| 1 1 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 0 MIG 1g.10gb+me 20 1/1 9.75 No 16 1 0 | | ||
| 1 1 1 | | ||
+-----------------------------------------------------------------------------+ | ||
| 0 MIG 1g.20gb 15 4/4 19.62 No 26 1 0 | | ||
| 1 1 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 0 MIG 2g.20gb 14 3/3 19.62 No 32 2 0 | | ||
| 2 2 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 0 MIG 3g.40gb 9 2/2 39.50 No 60 3 0 | | ||
| 3 3 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 0 MIG 4g.40gb 5 1/1 39.50 No 64 4 0 | | ||
| 4 4 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 0 MIG 7g.80gb 0 1/1 79.25 No 132 7 0 | | ||
| 8 7 1 | | ||
+-----------------------------------------------------------------------------+ | ||
| 1 MIG 1g.10gb 19 7/7 9.75 No 16 1 0 | | ||
| 1 1 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 1 MIG 1g.10gb+me 20 1/1 9.75 No 16 1 0 | | ||
| 1 1 1 | | ||
+-----------------------------------------------------------------------------+ | ||
| 1 MIG 1g.20gb 15 4/4 19.62 No 26 1 0 | | ||
| 1 1 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 1 MIG 2g.20gb 14 3/3 19.62 No 32 2 0 | | ||
| 2 2 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 1 MIG 3g.40gb 9 2/2 39.50 No 60 3 0 | | ||
| 3 3 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 1 MIG 4g.40gb 5 1/1 39.50 No 64 4 0 | | ||
| 4 4 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 1 MIG 7g.80gb 0 1/1 79.25 No 132 7 0 | | ||
| 8 7 1 | | ||
+-----------------------------------------------------------------------------+ | ||
| 2 MIG 1g.10gb 19 7/7 9.75 No 16 1 0 | | ||
| 1 1 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 2 MIG 1g.10gb+me 20 1/1 9.75 No 16 1 0 | | ||
| 1 1 1 | | ||
+-----------------------------------------------------------------------------+ | ||
| 2 MIG 1g.20gb 15 4/4 19.62 No 26 1 0 | | ||
| 1 1 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 2 MIG 2g.20gb 14 3/3 19.62 No 32 2 0 | | ||
| 2 2 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 2 MIG 3g.40gb 9 2/2 39.50 No 60 3 0 | | ||
| 3 3 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 2 MIG 4g.40gb 5 1/1 39.50 No 64 4 0 | | ||
| 4 4 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 2 MIG 7g.80gb 0 1/1 79.25 No 132 7 0 | | ||
| 8 7 1 | | ||
+-----------------------------------------------------------------------------+ | ||
| 3 MIG 1g.10gb 19 7/7 9.75 No 16 1 0 | | ||
| 1 1 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 3 MIG 1g.10gb+me 20 1/1 9.75 No 16 1 0 | | ||
| 1 1 1 | | ||
+-----------------------------------------------------------------------------+ | ||
| 3 MIG 1g.20gb 15 4/4 19.62 No 26 1 0 | | ||
| 1 1 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 3 MIG 2g.20gb 14 3/3 19.62 No 32 2 0 | | ||
| 2 2 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 3 MIG 3g.40gb 9 2/2 39.50 No 60 3 0 | | ||
| 3 3 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 3 MIG 4g.40gb 5 1/1 39.50 No 64 4 0 | | ||
| 4 4 0 | | ||
+-----------------------------------------------------------------------------+ | ||
| 3 MIG 7g.80gb 0 1/1 79.25 No 132 7 0 | | ||
| 8 7 1 | | ||
+-----------------------------------------------------------------------------+ | ||
``` | ||
|
||
## compute_init | ||
|
||
Use the ``vgpu`` metadata option to enable creation of mig devices on rebuild. | ||
|
||
## GRES configuration | ||
|
||
You should stop terraform templating out partitions.yml and specify `openhpc_nodegroups` manually. To do this | ||
set the `autogenerated_partitions_enabled` terraform variable to `false`. For example (`environments/production/tofu/main.tf`): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Requires: #665 |
||
|
||
``` | ||
module "cluster" { | ||
source = "../../site/tofu/" | ||
... | ||
# We manually populate this to add GRES. See environments/site/inventory/group_vars/all/partitions-manual.yml. | ||
autogenerated_partitions_enabled = false | ||
} | ||
``` | ||
|
||
GPU types can be determined by deploying slurm without any gres configuration and then running | ||
`sudo slurmd -G` on a compute node where GPU resources exist. An example is shown below: | ||
|
||
``` | ||
[rocky@io-io-gpu-02 ~]$ sudo slurmd -G | ||
slurmd-io-io-gpu-02: Gres Name=gpu Type=nvidia_h100_80gb_hbm3 Count=1 Index=0 ID=7696487 File=/dev/nvidia0 Links=(null) Flags=HAS_FILE,HAS_TYPE,ENV_NVML,ENV_RSMI,ENV_ONEAPI | ||
,ENV_OPENCL,ENV_DEFAULT | ||
slurmd-io-io-gpu-02: Gres Name=gpu Type=nvidia_h100_80gb_hbm3 Count=1 Index=1 ID=7696487 File=/dev/nvidia1 Links=(null) Flags=HAS_FILE,HAS_TYPE,ENV_NVML,ENV_RSMI,ENV_ONEAPI | ||
,ENV_OPENCL,ENV_DEFAULT | ||
slurmd-io-io-gpu-02: Gres Name=gpu Type=nvidia_h100_80gb_hbm3_4g.40gb Count=1 Index=291 ID=7696487 File=/dev/nvidia-caps/nvidia-cap291 Links=(null) Flags=HAS_FILE,HAS_TYPE, | ||
ENV_NVML,ENV_RSMI,ENV_ONEAPI,ENV_OPENCL,ENV_DEFAULT | ||
slurmd-io-io-gpu-02: Gres Name=gpu Type=nvidia_h100_80gb_hbm3_4g.40gb Count=1 Index=417 ID=7696487 File=/dev/nvidia-caps/nvidia-cap417 Links=(null) Flags=HAS_FILE,HAS_TYPE, | ||
ENV_NVML,ENV_RSMI,ENV_ONEAPI,ENV_OPENCL,ENV_DEFAULT | ||
slurmd-io-io-gpu-02: Gres Name=gpu Type=nvidia_h100_80gb_hbm3_1g.10gb Count=1 Index=336 ID=7696487 File=/dev/nvidia-caps/nvidia-cap336 Links=(null) Flags=HAS_FILE,HAS_TYPE, | ||
ENV_NVML,ENV_RSMI,ENV_ONEAPI,ENV_OPENCL,ENV_DEFAULT | ||
slurmd-io-io-gpu-02: Gres Name=gpu Type=nvidia_h100_80gb_hbm3_1g.10gb Count=1 Index=345 ID=7696487 File=/dev/nvidia-caps/nvidia-cap345 Links=(null) Flags=HAS_FILE,HAS_TYPE, | ||
ENV_NVML,ENV_RSMI,ENV_ONEAPI,ENV_OPENCL,ENV_DEFAULT | ||
slurmd-io-io-gpu-02: Gres Name=gpu Type=nvidia_h100_80gb_hbm3_1g.10gb Count=1 Index=354 ID=7696487 File=/dev/nvidia-caps/nvidia-cap354 Links=(null) Flags=HAS_FILE,HAS_TYPE, | ||
ENV_NVML,ENV_RSMI,ENV_ONEAPI,ENV_OPENCL,ENV_DEFAULT | ||
slurmd-io-io-gpu-02: Gres Name=gpu Type=nvidia_h100_80gb_hbm3_1g.10gb Count=1 Index=507 ID=7696487 File=/dev/nvidia-caps/nvidia-cap507 Links=(null) Flags=HAS_FILE,HAS_TYPE, | ||
ENV_NVML,ENV_RSMI,ENV_ONEAPI,ENV_OPENCL,ENV_DEFAULT | ||
slurmd-io-io-gpu-02: Gres Name=gpu Type=nvidia_h100_80gb_hbm3_1g.10gb Count=1 Index=516 ID=7696487 File=/dev/nvidia-caps/nvidia-cap516 Links=(null) Flags=HAS_FILE,HAS_TYPE, | ||
ENV_NVML,ENV_RSMI,ENV_ONEAPI,ENV_OPENCL,ENV_DEFAULT | ||
slurmd-io-io-gpu-02: Gres Name=gpu Type=nvidia_h100_80gb_hbm3_1g.10gb Count=1 Index=525 ID=7696487 File=/dev/nvidia-caps/nvidia-cap525 Links=(null) Flags=HAS_FILE,HAS_TYPE, | ||
ENV_NVML,ENV_RSMI,ENV_ONEAPI,ENV_OPENCL,ENV_DEFAULT | ||
``` | ||
|
||
GRES resources can then be configured manually. An example is shown below | ||
(`environments/<environment>/inventory/group_vars/all/partitions-manual.yml`): | ||
|
||
``` | ||
openhpc_partitions: | ||
- name: cpu | ||
- name: gpu | ||
|
||
openhpc_nodegroups: | ||
- name: cpu | ||
- name: gpu | ||
gres_autodetect: nvml | ||
gres: | ||
- conf: "gpu:nvidia_h100_80gb_hbm3:2" | ||
- conf: "gpu:nvidia_h100_80gb_hbm3_4g.40gb:2" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add hint on how to work out what the autodetection-created gres name is? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe slurmd -C or slurmd -G ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tried:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. journalctl -u slurmd does print this information:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Turns out I needed sudo when doing slurmd -G:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added this to the docs |
||
- conf: "gpu:nvidia_h100_80gb_hbm3_1g.10gb:6" | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
--- | ||
|
||
# Nvidia driver is provided by cuda role. | ||
vgpu_nvidia_driver_install_enabled: false |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a few other things which need fixing given bumping stackhpc.openhpc:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,7 +4,7 @@ roles: | |
version: v25.3.2 | ||
name: stackhpc.nfs | ||
- src: https://github.com/stackhpc/ansible-role-openhpc.git | ||
version: v0.28.0 | ||
version: feature/gres-autodetect | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Needs bumping to a release |
||
name: stackhpc.openhpc | ||
- src: https://github.com/stackhpc/ansible-node-exporter.git | ||
version: stackhpc | ||
|
@@ -55,4 +55,7 @@ collections: | |
version: 0.0.15 | ||
- name: stackhpc.pulp | ||
version: 0.5.5 | ||
- name: https://github.com/stackhpc/ansible-collection-linux | ||
type: git | ||
version: feature/mig-only | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the role docs, do we need idracadm7 changes to support SR-IOV and/or the iommu role?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So they are bios settings. I'm actually unsure if we need those when not using vGPU.