Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion files/default.slurmd → files/default.slurmd.computenode
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@
#SLURMCTLD_OPTIONS=""

# Additional options that are passed to the slurmd daemon
SLURMD_OPTIONS="-Z --conf-server cicus03.douglas.rtss.qc.ca --conf 'Gres=gpu:geforce Feature=workstation'"
SLURMD_OPTIONS="-Z --conf-server dnpus01.douglas.rtss.qc.ca --conf 'Feature=computenode Weight=1'"
11 changes: 11 additions & 0 deletions files/default.slurmd.workstation
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Defaults for slurmd initscript
# sourced by /etc/init.d/slurmd
# installed at /etc/default/slurmd by the maintainer scripts
#
# This is a POSIX shell fragment
#
# Additional options that are passed to the slurmctld daemon
#SLURMCTLD_OPTIONS=""

# Additional options that are passed to the slurmd daemon
SLURMD_OPTIONS="-Z --conf-server dnpus01.douglas.rtss.qc.ca --conf 'Gres=gpu:geforce Feature=workstation Weight=50'"
Binary file modified files/munge.key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did the (encrypted) munge key get removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it wasn't removed, but it's different from the one that's in the playbook (I did a diff on the munge.keys = and saw that they were different when i was refactoring this PR since i had to remove a bunch of stuff).

Binary file not shown.
13 changes: 12 additions & 1 deletion inventory
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ cicws16
cicws17
cicws18
cicws19
cicws23
cicws24
cicws25
cicws26
Expand Down Expand Up @@ -56,7 +57,6 @@ dnpws13
dnpws14
dnpws15
dnpws16
cichm01
dnpws17
dnpws18
dnpws19
Expand All @@ -66,12 +66,23 @@ dnpws22
dnpws23
dnpws24
dnpws25
dnpws26
cichm01

[computenodes]
ciccs01
ciccs02
ciccs03
ciccs04
ciccs05
ciccs06
ciccs07
ciccs08
ciccs09
ciccs10

[userservers]
cicus03 nis_role=master
dnpus01 nis_role=slave

[servers]
2 changes: 1 addition & 1 deletion roles/common/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
- {import_tasks: qbatch.yml, tags: [software,qbatch] }
- {import_tasks: manual-deb.yml, tags: [software,manualdeb]}
- {import_tasks: file-config.yml, tags: files}
#- {import_tasks: slurm.yml, tags: slurm}
- {import_tasks: slurm.yml, tags: slurm}
- {import_tasks: system-config.yml, tags: system}
- {import_tasks: motd.yml, tags: motd}

Expand Down
27 changes: 25 additions & 2 deletions roles/common/tasks/slurm.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
- liblz4-dev
- libmunge2
- libmunge-dev
- libncurses6
- libncurses-dev
- libpam0g-dev
- libperl-dev
Expand All @@ -36,6 +37,7 @@


- name: Check if current version is installed
become: yes
stat:
path: "{{ slurm_prefix_path}}/sbin/slurmd"
register: slurmd_installed
Expand Down Expand Up @@ -102,11 +104,32 @@
- name: restart munge
systemd: state=restarted enabled=true name=munge daemon_reload=yes

- name: install slurm default config
copy: src=files/default.slurmd dest=/etc/default/slurmd
- name: copy slurm default configuration based on node type
block:
- name: Install slurm default config on workstations
copy:
src: files/default.slurmd.workstation
dest: /etc/default/slurmd
when: "'workstations' in group_names"

- name: Install slurm default config on compute nodes
copy:
src: files/default.slurmd.computenode
dest: /etc/default/slurmd
when: "'computenodes' in group_names"

- name: install slurm path
copy: src=files/99slurm_path.sh dest=/etc/profile.d/

- name: update slurmd ExecStart to use the highest nice value
lineinfile:
path: /etc/systemd/system/slurmd.service
regexp: '^ExecStart='
line: "ExecStart=/opt/slurm/current/sbin/slurmd --systemd $SLURMD_OPTIONS -n 19"
tags:
- execstart

- name: enable slurmd
systemd: state=restarted enabled=true name=slurmd daemon_reload=yes
tags:
- execstart
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better tag here with slurm?

2 changes: 1 addition & 1 deletion roles/common/templates/hosts.j2
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ ff02::2 ip6-allrouters
172.16.67.215 cicus01.douglas.rtss.qc.ca cicus01
172.16.67.216 cicus02.douglas.rtss.qc.ca cicus02
172.16.67.230 cicus03.douglas.rtss.qc.ca cicus03
172.16.67.231 cicus04.douglas.rtss.qc.ca cicus04
172.16.67.231 dnpus01.douglas.rtss.qc.ca dnpus01

#Login/User Servers IPMI
172.16.69.50 ipmi-us01 ipmi-cicus01
Expand Down
2 changes: 1 addition & 1 deletion vars/slurm.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
slurm_version: 23.02.2
slurm_version: 24.05.1
slurm_tarball_url: "https://download.schedmd.com/slurm/slurm-{{ slurm_version }}.tar.bz2"
slurm_src_dir: "/tmp/slurm-{{ slurm_version }}"
slurm_sbin_path: "/opt/slurm/current/sbin/slurmd"
Expand Down