POV-Ray on Slurm Workload Manager demo scripts
- make, povray, ffmpeg
- munged
- slurmd
- nfs client
sudo apt -y install make povray povray-examples ffmpeg
sudo apt -y install munge
The start of the munge.service will fail, you have to make a little change:
sudo systemctl edit --system --full munge
Change the ExecStart
line to read:
ExecStart=/usr/sbin/munged --force
Copy the /etc/munge/munge.key
from another compute node in the cluster and check user, groups and permission flags:
# ls -al /etc/munge/munge.key
-r-------- 1 munge munge 1024 Mai 24 16:55 /etc/munge/munge.key
Restart the munge.service:
sudo systemctl restart munge
Test, that munge works:
munge -n | ssh <clusternode> unmunge
sudo apt -y install slurmd
The start of the slurmd.service will fail.
To fix this, edit the slurmd systemd unit file:
sudo systemctl edit --system --full slurmd
And change from:
[Service]
Type=forking
EnvironmentFile=/etc/default/slurmd
ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS
PIDFile=/var/run/slurm-llnl/slurmd.pid
to:
[Service]
Type=simple
EnvironmentFile=/etc/default/slurmd
ExecStart=/usr/sbin/slurmd $SLURMD_OPTIONS -cD
PIDFile=/var/run/slurm-llnl/slurmd.pid
Next, copy /etc/slurm-llnl/slurm.conf
from another node in your cluster.
Restart, check and enable the slurmd.service:
sudo systemctl restart slurmd.service
sudo systemctl status slurmd.service
sudo systemctl enable slurmd.service
Restart the slurmctld.service on your slurmctld node:
sudo systemctl restart slurmctld.service
Check the node's slurmd status with sview:
sview
sudo apt -y install nfs-common
sudo mkdir -p /nfs/data
Edit /etc/fstab
192.168.0.113:/data /nfs/data nfs auto,noatime,nolock,bg,nfsvers=4,intr,tcp,actimeo=1800,rsize=8192,wsize=8192 0
sudo mount /nfs/data
Render your first POV-Ray animation:
koppi@x200:~/data/demo-povay-slurm$ ./run.sh -s sphere
submitting job sphere.pov with 300 frames
executing: sbatch --hint=compute_bound -n 1 -J povray -p debug -t 8:00:00 -O -J sphere -a 0-300 povray.sbatch sphere 300 '+A0.01 -J +W1280 +H720'
* created povray job 33237 in /home/koppi/data/demo-povay-slurm/sphere-33237
executing: sbatch --hint=compute_bound -n 1 -J povray -p debug -t 8:00:00 --job-name=ffmpeg --depend=afterok:33237 -D sphere-33237 sphere-33237/ffmpeg.sbatch
* created ffmpeg job 33238 for /home/koppi/data/demo-povay-slurm/sphere-33237
done
Watch the job queue:
$ watch squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
33237_[44-300] debug sphere koppi PD 0:00 1 (Resources)
33238 debug ffmpeg koppi PD 0:00 1 (Dependency)
33237_43 debug sphere koppi R 0:03 1 dell
33237_42 debug sphere koppi R 0:04 1 x220
33237_41 debug sphere koppi R 0:05 1 x200