|
| 1 | +# Repository of useful Slurm tools |
| 2 | + |
| 3 | +## Setup and Installation |
| 4 | + |
| 5 | +```bash |
| 6 | +git clone .... |
| 7 | +cd slurm-scripts |
| 8 | +``` |
| 9 | + |
| 10 | + |
| 11 | +## `gnodes` |
| 12 | + |
| 13 | +The `gnodes` app is a shell script that reports Slurm node information. |
| 14 | + |
| 15 | +- It provides options to filter the output based on node names, partitions, and states. |
| 16 | +- The script retrieves node information from Slurm and displays it in a formatted table, including node name, state, CPUs, GPUs, memory, and partition. |
| 17 | +```bash |
| 18 | +Symbol | Meaning |
| 19 | + . | Available core |
| 20 | + _ | Allocated core |
| 21 | + O | Loaded core |
| 22 | + ! | Load is significantly higher than allocated core count |
| 23 | + ? | Load is unknown |
| 24 | + |
| 25 | +Symbol | Meaning |
| 26 | + * | Unallocated GPU |
| 27 | + G | Allocated GPU |
| 28 | +``` |
| 29 | + |
| 30 | +```bash |
| 31 | +gnodes -p himem |
| 32 | + |
| 33 | ++- himem - 38 cores & 220GB & max time 7-00:00:00 ------+-------------------------------------------------------+-------------------------------------------------------+ |
| 34 | +| node100 11G ......!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | node51 6G ............_______OOOOOOOOOOOOOOOOOOO | node70 6G .......______OOOOOOOOOOOOOOOOOOOOOOOOO | |
| 35 | +| node101 0G _______OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node52 12G ..............OOOOOOOOOOOOOOOOOOOOOOOO | node71 34G .............OOOOOOOOOOOOOOOOOOOOOOOOO | |
| 36 | +| node102 6G .___OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node53 7G ......._____OOOOOOOOOOOOOOOOOOOOOOOOOO | node72 10G ..........________OOOOOOOOOOOOOOOOOOOO | |
| 37 | +| node103 11G .........!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | node54 35G ....________________________________OO | node73 4G ............______OOOOOOOOOOOOOOOOOOOO | |
| 38 | +| node104 4G .........._______OOOOOOOOOOOOOOOOOOOOO | node55 4G ......!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | node74 4G ..!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | |
| 39 | +| node105 11G .OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node56 5G ........OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node75 70G ................OOOOOOOOOOOOOOOOOOOOOO | |
| 40 | +| node106 12G ........___OOOOOOOOOOOOOOOOOOOOOOOOOOO | node57 13G .......________OOOOOOOOOOOOOOOOOOOOOOO | node76 12G .......OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | |
| 41 | +| node107 15G ....!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | node58 14G ........OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node77 12G ........___OOOOOOOOOOOOOOOOOOOOOOOOOOO | |
| 42 | +| node108 7G ...OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node59 10G ....____________OOOOOOOOOOOOOOOOOOOOOO | node79 10G .....________OOOOOOOOOOOOOOOOOOOOOOOOO | |
| 43 | +| node109 13G ..........OOOOOOOOOOOOOOOOOOOOOOOOOOOO | node60 6G .________OOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node80 53G .............___________OOOOOOOOOOOOOO | |
| 44 | +| node110 7G ..........!!!!!!!!!!!!!!!!!!!!!!!!!!!! | node61 6G ..OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node81 7G .......__OOOOOOOOOOOOOOOOOOOOOOOOOOOOO | |
| 45 | +| node111 12G ........._________OOOOOOOOOOOOOOOOOOOO | node62 6G ...._______________________________OOO | node82 12G ..!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | |
| 46 | +| node112 9G .......OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node63 13G ................___OOOOOOOOOOOOOOOOOOO | node83 11G ..................OOOOOOOOOOOOOOOOOOOO | |
| 47 | +| node113 0G ___________________OOOOOOOOOOOOOOOOOOO | node64 11G ..____________OOOOOOOOOOOOOOOOOOOOOOOO | node84 13G ......___OOOOOOOOOOOOOOOOOOOOOOOOOOOOO | |
| 48 | +| node141 6G .............!!!!!!!!!!!!!!!!!!!!!!!!! | node65 12G .____________OOOOOOOOOOOOOOOOOOOOOOOOO | node92 62G .............._____OOOOOOOOOOOOOOOOOOO | |
| 49 | +| node47 0G _______________OOOOOOOOOOOOOOOOOOOOOOO | node66 8G ...........!!!!!!!!!!!!!!!!!!!!!!!!!!! | node93 6G ....!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | |
| 50 | +| node48 0G !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | node67 15G ..........._________OOOOOOOOOOOOOOOOOO | node94 13G ......OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | |
| 51 | +| node49 7G ....._____OOOOOOOOOOOOOOOOOOOOOOOOOOOO | node68 0G OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node95 6G ......OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | |
| 52 | +| node50 0G OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node69 11G .OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | | |
| 53 | ++-------------------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+ |
| 54 | + |
| 55 | ++- himem - 38 cores & 354GB & max time 7-00:00:00 ------+-------------------------------------------------------+-------------------------------------------------------+ |
| 56 | +| node118 175G ........................____OOOOOOOOOO | node123 0G OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node128 266G ...............................______O | |
| 57 | +| node119 0G ______________OOOOOOOOOOOOOOOOOOOOOOOO | node124 0G OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node129 0G _______OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | |
| 58 | +| node120 6G ._____________________________OOOOOOOO | node125 342G ..............................___OOOOO | node131 0G OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | |
| 59 | +| node121 0G _____________OOOOOOOOOOOOOOOOOOOOOOOOO | node126 354G ...................................... | node140 0G OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | |
| 60 | ++-------------------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+ |
| 61 | + |
| 62 | ++- himem - 94 cores & 1476GB & max time 7-00:00:00 -------------------------------------------------------------+ |
| 63 | +| node130 860G .............._______________________________________________________________________________O | |
| 64 | ++---------------------------------------------------------------------------------------------------------------+ |
| 65 | +``` |
| 66 | + |
| 67 | + |
| 68 | +## `acctusage` |
| 69 | + |
| 70 | +The `acctusage` is a shell script that reports Slurm account usage by users or accounts |
| 71 | +- It provides options to specify users, project accounts, start date/time, and end date/time |
| 72 | +- The script retrieves account usage information from Slurm and displays it in a formatted table, including login, accounts, cluster, resource, and usage |
| 73 | +- It also provides CPU/GPU usage in minutes |
| 74 | +- The script is designed to be run from the command line and provides a help option to display usage information |
| 75 | + |
| 76 | +```bash |
| 77 | +❯ acctusage |
| 78 | +-------------------------------------------------------------------- |
| 79 | +Slurm account usage from 2024-05-06 to 2024-05-13 |
| 80 | +-------------------------------------------------------------------- |
| 81 | + User Account Cluster Resource Usage |
| 82 | +--------- --------------- --------- -------------- ----------------- |
| 83 | +t119797u+ bhklab h4huhn cpu 155 |
| 84 | +t119797u+ bhklab h4huhn gres/gpu 0 |
| 85 | +t119797u+ bhklab h4huhn mem 39744 |
| 86 | +t119797u+ bhklab h4huhn billing 155 |
| 87 | +-------------------------------------------------------------------- |
| 88 | +*CPU/GPU usage in minutes |
| 89 | +``` |
| 90 | + |
| 91 | +#### Usage |
| 92 | + |
| 93 | +To use the `acctusage` script, you can run it from the command line with the following syntax: |
| 94 | + |
| 95 | +``` |
| 96 | +acctusage [-h] [-u <users>] [-a <accounts>] [-s <startdate>] [-e <enddate>] |
| 97 | +``` |
| 98 | + |
| 99 | +Here are the options and their descriptions: |
| 100 | + |
| 101 | +- `-h`: Display help |
| 102 | +- `-u`: Specify users |
| 103 | + - When only using users option, default value is current user |
| 104 | + - When only using accounts option, default value is all users in accounts |
| 105 | + - Multiple users can be specified with comma separation |
| 106 | +- `-a`: Specify project accounts |
| 107 | + - When only using users option, default value is all accounts |
| 108 | + - When only using accounts option, default value is all users in accounts |
| 109 | + - Multiple accounts can be specified with comma separation |
| 110 | +- `-s`: Specify start date/time |
| 111 | + - Default value is 7 days ago |
| 112 | +- `-e`: Specify end date/time |
| 113 | + - Default value is current date/time |
| 114 | + |
| 115 | + |
| 116 | +## `cqueue` |
| 117 | + |
| 118 | +The `cqueue` script is a utility that displays job queue information. |
| 119 | + |
| 120 | +- It provides options to filter the output based on partitions and users. |
| 121 | +- The script uses the `squeue` command to retrieve the job queue information. |
| 122 | +- The output includes job ID, user, partition, name, state, time, nodes, and reason. |
| 123 | + |
| 124 | +```bash |
| 125 | +❯ cqueue |
| 126 | +----------------------------------------------------------------------------------------- |
| 127 | + Job ID User Job Name Partition State Elapsed Nodelist(Reason) |
| 128 | +------------ ---------- ------------ ---------- -------- ----------- -------------------- |
| 129 | +11716027 t119797uhn /cluster/pro all PENDING 0:00 (BeginTime) |
| 130 | +11716028 t119797uhn /cluster/hom all PENDING 0:00 (BeginTime) |
| 131 | +``` |
| 132 | + |
| 133 | +#### Usage |
| 134 | + |
| 135 | +To use the `cqueue` script, you can run it from the command line with the following syntax: |
| 136 | + |
| 137 | +``` |
| 138 | +cqueue [-h] [-p <partitions>] [-u <users>] |
| 139 | +``` |
| 140 | + |
| 141 | +Here are the options and their descriptions: |
| 142 | + |
| 143 | +- `-h`: Display help |
| 144 | +- `-p`: Specify partitions |
| 145 | + - When only using partitions option, default value is all partitions |
| 146 | + - When only using users option, default value is all users in partitions |
| 147 | + - Multiple partitions can be specified with comma separation |
| 148 | +- `-u`: Specify users |
| 149 | + - When only using users option, default value is current user |
| 150 | + - When only using partitions option, default value is all partitions |
| 151 | + - Multiple users can be specified with comma separation |
| 152 | + |
| 153 | + |
| 154 | +## `jobhist` |
| 155 | + |
| 156 | +The `jobhist` script is a utility that displays job history information. |
| 157 | + |
| 158 | +- It provides options to filter the output based on partitions, users, and time range. |
| 159 | +- The script uses the `sacct` command to retrieve the job history information. |
| 160 | +- The output includes job ID, user, partition, name, state, time, nodes, and reason. |
| 161 | + |
| 162 | +```bash |
| 163 | +❯ jobhist |
| 164 | +--------------------------------------------------------------------------------------------------------- |
| 165 | + Startdate Job ID Job Name Partition State Elapsed Nodes CPUs Memory GPUs |
| 166 | +------------------- ------------- ------------ ---------- ---------- ----------- ----- ---- -------- ---- |
| 167 | +2024-05-12T17:01:59 11716028 /cluster/ho+ all REQUEUED 00:04:27 1 1 256M 0 |
| 168 | +2024-05-13T09:16:55 11716027 /cluster/pr+ all REQUEUED 00:00:02 1 1 256M 0 |
| 169 | +``` |
| 170 | + |
| 171 | +#### Usage |
| 172 | + |
| 173 | +To use the `jobhist` script, you can run it from the command line with the following syntax: |
| 174 | + |
| 175 | +``` |
| 176 | +jobhist [-h] [-p <partitions>] [-u <users>] [-s <startdate>] [-e <enddate>] |
| 177 | +``` |
| 178 | + |
| 179 | +Here are the options and their descriptions: |
| 180 | + |
| 181 | +- `-h`: Display help |
| 182 | +- `-p`: Specify partitions |
| 183 | + - When only using partitions option, default value is all partitions |
| 184 | + - When only using users option, default value is all users in partitions |
| 185 | + - Multiple partitions can be specified with comma separation |
| 186 | + |
| 187 | + |
| 188 | +## `jobinfo` |
| 189 | + |
| 190 | +The `jobinfo` script is a utility that displays detailed job information. |
| 191 | + |
| 192 | +- It provides options to filter the output based on job IDs. |
| 193 | +- The script uses the `scontrol` command to retrieve the job information. |
| 194 | +- The output includes job ID, user, partition, name, state, time, nodes, and reason. |
| 195 | + |
| 196 | +```bash |
| 197 | +Job ID : 11716028 |
| 198 | +Job name : /cluster/home/t119797uhn/.local/bin/ncdu |
| 199 | +User : t119797uhn |
| 200 | +Account : bhklab |
| 201 | +Working directory : /cluster/home/t119797uhn |
| 202 | +Cluster : h4huhn |
| 203 | +Partition : all |
| 204 | +Nodes : 1 |
| 205 | +Nodelist : None assigned |
| 206 | +Tasks : -- |
| 207 | +CPUs : 0 |
| 208 | +GPUs : -- |
| 209 | +State : PENDING (BeginTime) |
| 210 | +Exit code : -- |
| 211 | +Submit time : 2024-05-12T17:06:26 |
| 212 | +Start time : -- |
| 213 | +End time : -- |
| 214 | +Wait time : -- |
| 215 | +Reserved walltime : 1-00:00:00 |
| 216 | +Used walltime : -- |
| 217 | +Used CPU walltime : -- |
| 218 | +Used CPU time : -- |
| 219 | +% User (computation) : -- |
| 220 | +% System (I/O) : -- |
| 221 | +CPU efficiency : -- |
| 222 | +Reserved memory : 256M |
| 223 | +Max memory used : -- |
| 224 | +Memory efficiency : -- |
| 225 | +Max disk write : -- |
| 226 | +Max disk read : -- |
| 227 | +``` |
| 228 | + |
| 229 | +### Usage |
| 230 | + |
| 231 | +To use the `jobinfo` script, you can run it from the command line with the following syntax: |
| 232 | + |
| 233 | +``` |
| 234 | +jobinfo <job_id> |
| 235 | +``` |
| 236 | + |
| 237 | + |
0 commit comments