Skip to content

Commit f23eb8c

Browse files
committed
update README and documentation
1 parent 998a1fe commit f23eb8c

File tree

6 files changed

+383
-188
lines changed

6 files changed

+383
-188
lines changed

README.md

+237
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,237 @@
1+
# Repository of useful Slurm tools
2+
3+
## Setup and Installation
4+
5+
```bash
6+
git clone ....
7+
cd slurm-scripts
8+
```
9+
10+
11+
## `gnodes`
12+
13+
The `gnodes` app is a shell script that reports Slurm node information.
14+
15+
- It provides options to filter the output based on node names, partitions, and states.
16+
- The script retrieves node information from Slurm and displays it in a formatted table, including node name, state, CPUs, GPUs, memory, and partition.
17+
```bash
18+
Symbol | Meaning
19+
. | Available core
20+
_ | Allocated core
21+
O | Loaded core
22+
! | Load is significantly higher than allocated core count
23+
? | Load is unknown
24+
25+
Symbol | Meaning
26+
* | Unallocated GPU
27+
G | Allocated GPU
28+
```
29+
30+
```bash
31+
gnodes -p himem
32+
33+
+- himem - 38 cores & 220GB & max time 7-00:00:00 ------+-------------------------------------------------------+-------------------------------------------------------+
34+
| node100 11G ......!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | node51 6G ............_______OOOOOOOOOOOOOOOOOOO | node70 6G .......______OOOOOOOOOOOOOOOOOOOOOOOOO |
35+
| node101 0G _______OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node52 12G ..............OOOOOOOOOOOOOOOOOOOOOOOO | node71 34G .............OOOOOOOOOOOOOOOOOOOOOOOOO |
36+
| node102 6G .___OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node53 7G ......._____OOOOOOOOOOOOOOOOOOOOOOOOOO | node72 10G ..........________OOOOOOOOOOOOOOOOOOOO |
37+
| node103 11G .........!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | node54 35G ....________________________________OO | node73 4G ............______OOOOOOOOOOOOOOOOOOOO |
38+
| node104 4G .........._______OOOOOOOOOOOOOOOOOOOOO | node55 4G ......!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | node74 4G ..!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
39+
| node105 11G .OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node56 5G ........OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node75 70G ................OOOOOOOOOOOOOOOOOOOOOO |
40+
| node106 12G ........___OOOOOOOOOOOOOOOOOOOOOOOOOOO | node57 13G .......________OOOOOOOOOOOOOOOOOOOOOOO | node76 12G .......OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO |
41+
| node107 15G ....!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | node58 14G ........OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node77 12G ........___OOOOOOOOOOOOOOOOOOOOOOOOOOO |
42+
| node108 7G ...OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node59 10G ....____________OOOOOOOOOOOOOOOOOOOOOO | node79 10G .....________OOOOOOOOOOOOOOOOOOOOOOOOO |
43+
| node109 13G ..........OOOOOOOOOOOOOOOOOOOOOOOOOOOO | node60 6G .________OOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node80 53G .............___________OOOOOOOOOOOOOO |
44+
| node110 7G ..........!!!!!!!!!!!!!!!!!!!!!!!!!!!! | node61 6G ..OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node81 7G .......__OOOOOOOOOOOOOOOOOOOOOOOOOOOOO |
45+
| node111 12G ........._________OOOOOOOOOOOOOOOOOOOO | node62 6G ...._______________________________OOO | node82 12G ..!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
46+
| node112 9G .......OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node63 13G ................___OOOOOOOOOOOOOOOOOOO | node83 11G ..................OOOOOOOOOOOOOOOOOOOO |
47+
| node113 0G ___________________OOOOOOOOOOOOOOOOOOO | node64 11G ..____________OOOOOOOOOOOOOOOOOOOOOOOO | node84 13G ......___OOOOOOOOOOOOOOOOOOOOOOOOOOOOO |
48+
| node141 6G .............!!!!!!!!!!!!!!!!!!!!!!!!! | node65 12G .____________OOOOOOOOOOOOOOOOOOOOOOOOO | node92 62G .............._____OOOOOOOOOOOOOOOOOOO |
49+
| node47 0G _______________OOOOOOOOOOOOOOOOOOOOOOO | node66 8G ...........!!!!!!!!!!!!!!!!!!!!!!!!!!! | node93 6G ....!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! |
50+
| node48 0G !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! | node67 15G ..........._________OOOOOOOOOOOOOOOOOO | node94 13G ......OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO |
51+
| node49 7G ....._____OOOOOOOOOOOOOOOOOOOOOOOOOOOO | node68 0G OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node95 6G ......OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO |
52+
| node50 0G OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node69 11G .OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | |
53+
+-------------------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+
54+
55+
+- himem - 38 cores & 354GB & max time 7-00:00:00 ------+-------------------------------------------------------+-------------------------------------------------------+
56+
| node118 175G ........................____OOOOOOOOOO | node123 0G OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node128 266G ...............................______O |
57+
| node119 0G ______________OOOOOOOOOOOOOOOOOOOOOOOO | node124 0G OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO | node129 0G _______OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO |
58+
| node120 6G ._____________________________OOOOOOOO | node125 342G ..............................___OOOOO | node131 0G OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO |
59+
| node121 0G _____________OOOOOOOOOOOOOOOOOOOOOOOOO | node126 354G ...................................... | node140 0G OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO |
60+
+-------------------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+
61+
62+
+- himem - 94 cores & 1476GB & max time 7-00:00:00 -------------------------------------------------------------+
63+
| node130 860G .............._______________________________________________________________________________O |
64+
+---------------------------------------------------------------------------------------------------------------+
65+
```
66+
67+
68+
## `acctusage`
69+
70+
The `acctusage` is a shell script that reports Slurm account usage by users or accounts
71+
- It provides options to specify users, project accounts, start date/time, and end date/time
72+
- The script retrieves account usage information from Slurm and displays it in a formatted table, including login, accounts, cluster, resource, and usage
73+
- It also provides CPU/GPU usage in minutes
74+
- The script is designed to be run from the command line and provides a help option to display usage information
75+
76+
```bash
77+
❯ acctusage
78+
--------------------------------------------------------------------
79+
Slurm account usage from 2024-05-06 to 2024-05-13
80+
--------------------------------------------------------------------
81+
User Account Cluster Resource Usage
82+
--------- --------------- --------- -------------- -----------------
83+
t119797u+ bhklab h4huhn cpu 155
84+
t119797u+ bhklab h4huhn gres/gpu 0
85+
t119797u+ bhklab h4huhn mem 39744
86+
t119797u+ bhklab h4huhn billing 155
87+
--------------------------------------------------------------------
88+
*CPU/GPU usage in minutes
89+
```
90+
91+
#### Usage
92+
93+
To use the `acctusage` script, you can run it from the command line with the following syntax:
94+
95+
```
96+
acctusage [-h] [-u <users>] [-a <accounts>] [-s <startdate>] [-e <enddate>]
97+
```
98+
99+
Here are the options and their descriptions:
100+
101+
- `-h`: Display help
102+
- `-u`: Specify users
103+
- When only using users option, default value is current user
104+
- When only using accounts option, default value is all users in accounts
105+
- Multiple users can be specified with comma separation
106+
- `-a`: Specify project accounts
107+
- When only using users option, default value is all accounts
108+
- When only using accounts option, default value is all users in accounts
109+
- Multiple accounts can be specified with comma separation
110+
- `-s`: Specify start date/time
111+
- Default value is 7 days ago
112+
- `-e`: Specify end date/time
113+
- Default value is current date/time
114+
115+
116+
## `cqueue`
117+
118+
The `cqueue` script is a utility that displays job queue information.
119+
120+
- It provides options to filter the output based on partitions and users.
121+
- The script uses the `squeue` command to retrieve the job queue information.
122+
- The output includes job ID, user, partition, name, state, time, nodes, and reason.
123+
124+
```bash
125+
❯ cqueue
126+
-----------------------------------------------------------------------------------------
127+
Job ID User Job Name Partition State Elapsed Nodelist(Reason)
128+
------------ ---------- ------------ ---------- -------- ----------- --------------------
129+
11716027 t119797uhn /cluster/pro all PENDING 0:00 (BeginTime)
130+
11716028 t119797uhn /cluster/hom all PENDING 0:00 (BeginTime)
131+
```
132+
133+
#### Usage
134+
135+
To use the `cqueue` script, you can run it from the command line with the following syntax:
136+
137+
```
138+
cqueue [-h] [-p <partitions>] [-u <users>]
139+
```
140+
141+
Here are the options and their descriptions:
142+
143+
- `-h`: Display help
144+
- `-p`: Specify partitions
145+
- When only using partitions option, default value is all partitions
146+
- When only using users option, default value is all users in partitions
147+
- Multiple partitions can be specified with comma separation
148+
- `-u`: Specify users
149+
- When only using users option, default value is current user
150+
- When only using partitions option, default value is all partitions
151+
- Multiple users can be specified with comma separation
152+
153+
154+
## `jobhist`
155+
156+
The `jobhist` script is a utility that displays job history information.
157+
158+
- It provides options to filter the output based on partitions, users, and time range.
159+
- The script uses the `sacct` command to retrieve the job history information.
160+
- The output includes job ID, user, partition, name, state, time, nodes, and reason.
161+
162+
```bash
163+
❯ jobhist
164+
---------------------------------------------------------------------------------------------------------
165+
Startdate Job ID Job Name Partition State Elapsed Nodes CPUs Memory GPUs
166+
------------------- ------------- ------------ ---------- ---------- ----------- ----- ---- -------- ----
167+
2024-05-12T17:01:59 11716028 /cluster/ho+ all REQUEUED 00:04:27 1 1 256M 0
168+
2024-05-13T09:16:55 11716027 /cluster/pr+ all REQUEUED 00:00:02 1 1 256M 0
169+
```
170+
171+
#### Usage
172+
173+
To use the `jobhist` script, you can run it from the command line with the following syntax:
174+
175+
```
176+
jobhist [-h] [-p <partitions>] [-u <users>] [-s <startdate>] [-e <enddate>]
177+
```
178+
179+
Here are the options and their descriptions:
180+
181+
- `-h`: Display help
182+
- `-p`: Specify partitions
183+
- When only using partitions option, default value is all partitions
184+
- When only using users option, default value is all users in partitions
185+
- Multiple partitions can be specified with comma separation
186+
187+
188+
## `jobinfo`
189+
190+
The `jobinfo` script is a utility that displays detailed job information.
191+
192+
- It provides options to filter the output based on job IDs.
193+
- The script uses the `scontrol` command to retrieve the job information.
194+
- The output includes job ID, user, partition, name, state, time, nodes, and reason.
195+
196+
```bash
197+
Job ID : 11716028
198+
Job name : /cluster/home/t119797uhn/.local/bin/ncdu
199+
User : t119797uhn
200+
Account : bhklab
201+
Working directory : /cluster/home/t119797uhn
202+
Cluster : h4huhn
203+
Partition : all
204+
Nodes : 1
205+
Nodelist : None assigned
206+
Tasks : --
207+
CPUs : 0
208+
GPUs : --
209+
State : PENDING (BeginTime)
210+
Exit code : --
211+
Submit time : 2024-05-12T17:06:26
212+
Start time : --
213+
End time : --
214+
Wait time : --
215+
Reserved walltime : 1-00:00:00
216+
Used walltime : --
217+
Used CPU walltime : --
218+
Used CPU time : --
219+
% User (computation) : --
220+
% System (I/O) : --
221+
CPU efficiency : --
222+
Reserved memory : 256M
223+
Max memory used : --
224+
Memory efficiency : --
225+
Max disk write : --
226+
Max disk read : --
227+
```
228+
229+
### Usage
230+
231+
To use the `jobinfo` script, you can run it from the command line with the following syntax:
232+
233+
```
234+
jobinfo <job_id>
235+
```
236+
237+

0 commit comments

Comments
 (0)