-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathJobRunner_Primer.html
732 lines (626 loc) · 31.5 KB
/
JobRunner_Primer.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">
<title>JobRunner Primer</title>
<style type="text/css">
body {counter-reset:section;}
h1 {counter-reset:subsection;}
h1:before {counter-increment:section; content: counter(section) ". ";}
h2 {counter-reset:subsubsection;}
h2:before {counter-increment:subsection;content:counter(section) "." counter(subsection) ". ";
}
h3:before{counter-increment:subsubsection;content:counter(section) "." counter(subsection) "." counter(subsubsection) ". ";
}
a {text-decoration:none;}
pre { border-style: solid; border-width: 1px; margin-left: 5px; padding: 2px 2px 2px 2px; background-color: #DDDDDD;
/* from: http://forums.techguy.org/web-design-development/249849-forcing-long-text-lines-wrap.html */
white-space: -moz-pre-wrap; /* Mozilla, supported since 1999 */
white-space: -pre-wrap; /* Opera 4 - 6 */
white-space: -o-pre-wrap; /* Opera 7 */
white-space: pre-wrap; /* CSS3 - Text module (Candidate Recommendation) http://www.w3.org/TR/css3-text/#white-space */
word-wrap: break-word; /* IE 5.5+ */
}
</style>
</head>
<body>
<div style="text-align: center;"><big><big><big>JobRunner Primer</big></big></big><br>
</div>
<div style="text-align: center;"><small>Last modified: March 7th, 2013</small><br>
</div>
<br>
<!---------------------------------------->
<h1>Description</h1>
<a href="#JobRunner"><b>JobRunner</b></a> is a tool designed to <i>run</i> a <i>job</i> (ie to
execute a command line or script) that does not require user
interaction, and stores the
program's output (standard error and standard out) in an easy to read
log file. The tool is able to save a <i>configuration file</i> of the
<i>job</i> to be run, that can be used by <b>JobRunner_Caller</b> as a
light-weight queueing system.<br>
In addition to describing the tool's <a
href="#JR_procorder">Processing order</a>, a few <a
href="#JR_examples">examples of use</a> are available, as well a <a
href="#JR_nfs">special note related to use over NFS shared directories</a>.
<br>
<a href="#JobRunner_Caller"><b>JobRunner_Caller</b></a> is a tool designed to <i>call</i>
<b>JobRunner</b>'s <i>configuration files</i>, acting as a <i>queue
processor</i>.<br>
In addition to describing the tool's <a
href="#JRC_procorder">Processing order</a>, a few <a
href="#JRC_examples">examples of use</a> are available
<br>
The tools were designed to work using directory lock mechanism, instead
of relying on a central networked server to dispatch jobs, aggregate
logs and collect job status information.
<!---------------------------------------->
<a name="JobRunner">
<h1>JobRunner</h1>
</a>
<a href="JobRunner-usage.html"><b>JobRunner</b>'s command line options</a> can be obtained running:
<pre>JobRunner --help</pre>
<!---------------------------------------->
<h2>Simple Job</h2>
In its simplest form, the tool is run either as:
<pre>
JobRunner --lockdir ldir --name jobid -- commandline_to_run
</pre>
or
<pre>
JobRunner --lockdir ldir --name jobid --executable script
</pre>
The first form uses the information on the commandline as the <i>job</i>
<i>commandline_to_run</i>.
The second form uses the <i>executable</i> <code>script</code> as the <i>job</i> <i>command</i>.
<!---------------------------------------->
<a name="JR_procorder">
<h2>Job processing</h2>
</a>
In order to understand how the tool works, one must understand its use
of <i>Lock directories</i>.
Those allow for <i>jobs</i> to only be run once and allow
the user to know the exit status of one's <i>jobs</i> by simply looking at
the directory name.
<br>
They are placed in the
<code>--lockdir</code> directory and are composed of the
<code>--name</code>'s reformatted <code>jobid</code> (any character
that is not <code>a</code> to <code>z</code>, <code>0</code> to <code>9</code>, <code>-</code> or <code>_</code> is replaced by <code>_</code>) followed by five <code>_</code> and
terminated by a <code><STATUS></code> information:
<pre>
ldir/jobid_____<STATUS>
</pre>
, where <code><STATUS></code> is one of:
<ul>
<li> <code>inprogress</code>; the <i>job</i> is currently <i>in
progress</i> but has not yet completed,
<li> <code>done</code>; the <i>job</i> has completed and the program exited
with an <i>ok</i> status (i.e. exited with status code 0),
<li> <code>bad</code>; the <i>job</i> is complete but its program exited
with a <i>non ok</i> statys (i.e. exited with a status code
different from 0),
<li> <code>skip</code>; a special mode (user created dir) in which the <i>job</i> will be
bypassed entirely.
</ul>
When a <b>JobRunner</b> command line is executed:
<ol>
<li>If used, confirm that the <code>mutexTool</code> and <code>MutexLockDir</code> are
proper, and obtain the mutex if able. If not, it means that somebody
else is doing this job, and a skip is forced.
<li>If any, do <i>job</i> pre-tasks. In order:
<ol>
<li>create the <code>preCreatedir</code> directory
<li>change directory to <code>dirChange</code>
<li>confirming that the <code>executable</code> and
<code>RunIfTrue</code> scripts are available
<li>check all requested <code>checkfile</code>s existence and
bypass the <i>job</i> if any does not exist
</ol>
<li>Insure that the <i>job</i> is available to be run by checking its
<i>lock directory</i>:
<ol>
<li>Check that a given <i>job</i> does not have multiple <i>lock directories</i>, and if
that is the case exit in error
<li>bypass <i>jobs</i> to be <code>skip</code>-ped
<li>bypass completed but <code>bad</code> <i>jobs</i>, unless requested
to rerun them (<code>badErase</code>) in which case delete the
<code>bad</code> <i>lock directory</i>
<li>checking if a <code>done</code>'s <i>lock directory</i> log
file (default location is in directory, can be overriden using
<code>LogFile</code>) is present:
<ol>
<li>if not, consider as a new run, and delete the <code>done</code>
<i>lock directory</i>
<li>if it is present and none of the requested
<code>checkfile</code>s are more recent that the logfile, bypass
this <i>job</i> (and return with <code>SuccessReturnCode</code>
if used), otherwise delete the <code>done</code>
<i>lock directory</i>
</ol>
<li>bypass if already <code>inprogress</code>
</ol>
<li>check that all <code>RunIfTrue</code> scripts exit with the
<i>ok</i> status before accepting the <i>job</i> as runnable
<li>if <code>Onlycheck</code> was requested, exit with requested return value
<li>create the <code>inprogress</code> <i>run lock directory</i>
<li>if <code>goRunInLock</code> was selected, change directory to <i>run lock directory</i>
<li>if any, do the pre-<i>job</i> running task: creating the
<code>CreateDir</code> directories.
<li> if <code>GoRunInDir</code> is selected, change to the specified
directory.
<li><b>run the <i>job</i></b>
<li>if any <code>PostRunChangeDir</code> was requested go to the
requested directory
<li>depending on the <i>job</i> exit status:
<ul>
<li>for <i>ok</i> exit status (and no <i>SIGNALs</i> --unless ignored)
<ul>
<li>if requested, run the <code>DonePostRun</code> script (will
only print a warning message if it did not complete with an
<i>ok</i> status)
<li>rename the <i>lock directory</i> from
<code>inprogress</code> to <code>done</code>
<li>exit with the <i>ok</i> status unless
<code>SuccessReturnCode</code> was requested, in which case
return with that status.
</ul>
<li>for <i>non ok</i> exit status (or any <i>SIGNAL</i> --if not ignored)
<ul>
<li>if requested, run the <code>ErrorPostRun</code> script (will
only print a warning message if it did not complete with an
<i>ok</i> status)
<li>rename the <i>lock directory</i> from
<code>inprogress</code> to <code>bad</code>
<li>exit with the <i>non ok</i> status
</ul>
</ul>
</ul>
</ol>
<!---------------------------------------->
<a name="JR_examples">
<h2>A Series of Examples</h2>
</a>
<!---------------------------------------->
<h3>"Hello World !" (version 1)</h3>
After creating our <code>lockdir</code> (<code>JR_lockd</code>):
<pre>
% JobRunner --lockdir JR_lockd --name test -- echo \"hello world \!\"
[JobRunner] test -- Job succesfully completed
</pre>
Each line containing <code>[JobRunner]</code> is an output from the tool (default value can be changed using <code>toPrint</code>), allowing us to easily distinguish between code output and <i>JobRunner</i>'s output.
<b>JobRunner</b> informs us that the <i>job</i> named
<code>test</code> is <i>succesfully completed</i>.
This is easily confirmed by confirming the expected <i>lock
directory</i>'s existence (<code>JR_lockd/test_____done</code>) that
only contains the <code>logfile</code> whose content is as follow:
<pre>
[[COMMANDLINE]] echo "Hello World !"
[[RETURN CODE]] 0
[[SIGNAL]] 0
[[STDOUT]]
Hello World !
[[STDERR]]
</pre>
In it we can see that:
<ul>
<li>the <code>COMMANDLINE</code> ran was <code>echo "Hello World !"</code>
<li>the <code>RETURN CODE</code> of the command line execution was
<code>0</code>, the <i>ok</i> status
<li>no <code>SIGNAL</code> (<code>0</code>) was used on the command line execution
<li>the expected result of the command was printed on
<code>STDOUT</code>
<li>nothing was print on <code>STDERR</code>
</ul>
Note that since this was provided on the command line, it was
preferable to escape the <code>"</code> and <code>!</code> to be
certain the command line ran was as expected.
<br>
If we re-run the command again we are told the the <code>test</code>
<i>job</i> was <code>Previously succesfully completed</code>.
<pre>
% JobRunner --lockdir JR_lockd --name test -- echo \"hello world \!\"
[JobRunner] test : Previously successfully completed
</pre>
If we change the <code>jobid</code> to <code>test2</code>:
<pre>
% JobRunner --lockdir JR_lockd --name test2 -- echo \"Hello World \!\"
[JobRunner] test2 -- Job succesfully completed
</pre>
we are able to run the same program again, since it is a new
<code>jobid</code>, and we will now see in the <code>lockdir</code>
both the <code>test_____done</code> and <code>test2_____done</code> directories.
<!---------------------------------------->
<h3>"Hello World !" (version 2)</h3>
After trying to run a command line through <b>JobRunner</b>, let us try
the <code>executable</code> script version, for that we create a
<code>hello_world.bash</code> file containing:
<pre>
#!/bin/bash
echo "Hello World !"
touch hello_world_file
</pre>
and make it executable (<code>chmod +x hello_world.bash</code>).
The main difference between the first version and this version, in
addition to having what was once a command line an executable, is that
the script will in addition create a <code>hello_world_file</code>
file in the <i>current directory</i> where the executable is run.
<br>
Then we can run the program from the <i>current directory</i> (.) :
<pre>
% JobRunner --lockdir JR_lockd --name test3 --executable ./hello_world.bash
[JobRunner] test3 : Job successfully completed
</pre>
As per previous runs, we can see that the
<code>JR_lockd/test3_____done/logfile</code> contains the expected
<code>Hello World !</code> printed in the <code>[[STDOUT]]</code>
section. And we find a <code>hello_world_file</code> in the <i>current
directory</i> (ie in <code>.</code>).
<br>
It is possible to request the script to run in a specific directory,
like <code>/tmp</code>, such as:
<pre>
% JobRunner --lockdir JR_lockd --name test4 --executable ./hello_world.bash --dirChange /tmp
[Warning] [JobRunner] test4 : dirChange -- Current directory changed to '/tmp'
[ERROR] [JobRunner] test4 : Problem with 'executable' (./hello_world.bash): file does not exist
[ERROR]
</pre>
but as we can see the <code>./hello_world.bash</code> script was not
relative to <code>/tmp</code> and therefore is not available to be
run. To remedy this situation we have to unrelativise <code>.</code>
by using <code>pwd</code> for the <code>executable</code>
(<code>lockdir</code> is the only entries that the tool will attempt
to relativise from the starting directory) so that:
<pre>
% JobRunner --lockdir JR_lockd --name test4 --executable `pwd`/hello_world.bash --dirChange /tmp
[Warning] [JobRunner] test4 : dirChange -- Current directory changed to '/tmp'
[JobRunner] test4 : Job successfully completed
</pre>
This time we find a <code>/tmp/hello_world_file</code> file.
<br>
It is also possible to request the script to run in the <i>run lock
directory</i> using the <code>goRunInLock</code> command line option, which allows for the user to not have to worry about
creating directories to store one's created files.
This method will create any new files in the
location of <i>run lock directory</i> where we expect to find the <code>logfile</code>
alone, and use it a <i>run directory</i> where data can be stored
in to the <i>current directory</i>.
<!---------------------------------------->
<h3>"Hello World !" (version 3)</h3>
In this example, we will run the <code>hello_world.bash</code> program
but only if the required file <code>required_file</code> is
present.
<br>
First, let us see what happens when the file is not present:
<pre>
% rm -f required_file
% JobRunner --lockdir JR_lockd --name test5 --executable `pwd`/hello_world.bash --dirChange /tmp --checkfile `pwd`/required_file
[Warning] [JobRunner] test5 : dirChange -- Current directory changed to '/tmp'
[ERROR] [JobRunner] test5 : Problem with 'checkfile' [<i>pwd_path</i>/required_file] : file does not exist
[ERROR]
</pre>
Now, let's create the file and run the tool:
<pre>
% touch required_file
% JobRunner --lockdir JR_lockd --name test5 --executable `pwd`/hello_world.bash --dirChange /tmp --checkfile `pwd`/required_file
[Warning] [JobRunner] test5 : dirChange -- Current directory changed to '/tmp'
[JobRunner] test5 : Job successfully completed
</pre>
as we can see, because the <code>required_file</code> file does exist,
the <code>executable</code> can be run.<br>
<br>
If we try to rerun the <b>JobRunner</b> tool:
<pre>
% JobRunner --lockdir JR_lockd --name test5 --executable `pwd`/hello_world.bash --dirChange /tmp --checkfile `pwd`/required_file
[Warning] [JobRunner] test5 : dirChange -- Current directory changed to '/tmp'
[JobRunner] test5 : Previously successfully completed, and no files listed in 'checkfile' is newer than the logfile, not re-runing
</pre>
, a previous <i>lock directory</i> is present and no
<code>checkfile</code> files is newer than the logfile, there is no
need to run or re-run the <i>job</i>.<br>
<br>
If we <code>touch</code> the
<code>required_file</code>, ie change its date as being more recent
than the last <b>JobRunner</b> run's <code>logfile</code> (and asking
the tool to be a little more <code>Verbose</code>):
<pre>
% touch required_file
% JobRunner --lockdir JR_lockd --name test5 --executable `pwd`/hello_world.bash --dirChange /tmp --checkfile `pwd`/required_file --Verbose
[Warning] [JobRunner] test5 : dirChange -- Current directory changed to '/tmp'
!! [JobRunner] test5 : Previously successfully completed, but at least one file listed in 'checkfile' is newer than the logfile => re-runing
[JobRunner] test5 : !! Deleting previous run lockdir [<i>pwd_path</i>/JR_lockd/test5_____done]
[JobRunner] test5 : %% In progress:
[JobRunner] test5 : ++ Creating "In Progress" lock dir
[JobRunner] test5 : Job successfully completed
</pre>
, here despite the fact that a <i>lock directory</i> was already
present, the <code>required_file</code> <code>checkfile</code> being
newer than the <code><i>pwd_path</i>/JR_lockd/test5_____done/logfile</code>,
<b>JobRunner</b> did a forced <i>re-run</i> of the <i>job</i> by
deleting the entire <i>lock directory</i>.
<br>
This behavior make it possible to have the tool check for updated
source files before accepting to run (or re-run) a job. It is also
possible to run a job only if a certain precondition is met. This is
done using the <code>--RunIfTrue</code> command line option by
checking the <i>return status code</i> of the executable run, so that
the user specific condition is met if the return code is <b>0</b> (the
unix ok exit status). For example, to only run the
<code>hello_world.bash</code> script when the five minute load is
under 1.0, we created a small script we named
<code>is5minloadok.sh</code> that relies on the <code>uptime</code>
command. The content of that script is as follow:
<pre>
#!/bin/bash
RIL=1.0
L5E=`uptime | perl -ne 'print $1 if (m%load\saverage.?\:\s+([\d\.]+)%);'| perl -ne 'if ($_ < '$RIL'){print"0"}else{print"1"}'`
exit $L5E
</pre>
in which the <code>RIL</code> variable set the expected load value and
<code>L5E</code> set the <code>exit</code> by calling
<code>uptime</code>, extracting the <i>5 minutes load average</i>
information, comparing it to the <code>RIL</code> value (or
<i>1.0</i>) so that if the load is less than the value, return
<code>0</code> (ie the unix <i>ok status</i>) otherwise return
<code>1</code>.<br>
After making this <code>is5minloadok.sh</code> program executable, we
can try to run the <i>test5</i> <i>job</i> again as such:
<pre>
% touch required_file
% JobRunner --lockdir JR_lockd --name test5 --executable `pwd`/hello_world.bash --dirChange /tmp --checkfile `pwd`/required_file --Verbose --RunIfTrue `pwd`/is5minloadok.sh
[Warning] [JobRunner] test5 : dirChange -- Current directory changed to '/tmp'
!! [JobRunner] test5 : Previously successfully completed, but at least one file listed in 'checkfile' is newer than the logfile => re-runing
[JobRunner] test5 : !! Deleting previous run lockdir [<i>pwd_path</i>/JR_lockd/test5_____done]
[JobRunner] test5 : == 'RunIfTrue' check OK (<i>pwd_path</i>/is5minloadok.sh)
[JobRunner] test5 : %% In progress:
[JobRunner] test5 : ++ Creating "In Progress" lock dir
[JobRunner] test5 : Job successfully completed
</pre>
In this case, the load on my system was low enough that the <i>5 minutes
load</i> was less than 1.0, therefore the <code>RunIfTrue</code>
returned <i>ok</i> and therefore the job could be run.
<!---------------------------------------->
<h3>"Hello World !" (version 4)</h3>
In this example, we are also making use of the
<code>DonePostRun</code> and <code>ErrorPostRun</code> options to
create status files that can be checked by outside programs to trigger
other jobs that could only be run once a previous job is completed.
<pre>
% touch required_file
% JobRunner --lockdir JR_lockd --name test5 --executable `pwd`/hello_world.bash --dirChange /tmp --checkfile `pwd`/required_file --Verbose --RunIfTrue `pwd`/is5minloadok.sh --DonePostRun 'touch /tmp/test5_doneok' --ErrorPostRun 'touch /tmp/test5_error'
[Warning] [JobRunner] test5 : dirChange -- Current directory changed to '/tmp'
!! [JobRunner] test5 : Previously successfully completed, but at least one file listed in 'checkfile' is newer than the logfile => re-runing
[JobRunner] test5 : !! Deleting previous run lockdir [<i>pwd_path</i>/JR_lockd/test5_____done]
[JobRunner] test5 : == 'RunIfTrue' check OK (<i>pwd_path</i>/is5minloadok.sh)
[JobRunner] test5 : %% In progress:
[JobRunner] test5 : ++ Creating "In Progress" lock dir
[JobRunner] test5 : %% OK Post Running: 'touch /tmp/test5_doneok'
[JobRunner] test5 : Job successfully completed
</pre>
Other examples of uses of those options:
<ul>
<li>email user with a specific message upon job unsuccesful or
succesful completion
<li>delete a dataset upon completion of work on it to save disk
space
<li>Append a step result to a global database
</ul>
<!---------------------------------------->
<h3>"Hello World !" (version 5)</h3>
In this example, we will run the same example as in version 4, as <i>jobid</i> <i>test6</i>, but will create a configuration file for running it at a later time:
<pre>
% touch required_file
% JobRunner --lockdir JR_lockd --name test6 --executable `pwd`/hello_world.bash --dirChange /tmp --checkfile `pwd`/required_file --Verbose --RunIfTrue `pwd`/is5minloadok.sh --DonePostRun 'touch /tmp/test6_doneok' --ErrorPostRun 'touch /tmp/test6_error' --saveConfig `pwd`/test6.jobrunner
Wrote 'saveConfig' file (<i>pwd_path</i>/test6.jobrunner)
</pre>
As can be seen, no processing was done by the tool except for the generation of the configuration file, whose content is a simple <i>array dump</i> of the command line arguments in the orders they were added on the command line:
<pre>
# Job Runner Configuration file
$VAR1 = [
'--lockdir',
'JR_lockd',
'--name',
'test6',
'--executable',
'<i>pwd_path</i>/hello_world.bash',
'--dirChange',
'/tmp',
'--checkfile',
'<i>pwd_path</i>/required_file',
'--Verbose',
'--RunIfTrue',
'<i>pwd_path</i>/is5minloadok.sh',
'--DonePostRun',
'touch /tmp/test6_doneok',
'--ErrorPostRun',
'touch /tmp/test6_error'
];
</pre>
To run this configuration file, simply use the <code>useConfig</code> option:
<pre>
% JobRunner --useConfig test6.jobrunner
[Warning] [JobRunner] test6 : dirChange -- Current directory changed to '/tmp'
[JobRunner] test6 : == 'RunIfTrue' check OK (<i>pwd_path</i>/is5minloadok.sh)
[JobRunner] test6 : %% In progress:
[JobRunner] test6 : ++ Creating "In Progress" lock dir
[JobRunner] test6 : %% OK Post Running: 'touch /tmp/test6_doneok'
[JobRunner] test6 : Job successfully completed
</pre>
An interesting use of the configuration file is the creation of a default configuration file and overridding its content using new command line overrides:
<pre>
% JobRunner --useConfig test6.jobrunner --name test7
[Warning] [JobRunner] test7 : dirChange -- Current directory changed to '/tmp'
[JobRunner] test7 : == 'RunIfTrue' check OK (<i>pwd_path</i>/is5minloadok.sh)
[JobRunner] test7 : %% In progress:
[JobRunner] test7 : ++ Creating "In Progress" lock dir
[JobRunner] test7 : %% OK Post Running: 'touch /tmp/test6_doneok'
[JobRunner] test7 : Job successfully completed
</pre>
Here, I just requested for the <i>name</i> to be one that was not yet run and it authorized the rerun.
Using the same idea, we can create an updated configuration file for a <i>jobid</i> <i>test8</i>:
<pre>
% JobRunner --useConfig test6.jobrunner --name test8 --saveConfig `pwd`/test8.jobrunner
Wrote 'saveConfig' file (<i>pwd_path</i>/test8.jobrunner)
</pre>
Using this method it is possible to create many similar jobs quickly, simply modifying the <code>jobid</code> and <code>excecutable</code> <i>script</i>, but using the same other configuration settings.
For example:
<pre>
% JobRunner --useConfig test6.jobrunner --name test9 --executable /bin/pwd --saveConfig `pwd`/test9.jobrunner
Wrote 'saveConfig' file (<i>pwd_path</i>/test9.jobrunner)
</pre>
contains the same content as the <code>test6.jobrunner</code> except for the two changes done on the command line, the <code>name</code> and the <code>executable</code>:
<pre>
# Job Runner Configuration file
$VAR1 = [
'--lockdir',
'JR_lockd',
<b> '--name',
'test9',
'--executable',
'/bin/pwd',</b>
'--dirChange',
'/tmp',
'--checkfile',
'<i>pwd_path</i>/required_file',
'--Verbose',
'--RunIfTrue',
'<i>pwd_path</i>/is5minloadok.sh',
'--DonePostRun',
'touch /tmp/test6_doneok',
'--ErrorPostRun',
'touch /tmp/test6_error'
];
</pre>
<!---------------------------------------->
<a name="JobRunner_Caller">
<h1>JobRunner_Caller</h1>
</a>
<code>JobRunner_Caller</code>'s purpose is to execute <i>JobRunner</i> on its <i>configuration files</i>. The entire decision on wether a job is to be run is done by <i>JobRunner</i> itself.<br>
<a href="JobRunner_Caller-usage.html"><b>JobRunner_Caller</b>'s command line options</a> can be obtained running:
<pre>JobRunner_Caller --help</pre>
<!---------------------------------------->
<a name="JRC_procorder">
<h2>Processing order</h2>
</a>
<ul>
<li>Create the <code>quitfile</code>
<li>Set <i>KeepDoingIt</i> to <i>true</i> if <code>watchdir</code> is used, <i>false</i> otherwise
<li>Run a <i>Set</i>:
<ul>
<li>Generate a <i>ToBeDone</i> list by adding to it:
<ol>
<li>Entries found in the <code>DirOnceBefore</code> directory (only first <i>set</i>)
<li>Entries found in the <code>watchdir</code> directory
<li>Entries found in the <code>dironce</code> directory (only first <i>set</i>)
<li>Command line provided <i>JobRunner configuration files</i>
</ol>
<li>If one of <code>RandomOrder</code>, <code>timeSort</code> or <code>customSortTool</code> is used, apply it to the <i>ToBeDone</i> list
<li>For each <i>Job</i> in <i>ToBeDone</i>:
<ul>
<li>Check if the <code>QuitFile</code> still exists; if it does not, set <i>KeepDoingIt</i> to <i>false</i> and exit loop
<li><b>Run <i>Job</i> using <i>JobRunner</i></b>
</ul>
<li>If used, decrease <code>maxSet</code>. If <i>0</i>, turn <i>KeepDoingIt</i> to <i>false</i>
</ul>
<li>If <i>KeepDoingIt</i> is set to <i>true</i>, run the next <i>Set</i>
</ul>
<!---------------------------------------->
<a name="JRC_examples">
<h2>Examples of use</h2>
</a>
<!---------------------------------------->
<h3>Running a set of jobs</h3>
In its simplest use, the tool is designed to run <i>JobRunner</i> configuration files. For example, the <code>test6.jobrunner</code>, <code>test8.jobrunner</code> and <code>test9.jobrunner</code> jobs created earlier:
<pre>
% JobRunner_Caller test6.jobrunner test8.jobrunner test9.jobrunner
Reminder: to quit properly after a Job/during a Set, delete the 'QuitFile': <i>/tmpdir</i>/<i>tmpfile</i>
[**] Job Runner Config: '/Users/martial/Works/JobRunner/test6.jobrunner'
@@ Can be skipped
(stdout)[Warning] [JobRunner] test6 : dirChange -- Current directory changed to '/tmp'
[JobRunner] test6 : Previously successfully completed, and no file listed in 'checkfile' is newer than the logfile, not re-runing
Reminder: to quit properly after a Job/during a Set, delete the 'QuitFile': <i>/tmpdir</i>/<i>tmpfile</i>
[**] Job Runner Config: './test8.jobrunner'
++ Job completed
(stdout)[Warning] [JobRunner] test8 : dirChange -- Current directory changed to '/tmp'
[JobRunner] test8 : == 'RunIfTrue' check OK (/Users/martial/Works/JobRunner/is5minloadok.sh)
[JobRunner] test8 : %% In progress:
[JobRunner] test8 : ++ Creating "In Progress" lock dir
[JobRunner] test8 : %% OK Post Running: 'touch /tmp/test6_doneok'
[JobRunner] test8 : Job successfully completed
Reminder: to quit properly after a Job/during a Set, delete the 'QuitFile': <i>/tmpdir</i>/<i>tmpfile</i>
[**] Job Runner Config: './test9.jobrunner'
++ Job completed
(stdout)[Warning] [JobRunner] test9 : dirChange -- Current directory changed to '/tmp'
[JobRunner] test9 : == 'RunIfTrue' check OK (/Users/martial/Works/JobRunner/is5minloadok.sh)
[JobRunner] test9 : %% In progress:
[JobRunner] test9 : ++ Creating "In Progress" lock dir
[JobRunner] test9 : %% OK Post Running: 'touch /tmp/test6_doneok'
[JobRunner] test9 : Job successfully completed
Reminder: to quit properly after a Job/during a Set, delete the 'QuitFile': <i>/tmpdir</i>/<i>tmpfile</i>
%%%%%%%%%% Set 1 run results: 3 / 3 %%%%%%%%%%
Done (3 / 3)
</pre>
As can be seen, <code>test6.jobrunner</code> was already run in the past, but <code>test8.jobrunner</code> and <code>test9.jobrunner</code> were not yet, so <i>JobRunner_Caller</i> just insured that those were called in the order they were entered on the command line.
<!---------------------------------------->
<h3>Queue modes</h3>
When using a <code>watchdir</code> the tool will never exit, unless deletetion of the <code>quitfile</code>; this mode is used to run a <i>queue processor</i>:
<pre>
% JobRunner_Caller --watchdir watchdir_dir
</pre>
One <code>JobRunner_Caller</code> equals to one <i>queue processor</i>. To have multiple of those run as many <code>JobRunner_Caller</code> as you want <i>queue processor</i> per <i>system</i>. For example to run three <i>queue processors</i> on a system called <i>linux1</i>, in multiple shells:
<pre>
linux1:shell1 % JobRunner_Caller --watchdir watchdir_dir
linux1:shell2 % JobRunner_Caller --watchdir watchdir_dir
linux1:shell3 % JobRunner_Caller --watchdir watchdir_dir
</pre>
It is therefore possible to run a set number of <i>queue processors</i> on a limitless number of systems as long as they share the <code>watchdir</code> and <i>JobRunner</i>'s <code>lockdir</code> directories. For example, to run one <i>queue processor</i> on the <i>linux1</i> and two on the <i>linux2</i> system:
<pre>
linux1:shell1 % JobRunner_Caller --watchdir watchdir_dir
linux2:shell1 % JobRunner_Caller --watchdir watchdir_dir
linux2:shell2 % JobRunner_Caller --watchdir watchdir_dir
</pre>
When doing so it is often important to keep in mind that the directory sharing often does not insure mutual exclusion of the lock directory creation, and it might be important to use the <code>ExtraLockingTool</code> and <code>LockingToolLockDir</code> command line options.
Running the tool on a non-homogenous set of system is also possible; given that the <i>job selection</i> process is done using the <i>jobid</i>, it is possible to create a set of <i>JobRunner configuration files</i> that are specific to a given operating system, in order to optimize processing time. For example to run one queue on a linux system, and one queue on a Mac OSX system:
<pre>
linux1:shell1 % JobRunner_Caller --watchdir anyOSjobs_dir --watchdir linuxOnly_dir
osx1:shell1 % JobRunner_Caller --watchdir anyOSjobs_dir --watchdir OSXOnly_dir
</pre>
The <i>linux1</i> system will try to process jobs that are both in <code>anyOSjobs_dir</code> and <code>linuxOnly_dir</code>, while the <i>osx1</i> system will try to process jobs that are both in <code>anyOSjobs_dir</code> and <code>OSXOnly_dir</code>. So if a <i>jobid</i> <code>complexJob99.jobrunner</code> is present both in <code>linuxOnly_dir</code> and <code>OSXOnly_dir</code> in its <i>linux</i> and <i>OSX</i> executable equivalents, the first system to be available will complete that <i>job</i> and the other system will see that the <i>jobid</i> is used in <i>JobRunner</i>'s <code>lockdir</code> already, and will skip it from being reprocessed.
In this example, a <code>anyOSjobs_dir</code> is also used to contain any <i>job</i> that can be run on either operating system.
<!-- check in /slp/SLRE/jobrunner/mx5/mic-interview/ there are a lot -->
<!-- of examples in there (and in /slp/SLRE2/jobrunner) -->
<!---------------------------------------->
<a name="JR_nfs">
<h1><code>JobRunner</code>'s <code>mutexTool</code> and
<code>JobRunner_Caller</code>'s <code>ExtraLockingTool</code></h1>
</a>
<h2>Method 1: Login access to NFS server</h2>
For use over NFS, If you have login access to the NFS server computer,
one way of insuring an atomic lock creation is to use the unix
<code>lockfile</code> command.
The following example script also rely on having an <code>ssh-agent</code>
running to log into the <i>nfs_server_host</i> computer:
<pre>
#!/bin/csh
set lfile=$1
shift
# Only try once to obtain lock (expected 8 seconds)
# the idea is that lock creation is atomic on the server which host the file
ssh <i>nsf_server_host</i> lockfile -r 1 $lfile
</pre>
<h2>Method 2: Other cases</h2>
For our use over NFS, if you do not have login access to the NFS
server computer, we have relied on the <code>rlock</code> command
(part of the ruby library called <a
href="https://github.com/ahoward/lockfile">lockfile</a>) using the
following wrapper (named <code>sp_lock.csh</code>):
<pre>
#!/bin/csh
set lfile=$1
shift
# timeout after 60 seconds if not able to obtain the lockfile
# consider any lock older than 10 days "stale" (in seconds: 3600*24*10=864000) [nil for infinite]
# when stealing a lock, only wait 10 seconds before confirming
rlock --timeout=60 --max_age 864000 --suspend 10 $lfile
</pre>
</body>
</html>