-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathescenic_jstat_
executable file
·554 lines (454 loc) · 18.9 KB
/
escenic_jstat_
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
#!/bin/bash
: <<=cut
=head1 NAME
escenic_jstat_ - Munin plugin that interacts with the jstat tool, to show
(among other things), the heap usage of instances of Escenic Content Engine
=head1 APPLICABLE SYSTEMS
escenic_jstat_ works with all installations of Escenic Content Engine 5.1
and onwards. It uses the pidfiles located in /var/run/escenic/ to locate
different instances of Content Engine and graphs their heaps individually.
=head1 CONFIGURATION
escenic_jstat_ needs to know which instance it will run on. Since 5.1,
ECE out of the box allows easy configuration of several instances. These
end up putting their pidfiles in /var/run/escenic/engine(-xxx).pid. This munin
plug-in suggests a set of graphs for each of the currently running instances.
The plugin uses the jstat utility from the JVM which it assumes is available
in the path. jstat itself must execute as the same user that owns the
process being monitored, which means that the plugin itself must be run as
that user.
[escenic_jstat_default_*]
user escenic
[escenic_jstat_test_*]
user test
=head1 INTERPRETATION
The system suggests four graphs.
=head2 MEMORY USAGE GRAPH
The "heap" graph tries to visualize the entire heap in three distinct areas:
The bottom area (green) indicates the size of the young generation, which
consists of "eden", "survivor0" and "survivor1". Stacked on top of that
is the unused portion of that heap (in a light green tint).
The middle area (blue) indicates the size of the old generation, which
consists of objects that survived being in the young generation for a while.
Stacked on top of that is again the unused portion of the old generation
(in a light blue tint).
The top area (orange) indicates the size of the permanent area, which
consists of classes and other things. Stacked on top is the unused portion
of the permanent area (light orange).
The -Xmx parameter controls the size of the young and the old generation, so
the green and blue portions of the graph should correspond exactly
with the value of -Xmx. The capacity of the permanent generation is usually
controlled via the proprietary -XX:MaxPermSize JVM option.
=head2 GARBAGE COLLECTION COUNTER GRAPH
The second graph is a garbage collection graph which shows how often the
garbage collector kicks in. More specifically the number of garbage
collections per minute.
If this number is large, it indicates that the JVM is busy allocating many
objects and then collecting them again. This graph usually co-varies with
the gcoverhead graph.
=head2 CPU SPENT IN GARBAGE COLLECTION GRAPH
The third graph shows the amount of time used reclaiming objects from the
heap.
The graph shows the percentage of the CPU time being used in the garbage
collector. In an idle or healthy system, this number is quite low, well
below 1. Values in the range 1 milli are common. 1 milli-percent actually
means one thousandth of a percentile.
If this number is large, this typically indicates that the heap is filled up
to the limit with objects that can't be reclaimed. This is usually due to
cache sizes not being in line with the size of the JVM's heap, but it may
also be an indicator of a memory leak. Large usually means greater than 1-2
percent.
=head2 UPTIME
This is a simple uptime graph which shows the uptime of the monitored JVM
process, as reported by jstat. This graph will show the actual Java process
uptime, and not the uptime of any wrapper that may be re-spawning it.
=head1 MAGIC MARKERS
#%# family=auto contrib
#%# capabilities=autoconf suggest
=head1 CONFIGURATION
There are a few options that can be specified to control how a particular
instance of this works, if you run a non-standard setup, e.g. pidfiles in
other locations, etc.
jstatbin - path to jstat executable. Defaults to
looking for jstat using which.
ece_instance - name of instance. Defaults to output of 'suggest'
The instance names is used to group and title graphs,
and to locate the pidfile.
pidfile - Override name of pidfile to use. Defaults to
'/var/run/escenic/${ece_instance}.pid' if it exists;
if it doesn't exist, it defaults to
'/var/run/escenic/*-${ece_instance}.pid'. If
configured manually (i.e. not suggest) this can be
used to use this plug-in on arbitrary JVMs. YMMV.
graphtitle - if unspecified, the graph title is preended with the
name of the instance.
=cut
# suggest suggests escenic_jstat_<instancename>_xxx
graph_update=yes
jstatbin=${jstatbin:-$(which jstat)}
ece_instance=$(echo $(basename $0) | sed -e s,escenic_jstat_,, -e 's,_.*,,' )
graph=$(echo $(basename $0) | sed s,escenic_jstat_${ece_instance}_,,)
MUNIN_VERSION=${MUNIN_VERSION:-1.2.5}
MAJOR_VERSION=${MUNIN_VERSION/.*/}
MINOR_VERSION=${MUNIN_VERSION/${MAJOR_VERSION}./}
MINOR_VERSION=${MINOR_VERSION/.*/}
MAINT_VERSION=${MUNIN_VERSION/${MAJOR_VERSION}.${MINOR_VERSION}./}
borrowed_labels=0
if [ "${MAJOR_VERSION}" -gt 1 ] ; then
borrowed_labels=1
elif [ "${MAJOR_VERSION}" -eq 1 ] ; then
if [ "${MINOR_VERSION}" -gt 3 ] ; then
borrowed_labels=1
fi
fi
# Look for a pidfile, prefering engine's pidfiles.
for a in "" engine- ece- search- rmi-hub- analysis- changelogd- ; do
tmp_pidfile="/var/run/escenic/${a}${ece_instance}.pid"
if [ -r $tmp_pidfile ] ; then
break;
fi
done
# use found pidfile unless specified by config parameters.
pidfile=${pidfile:-${tmp_pidfile}}
if [ ! -r "$jstatbin" ] ; then
graph_info="$graph_info NOTE: jstat ($jstatbin) is not executable, so I can't output a graph."
graph_update=no
fi
if [ ! -r "$pidfile" ] ; then
graph_info="$graph_info NOTE: pidfile $pidfile is not readable, so I can't output a graph."
graph_update=no
fi
if [ -r "${pidfile}" ] ; then
pid=$(cat ${pidfile})
echo "# pid is ${pid}"
exe="$(ps -o comm= ${pid})"
echo "# exe is <${exe}>"
if [ "${exe}" == "" ] ; then
echo "# pid ${pid} is not running..."
graph_info="$graph_info Note: dangling pidfile $pidfile isn't running"
unset pid
graph_update=no
elif [ "$(basename ${exe})" != "java" ] ; then
for p in $( ps -o pid= --ppid ${pid} ) ; do
echo "# checking child of pid ${pid}, ${p}"
if [ "$(basename $(ps -o comm= ${p}) )" == "java" ] ; then
echo "# found child java process of ${pid}: ${p}"
pid=${p}
break;
fi
done
fi
if [ "${pid}" != "" ] ; then
owner_uid="$[$(ps -o uid= ${pid})]"
my_id="$[$(id -u)]"
if [ "$my_id" == "0" -o "$owner_uid" != "$my_id" ] ; then
graph_info="$graph_info NOTE: pid is owned by a different user ${owner_uid} than me ${my_id}. So I can't run jstat to talk to it."
graph_update=no
fi
fi
fi
if [ -z "${graphtitle}" ]; then
graphtitle="${ece_instance}"
fi
config_uptime()
{
echo "graph_title $graphtitle uptime"
echo "graph_args --base 1000 -l 0"
echo "graph_scale no" # In case > 1000 days :-) you never know!
echo "graph_vlabel uptime in days"
echo "graph_info Uptime as reported by jstat. " ${graph_info}
echo "graph_category Escenic"
echo "update no" # this graph only loans from the heap graph...
echo "graph_order uptime=escenic_jstat_${ece_instance}_heap.Uptime"
if [ "${borrowed_labels}" == "1" ] ; then
echo "uptime.label Uptime"
fi
echo "uptime.draw AREA"
echo "uptime.info Uptime of the java process in days"
}
config_gc()
{
echo "graph_title $graphtitle memory: Garbage collection frequency"
echo "graph_args -l 0"
echo "graph_scale no" # Use 0.001 or 1200.
echo "graph_period minute"
echo "graph_vlabel collections / \${graph_period}"
echo "graph_info The frequency of garbage collections." ${graph_info}
echo "graph_category Escenic"
echo "update no" # this graph only loans from the heap graph...
echo "graph_order young=escenic_jstat_${ece_instance}_heap.Young_Count full=escenic_jstat_${ece_instance}_heap.Full_Count"
# Labels on borrowed things are bad for munin < 1.3
if [ "${borrowed_labels}" == "1" ] ; then
echo "young.label Young collections"
fi
echo "young.draw LINE2"
echo "young.type DERIVE"
echo "young.min 0"
echo "young.cdef young,0,GT,young,UNKN,IF"
echo "young.info The rate of young collections. Young collections collect the eden and usually execute extremely quickly. Young collections happen asynchronously and don't adversely affect the performance of the JVM."
if [ "${borrowed_labels}" == "1" ] ; then
echo "full.label Full collections"
fi
echo "full.draw LINE2"
echo "full.type DERIVE"
echo "full.min 0"
echo "full.cdef full,0,GT,full,UNKN,IF"
echo "full.info The rate of full collections. Full collections pause all the threads of the JVM while collection is going on, which may lead to spikes in latency. The JVM tries to avoid this, since a full collection usually takes a lot longer than young collections."
}
config_gcoverhead()
{
echo "graph_title $graphtitle memory: CPU spent by garbage collector"
echo "graph_args --base 1000 --lower-limit 0 --upper-limit 1" # --upper-limit 1000000"
echo "graph_scale no" # Use 0.001 or 1200.
echo "graph_printf %6.2lf" # show more decimals.
echo "graph_vlabel CPU time (%)"
echo "graph_info The amount of time spent in garbage collection. The measurement is in percent, A value of 1 means that one percent of the time is spent garbage collecting, which in turn means that for every minute that goes by, 600ms is spent in garbage collection. A value of 0.001 (1 milli) indicates that less than one millisecond is being spent on GC per minute. The value includes both young and old heap collections. Low is good." ${graph_info}
echo "graph_category Escenic"
echo "update no" # this graph only loans from the heap graph...
echo "graph_order young=escenic_jstat_${ece_instance}_heap.Young_Time_millis full=escenic_jstat_${ece_instance}_heap.Full_Time_millis"
# I decided to merge the "gc time" graph data sets, since it clutters the display quite a lot.
# echo "young.label Young_Time_millis"
# echo "young.draw AREA"
echo "young.type DERIVE"
echo "young.graph no"
echo "young.min 0"
# echo "young.cdef young,300000,/,100,*"
# echo "young.info time spent (in parts per million) doing young generation garbage collection. These collections are done incrementally and asynchronously. The only overhead is raw CPU usage."
if [ "${borrowed_labels}" == "1" ] ; then
echo "full.label % time spent in GC"
fi
echo "full.draw LINE2"
echo "full.type DERIVE"
echo "full.min 0"
# if (x) then (y) else (z) --> x,y,z,IF
# x = young+full gt 0
# y = young+ful
# z = 0
echo "full.cdef young,full,+,0,GT,young,full,+,0,IF,30,/,1,*" # 1000 (millis) * 60 (minute) * 5 (munin graph period) = 300000 ms per 5minute period.
# echo "full.cdef misses,hits,+,0,GT,100,hits,*,${a}_hits,${a}_misses,+,/,UNKN,IF"
# echo "full.cdef young,full,+,300000,/,100,*" # 1000 (millis) * 60 (minute) * 5 (munin graph period) = 300000 ms per 5minute period.
# Explanation of full.cdef
# var result = 0;
# if (young+full > 0) then result = young+full;
# result = result / 300000 (number of milliseconds per 5 minute)
# result = result * 100 (percent graph)
# so if we've spent 60 seconds in (young+full) over the last 5 minute
# interval (that's 20%), that's 60.000 milliseconds, we get
# 60k = 60k / 300000 * 100, which equals 20.
echo "full.info Percent of time a CPU spent doing incremental or full garbage collection."
}
config_heap()
{
echo "graph_title $graphtitle memory: Memory usage by generation"
echo "graph_args --base 1024 -l 0"
echo "graph_vlabel bytes (young, old, permanent)"
echo "graph_period minute" # this propagates to where the values are used (I hope)
echo "graph_info Heap Usage split by different blocks of heap." ${graph_info}
echo "graph_category Escenic"
echo "update ${graph_update}"
echo "graph_order Eden_Used Survivor0_Used Survivor1_Used Young_Limit Young_Used Young_Free Old_Used Old_Limit Old_Free Permanent_Used Permanent_Limit Permanent_Free"
echo "Eden_Used.label Eden"
echo "Eden_Used.graph no"
# echo "Eden_Used.draw AREA"
# echo "Eden_Used.info New objects are almost always allocated here. It is a large part of the heap (typically one third) and when it is full, it is garbage collected and survivors are moved up to the Survivor portion of the young heap."
# echo "Eden_Used.colour 00cd00" # green3
echo "Survivor0_Used.label Survivor0_Used"
echo "Survivor0_Used.graph no"
echo "Survivor1_Used.label Survivor1_Used"
echo "Survivor1_Used.graph no"
echo "Young_Limit.label Young_Limit"
echo "Young_Limit.graph no" # Don't show the "limit" as-is, since it's not useful.
echo "Young_Used.label Young used"
echo "Young_Used.draw AREA" # Should equate Young_limit always.
echo "Young_Used.colour 00CD00"
echo "Young_Used.info Objects are allocated in the young part of the heap. The young part consists of Eden and Survivor0 and Survivor1, all of which get a certain portion of the young heap. Objects live in the young part for a short time after which they are promoted to the old part of the heap."
echo "Young_Used.cdef Eden_Used,Survivor0_Used,Survivor1_Used,+,+"
echo "Young_Free.label Young unused"
echo "Young_Free.draw STACK" # Should equate Young_limit always.
echo "Young_Free.colour AEE8AE" # a light green
echo "Young_Free.cdef Young_Limit,Eden_Used,Survivor0_Used,Survivor1_Used,+,+,-"
echo "Old_Used.label Old used"
echo "Old_Used.draw STACK"
echo "Old_Used.colour 3B09ED" # brown1
echo "Old_Used.info When Survivor is full, Survivor is garbage collected and survivors are placed in the Old area"
echo "Old_Limit.label Old_Limit"
echo "Old_Limit.graph no"
echo "Old_Free.label Old unused"
echo "Old_Free.draw STACK"
echo "Old_Free.colour C5B5FF" # AliceBlue
echo "Old_Free.cdef Old_Limit,Old_Used,-"
echo "Permanent_Limit.label Permanent_Limit"
echo "Permanent_Limit.graph no"
echo "Permanent_Used.label Permanent used"
echo "Permanent_Used.draw STACK"
echo "Permanent_Used.colour FF8000" # A relatively dark blue tint
echo "Permanent_Used.info Permanent space is only used for permanent type objects, like class loaders, classes and so on."
echo "Permanent_Free.label Permanent unused"
echo "Permanent_Free.draw STACK"
echo "Permanent_Free.colour F2C79B"
echo "Permanent_Free.cdef Permanent_Limit,Permanent_Used,-"
# unused graphs, used for loaning data to lower munin processing time by avoiding
# several calls to jstat.
echo "Young_Count.label Young_Count"
echo "Young_Count.type DERIVE"
echo "Young_Count.draw AREA"
echo "Young_Count.graph no"
echo "Young_Count.min 0"
echo "Full_Count.label Full_Count"
echo "Full_Count.type DERIVE"
echo "Full_Count.draw STACK"
echo "Full_Count.graph no"
echo "Full_Count.min 0"
echo "Young_Time_millis.label Young_Time_millis"
echo "Young_Time_millis.type DERIVE"
echo "Young_Time_millis.graph no"
echo "Young_Time_millis.min 0"
echo "Full_Time_millis.label Full_Time_millis"
echo "Full_Time_millis.type DERIVE"
echo "Full_Time_millis.graph no"
echo "Full_Time_millis.min 0"
echo "Uptime.label Uptime"
echo "Uptime.info The current uptime (in days) as reported by jstat -t"
echo "Uptime.draw AREA"
echo "Uptime.graph no"
}
jstat()
{
echo "# ${jstatbin} -gc ${pid} | tail -n 1 | awk"
${jstatbin} -gc -t ${pid} | tail -n 1 | awk \
'{\
// Timestamp S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT
// 123456.0 512.0 512.0 0.0 160.0 5888.0 1163.4 88704.0 48433.9 80896.0 79993.8 2181 9.993 36 12.682 22.674
UPTIME = $1; \
S0C = $2; \
S1C = $3; \
S0U = $4; \
S1U = $5; \
EC = $6; \
EU = $7; \
OC = $8; \
OU = $9;
PC = $10; \
PU = $11; \
YGC = $12; \
YGCT = $13; \
FGC = $14; \
FGCT = $15; \
\
S0F = S0C - S0U; \
S1F = S1C - S1U; \
EF = EC - EU; \
OF = OC - OU; \
PF = PC - PU; \
\
printf "Eden_Used.value %d\n",EU * 1024; \
printf "Survivor0_Used.value %d\n",S0U * 1024; \
printf "Survivor1_Used.value %d\n", S1U * 1024; \
printf "Old_Used.value %d\n", OU * 1024; \
printf "Permanent_Used.value %d\n", PU * 1024; \
printf "Young_Count.value %d\n", YGC; \
printf "Full_Count.value %d\n", FGC; \
printf "Young_Time_millis.value %d\n", YGCT * 1000; \
printf "Full_Time_millis.value %d\n", FGCT * 1000; \
printf "Uptime.value %.4f\n", UPTIME/86400; \
}'
${jstatbin} -gccapacity ${pid} | tail -n 1 | awk \
'{\
//NGCMN NGCMX NGC S0C S1C EC OGCMN OGCMX OGC OC PGCMN PGCMX PGC PC YGC FGC
//21056.0 337216.0 41792.0 7552.0 8000.0 25792.0 42176.0 674496.0 73216.0 73216.0 21248.0 86016.0 75392.0 75392.0 72 3
NGCMX = $2; \
OGCMX = $8; \
PGCMX = $12; \
printf "Young_Limit.value %d\n", NGCMX * 1024; \
printf "Old_Limit.value %d\n", OGCMX * 1024; \
printf "Permanent_Limit.value %d\n", PGCMX * 1024; \
}'
}
#
# autoconf
#
if [ "$1" = "autoconf" ]; then
if [ ! -x "${jstatbin}" ]; then
echo "no (No jstat found)"
exit 0
fi
# Check for the presence of both ece or engine pid files
if [ "$(ls 2>/dev/null /var/run/escenic/*.pid | wc -l)" == "0" ]; then
echo "no (No escenic pidfiles in /var/run/escenic/*.pid)"
exit 0
fi
echo "yes"
exit 0
fi
#
# suggest
#
if [ "$1" = "suggest" ]; then
for pid_base in ece- engine- analysis- changelogd- search- rmi-hub- "" ; do
for a in /var/run/escenic/${pid_base}*.pid ; do
if [ ! -r "$a" ] ; then continue; fi
ece_instance=$(basename "$a" .pid | sed "s/^${pid_base}//" )
if [[ "$instances" =~ "<$ece_instance>" ]] ; then
echo "# Ignoring pidfile $a because it would conflict with other instances."
echo "# This pidfile needs manual setup."
break;
fi
instances="$instances <$ece_instance>"
echo ${ece_instance}_heap
echo ${ece_instance}_gc
echo ${ece_instance}_gcoverhead
echo ${ece_instance}_uptime
done
done
exit 0
fi
#
# debug
#
if [ "$1" == "debug" ] ; then
echo "# Debug information"
env
exit 0
fi
#
# config
#
if [ "$1" == "config" ] ; then
config_${graph}
exit 0
fi
#
# Main
#
if [ "${graph}" == "heap" ] ; then
jstat
fi
# 1.4.5 environment
# MUNIN_BASH=/bin/bash
# MUNIN_PLUGSTATE=/var/lib/munin/plugin-state
# MUNIN_PERL=/usr/bin/perl
# MUNIN_STATEDIR=/var/run/munin
# MUNIN_PYTHON=/usr/bin/env python
# MUNIN_PLUGINUSER=nobody
# MUNIN_LOGDIR=/var/log/munin
# MUNIN_PERLLIB=/usr/share/perl5
# MUNIN_HASSETR=1
# MUNIN_LIBDIR=/usr/share/munin
# MUNIN_HTMLDIR=/var/cache/munin/www
# MUNIN_CAP_MULTIGRAPH=1
# MUNIN_OSTYPE=linux
# MUNIN_BINDIR=/usr/bin
# MUNIN_GOODSH=/bin/sh
# MUNIN_SBINDIR=/usr/sbin
# MUNIN_HOSTNAME=localhost.localdomain
# MUNIN_VERSION=1.4.5
# MUNIN_MASTER_IP=
# MUNIN_DOCDIR=/usr/doc
# MUNIN_RUBY=/usr/bin/env ruby
# MUNIN_CGIDIR=/usr/lib/cgi-bin
# MUNIN_MANDIR=/usr/share/man
# MUNIN_DBDIR=/var/lib/munin
# MUNIN_CONFDIR=/etc/munin
# MUNIN_STATEFILE=/var/lib/munin/plugin-state/something
# MUNIN_USER=munin
# MUNIN_GROUP=munin
# MUNIN_PREFIX=/usr