Java agent to diagnose JDK-8180450
See the issue for full details...
But briefly, the problem is that the OpenJDK JVM caches the supertype of a class/interface on the JRE's Klass
structure when evaluating things like instanceof
or checking type casts and while that usually results in a performance win (because a slower traversal of the whole hierarchy can be avoided), in certain circumstances involving multiple inheritance of interfaces the cache is worse than useless.
CPU will be expended updating the cache, but the cached value won't help avoiding the slow path.
It's a problem because the surprising and significant difference in performance between the ideal and worst case behaviours means programmers can easily write poorly performing code without realising the costs (usually instanceof
is very cheap, for example).
It's a Java agent which can be used to identify code which may be suffering from this problem.
It attempts to count the number of times an a particular concrete class is used in instanceof
(and similar) expressions with a different test type.
For example at one point in the code you might have foo instanceof I1
and somewhere else bar instanceof I2
(where foo
and bar
have the same concrete type).
The agent will estimate the number of times the cached supertype (I1
and I2
) is changing, i.e. how often you're hitting the slower path.
$ # From the root dir
$ mvn package
$ # From the root dir
$ java -javaagent:agent/target/type-pollution-agent-0.1-SNAPSHOT.jar -jar example/target/type-pollution-example-0.1-SNAPSHOT.jar
The output, with a default agent configuration, is:
--------------------------
Type Pollution Statistics:
--------------------------
1: io.type.pollution.example.B
Count: 6477753
Types:
io.type.pollution.example.I2
io.type.pollution.example.I3
io.type.pollution.example.I1
Traces:
io.type.pollution.example.Main.goo(Main.java:74)
class: io.type.pollution.example.I3
count: 1826427
class: io.type.pollution.example.I2
count: 1430748
io.type.pollution.example.Main.foo(Main.java:69)
class: io.type.pollution.example.I1
count: 1803371
io.type.pollution.example.Main$$Lambda$ByteBuddy$4.accept(Unknown Source)
class: io.type.pollution.example.I2
count: 1417207
--------------------------
2: io.type.pollution.example.C
Count: 4804011
Types:
io.type.pollution.example.I2
io.type.pollution.example.I1
Traces:
io.type.pollution.example.Main.castToI1(Main.java:85)
class: io.type.pollution.example.I1
count: 2400934
io.type.pollution.example.Main.castToI2(Main.java:89)
class: io.type.pollution.example.I2
count: 1848168
io.type.pollution.example.Main$$Lambda$ByteBuddy$6.accept(Unknown Source)
class: io.type.pollution.example.I2
count: 554909
--------------------------
- Count: is the number of observed successful
checkcast
/instanceof
/Class::isAssignableFrom
/Class::cast
/Class::isInstance
against a different (interface) type from the last seen - Types: is the list of different (interface) types observed
- Traces: is a list of top method stack traces along with the interface class seen and related count (ie observed secondary super cache invalidations).
note Traces
format is compatible with
Idea IntelliJ Stack Trace Viewer
The report is an ordered list which types are ordered by increasing Count
ie
io.type.pollution.example.B
is the top type based on it and more likely the one with the highest chance
to cause scalability issues.
Due to the complexities of JIT mechanism (Inlining heuristic, method and bytecode Type Profiling, etc etc) the agent cannot detect if a potential treat translates into a real (scalability) performance issue: the smart and wise user should create system/JMH benchmarks to analyze the results of changes and narrow the scope of the agent to ignore the innocent traces.
To narrow the scope of the tracing, users can add a comma separated list of package prefixes (not full qualified class names!!!) to the agent configuration and the agent will place tracing around the usual suspects byte-code instructions just on these:
eg
$ java -javaagent:agent/target/type-pollution-agent-0.1-SNAPSHOT.jar=io.other -jar example/target/type-pollution-example-0.1-SNAPSHOT.jar
# IT WOULD PRINT...NOTHING! No packages starting with io.other exists :P
Said that, cleaning up every instanceof/checkcast
misuses is the best way to get rid
of any potential scalability issue, without relying on how JIT optimizes specific tests.
Yes! To enable full-stack trace sampling for each type check execution, use:
WARNING: this can have a large performance impact!!
-Dio.type.pollution.full.traces=true
And, the output will become:
--------------------------
Type Pollution Statistics:
--------------------------
1: io.type.pollution.example.B
Count: 6477753
Types:
io.type.pollution.example.I2
io.type.pollution.example.I3
io.type.pollution.example.I1
Traces:
io.type.pollution.example.Main.goo(Main.java:74)
class: io.type.pollution.example.I3
count: 1826427
class: io.type.pollution.example.I2
count: 1430748
io.type.pollution.example.Main.foo(Main.java:69)
class: io.type.pollution.example.I1
count: 1803371
io.type.pollution.example.Main$$Lambda$ByteBuddy$4.accept(Unknown Source)
class: io.type.pollution.example.I2
count: 1417207
Full Traces:
--------------------------
io.type.pollution.example.Main.goo(Main.java:74)
io.type.pollution.example.Main.lambda$main$0(Main.java:57)
io.type.pollution.example.Main$$Lambda$ByteBuddy$1/0x00000008001bf840.run(Unknown Source)
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
java.base/java.lang.Thread.run(Thread.java:829)
--------------------------
io.type.pollution.example.Main$$Lambda$ByteBuddy$4/0x00000008001be440.accept(Unknown Source)
io.type.pollution.example.Main.consumeAsI2(Main.java:81)
io.type.pollution.example.Main.lambda$main$0(Main.java:58)
io.type.pollution.example.Main$$Lambda$ByteBuddy$1/0x00000008001bf840.run(Unknown Source)
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
java.base/java.lang.Thread.run(Thread.java:829)
--------------------------
io.type.pollution.example.Main.foo(Main.java:69)
io.type.pollution.example.Main.lambda$main$0(Main.java:56)
io.type.pollution.example.Main$$Lambda$ByteBuddy$1/0x00000008001bf840.run(Unknown Source)
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
java.base/java.lang.Thread.run(Thread.java:829)
--------------------------
# Rest omitted for brevity....
To reduce the performance impact due to collecting full stack traces for each execution, consider adding too
-Dio.type.pollution.full.traces.ms=10
that's going to perform stack sampling with a global tick time of 10 milliseconds: tune it to match your perf requirement and your type of load.