Improve CHABuilder
precision via resolving callees using the type of the receiver variable
#197
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
According to lecture 7, in Class Hierarchy Analysis (CHA for short), the resolution of virtual invocation such as
base.foo()
should traverse down the class hierarchy starting from the type of the receiver variablebase
.However, Tai-e currently uses the declaring class of the referenced method, as in the following snippet:
Tai-e/src/main/java/pascal/taie/analysis/graph/callgraph/CHABuilder.java
Line 154 in 523aec2
This might introduce imprecision when the base variable has a different type of the declaring class of the referenced method. Below is a simplified example from the
antlr
benchmark program from java-benchmarks, the corresponding source code can be found here:Tir dumped by Tai-e:
Note the how the types of variable
source
and$r36
differ in the source code and the IR.The variable
$r36
in the generated Tir (source
in the source code) has typeBufferedReader
(Reader
in the source code). Since we know that$r36
can only point to objects of types that are subtypes ofBufferedReader
, we can eliminate other callees from other subclasses ofReader
.The difference of variable types comes from the fact that bytecode frontends (Soot, at the moment) usually perform a precise local type inference algorithm to recover tight type information from the bytecode. Although variable
source
has typeReader
in the original source code, it is only assigned with objects of typeBufferedReader
throughout its method. This guarantees that for invoke statementsrecv.foo()
, the type of receiver is always a subtype of the declaring class offoo
, and using the type of the receiver variable is always as precise as (sometimes more precise than) using the declaring class of the method reference.