Skip to content

Fix Mahjong's FieldPointsToGraph by handling non-functional MockObj in PTA Result #162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 23, 2025

Conversation

jjppp
Copy link
Member

@jjppp jjppp commented Jun 12, 2025

Inconsistency

The current implementation of Mahjong is inconsistent with its paper. According to [1], if the field f of an object oi may point to another object oj, then there is an edge from oi to oj with label f added to the FieldPointsToGraph. However, currently Mahjong builds edges only for those fields that are loaded in the analyzed program.

MockObjs

ConcurrentModificationException may be thrown when building the FieldPointsToGraph.
In #140 an optimization targeting zero length arrays is introduced by allocating non-functional MockObjs for 0 sized arrays, whose array indices should never point to any objects. Currently FieldPointsToGraph does not check for this situation.

[1] T. Tan, Y. Li, and J. Xue, “Efficient and precise points-to analysis: modeling the heap by merging equivalent automata,” in Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, in PLDI 2017. New York, NY, USA: Association for Computing Machinery, 2017, pp. 278–291. doi: 10.1145/3062341.3062360.

@codecov-commenter
Copy link

codecov-commenter commented Jun 12, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 94.73684% with 1 line in your changes missing coverage. Please review.

Project coverage is 75.18%. Comparing base (21118b5) to head (2a9f8e0).

Files with missing lines Patch % Lines
...l/taie/analysis/pta/PointerAnalysisResultImpl.java 75.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff              @@
##             master     #162      +/-   ##
============================================
+ Coverage     72.85%   75.18%   +2.32%     
- Complexity     4435     4580     +145     
============================================
  Files           480      480              
  Lines         15927    15927              
  Branches       2185     2179       -6     
============================================
+ Hits          11603    11974     +371     
+ Misses         3467     3080     -387     
- Partials        857      873      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jjppp jjppp requested a review from silverbullettt July 2, 2025 04:51
@jjppp jjppp self-assigned this Jul 2, 2025
@silverbullettt
Copy link
Contributor

silverbullettt commented Jul 10, 2025

The perceived 'inconsistency' is actually a minor optimization. If a field has never been loaded, there's no need to consider it during object merging; hence, we omitted it from the FPG. Regarding zero-length arrays, as they are non-functional, their indices should not reference any objects. Is it necessary for Mahjong to perform a check on this?

@jjppp
Copy link
Member Author

jjppp commented Jul 16, 2025

Just force-updated a test in 5d2fa24 in this PR to reproduce the bug regarding zero-length arrays, a corresponding stacktrace for this bug can be found here.
Note that this bug is non-deterministic, reproducing it might need multiple runs (in the given test we set the number of runs to 100, and caught 1 failure).

Yes, zero-length arrays are non-functional objects and their indices do not point to any objects, so the DefaultSolver will not create pointers for them via CSManager.

if (array.getObject().isFunctional()) {
ArrayIndex arrayIndex = csManager.getArrayIndex(array);
addPFGEdge(arrayIndex, to, FlowKind.ARRAY_LOAD);
}

When FieldPointsToGraph iterates through pointers concurrently, without checking whether the base object is functional, PointerAnalysisResultImpl will invoke csManager.getArrayIndex() to create pointers for zero-length arrays:
return arrayPointsTo.computeIfAbsent(base, b ->
removeContexts(csManager.getCSVarsOf(b)
.stream()
.flatMap(Pointer::objects)
.map(csManager::getArrayIndex)
.flatMap(ArrayIndex::objects)));

This is explains the ConcurrentModificationException thrown by mahjong: the ptrManager of csManager is updated concurrently for using computeIfAbsent to create new array indices for zero-length arrays, which is a concurrent modification operation.

@jjppp
Copy link
Member Author

jjppp commented Jul 16, 2025

Incorrect PFG edge

For an array load statement x = y[*], current implementation adds PFG edges from all $$o_i \in pts(y)$$ to all $$o_j \in pts(y[*])$$:

for (LoadField load : var.getLoadFields()) {
if (isConcerned(load.getRValue())) {
for (Obj baseObj : pta.getPointsToSet(var)) {
Field field = factory.get(load.getFieldRef().resolve());
Set<Obj> pts = pta.getPointsToSet(load.getRValue());
addFieldPointsTo(baseObj, field, pts);
}
}
}

This is in fact incorrect, since $pts(y[*])=\{ o_j | o_i \in pts(y) \bigwedge o_j \in pts(o_i[*]) \}$, there is no guarantee that all the objects in $pts(y[*])$ will be pointed to by all the objects in $pts(y)$.

This will actually add more edges than needed to the PFG. Consider the following example:

Object[] x = new Object[1]; // o1
Object[] y = new Object[2]; // o2
x[0] = new Object[]; // o3
y[0] = new Object[]; // o4
Object[] z; z = x; z = y;
return z[0]; // ??

For array access z[0], current implementation adds edges from all the objects in pts(z) to all the objects in pts(z[0]). This results in 4 edges (o1 -> o3, o1 -> o4, o2 -> o3, o2 -> o4).
However, o1[*] never points to o4, and no access paths starting from o1 can reach o4. The correct implementation should contain only 2 edges (o1 -> o3, o2 -> o4) in the PFG.

Optimization

As for the optimization, a field (or array index) that is never loaded can still affect the final analysis result through reflection or native APIs (consider the handling of argument passing for Method.invoke(Object[]), where the arguments for method call is implicitly loaded from the argument array, not via explicit array load statements).

Here I compare 2-object sensitive pointer analysis guided by the current mahjong (denoted as Mahjong) and my fix (denoted as Mahjong-Fix) using some of the benchmark programs from https://github.com/pascal-lab/java-benchmarks, and the four precision metrics from mahjong's paper.
The result is summarized using the following table.
Mahjong-fix is as precise as Mahjong, and has better precision on most of the tested benchmarks for most metrics, while Mahjong seems to have no significant performance advantage over Mahjong-fix (Note that Mahjong's pre-analysis is even slower than Mahjong-fix).

Program Analysis Pre-Time (s) Time (s) fail cast poly call reach method call edge
fop Mahjong 1 2 322 824 8088 36221
Mahjong-fix 1 3 322 824 8088 36214
luindex Mahjong 1 3 331 931 7509 35514
Mahjong-fix 1 3 331 931 7509 35507
lusearch Mahjong 1 2 344 1127 8148 38287
Mahjong-fix 1 3 344 1127 8148 38280
antlr Mahjong 1 3 446 1641 8285 53097
Mahjong-fix 1 3 446 1641 8285 53090
pmd Mahjong 3 9 1380 2545 12809 64403
Mahjong-fix 2 10 1374 2543 12784 64234
bloat Mahjong 2 149 1221 1572 9496 58433
Mahjong-fix 1 140 1221 1572 9496 58426
chart Mahjong 9 35 1194 2005 15672 74164
Mahjong-fix 5 34 1179 2001 15649 74032
xalan Mahjong 4 150 955 2840 12506 63866
Mahjong-fix 2 149 941 2835 12470 63633
findbugs Mahjong 6 29 1969 3580 16809 89090
Mahjong-fix 3 30 1964 3578 16803 88936

@silverbullettt
Copy link
Contributor

silverbullettt commented Jul 16, 2025

Great and detailed explanation! Your fix on PFG edge building is well-executed. Regarding the ConcurrentModificationException, I suggest modifying PointerAnalysisResultImpl to reduce the need for clients to be aware of functional/non-functional objects. I've highlighted the methods below that should be adjusted to return empty points-to sets for non-functional base objects:"

public Set<Obj> getPointsToSet(Var base, JField field) {
if (!propTypes.isAllowed(field.getType())) {
return Set.of();
}
if (field.isStatic()) {
logger.warn("{} is not an instance field", field);
return Set.of();
}
// TODO - properly handle non-exist base.field
return ifieldPointsTo.computeIfAbsent(new Pair<>(base, field), p ->
removeContexts(csManager.getCSVarsOf(base)
.stream()
.flatMap(Pointer::objects)
.map(o -> csManager.getInstanceField(o, field))
.flatMap(InstanceField::objects)));
}

public Set<Obj> getPointsToSet(Obj base, JField field) {
if (!propTypes.isAllowed(field.getType())) {
return Set.of();
}
if (field.isStatic()) {
logger.warn("{} is not an instance field", field);
return Set.of();
}
// TODO - properly handle non-exist base.field
return removeContexts(csManager.getCSObjsOf(base)
.stream()
.map(o -> csManager.getInstanceField(o, field))
.flatMap(InstanceField::objects));
}

public Set<Obj> getPointsToSet(Var base, Var index) {
if (base.getType() instanceof ArrayType baseType) {
if (!propTypes.isAllowed(baseType.elementType())) {
return Set.of();
}
} else {
logger.warn("{} is not an array", base);
return Set.of();
}
return arrayPointsTo.computeIfAbsent(base, b ->
removeContexts(csManager.getCSVarsOf(b)
.stream()
.flatMap(Pointer::objects)
.map(csManager::getArrayIndex)
.flatMap(ArrayIndex::objects)));
}

public Set<Obj> getPointsToSet(Obj array) {
if (array.getType() instanceof ArrayType baseType) {
if (!propTypes.isAllowed(baseType.elementType())) {
return Set.of();
}
} else {
logger.warn("{} is not an array", array);
return Set.of();
}
return removeContexts(csManager.getCSObjsOf(array)
.stream()
.map(csManager::getArrayIndex)
.flatMap(ArrayIndex::objects));
}

…and object fields in `PointerAnalysisResultImpl`
@jjppp
Copy link
Member Author

jjppp commented Jul 23, 2025

I have fixed PointerAnalysisResultImpl according to your instructions, getPointsToSet() should now ignore non functional objects.
Could you review my changes, by any chance? @silverbullettt

Copy link
Contributor

@silverbullettt silverbullettt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@jjppp jjppp changed the title Fix for Mahjong's FieldPointsToGraph Fix Mahjong's FieldPointsToGraph by handling non-functional MockObjs in PTA Result Jul 23, 2025
@jjppp jjppp changed the title Fix Mahjong's FieldPointsToGraph by handling non-functional MockObjs in PTA Result Fix Mahjong's FieldPointsToGraph by handling non-functional MockObj in PTA Result Jul 23, 2025
@jjppp jjppp merged commit 359eb63 into pascal-lab:master Jul 23, 2025
4 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Jul 23, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants