Skip to content

Java: Diff-informed queries: phase 3 (non-trivial locations) #20077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

d10c
Copy link
Contributor

@d10c d10c commented Jul 17, 2025

This PR enables diff-informed mode on queries that select a location other than dataflow source or sink. This entails adding a non-trivial location override that returns the locations that are actually selected.

Prior work includes PRs like #19663, #19759, and #19817. This PR uses the same patch script as those PRs to find candidate queries to convert to diff-enabled. This is the final step in mass-enabling diff-informed queries on all the languages.

Commit-by-commit reviewing is recommended.

  • I have split the commits that add/modify tests from the ones that enable/disable diff-informed queries.
  • If the commit modifies a .qll file, in the commit message I've included links to the queries that depend on that .qll for easier reviewing.
  • Feel free to delegate parts of the review to others who may be more specialized in certain languages.

Potentially tricky cases:

@github-actions github-actions bot added the Java label Jul 17, 2025
@d10c d10c force-pushed the d10c/diff-informed-phase-3-java branch from b5c521d to 2e426db Compare July 17, 2025 12:34
@d10c d10c added the no-change-note-required This PR does not need a change note label Jul 17, 2025
@d10c d10c marked this pull request as ready for review July 17, 2025 14:03
@Copilot Copilot AI review requested due to automatic review settings July 17, 2025 14:03
@d10c d10c requested a review from a team as a code owner July 17, 2025 14:03
@d10c d10c requested a review from michaelnebel July 17, 2025 14:03
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables diff-informed mode on Java security queries by adding non-trivial location overrides for queries that select locations other than dataflow source or sink. This is part of the final phase of mass-enabling diff-informed queries across all languages.

Key changes:

  • Adds observeDiffInformedIncrementalMode() and location override predicates to various security query library files
  • Converts test files from inline flow tests to reference actual queries with expectation annotations
  • Updates test data with new Source/Sink/Alert annotations replacing hasTaintFlow annotations

Reviewed Changes

Copilot reviewed 50 out of 54 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
java/ql/lib/semmle/code/java/security/*.qll Added diff-informed mode predicates and location overrides to security query configurations
java/ql/test/query-tests/security//Test.java Updated test annotations from hasTaintFlow to Alert/Source/Sink format
java/ql/test/query-tests/security//Test.ql Removed inline flow test implementations
java/ql/test/query-tests/security//Test.qlref Added query reference files pointing to actual security queries
java/ql/test/query-tests/security//Test.expected Added expected test output files

@@ -19,6 +19,10 @@ module LogInjectionConfig implements DataFlow::ConfigSig {
}

predicate isBarrierIn(DataFlow::Node node) { isSource(node) }

predicate observeDiffInformedIncrementalMode() {
none() // straightforward case; but the large test source is causing OOMs under `--check-diff-informed`.
Copy link
Preview

Copilot AI Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions OOMs under --check-diff-informed but doesn't provide sufficient context about the issue or potential solutions. Consider adding more details about the specific test case causing the problem and any planned follow-up actions.

Suggested change
none() // straightforward case; but the large test source is causing OOMs under `--check-diff-informed`.
none() // The large test source used in this query causes Out-Of-Memory (OOM) issues under `--check-diff-informed` mode.
// This predicate is intentionally disabled to prevent OOMs. Future work may involve optimizing the test source
// or refining the query to handle large datasets more efficiently.

Copilot uses AI. Check for mistakes.


predicate observeDiffInformedIncrementalMode() { any() }

Location getASelectedSourceLocation(DataFlow::Node source) { none() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should exclude sources here, as the path is fully reported.

Copy link
Contributor Author

@d10c d10c Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean path-problems always report the location of source and sink? That seems to contradict the example of WebviewDebuggingEnabled.ql/WebviewDebuggingEnabledQuery.qll mentioned in the incremental codeql docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem to contradict that example, yes. I'm not completely sure here, but both ends of the path are a part of the result tuple. From the QL point of view, it's a bit arbitrary whether the second end-point is included in the message or not - a lot of queries do this, but from the query writers perspective it's perhaps a bit of a coin-toss, since the path includes both end-points regardless. Now if this coin-toss decision affects whether or not a result is included in a PR, then of course it shouldn't be made arbitrarily, and if that's the case then perhaps we should institute a rule (e.g. in ql-for-ql) to always include the second end-point in the message.

Now, as mentioned, I'm unsure about the filtering semantics of the downstream consumption in PRs, but considering the two possible cases then either these kind of results aren't filtered in which case we shouldn't exclude sources here, or they are filtered in which case I think we also shouldn't filter here, since then I'd argue we'd want to modify the query to include the source in the message to prevent the filtering.

We should move this conversation to slack and figure it out.

Copy link
Contributor

@michaelnebel michaelnebel Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the documentation we only want to report alerts on locations that are pertaining to either the primary location (location of the first element in the select) or an alert pertaining to a related location - locations for elements used in placeholders @.
For the ExecTaintedEvironment the "source" is the second column (so not a primary or related location), so it should be fine to set the getASelectedSourceLocation to none(). Note that this doesn't mean that we will disregard "all" sources as a part of the flow path computation as a source is still relevant if there is a relevant sink.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, you are questioning the design (saw the thread in slack).


predicate observeDiffInformedIncrementalMode() { any() }

Location getASelectedSinkLocation(DataFlow::Node sink) { none() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. The query reports the full path, so we shouldn't exclude sinks like this.

d10c added 14 commits July 17, 2025 18:58
d10c added 11 commits July 17, 2025 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Java no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants