C++: Instantiate model generation library #19295

MathiasVP · 2025-04-11T18:53:04Z

Now that both #19273 and #19274 are merged we can finally get to the fun part: Adding the actual implementation of model generation to C++ 🎉.

I couldn't think of a good way to structure the commits in this PR. Apologies!

The first commit adds the entire library
The second commit adds tests that will succeed at the end of the PR.
The third commit instantiates the inline expectation test framework to test model generation
The fourth commit adds all the files that I could copy/paste from C#/Java/Rust. It's basically all the query and test files required for model generation.

I think that after this PR is merged the final step is to add the required DCA suite for model generation. However, as this isn't yet done I don't think it makes sense to run any DCA for this right now.

I've done some testing of this already: I've run the model generation on sqlite and the models look sensible for the very small subset that I've checked (if you're curious: https://gist.github.com/MathiasVP/6942f022c7a8f4e515c80ccd442ab59f).

cc @michaelnebel would you mind taking a brief look at this PR? I don't expect you to review the C++ specific parts, obviously 🙈

…ration.

Copilot

Pull Request Overview

This PR instantiates the model generation library for C++ by adding the complete library, tests, and associated extension configuration files to enable model generation via query summaries.

Adds annotated model definitions and summaries in the library tests.
Introduces tests (including instantiation of the inline expectation test framework) for verifying model generation.
Provides extension YAML files to integrate summary models into the CodeQL pack.

Reviewed Changes

Copilot reviewed 4 out of 18 changed files in this pull request and generated no comments.

File	Description
cpp/ql/test/library-tests/dataflow/modelgenerator/dataflow/summaries.cpp	Adds model definitions with summary annotations for C++ dataflow.
cpp/ql/test/library-tests/dataflow/modelgenerator/dataflow/CaptureSummaryModels.ext.yml	Configures summary capture extension with manual models.
cpp/ql/test/library-tests/dataflow/modelgenerator/dataflow/CaptureContentSummaryModels.ext.yml	Configures content-based summary capture extension.
cpp/ql/src/utils/modelgenerator/GenerateFlowModel.py	Introduces a utility script for generating C++ flow models.

Files not reviewed (14)

cpp/ql/lib/utils/test/InlineMadTest.qll: Language not supported
cpp/ql/src/utils/modelgenerator/CaptureContentSummaryModels.ql: Language not supported
cpp/ql/src/utils/modelgenerator/CaptureMixedNeutralModels.ql: Language not supported
cpp/ql/src/utils/modelgenerator/CaptureMixedSummaryModels.ql: Language not supported
cpp/ql/src/utils/modelgenerator/CaptureNeutralModels.ql: Language not supported
cpp/ql/src/utils/modelgenerator/CaptureSinkModels.ql: Language not supported
cpp/ql/src/utils/modelgenerator/CaptureSourceModels.ql: Language not supported
cpp/ql/src/utils/modelgenerator/CaptureSummaryModels.ql: Language not supported
cpp/ql/src/utils/modelgenerator/internal/CaptureModels.qll: Language not supported
cpp/ql/src/utils/modelgenerator/internal/CaptureModelsPrinting.qll: Language not supported
cpp/ql/test/library-tests/dataflow/modelgenerator/dataflow/CaptureContentSummaryModels.expected: Language not supported
cpp/ql/test/library-tests/dataflow/modelgenerator/dataflow/CaptureContentSummaryModels.ql: Language not supported
cpp/ql/test/library-tests/dataflow/modelgenerator/dataflow/CaptureSummaryModels.expected: Language not supported
cpp/ql/test/library-tests/dataflow/modelgenerator/dataflow/CaptureSummaryModels.ql: Language not supported

Comments suppressed due to low confidence (2)

cpp/ql/src/utils/modelgenerator/GenerateFlowModel.py:9

[nitpick] Consider renaming 'madpath' to a more descriptive name like 'modelsDataPath' for improved clarity.

madpath = os.path.join(gitroot, "misc/scripts/models-as-data/")

cpp/ql/test/library-tests/dataflow/modelgenerator/dataflow/CaptureContentSummaryModels.ext.yml:6

[nitpick] Ensure consistent naming for model identifiers; consider using 'Models' instead of 'models' to align with other summary annotations.

- [ "models", "ManuallyModelled", False, "hasSummary", "(void *)", "", "Argument[0]", "ReturnValue", "value", "manual"]

cpp/ql/lib/utils/test/InlineMadTest.qll

cpp/ql/src/utils/modelgenerator/internal/CaptureModels.qll

michaelnebel

This is very very nice! Well done @MathiasVP !

michaelnebel · 2025-04-14T08:21:47Z

cpp/ql/src/utils/modelgenerator/GenerateFlowModel.py

+import generate_flow_model as model
+
+language = "cpp"
+model.Generator.make(language).run()


It is my intention to change the python script such that --with-summaries uses the mixed query instead (and correspondingly for the neutral), but for testing purposes it is nice to keep both the content based and heuristic based queries around.

michaelnebel · 2025-04-14T08:58:41Z

cpp/ql/src/utils/modelgenerator/internal/CaptureModels.qll

+    f.isStatic()
+  }
+
+  predicate isUninterestingForDataFlowModels(Callable api) {


Maybe consider to move the content of this predicate to the relevant predicate instead.
This will make things easier in case you intend to introduce "lifting" logic for the produced models.
For C# and Java we consider implementations of method "prototypes" (implementations of interface- or abstract class members) to abide to the "contract" of the interface- or abstract member - at least if the implementation is in the same codebase as the interface- or abstract member declaration.
That is, for something like

public interface I { object M(object o); } public class C : I { public object M(object o) { return o; } }

we would like to "lift" the model identified for C.M to I.M and use this for all implementations of I.M.

michaelnebel · 2025-04-14T09:05:19Z

cpp/ql/src/utils/modelgenerator/internal/CaptureModels.qll

+   * of `c`.
+   */
+  private string isExtensible(Callable c) {
+    if c instanceof MemberFunction then result = "true" else result = "false"


Maybe return false for member functions that are final or declared in a final class/struct.

michaelnebel · 2025-04-14T09:07:47Z

cpp/ql/src/utils/modelgenerator/internal/CaptureModels.qll

+            i
+        )
+    |
+      if params = "" then result = "()" else result = "(" + params + ")"


Suggested change

if params = "" then result = "()" else result = "(" + params + ")"

"(" + params + ")"

michaelnebel · 2025-04-14T09:16:56Z

cpp/ql/src/utils/modelgenerator/internal/CaptureModels.qll

+    // uninteresting (which is good!)
+    not api.(Function).hasDefinition()
+    or
+    isPrivate(api)


What about protected members?

michaelnebel · 2025-04-14T09:22:15Z

cpp/ql/src/utils/modelgenerator/GenerateFlowModel.py

+import generate_flow_model as model
+
+language = "cpp"
+model.Generator.make(language).run()


The python script puts the generated models in

lib/ext/generated

If you intend to use this location for generated models, maybe consider to extend the qlpack.yml to also include data extensions from this location.

michaelnebel · 2025-04-14T09:25:11Z

cpp/ql/test/library-tests/dataflow/modelgenerator/dataflow/summaries.cpp

+        int* tainted;
+
+        //No model as destructors are excluded from model generation.
+        ~BasicFlow() = default;


Maybe consider adding more testcases for members where we expect not to get any models. It is a good idea (as you have already done) to narrow the number of produced models to only "effectively public" endpoints as loading data extensions can be somewhat expensive (if there are 10k's+ models).

michaelnebel · 2025-04-14T10:32:35Z

Inspiration for adding model generation summaries to DCA: https://github.com/github/codeql-dca/pull/847/
Not sure why the experiments failed last week. We might need help from the DX team on this (but maybe this is something for after Easter).

MathiasVP added 4 commits April 11, 2025 19:33

C++: Instantiate model generation library.

e0f62b7

C++: Add tests that will soon succeed.

e701092

C++: Instantiate inline expectation test framework to test model gene…

37e91fd

…ration.

C++: Add copy-pasted files from C#.

8283594

Copilot bot review requested due to automatic review settings April 11, 2025 18:53

MathiasVP requested a review from a team as a code owner April 11, 2025 18:53

github-actions bot added the C++ label Apr 11, 2025

Copilot AI reviewed Apr 11, 2025

View reviewed changes

MathiasVP added the no-change-note-required This PR does not need a change note label Apr 11, 2025

github-advanced-security bot found potential problems Apr 11, 2025

View reviewed changes

cpp/ql/lib/utils/test/InlineMadTest.qll Fixed Show fixed Hide fixed

cpp/ql/src/utils/modelgenerator/internal/CaptureModels.qll Fixed Show fixed Hide fixed

C++: Fix ql-for-ql findings.

9b5643d

michaelnebel reviewed Apr 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++: Instantiate model generation library #19295

C++: Instantiate model generation library #19295

MathiasVP commented Apr 11, 2025

Copilot AI left a comment

michaelnebel left a comment

michaelnebel Apr 14, 2025

michaelnebel Apr 14, 2025

michaelnebel Apr 14, 2025

michaelnebel Apr 14, 2025

michaelnebel Apr 14, 2025

michaelnebel Apr 14, 2025

michaelnebel Apr 14, 2025

michaelnebel commented Apr 14, 2025

	if params = "" then result = "()" else result = "(" + params + ")"
	"(" + params + ")"

C++: Instantiate model generation library #19295

Are you sure you want to change the base?

C++: Instantiate model generation library #19295

Conversation

MathiasVP commented Apr 11, 2025

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

michaelnebel left a comment

Choose a reason for hiding this comment

michaelnebel Apr 14, 2025

Choose a reason for hiding this comment

michaelnebel Apr 14, 2025

Choose a reason for hiding this comment

michaelnebel Apr 14, 2025

Choose a reason for hiding this comment

michaelnebel Apr 14, 2025

Choose a reason for hiding this comment

michaelnebel Apr 14, 2025

Choose a reason for hiding this comment

michaelnebel Apr 14, 2025

Choose a reason for hiding this comment

michaelnebel Apr 14, 2025

Choose a reason for hiding this comment

michaelnebel commented Apr 14, 2025