Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++: Refactor SSA usage in data flow. #18942

Merged
merged 14 commits into from
Mar 14, 2025

Conversation

aschackmull
Copy link
Contributor

Commit-by-commit review encouraged.

There are a bunch of minor commits, and then there's "C++: Use SSA data flow integration module", which is the main change that switches C++ from direct phi-read references to use the data flow integration module to construct use-use flow.

I've done a lot of local black-box testing of these changes and they generally looking very reasonable. There are minor changes on the order of 0.1% of the ssa-based data flow edges, but at this point I think further evaluation is best done via dca.

@aschackmull aschackmull added the no-change-note-required This PR does not need a change note label Mar 6, 2025
@Copilot Copilot bot review requested due to automatic review settings March 6, 2025 14:19
@aschackmull aschackmull requested a review from a team as a code owner March 6, 2025 14:19
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Tip: If you use Visual Studio Code, you can request a review from Copilot before you push from the "Source Control" tab. Learn more

@github-actions github-actions bot added the C++ label Mar 6, 2025
Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! If tests DCA comes back happy I'm perfectly happy merging this and leaving the small TODOs.

Update: I see CI is already finding some test discrepancies. Happy to take a look at those

@MathiasVP
Copy link
Contributor

The result changes should be fixed by #18955

@aschackmull
Copy link
Contributor Author

Rebased and added two new commits. The first commit "C++: Fix spurious ExprNode fanout in DataFlowIntegration" should improve performance by using proper control-flow nodes instead of UseImpl as the instantiation of Expr in DataFlowIntegration. A side-effect of that is that barrier-guards can no longer properly identify guarded uses, so that part of barrier-guards is moved to the C++ side.
The second commit "C++: Remove superfluous disjunct" is a very minor cleanup. The second disjunct is completely contained in the first and thus superfluous.

@MathiasVP
Copy link
Contributor

And finally: The lost test result in the internal directory will be fixed by #19001

@MathiasVP MathiasVP force-pushed the cpp/refactor-ssa branch 2 times, most recently from 61f1f07 to 7600a2d Compare March 13, 2025 17:19
@aschackmull aschackmull added the depends on internal PR This PR should only be merged in sync with an internal Semmle PR label Mar 14, 2025
aschackmull and others added 14 commits March 14, 2025 10:51
Before:

[2025-03-12 10:27:53] Evaluated non-recursive predicate SsaInternals::UseImpl.hasIndexInBlock/2#dispred#1e34a5af@e87543ui in 935ms (size: 8905695).
Evaluated relational algebra for predicate SsaInternals::UseImpl.hasIndexInBlock/2#dispred#1e34a5af@e87543ui with tuple counts:
                          {3} r1 = SsaInternals::DirectUseImpl#a58aae88 AND NOT `_ArithmeticOperation::PostfixCrementOperation#17623ada_Expr::UnaryOperation.getOperand/0#dispred#990__#antijoin_rhs`(FIRST 3)
         8579337   ~4%    {2}    | SCAN OUTPUT In.1, In.0
         8579337   ~0%    {2}    | JOIN WITH `Operand::Operand.getUse/0#dispred#427b49d0` ON FIRST 1 OUTPUT Rhs.1, Lhs.1
         8579337   ~0%    {3}    | JOIN WITH `IRBlock::Cached::getInstruction/2#627f9c61_201#join_rhs` ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Rhs.2

           48215   ~2%    {2} r2 = SCAN SsaInternals::GlobalUse#9cd323b4 OUTPUT In.2, In.0
        35467318   ~3%    {2}    | JOIN WITH `SSAConstruction::getInstructionEnclosingIRFunction/1#5443f355_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1

           48189   ~0%    {2} r3 = JOIN r2 WITH Instruction::ReturnInstruction#28bfb7eb ON FIRST 1 OUTPUT Lhs.0, Lhs.1

           12332   ~0%    {2} r4 = JOIN r2 WITH Instruction::UnreachedInstruction#774c7a34 ON FIRST 1 OUTPUT Lhs.0, Lhs.1

           60521   ~0%    {2} r5 = r3 UNION r4
           60521   ~2%    {3}    | JOIN WITH `IRBlock::Cached::getInstruction/2#627f9c61_201#join_rhs` ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Rhs.2

           39316   ~0%    {2} r6 = JOIN SsaInternals::FinalParameterUse#c1f84700_10#join_rhs WITH `Parameter::Parameter.getFunction/0#dispred#803faca2` ON FIRST 1 OUTPUT Rhs.1, Lhs.1
        43821265   ~0%    {2}    | JOIN WITH `Instruction::Instruction.getEnclosingFunction/0#dispred#cb8ccc56_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1

           39194   ~0%    {2} r7 = JOIN r6 WITH Instruction::ReturnInstruction#28bfb7eb ON FIRST 1 OUTPUT Lhs.0, Lhs.1

           21255   ~2%    {2} r8 = JOIN r6 WITH Instruction::UnreachedInstruction#774c7a34 ON FIRST 1 OUTPUT Lhs.0, Lhs.1

           60449   ~0%    {2} r9 = r7 UNION r8
           60449   ~3%    {3}    | JOIN WITH `IRBlock::Cached::getInstruction/2#627f9c61_201#join_rhs` ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Rhs.2

         8784725   ~1%    {5} r10 = JOIN `_SsaInternals::DirectUseImpl#a58aae88_SsaInternals::DirectUseImpl.getBase/0#dispred#4b8c43d0_SsaInte__#shared` WITH `SsaInternals::DirectUseImpl.getBase/0#dispred#4b8c43d0` ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Lhs.1, Lhs.2, Lhs.3
         8784725   ~0%    {5}    | JOIN WITH `cached_SSAConstruction::getInstructionAst/1#d0d95b50` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4
          210435   ~4%    {5}    | JOIN WITH `Expr::UnaryOperation.getOperand/0#dispred#990de484#bf_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4
          205388   ~0%    {4}    | JOIN WITH ArithmeticOperation::PostfixCrementOperation#17623ada ON FIRST 1 OUTPUT Lhs.4, Lhs.3, Lhs.2, Lhs.1
          205388   ~4%    {3}    | JOIN WITH `__IRBlock::Cached::getInstruction/2#627f9c61_201#join_rhs__ArithmeticOperation::PostfixCrementOperat__#join_rhs` ON FIRST 3 OUTPUT Rhs.4, Lhs.3, Rhs.3
          205388   ~0%    {3}    | JOIN WITH `Operand::Operand.getUse/0#dispred#427b49d0` ON FIRST 1 OUTPUT Lhs.2, Rhs.1, Lhs.1
          205388   ~1%    {3}    | JOIN WITH `IRBlock::Cached::getInstruction/2#627f9c61_021#join_rhs` ON FIRST 2 OUTPUT Lhs.2, Lhs.0, Rhs.2

         8905695   ~0%    {3} r11 = r1 UNION r5 UNION r9 UNION r10
                          return r11

After:

[2025-03-12 11:12:48] Evaluated non-recursive predicate SsaInternals::hasReturnPosition/3#02f7eab8@bc405c4l in 3ms (size: 49368).
Evaluated relational algebra for predicate SsaInternals::hasReturnPosition/3#02f7eab8@bc405c4l with tuple counts:
        49368  ~3%    {1} r1 = Instruction::ReturnInstruction#28bfb7eb UNION Instruction::UnreachedInstruction#774c7a34
        49368  ~0%    {2}    | JOIN WITH `cached_SSAConstruction::getInstructionEnclosingIRFunction/1#5443f355` ON FIRST 1 OUTPUT Lhs.0, Rhs.1
        49368  ~2%    {3}    | JOIN WITH `IRBlock::Cached::getInstruction/2#627f9c61_201#join_rhs` ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Rhs.2
                      return r1

[2025-03-12 11:12:54] Evaluated non-recursive predicate SsaInternals::UseImpl.hasIndexInBlock/2#dispred#1e34a5af@6e30cduo in 549ms (size: 8905695).
Evaluated relational algebra for predicate SsaInternals::UseImpl.hasIndexInBlock/2#dispred#1e34a5af@6e30cduo with tuple counts:
          48215   ~2%    {2} r1 = SCAN SsaInternals::GlobalUse#9cd323b4 OUTPUT In.2, In.0
          60521   ~2%    {3}    | JOIN WITH `SsaInternals::hasReturnPosition/3#02f7eab8` ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Rhs.2

          50725   ~0%    {2} r2 = JOIN `IRFunctionBase::IRFunctionBase.getFunction/0#dispred#b024672e_10#join_rhs` WITH `Parameter::Parameter.getFunction/0#dispred#803faca2_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1
          39231   ~2%    {2}    | JOIN WITH SsaInternals::FinalParameterUse#c1f84700_10#join_rhs ON FIRST 1 OUTPUT Lhs.1, Rhs.1
          60449   ~3%    {3}    | JOIN WITH `SsaInternals::hasReturnPosition/3#02f7eab8` ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Rhs.2

                         {3} r3 = SsaInternals::DirectUseImpl#a58aae88 AND NOT `_ArithmeticOperation::PostfixCrementOperation#17623ada_Expr::UnaryOperation.getOperand/0#dispred#990__#antijoin_rhs`(FIRST 3)
        8579337   ~1%    {2}    | SCAN OUTPUT In.1, In.0
        8579337   ~0%    {2}    | JOIN WITH `Operand::Operand.getUse/0#dispred#427b49d0` ON FIRST 1 OUTPUT Rhs.1, Lhs.1
        8579337   ~1%    {3}    | JOIN WITH `IRBlock::Cached::getInstruction/2#627f9c61_201#join_rhs` ON FIRST 1 OUTPUT Lhs.1, Rhs.1, Rhs.2

        8784725   ~0%    {5} r4 = JOIN `_SsaInternals::DirectUseImpl#a58aae88_SsaInternals::DirectUseImpl.getBase/0#dispred#4b8c43d0_SsaInte__#shared` WITH `SsaInternals::DirectUseImpl.getBase/0#dispred#4b8c43d0` ON FIRST 1 OUTPUT Rhs.1, Lhs.0, Lhs.1, Lhs.2, Lhs.3
        8784725   ~0%    {5}    | JOIN WITH `cached_SSAConstruction::getInstructionAst/1#d0d95b50` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4
         210435   ~0%    {5}    | JOIN WITH `Expr::UnaryOperation.getOperand/0#dispred#990de484#bf_10#join_rhs` ON FIRST 1 OUTPUT Rhs.1, Lhs.1, Lhs.2, Lhs.3, Lhs.4
         205388   ~2%    {4}    | JOIN WITH ArithmeticOperation::PostfixCrementOperation#17623ada ON FIRST 1 OUTPUT Lhs.4, Lhs.3, Lhs.2, Lhs.1
         205388   ~0%    {3}    | JOIN WITH `__IRBlock::Cached::getInstruction/2#627f9c61_201#join_rhs__ArithmeticOperation::PostfixCrementOperat__#join_rhs` ON FIRST 3 OUTPUT Rhs.4, Lhs.3, Rhs.3
         205388   ~0%    {3}    | JOIN WITH `Operand::Operand.getUse/0#dispred#427b49d0` ON FIRST 1 OUTPUT Lhs.2, Rhs.1, Lhs.1
         205388   ~0%    {3}    | JOIN WITH `IRBlock::Cached::getInstruction/2#627f9c61_021#join_rhs` ON FIRST 2 OUTPUT Lhs.2, Lhs.0, Rhs.2

        8905695   ~0%    {3} r5 = r1 UNION r2 UNION r3 UNION r4
                         return r5
@MathiasVP
Copy link
Contributor

MathiasVP commented Mar 14, 2025

Analysis of result changes

neovim__neovim

  • Lost: cpp/invalid-pointer-deref: The two lost results are TPs. It's caused by the change to the barriers in the query. It's probably fine, but we should think about whether we can do something else in the future
  • Lost: cpp/unbounded-write: A barrier is now being applied "correctly" (i.e., as the query intended to do, at least)
  • Gained: cpp/overrun-write: I think the query has an off-by-one, but otherwise the flow looks genuine

vim__vim and openjdk-jdk

  • Lost: 12 x cpp/unbounded-write: These are for the same reason as on neovim__neovim.

@MathiasVP
Copy link
Contributor

Overall, the changes in code and results LGTM. It's a bit of a shame that we lose those two cpp/invalid-pointer-deref, and I don't know how to regain them without exposing phi (reads) 😭. But I think it's a price I'm willing to pay for more code sharing.

Since this is a fairly major change (and I'm not actually on the team that owns this code anymore!) I would appreciate it if someone from the C team would take a second look. I'll ping them on Slack

Copy link
Contributor

@jketema jketema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve, per the internal discussion.

@aschackmull aschackmull merged commit 474b8a5 into github:main Mar 14, 2025
35 of 36 checks passed
@aschackmull aschackmull deleted the cpp/refactor-ssa branch March 14, 2025 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ depends on internal PR This PR should only be merged in sync with an internal Semmle PR no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants