Skip to content

Conversation

@zyw-bot
Copy link
Collaborator

@zyw-bot zyw-bot commented Oct 31, 2025

Link: llvm/llvm-project#165877
Requested by: @dtcxzyw

@github-actions github-actions bot mentioned this pull request Oct 31, 2025
@zyw-bot
Copy link
Collaborator Author

zyw-bot commented Oct 31, 2025

Diff mode

runner: ariselab-64c-docker
baseline: llvm/llvm-project@cc8ff73
patch: llvm/llvm-project#165877
sha256: 950fb945127d55c3168412fd752684323e3cd1395a608a570f4f10d2087df953
commit: 383bb45

1304 files changed, 1286865 insertions(+), 1285622 deletions(-)

Improvements:
  early-cse.NumCSECVP 97438 -> 101729 +4.40%
  jump-threading.NumDupes 129641 -> 131506 +1.44%
  correlated-value-propagation.NumAnd 44197 -> 44369 +0.39%
  instcombine.NumDeadInst 40894830 -> 40940331 +0.11%
  gvn.IsValueFullyAvailableInBlockNumSpeculationsMax 601823 -> 602422 +0.10%
  jump-threading.NumFolds 2546809 -> 2548242 +0.06%
  correlated-value-propagation.NumAShrsConverted 3484 -> 3485 +0.03%
  early-cse.NumCSECall 28035 -> 28041 +0.02%
  early-cse.NumSimplify 28522953 -> 28528546 +0.02%
  gvn.NumGVNEqProp 426700 -> 426774 +0.02%
Regressions:
  aggressive-instcombine.NumInstrsReduced 66486 -> 45011 -32.30%
  aggressive-instcombine.NumExprsReduced 19989 -> 15570 -22.11%
  instcombine.NumConstProp 156985 -> 155413 -1.00%
  correlated-value-propagation.NumReturns 145 -> 144 -0.69%
  correlated-value-propagation.NumPhis 1239305 -> 1236922 -0.19%
  correlated-value-propagation.NumUDivURemsNarrowed 12798 -> 12775 -0.18%
  jump-threading.NumThreads 2696700 -> 2693005 -0.14%
  instcombine.NumSel 32492 -> 32460 -0.10%
  bdce.NumSExt2ZExt 4076 -> 4073 -0.07%
  licm.NumBOAssociationsHoisted 3583 -> 3581 -0.06%

+18 libquic/string_number_conversions.ll
+10 postgres/relcache.ll
+10 sdl/SDL_blit.ll
+9 cvc5/theory_arith_private.ll
+8 wireshark/packet-iwarp-mpa.ll
+7 rustfmt-rs/x2cb3fifm47d4t5.ll
+6 sdl/SDL_bmp.ll
+6 zed-rs/c4c7jl64zv8zhv2ne6xdvhty4.ll
+5 clamav/volume.ll
+5 uv-rs/dv79qfcpy73s7ozlb66podgd3.ll
+4 llvm/LoopStrengthReduce.ll
+4 php/element.ll
+4 pola-rs/b4ioxe3ookh320kj8ajxke72f.ll
+4 raylib/rmodels.ll
+4 wireshark/wlan_statistics_dialog.ll
+3 draco/parser_utils.ll
+3 hdf5/H5Groot.ll
+3 openjdk/systemDictionary.ll
+3 rust-analyzer-rs/3r60zyztvepuy9ka.ll
+3 sdl/SDL_wave.ll
+3 sqlite/sqlite3.ll
+3 z3/spacer_antiunify.ll
+3 zed-rs/83f7cv59nhkcel85ism08ubeo.ll
+3 zed-rs/aoil3dh3wwwg6dihc4l59fpms.ll
+2 ffmpeg/mpeg12enc.ll
+2 velox/URLFunctions.ll
+2 wireshark/packet-ntlmssp.ll
+2 wireshark/packet-sctp.ll
+1 abc/ioReadPla.ll
+1 cvc5/infer_proof_cons.ll
+1 glslang/linkValidate.ll
+1 hdf5/H5B2int.ll
+1 hdf5/H5I.ll
+1 hdf5/H5Rdeprec.ll
+1 hermes/JSParserImpl.ll
+1 llvm/ValueTracking.ll
+1 openssl/cipher_des_hw.ll
+1 sdl/SDL_render_vulkan.ll
+1 slurm/reservation.ll
+1 velox/TimestampConversion.ll
+0 actix-rs/5k5ycrtlwwxldg7.ll
+0 boost/config_file.ll
+0 clap-rs/46qpaucouebcxfrx.ll
+0 coreutils-rs/11hiuykak1azonq6.ll
+0 coreutils-rs/2145dndjkhee8wnm.ll
+0 foundations-rs/f1iknzskasm8x3xyu95gzvwuf.ll
+0 freetype/bdf.ll
+0 graphviz/shapes.ll
+0 jemalloc/extent.ll
+0 libzmq/stream_engine_base.ll
+0 logos-rs/hwk26id9epou4ag.ll
+0 mini-lsm-rs/3l74wehtlfae5jz1.ll
+0 rayon-rs/1kw8d85q77j78ldq.ll
+0 tokio-rs/3nmgzybx6iv04snk.ll
-1 abc/bmcMaj.ll
-1 abc/luckyRead.ll
-1 abc/wlcReadVer.ll
-1 cpython/_zoneinfo.ll
-1 curl/altsvc.ll
-1 delta-rs/11f8x98axanecwnw.ll
-1 duckdb/miniz.ll
-1 ffmpeg/cabac.ll
-1 llvm/LegalizeTypes.ll
-1 php/logical_filters.ll
-1 pingora-rs/86gtuzsa8hmfthtp7wbav90h5.ll
-1 qemu/optimize.ll
-1 recastnavigation/DetourCrowd.ll
-1 ruby/bignum.ll
-1 rust-analyzer-rs/1r5fg81ha4dpx7ns.ll
-1 rust-analyzer-rs/577813mpo9tvqnpt.ll
-1 softposit-rs/1e6z9tsqxvhrpdzq.ll
-1 softposit-rs/kf9u47qfx5x7qom.ll
-2 hdf5/H5Fsuper.ll
-2 llvm/CGPointerAuth.ll
-2 openjdk/compilerOracle.ll
-2 php/ZendAccelerator.ll
-2 pola-rs/d8q9hkuy9m3r0tdsdk3s5e5sl.ll
-2 postgres/copy.ll
-2 postgres/nodeWindowAgg.ll
-2 uv-rs/01kc013hwbqzr83fvgj8tm5o0.ll
-2 zed-rs/0oeh7hwbxnw4zu37xj5psd1f6.ll
-3 clamav/strfn.ll
-3 git/log.ll
-3 git/setup.ll
-3 hdf5/H5Rint.ll
-3 llvm/ToolChain.ll
-3 luau/Simplify.ll
-3 mold/input-sections.cc.X86_64.ll
-3 nori/tabwidget.ll
-4 openjdk/ciReplay.ll
-4 tev/Common.ll
-4 velox/Sequence.ll
-4 z3/monomial_bounds.ll
-5 abseil-cpp/node_hash_map_test.ll
-5 icu/numparse_compositions.ll
-6 abseil-cpp/charconv_parse.ll
-6 mold/input-files.cc.X86_64.ll
-6 quantlib/capflooredinflationcoupon.ll
-7 ffmpeg/ituh263enc.ll

@github-actions
Copy link
Contributor

The provided patch contains numerous changes across multiple LLVM IR files, primarily focused on improving type safety, reducing unnecessary truncations, and optimizing control flow. Here are the major changes:

  1. Type Promotion and Truncation Removal: Several instances of i8 or i32 values being truncated before use have been replaced with direct use of wider types like i64. For example, in bmcMaj.ll, sext i8 %42 to i32 was changed to sext i8 %42 to i64, eliminating a subsequent sext i32 %.0.i to i64.

  2. Control Flow Simplification: Some conditional branches were simplified by reordering comparisons or using more efficient branching patterns. In volume.ll, branch targets for null pointer checks were updated to streamline error handling paths.

  3. Phi Node Optimization: Multiple phi nodes that aggregated boolean flags (i1) previously stored as integers (i8) now directly use i1, removing redundant trunc and zext operations. This is seen in node_hash_map_test.ll where phi i8 became phi i1.

  4. Switch Statement Refactoring: In mold/input-files.cc.X86_64.ll, unreachable default cases in switch statements were replaced with valid exit blocks (.loopexit), improving code clarity and potentially enabling better optimization.

  5. Redundant Instruction Elimination: Instructions such as zext i1 followed immediately by logical operations were removed when they could be folded into simpler forms. For instance, or i1 %.not, true replaced zext i1 %.not to i8; or i8 ..., 1.

These changes collectively improve performance by reducing instruction count, enhancing type precision, and simplifying control flow, which can lead to better optimization opportunities in later compilation stages.

model: qwen-plus-latest
CompletionUsage(completion_tokens=396, prompt_tokens=109430, total_tokens=109826, completion_tokens_details=None, prompt_tokens_details=None)

%91 = add nuw nsw i16 %90, %87
%92 = lshr exact i16 %86, 1
%93 = add nuw nsw i16 %91, %92
%94 = add nsw i16 %93, -132
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression.

%mul = mul nsw i64 %spec.select.i, 3600
%add25 = add nsw i64 %minuteOffset.0, %mul
%sext = shl i64 %add25, 32
%conv27 = ashr exact i64 %sext, 32
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I also don't get why this might be beneficial, but this is actually a goal of this transformation to remove extra sext and calculate in a wider range. My patch just applies it in more cases. If this is a regression, we should remove that transformation altogether.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the sext_inreg pattern is not eliminated after evaluating the expression in a wider type.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's added because we can't prove that higher known bits of the %add25 are 0s. It's still part of a normal workings of this transformation.

Do you mind expanding a bit what would be the expectation about the PR and dealing with marked regressions like this one. Thanks!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's added because we can't prove that higher known bits of the %add25 are 0s.

It doesn't require high 32 0s. We can eliminate the sext_inreg if %add25 has at least 33 sign bits (See ComputeNumSignBits). Obviously the condition holds because we can compute the expression in i32 without signed overflow. It is the last chance that we can remove the shl+ashr pair since we will lose the sign bit information after the transformation (binop nsw i32 -> binop nsw i64).

I don't mean to block this pr as the net effect is positive. It's just a missed optimization exposed by this patch :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes perfect sense. Thanks for a clear explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants