Skip to content

8385643: Shenandoah: Rework mark loop inlining#31634

Open
shipilev wants to merge 7 commits into
openjdk:masterfrom
shipilev:JDK-8385643-shenandoah-rework-mark-inline
Open

8385643: Shenandoah: Rework mark loop inlining#31634
shipilev wants to merge 7 commits into
openjdk:masterfrom
shipilev:JDK-8385643-shenandoah-rework-mark-inline

Conversation

@shipilev

@shipilev shipilev commented Jun 23, 2026

Copy link
Copy Markdown
Member

While following up on concurrent marking performance, I noticed that we stopped / failed to inline some of the hot methods in marking loop. We need to rework this.

This PR replaces the build-time GCC-specific "bump" for inlining heuristics into explicit inlining hints across the hot path. I have eyeballed the profiles on typical workloads and the inlining makes sense now.

Additional testing:

  • Linux x86_64 server fastdebug, hotspot_gc_shenandoah
  • Ad-hoc marking performance tests
  • Regular testing pipelines


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8385643: Shenandoah: Rework mark loop inlining (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/31634/head:pull/31634
$ git checkout pull/31634

Update a local copy of the PR:
$ git checkout pull/31634
$ git pull https://git.openjdk.org/jdk.git pull/31634/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 31634

View PR using the GUI difftool:
$ git pr show -t 31634

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/31634.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper

bridgekeeper Bot commented Jun 23, 2026

Copy link
Copy Markdown

👋 Welcome back shade! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk

openjdk Bot commented Jun 23, 2026

Copy link
Copy Markdown

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk Bot added build build-dev@openjdk.org hotspot-gc hotspot-gc-dev@openjdk.org shenandoah shenandoah-dev@openjdk.org labels Jun 23, 2026
@openjdk

openjdk Bot commented Jun 23, 2026

Copy link
Copy Markdown

@shipilev The following labels will be automatically applied to this pull request:

  • build
  • hotspot-gc
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk

openjdk Bot commented Jun 23, 2026

Copy link
Copy Markdown

The total number of required reviews for this PR has been set to 2 based on the presence of this label: hotspot-gc. This can be overridden with the /reviewers command.

@shipilev

shipilev commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

With a stress test for marking, I have got about +23% (!) faster mark times.

# Baseline
[30.738s][info][gc,stats] Concurrent Marking             =   24.380 s (a =   369393 us) (n =    66) (lvls, us =    42969,   375000,   404297,   408203,   428060)
[30.738s][info][gc,stats]   CM: Work                     =  194.764 s (a =  2950963 us) (n =    66) (lvls, us =   333984,  3007812,  3222656,  3261719,  3420661)
[30.738s][info][gc,stats]   Flush SATB                   =    0.005 s (a =       70 us) (n =    66) (lvls, us =       40,       65,       68,       71,      163)

# Patched
[30.767s][info][gc,stats] Concurrent Marking             =   23.036 s (a =   299170 us) (n =    77) (lvls, us =    27148,   312500,   316406,   320312,   368054)
[30.767s][info][gc,stats]   CM: Work                     =  183.922 s (a =  2388592 us) (n =    77) (lvls, us =   214844,  2480469,  2539062,  2558594,  2937770)
[30.767s][info][gc,stats]   Flush SATB                   =    0.006 s (a =       73 us) (n =    77) (lvls, us =       38,       67,       71,       78,       94)

@shipilev shipilev marked this pull request as ready for review June 23, 2026 13:36
@openjdk openjdk Bot added the rfr Pull request is ready for review label Jun 23, 2026
@mlbridge

mlbridge Bot commented Jun 23, 2026

Copy link
Copy Markdown

Webrevs

@shipilev

Copy link
Copy Markdown
Member Author

On SPECjbb preset-IR run, seeing +20...30% faster concurrent marks as well:

# Baseline
[170.333s][info][gc,stats] Pause Init Mark (G)            =    0.031 s (a =      201 us) (n =   157) (lvls, us =       64,      102,      117,      154,     2280)
[170.333s][info][gc,stats] Pause Init Mark (N)            =    0.004 s (a =       26 us) (n =   157) (lvls, us =       17,       22,       24,       28,       55)
[170.333s][info][gc,stats]   Update Region States         =    0.002 s (a =       11 us) (n =   157) (lvls, us =        6,        8,       10,       12,       32)
[170.333s][info][gc,stats]   Propagate GC State           =    0.000 s (a =        2 us) (n =   157) (lvls, us =        1,        2,        2,        2,       10)
[170.333s][info][gc,stats] Concurrent Mark Roots          =    0.052 s (a =      333 us) (n =   157) (lvls, us =      176,      262,      289,      318,     2094)
[170.333s][info][gc,stats]   CMR: Threads                 =    0.231 s (a =     1472 us) (n =   157) (lvls, us =      779,     1250,     1328,     1465,     6545)
[170.333s][info][gc,stats]   CMR: VM Strongs              =    0.009 s (a =       55 us) (n =   157) (lvls, us =       31,       39,       43,       47,     1031)
[170.333s][info][gc,stats]   CMR: Classes                 =    0.016 s (a =      100 us) (n =   157) (lvls, us =       67,       81,       90,      105,      775)
[170.333s][info][gc,stats] Concurrent Marking             =   35.671 s (a =   227203 us) (n =   157) (lvls, us =     6016,   226562,   232422,   238281,   254850)
[170.333s][info][gc,stats]   CM: Work                     =  283.904 s (a =  1808304 us) (n =   157) (lvls, us =    46875,  1796875,  1855469,  1894531,  2033774)
[170.333s][info][gc,stats]   Flush SATB                   =    0.142 s (a =      902 us) (n =   157) (lvls, us =      105,      549,      623,      783,     5526)
[170.333s][info][gc,stats] Pause Final Mark (G)           =    0.048 s (a =      304 us) (n =   157) (lvls, us =      123,      221,      238,      258,     3370)
[170.333s][info][gc,stats] Pause Final Mark (N)           =    0.030 s (a =      193 us) (n =   157) (lvls, us =       96,      176,      191,      205,      306)
[170.333s][info][gc,stats]   Flush SATB and Roots         =    0.003 s (a =       22 us) (n =   157) (lvls, us =        7,       14,       15,       17,      108)
[170.333s][info][gc,stats]   Propagate GC State           =    0.000 s (a =        2 us) (n =   157) (lvls, us =        1,        2,        2,        2,        3)
[170.333s][info][gc,stats]   Update Region States         =    0.005 s (a =       31 us) (n =   157) (lvls, us =       22,       30,       31,       32,       43)
[170.333s][info][gc,stats]   Choose Collection Set        =    0.017 s (a =      111 us) (n =   157) (lvls, us =       35,       98,      113,      121,      167)
[170.333s][info][gc,stats]   Rebuild Free Set             =    0.003 s (a =       17 us) (n =   157) (lvls, us =       13,       16,       16,       17,       41)

# Patched
[171.311s][info][gc,stats] Pause Init Mark (G)            =    0.039 s (a =      270 us) (n =   143) (lvls, us =       77,      104,      113,      162,     2290)
[171.311s][info][gc,stats] Pause Init Mark (N)            =    0.004 s (a =       25 us) (n =   143) (lvls, us =       12,       22,       23,       27,       55)
[171.311s][info][gc,stats]   Update Region States         =    0.002 s (a =       11 us) (n =   143) (lvls, us =        6,        8,        9,       12,       29)
[171.311s][info][gc,stats]   Propagate GC State           =    0.000 s (a =        2 us) (n =   143) (lvls, us =        1,        2,        2,        2,        3)
[171.311s][info][gc,stats] Concurrent Mark Roots          =    0.051 s (a =      355 us) (n =   143) (lvls, us =      213,      271,      287,      311,     6407)
[171.311s][info][gc,stats]   CMR: Threads                 =    0.211 s (a =     1475 us) (n =   143) (lvls, us =      801,     1289,     1367,     1465,     6937)
[171.311s][info][gc,stats]   CMR: VM Strongs              =    0.008 s (a =       54 us) (n =   143) (lvls, us =       31,       38,       44,       49,      638)
[171.311s][info][gc,stats]   CMR: Classes                 =    0.014 s (a =       97 us) (n =   143) (lvls, us =       61,       74,       88,      102,      678)
[171.311s][info][gc,stats] Concurrent Marking             =   26.880 s (a =   187970 us) (n =   143) (lvls, us =     4082,   187500,   191406,   197266,   210714)
[171.311s][info][gc,stats]   CM: Work                     =  213.791 s (a =  1495043 us) (n =   143) (lvls, us =    31641,  1484375,  1523438,  1562500,  1665250)
[171.311s][info][gc,stats]   Flush SATB                   =    0.119 s (a =      836 us) (n =   143) (lvls, us =       89,      607,      660,      779,     2852)
[171.311s][info][gc,stats] Pause Final Mark (G)           =    0.041 s (a =      283 us) (n =   143) (lvls, us =      115,      232,      246,      266,     1350)
[171.311s][info][gc,stats] Pause Final Mark (N)           =    0.029 s (a =      203 us) (n =   143) (lvls, us =       78,      188,      199,      215,      300)
[171.311s][info][gc,stats]   Flush SATB and Roots         =    0.003 s (a =       21 us) (n =   143) (lvls, us =        5,       13,       14,       16,      105)
[171.311s][info][gc,stats]   Propagate GC State           =    0.000 s (a =        2 us) (n =   143) (lvls, us =        1,        2,        2,        2,        3)
[171.311s][info][gc,stats]   Update Region States         =    0.005 s (a =       32 us) (n =   143) (lvls, us =       11,       30,       31,       33,       50)
[171.311s][info][gc,stats]   Choose Collection Set        =    0.017 s (a =      120 us) (n =   143) (lvls, us =       34,      109,      123,      131,      180)
[171.311s][info][gc,stats]   Rebuild Free Set             =    0.002 s (a =       17 us) (n =   143) (lvls, us =       14,       16,       17,       18,       30)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build build-dev@openjdk.org hotspot-gc hotspot-gc-dev@openjdk.org rfr Pull request is ready for review shenandoah shenandoah-dev@openjdk.org

Development

Successfully merging this pull request may close these issues.

1 participant