8340490: Shenandoah: Optimize ShenandoahPacer #21099

pengxiaolong · 2024-09-19T23:32:14Z

In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget here, all of them will forcefully claim and them wait for up to 10ms(code link)

The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget.

Here the latency comparison for the optimization:

With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip:

    static final int threadCount = Runtime.getRuntime().availableProcessors();
    static final LongAdder totalCount = new LongAdder();
    static volatile byte[] sink;
    public static void main(String[] args) {
        runAllocationTest(100000);
    }
    static void recordTimeToAllocate(final int dataSize, final Histogram histogram) {
        long startTime = System.nanoTime();
        sink = new byte[dataSize];
        long endTime = System.nanoTime();
        histogram.recordValue(endTime - startTime);
    }

    static void runAllocationTest(final int dataSize) {
        final long endTime = System.currentTimeMillis() + 30_000;
        final CountDownLatch startSignal = new CountDownLatch(1);
        final CountDownLatch finished = new CountDownLatch(threadCount);
        final Thread[] threads = new Thread[threadCount];
        final Histogram[] histograms = new Histogram[threadCount];
        final Histogram totalHistogram = new Histogram(3600000000000L, 3);
        for (int i = 0; i < threadCount; i++) {
            final var histogram = new Histogram(3600000000000L, 3);
            histograms[i] = histogram;
            threads[i] = new Thread(() -> {
                wait(startSignal);
                do {
                    recordTimeToAllocate(dataSize, histogram);
                } while (System.currentTimeMillis() < endTime);
                finished.countDown();
            });
            threads[i].start();
        }

        startSignal.countDown(); //Start to test
        wait(finished);
        
        for (Histogram histogram : histograms) {
            totalHistogram.add(histogram);
        }

        totalHistogram.outputPercentileDistribution(System.out, 1000.0);

    }

    public static void wait(final CountDownLatch latch) {
        try {
            latch.await();
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    }

Additional test

MacOS AArch64 server fastdebug, hotspot_gc_shenandoah

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8340490: Shenandoah: Optimize ShenandoahPacer (Enhancement - P4)

Reviewers

Aleksey Shipilev (@shipilev - Reviewer)
Kelvin Nilsen (@kdnilsen - Author)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21099/head:pull/21099
$ git checkout pull/21099

Update a local copy of the PR:
$ git checkout pull/21099
$ git pull https://git.openjdk.org/jdk.git pull/21099/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21099

View PR using the GUI difftool:
$ git pr show -t 21099

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21099.diff

Webrev

Link to Webrev Comment

bridgekeeper · 2024-09-19T23:38:39Z

👋 Welcome back xpeng! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2024-09-19T23:39:48Z

@pengxiaolong This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8340490: Shenandoah: Optimize ShenandoahPacer

Reviewed-by: shade, kdnilsen

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 117 new commits pushed to the master branch:

12de4fb: 8340826: Should not send unload notification for scratch classes
25e8929: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops
6587909: 8341015: OopStorage location decoder crashes accessing non-initalized OopStorage
9003e2c: 8341027: Crash in java/runtime/Unsafe/InternalErrorTest when running with -XX:-UseCompressedClassPointers
2a2ecc9: 8339475: Clean up return code handling for pthread calls in library coding
85dba47: 8325090: javadoc fails when -subpackages option is used with non-modular -source
1bc13a1: 8340552: Harden TzdbZoneRulesCompiler against missing zone names
e6373b5: 8340679: Misc tests fail assert(!set || SafepointSynchronize::is_at_safepoint()) failed: set once or at safepoint
2349bb7: 8340974: Ambiguous name of jtreg property vm.libgraal.enabled
5d062e2: 8340576: Some JVMCI flags are inconsistent
... and 107 more: https://git.openjdk.org/jdk/compare/75d5e117770590d2432fcfe8d89734c7038d4e55...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@shipilev) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

openjdk · 2024-09-19T23:40:52Z

@pengxiaolong The following labels will be automatically applied to this pull request:

hotspot-gc
shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

shipilev

I am trying to remember why we even bothered to go into negative budget on this path, and then waited for it to recover. I think it is from here: https://mail.openjdk.org/pipermail/shenandoah-dev/2018-April/005559.html. AFAICS, the intent for that fix was to make sure that unsuccessful pacing claim the budget, which this patch also does. And given it apparently improves performance, I don't mind it going in.

Comprehension question: the actual improvement comes from waiting in 1ms slices, not from anything else? In retrospect, it is silly to wait until the deadline before attempting to claim the pacing budget.

src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp

pengxiaolong · 2024-09-20T18:27:14Z

I am trying to remember why we even bothered to go into negative budget on this path, and then waited for it to recover. I think it is from here: https://mail.openjdk.org/pipermail/shenandoah-dev/2018-April/005559.html. AFAICS, the intent for that fix was to make sure that unsuccessful pacing claim the budget, which this patch also does. And given it apparently improves performance, I don't mind it going in.

Comprehension question: the actual improvement comes from waiting in 1ms slices, not from anything else? In retrospect, it is silly to wait until the deadline before attempting to claim the pacing budget.

shipilev

I am good with this, assuming performance runs show good results.

src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp

mlbridge · 2024-09-20T18:31:55Z

Webrevs

01: Full - Incremental (58196a4f)
00: Full (1de70211)

pengxiaolong · 2024-09-20T18:42:28Z

I am trying to remember why we even bothered to go into negative budget on this path, and then waited for it to recover. I think it is from here: https://mail.openjdk.org/pipermail/shenandoah-dev/2018-April/005559.html. AFAICS, the intent for that fix was to make sure that unsuccessful pacing claim the budget, which this patch also does. And given it apparently improves performance, I don't mind it going in.

Comprehension question: the actual improvement comes from waiting in 1ms slices, not from anything else? In retrospect, it is silly to wait until the deadline before attempting to claim the pacing budget.

It is primarily from the algorithm change with 1ms slices.

The behavior has been changed in the new algorithm with 1ms slices, e.g. when 10 threads seeming insufficient budget at the same time, assuming each of them claim 100 budget, in old algorithm all of the 10 threads forcefully claim the budget and result in -1000 budget, them it need other mutators to release at least 1000 or they have to wait for up to 10ms even they may be woken up by the ShenandoahPeriodicPacerNotifyTask. In new algorithm, each threads will try to claim 100 budget every 1ms and don't need to wait other mutators to release at least 1000, as soon as enough budget(>100) is returned, some thread(s) will compete others and proceed.

pengxiaolong · 2024-09-20T18:48:45Z

I am good with this, assuming performance runs show good results.

Latency wise, in most time it is better than old impl.

In my specific test with 8G heap on MacOS, throughput is very close to the test w/ ShenandoahPacing disabled, and about 25%~30% improvement comparing the old implementation.

shipilev · 2024-09-21T05:52:10Z

I am good with this, assuming performance runs show good results.

Latency wise, in most time it is better than old impl.

It is great it improves targeted tests, and it makes sense from the first principles. Run our usual performance pipeline to sanity check if this affects any other benchmarks in any meaningful way.

pengxiaolong · 2024-09-26T17:40:24Z

I am good with this, assuming performance runs show good results.

Latency wise, in most time it is better than old impl.

It is great it improves targeted tests, and it makes sense from the first principles. Run our usual performance pipeline to sanity check if this affects any other benchmarks in any meaningful way.

Performance pipeline showed improvments in most Dacapo benchmarks, we did found very small regression in Dacapo Spring max latency(<1%?), tried to reproduce it with bare metal instance and can't really stably reproduce the regression, sometime better and sometime worse, it could be just noises.

pengxiaolong · 2024-09-26T18:55:28Z

@shipilev Need you to review it again since I pushed minor refactor and format change as per your comments.

pengxiaolong · 2024-09-27T15:04:00Z

Thanks all for the reviews!

/integrate

openjdk · 2024-09-27T15:05:27Z

@pengxiaolong
Your change (at version 58196a4) is now ready to be sponsored by a Committer.

shipilev · 2024-09-27T17:04:28Z

/sponsor

openjdk · 2024-09-27T17:06:20Z

Going to push as commit 65200a9.
Since your change was applied there have been 120 commits pushed to the master branch:

824a297: 8341057: Add 2 SSL.com TLS roots
5aae3d4: 8341096: ProblemList compiler/cha/TypeProfileFinalMethod.java in Xcomp mode
68c4f36: 8340024: In ClassReader, extract a constant for the superclass supertype_index
12de4fb: 8340826: Should not send unload notification for scratch classes
25e8929: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops
6587909: 8341015: OopStorage location decoder crashes accessing non-initalized OopStorage
9003e2c: 8341027: Crash in java/runtime/Unsafe/InternalErrorTest when running with -XX:-UseCompressedClassPointers
2a2ecc9: 8339475: Clean up return code handling for pthread calls in library coding
85dba47: 8325090: javadoc fails when -subpackages option is used with non-modular -source
1bc13a1: 8340552: Harden TzdbZoneRulesCompiler against missing zone names
... and 110 more: https://git.openjdk.org/jdk/compare/75d5e117770590d2432fcfe8d89734c7038d4e55...master

Your commit was automatically rebased without conflicts.

openjdk · 2024-09-27T17:06:26Z

@shipilev @pengxiaolong Pushed as commit 65200a9.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

pengxiaolong added 2 commits September 19, 2024 16:30

8340490: Shenandoah: Optimize ShenandoahPacer

f8bd82b

clean up

37df57b

openjdk bot added hotspot-gc [email protected] shenandoah [email protected] labels Sep 19, 2024

Xiaolong Peng and others added 3 commits September 20, 2024 00:51

try claim_for_alloc before calculating total_delay

efb2020

try claim_for_alloc before calculating total_delay

106e932

Clean code

8794d71

shipilev reviewed Sep 20, 2024

View reviewed changes

pengxiaolong added 2 commits September 20, 2024 11:10

refactor

cee7cd2

use const

1de7021

shipilev approved these changes Sep 20, 2024

View reviewed changes

src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp Show resolved Hide resolved

src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp Outdated Show resolved Hide resolved

pengxiaolong marked this pull request as ready for review September 20, 2024 18:27

openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 20, 2024

clean up

58196a4

openjdk bot removed the ready Pull request is ready to be integrated label Sep 20, 2024

kdnilsen approved these changes Sep 26, 2024

View reviewed changes

pengxiaolong requested a review from shipilev September 26, 2024 19:05

shipilev approved these changes Sep 27, 2024

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Sep 27, 2024

openjdk bot added the sponsor Pull request is ready to be sponsored label Sep 27, 2024

openjdk bot added the integrated Pull request has been integrated label Sep 27, 2024

openjdk bot closed this Sep 27, 2024

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Sep 27, 2024

pengxiaolong deleted the JDK-8340490 branch December 2, 2024 21:43

8340490: Shenandoah: Optimize ShenandoahPacer #21099

8340490: Shenandoah: Optimize ShenandoahPacer #21099

Uh oh!

Conversation

pengxiaolong commented Sep 19, 2024 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Additional test

Progress

Issue

Reviewers

Reviewing

Webrev

Uh oh!

bridgekeeper bot commented Sep 19, 2024

Uh oh!

openjdk bot commented Sep 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Sep 19, 2024

Uh oh!

shipilev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pengxiaolong commented Sep 20, 2024

Uh oh!

shipilev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mlbridge bot commented Sep 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

pengxiaolong commented Sep 20, 2024

Uh oh!

pengxiaolong commented Sep 20, 2024

Uh oh!

shipilev commented Sep 21, 2024

Uh oh!

pengxiaolong commented Sep 26, 2024

Uh oh!

pengxiaolong commented Sep 26, 2024

Uh oh!

pengxiaolong commented Sep 27, 2024

Uh oh!

openjdk bot commented Sep 27, 2024

Uh oh!

shipilev commented Sep 27, 2024

Uh oh!

openjdk bot commented Sep 27, 2024

Uh oh!

openjdk bot commented Sep 27, 2024

Uh oh!

Uh oh!

pengxiaolong commented Sep 19, 2024 •

edited by openjdk bot

Loading

openjdk bot commented Sep 19, 2024 •

edited

Loading

mlbridge bot commented Sep 20, 2024 •

edited

Loading