Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8340490: Shenandoah: Optimize ShenandoahPacer #21099

Closed
wants to merge 8 commits into from

Conversation

pengxiaolong
Copy link

@pengxiaolong pengxiaolong commented Sep 19, 2024

In a simple latency benchmark for memory allocation, I found ShenandoahPacer contributed quite a lot to the long tail latency > 10ms, when there are multi mutator threads failed at fast path to claim budget here, all of them will forcefully claim and them wait for up to 10ms(code link)

The change in this PR makes ShenandoahPacer impact long tail latency much less, instead forcefully claim budget and them wait, it attempts to claim after waiting for 1ms, and keep doing this until: 1/ either spent 10ms waiting in total; 2/ or successfully claimed the budget.

Here the latency comparison for the optimization:
hdr-histogram-optimize-pacer

With the optimization, long tail latency from the test code below has been much improved from over 20ms to ~10ms on MacOS with M3 chip:

    static final int threadCount = Runtime.getRuntime().availableProcessors();
    static final LongAdder totalCount = new LongAdder();
    static volatile byte[] sink;
    public static void main(String[] args) {
        runAllocationTest(100000);
    }
    static void recordTimeToAllocate(final int dataSize, final Histogram histogram) {
        long startTime = System.nanoTime();
        sink = new byte[dataSize];
        long endTime = System.nanoTime();
        histogram.recordValue(endTime - startTime);
    }

    static void runAllocationTest(final int dataSize) {
        final long endTime = System.currentTimeMillis() + 30_000;
        final CountDownLatch startSignal = new CountDownLatch(1);
        final CountDownLatch finished = new CountDownLatch(threadCount);
        final Thread[] threads = new Thread[threadCount];
        final Histogram[] histograms = new Histogram[threadCount];
        final Histogram totalHistogram = new Histogram(3600000000000L, 3);
        for (int i = 0; i < threadCount; i++) {
            final var histogram = new Histogram(3600000000000L, 3);
            histograms[i] = histogram;
            threads[i] = new Thread(() -> {
                wait(startSignal);
                do {
                    recordTimeToAllocate(dataSize, histogram);
                } while (System.currentTimeMillis() < endTime);
                finished.countDown();
            });
            threads[i].start();
        }

        startSignal.countDown(); //Start to test
        wait(finished);
        
        for (Histogram histogram : histograms) {
            totalHistogram.add(histogram);
        }

        totalHistogram.outputPercentileDistribution(System.out, 1000.0);

    }

    public static void wait(final CountDownLatch latch) {
        try {
            latch.await();
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    }

Additional test

  • MacOS AArch64 server fastdebug, hotspot_gc_shenandoah

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8340490: Shenandoah: Optimize ShenandoahPacer (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/21099/head:pull/21099
$ git checkout pull/21099

Update a local copy of the PR:
$ git checkout pull/21099
$ git pull https://git.openjdk.org/jdk.git pull/21099/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 21099

View PR using the GUI difftool:
$ git pr show -t 21099

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/21099.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 19, 2024

👋 Welcome back xpeng! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Sep 19, 2024

@pengxiaolong This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8340490: Shenandoah: Optimize ShenandoahPacer

Reviewed-by: shade, kdnilsen

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 117 new commits pushed to the master branch:

  • 12de4fb: 8340826: Should not send unload notification for scratch classes
  • 25e8929: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops
  • 6587909: 8341015: OopStorage location decoder crashes accessing non-initalized OopStorage
  • 9003e2c: 8341027: Crash in java/runtime/Unsafe/InternalErrorTest when running with -XX:-UseCompressedClassPointers
  • 2a2ecc9: 8339475: Clean up return code handling for pthread calls in library coding
  • 85dba47: 8325090: javadoc fails when -subpackages option is used with non-modular -source
  • 1bc13a1: 8340552: Harden TzdbZoneRulesCompiler against missing zone names
  • e6373b5: 8340679: Misc tests fail assert(!set || SafepointSynchronize::is_at_safepoint()) failed: set once or at safepoint
  • 2349bb7: 8340974: Ambiguous name of jtreg property vm.libgraal.enabled
  • 5d062e2: 8340576: Some JVMCI flags are inconsistent
  • ... and 107 more: https://git.openjdk.org/jdk/compare/75d5e117770590d2432fcfe8d89734c7038d4e55...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@shipilev) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk
Copy link

openjdk bot commented Sep 19, 2024

@pengxiaolong The following labels will be automatically applied to this pull request:

  • hotspot-gc
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to remember why we even bothered to go into negative budget on this path, and then waited for it to recover. I think it is from here: https://mail.openjdk.org/pipermail/shenandoah-dev/2018-April/005559.html. AFAICS, the intent for that fix was to make sure that unsuccessful pacing claim the budget, which this patch also does. And given it apparently improves performance, I don't mind it going in.

Comprehension question: the actual improvement comes from waiting in 1ms slices, not from anything else? In retrospect, it is silly to wait until the deadline before attempting to claim the pacing budget.

src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp Outdated Show resolved Hide resolved
src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp Outdated Show resolved Hide resolved
@pengxiaolong
Copy link
Author

I am trying to remember why we even bothered to go into negative budget on this path, and then waited for it to recover. I think it is from here: https://mail.openjdk.org/pipermail/shenandoah-dev/2018-April/005559.html. AFAICS, the intent for that fix was to make sure that unsuccessful pacing claim the budget, which this patch also does. And given it apparently improves performance, I don't mind it going in.

Comprehension question: the actual improvement comes from waiting in 1ms slices, not from anything else? In retrospect, it is silly to wait until the deadline before attempting to claim the pacing budget.

Copy link
Member

@shipilev shipilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am good with this, assuming performance runs show good results.

src/hotspot/share/gc/shenandoah/shenandoahPacer.cpp Outdated Show resolved Hide resolved
@pengxiaolong pengxiaolong marked this pull request as ready for review September 20, 2024 18:27
@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 20, 2024
@mlbridge
Copy link

mlbridge bot commented Sep 20, 2024

Webrevs

@pengxiaolong
Copy link
Author

I am trying to remember why we even bothered to go into negative budget on this path, and then waited for it to recover. I think it is from here: https://mail.openjdk.org/pipermail/shenandoah-dev/2018-April/005559.html. AFAICS, the intent for that fix was to make sure that unsuccessful pacing claim the budget, which this patch also does. And given it apparently improves performance, I don't mind it going in.

Comprehension question: the actual improvement comes from waiting in 1ms slices, not from anything else? In retrospect, it is silly to wait until the deadline before attempting to claim the pacing budget.

It is primarily from the algorithm change with 1ms slices.

The behavior has been changed in the new algorithm with 1ms slices, e.g. when 10 threads seeming insufficient budget at the same time, assuming each of them claim 100 budget, in old algorithm all of the 10 threads forcefully claim the budget and result in -1000 budget, them it need other mutators to release at least 1000 or they have to wait for up to 10ms even they may be woken up by the ShenandoahPeriodicPacerNotifyTask. In new algorithm, each threads will try to claim 100 budget every 1ms and don't need to wait other mutators to release at least 1000, as soon as enough budget(>100) is returned, some thread(s) will compete others and proceed.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Sep 20, 2024
@pengxiaolong
Copy link
Author

I am good with this, assuming performance runs show good results.

Latency wise, in most time it is better than old impl.

In my specific test with 8G heap on MacOS, throughput is very close to the test w/ ShenandoahPacing disabled, and about 25%~30% improvement comparing the old implementation.

@shipilev
Copy link
Member

I am good with this, assuming performance runs show good results.

Latency wise, in most time it is better than old impl.

It is great it improves targeted tests, and it makes sense from the first principles. Run our usual performance pipeline to sanity check if this affects any other benchmarks in any meaningful way.

@pengxiaolong
Copy link
Author

I am good with this, assuming performance runs show good results.

Latency wise, in most time it is better than old impl.

It is great it improves targeted tests, and it makes sense from the first principles. Run our usual performance pipeline to sanity check if this affects any other benchmarks in any meaningful way.

Performance pipeline showed improvments in most Dacapo benchmarks, we did found very small regression in Dacapo Spring max latency(<1%?), tried to reproduce it with bare metal instance and can't really stably reproduce the regression, sometime better and sometime worse, it could be just noises.

@pengxiaolong
Copy link
Author

@shipilev Need you to review it again since I pushed minor refactor and format change as per your comments.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 27, 2024
@pengxiaolong
Copy link
Author

Thanks all for the reviews!

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Sep 27, 2024
@openjdk
Copy link

openjdk bot commented Sep 27, 2024

@pengxiaolong
Your change (at version 58196a4) is now ready to be sponsored by a Committer.

@shipilev
Copy link
Member

/sponsor

@openjdk
Copy link

openjdk bot commented Sep 27, 2024

Going to push as commit 65200a9.
Since your change was applied there have been 120 commits pushed to the master branch:

  • 824a297: 8341057: Add 2 SSL.com TLS roots
  • 5aae3d4: 8341096: ProblemList compiler/cha/TypeProfileFinalMethod.java in Xcomp mode
  • 68c4f36: 8340024: In ClassReader, extract a constant for the superclass supertype_index
  • 12de4fb: 8340826: Should not send unload notification for scratch classes
  • 25e8929: 8340620: Fix -Wzero-as-null-pointer-constant warnings for CompressedOops
  • 6587909: 8341015: OopStorage location decoder crashes accessing non-initalized OopStorage
  • 9003e2c: 8341027: Crash in java/runtime/Unsafe/InternalErrorTest when running with -XX:-UseCompressedClassPointers
  • 2a2ecc9: 8339475: Clean up return code handling for pthread calls in library coding
  • 85dba47: 8325090: javadoc fails when -subpackages option is used with non-modular -source
  • 1bc13a1: 8340552: Harden TzdbZoneRulesCompiler against missing zone names
  • ... and 110 more: https://git.openjdk.org/jdk/compare/75d5e117770590d2432fcfe8d89734c7038d4e55...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 27, 2024
@openjdk openjdk bot closed this Sep 27, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Sep 27, 2024
@openjdk
Copy link

openjdk bot commented Sep 27, 2024

@shipilev @pengxiaolong Pushed as commit 65200a9.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants