Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#2380] Improvement: Eagerly cancel rpc request #2381

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

summaryzb
Copy link
Contributor

What changes were proposed in this pull request?

needCancel takes effect in rpc retry

Why are the changes needed?

this is helpful when current task is killed since speculation task attempts succeed, but the rpc of which send data still keep retrying

Fix: #2380

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT

@codecov-commenter
Copy link

codecov-commenter commented Mar 6, 2025

Codecov Report

Attention: Patch coverage is 0% with 6 lines in your changes missing coverage. Please review.

Project coverage is 51.21%. Comparing base (8ad0f8d) to head (7b0f563).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...ffle/client/request/RssSendShuffleDataRequest.java 0.00% 4 Missing ⚠️
...ffle/client/impl/grpc/ShuffleServerGrpcClient.java 0.00% 1 Missing ⚠️
...client/impl/grpc/ShuffleServerGrpcNettyClient.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2381      +/-   ##
============================================
- Coverage     51.34%   51.21%   -0.14%     
+ Complexity     3615     3016     -599     
============================================
  Files           571      481      -90     
  Lines         32892    23193    -9699     
  Branches       2833     2140     -693     
============================================
- Hits          16890    11878    -5012     
+ Misses        14932    10569    -4363     
+ Partials       1070      746     -324     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

github-actions bot commented Mar 6, 2025

Test Results

 2 996 files  ± 0   2 996 suites  ±0   6h 33m 24s ⏱️ + 2m 1s
 1 106 tests + 1   1 104 ✅ + 1   2 💤 ±0  0 ❌ ±0 
13 864 runs  +15  13 834 ✅ +15  30 💤 ±0  0 ❌ ±0 

Results for commit 56fa5ad. ± Comparison against base commit 01f6e25.

♻️ This comment has been updated with latest results.

@summaryzb
Copy link
Contributor Author

@jerqi @LuciferYang PTAL

@LuciferYang
Copy link
Contributor

also cc @advancedxy

@LuciferYang
Copy link
Contributor

Seems we should add a new test to cover this

@summaryzb
Copy link
Contributor Author

gentle ping @LuciferYang @advancedxy

@@ -29,26 +30,29 @@ public class RssSendShuffleDataRequest {
private int retryMax;
private long retryIntervalMax;
private Map<Integer, Map<Integer, List<ShuffleBlockInfo>>> shuffleIdToBlocks;
private Supplier<Boolean> needCancel;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is kind of leaking details or making RssSendShuffleDataRequest holding references to the sending class, for spark, it's DataPusher. I'm not sure this is the elegant way to do that.

Is it possible for
boolean result = ClientUtils.waitUntilDoneOrFail(futures, allowFastFail); in ShuffleWriteClientImpl to be aware of interruption/spark cancellation, and cancels all the sending futures?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, current Datapusher leake details to sending class, this pr does not make it worse, but achive a eagerly cancel in rpc retry level.
Aware of interruption/spark cancellation is a good idea, i'll follow this way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Improvement] Eagerly cancel rpc request
4 participants