Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#2086] feat(spark): Support cut partition to slices and served by multiply server #2093

Merged
merged 3 commits into from
Nov 7, 2024

Conversation

maobaolong
Copy link
Member

@maobaolong maobaolong commented Sep 3, 2024

What changes were proposed in this pull request?

Support sliced store partition to multiply server.

Limitation:

  • Only finished tested the netty mode.

Why are the changes needed?

Fix: #2086

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Start multiply servers and coordinator on local
  • Start a spark standalone env on local
  • Start spark-shell and execute test.scala
bin/spark-shell  --master  spark://localhost:7077  --deploy-mode client --conf spark.rss.client.reassign.blockRetryMaxTimes=3 --conf spark.rss.writer.buffer.spill.size=30 --conf spark.rss.client.reassign.enabled=true  --conf spark.shuffle.manager=org.apache.spark.shuffle.RssShuffleManager --conf spark.rss.coordinator.quorum=localhost:19999  --conf spark.rss.storage.type=LOCALFILE --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.rss.test.mode.enable=true --conf spark.rss.client.type=GRPC_NETTY --conf spark.sql.shuffle.partitions=1  -i test.scala
  • test.scala
val data = sc.parallelize(Seq(("A", 1), ("B", 2), ("C", 3), ("A", 4), ("B", 5), ("A", 6), ("A", 7),("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7), ("A", 7)));
val result = data.reduceByKey(_ + _);
result.collect().foreach(println);
System.exit(0);
image

@maobaolong maobaolong marked this pull request as draft September 3, 2024 02:46
@maobaolong maobaolong force-pushed the partitionSlice branch 2 times, most recently from fb19661 to 2d5d3ed Compare September 3, 2024 02:53
Copy link

github-actions bot commented Sep 3, 2024

Test Results

 2 926 files  ± 0   2 926 suites  ±0   6h 11m 57s ⏱️ - 13m 10s
 1 090 tests + 2   1 088 ✅ + 2   2 💤 ±0  0 ❌ ±0 
13 655 runs  +25  13 625 ✅ +25  30 💤 ±0  0 ❌ ±0 

Results for commit 949d57f. ± Comparison against base commit 4206541.

♻️ This comment has been updated with latest results.

@maobaolong maobaolong force-pushed the partitionSlice branch 5 times, most recently from abf947f to 059cf59 Compare September 3, 2024 10:58
@maobaolong maobaolong closed this Sep 3, 2024
@maobaolong maobaolong reopened this Sep 3, 2024
@maobaolong maobaolong marked this pull request as ready for review September 4, 2024 08:36
Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. Thanks for your effort. Left some comments.

@@ -52,7 +52,7 @@ public class MutableShuffleHandleInfo extends ShuffleHandleInfoBase {
*/
private Map<Integer, Map<Integer, List<ShuffleServerInfo>>> partitionReplicaAssignedServers;

private Map<String, Set<ShuffleServerInfo>> excludedServerToReplacements;
private Map<String, List<ShuffleServerInfo>> excludedServerToReplacements;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using list instead of set, is it possible to store the duplicate replacement servers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this change, what the set of excludedServerToReplacements stored the servers about?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert the set -> list change.

@@ -62,6 +62,8 @@ message RequireBufferResponse {
int64 requireBufferId = 1;
StatusCode status = 2;
string retMsg = 3;
// need split partitions
repeated int32 partitionIds = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not marking the partitionIds name as the needSplitPartitionIds

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -148,4 +148,9 @@ public static boolean limitHugePartition(
}
return false;
}

public static boolean hasExceedPartitionSplitLimit(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not extract the general logic to limit the partitioned data? I think the partition split and huge partition limit could be scoped in the same abstract limit policies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address this after other comments addressed and tested.

@maobaolong maobaolong force-pushed the partitionSlice branch 2 times, most recently from 704f57d to 9437c2e Compare September 8, 2024 01:30
if (replacements == null || StringUtils.isEmpty(receivingFailureServerId)) {
return Collections.emptySet();
return Collections.emptyList();
}
excludedServerToReplacements.put(receivingFailureServerId, replacements);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zuston I found there are something wrong with excludedServerToReplacements for split partition, as the server is not faulty server only just cannot store more data for specific partition, so it may should not be added to the exclude servers.

Reference to

  public Set<String> listExcludedServers() {
    return excludedServerToReplacements.keySet();
  }

@zuston
Copy link
Member

zuston commented Sep 10, 2024

Overall lgtm. I hope some test cases could be added to cover this case.

@zuston
Copy link
Member

zuston commented Sep 18, 2024

Overall lgtm. I hope some test cases could be added to cover this case.

Have any update on this? @maobaolong If you have applied in your internal cluster, please let me know. Thanks

@maobaolong
Copy link
Member Author

@zuston Yeah, I test this on our pressure test cluster and verified by our pressure test case. We have 10 node in that cluster, and run a 17TiB shuffle data spark application with 3 shuffle stage, there are 2000 partition for each shuffle stage. I verified the shuffle split result by #2133 and #2156.

BTW, #2137 this PR make me easy to dynamic update conf and happy on test this feature.

@maobaolong maobaolong force-pushed the partitionSlice branch 2 times, most recently from a3b64dc to 831ee0f Compare October 12, 2024 01:46
@maobaolong maobaolong force-pushed the partitionSlice branch 2 times, most recently from 505c625 to 993c00b Compare November 6, 2024 04:16
@maobaolong
Copy link
Member Author

@zuston UT has been added, would you like to take another look?

Copy link
Member

@zuston zuston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zuston zuston changed the title [#2086] feat(client/server): Support cut partition to slices and served by multiply server [#2086] feat(spark): Support cut partition to slices and served by multiply server Nov 6, 2024
@maobaolong
Copy link
Member Author

maobaolong commented Nov 7, 2024

@zuston Thank you so much for the discussion and reviewing for this feature, I cannot finish this work without your help!

Merge it now!

@maobaolong maobaolong merged commit 051a247 into apache:master Nov 7, 2024
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Rss partition sliced store
2 participants