-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49741][DOCS] Add spark.shuffle.accurateBlockSkewedFactor
to config docs page
#48189
[SPARK-49741][DOCS] Add spark.shuffle.accurateBlockSkewedFactor
to config docs page
#48189
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on documenting this, as this is a useful configuration. Looks like the docs generally match
spark/core/src/main/scala/org/apache/spark/internal/config/package.scala
Lines 1387 to 1397 in f3785fa
private[spark] val SHUFFLE_ACCURATE_BLOCK_SKEWED_FACTOR = | |
ConfigBuilder("spark.shuffle.accurateBlockSkewedFactor") | |
.internal() | |
.doc("A shuffle block is considered as skewed and will be accurately recorded in " + | |
"HighlyCompressedMapStatus if its size is larger than this factor multiplying " + | |
"the median shuffle block size or SHUFFLE_ACCURATE_BLOCK_THRESHOLD. It is " + | |
"recommended to set this parameter to be the same as SKEW_JOIN_SKEWED_PARTITION_FACTOR." + | |
"Set to -1.0 to disable this feature by default.") | |
.version("3.3.0") | |
.doubleConf | |
.createWithDefault(-1.0) |
docs/configuration.md
Outdated
@@ -1010,6 +1010,19 @@ Apart from these, the following properties are also available, and may be useful | |||
</td> | |||
<td>2.2.1</td> | |||
</tr> | |||
<tr> | |||
<td><code>spark.shuffle.accurateBlockSkewedFactor</code></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the spark.shuffle.accurateBlockThreshold
configuration is already documented in this table. It looks like we're preexistingly inconsistent about alphabetizing this list.
What do you think about either moving this new configuration a bit further down so it's next to spark.shuffle.accurateBlockThreshold
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark.sql.adaptive.skewJoin.skewedPartitionFactor
to config docs page
@@ -1222,6 +1222,19 @@ Apart from these, the following properties are also available, and may be useful | |||
</td> | |||
<td>2.2.1</td> | |||
</tr> | |||
<tr> | |||
<td><code>spark.shuffle.accurateBlockSkewedFactor</code></td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To @timlee0119 and @JoshRosen , shall we remove .internal()
from the configuration definition together?
spark/core/src/main/scala/org/apache/spark/internal/config/package.scala
Lines 1387 to 1389 in bdea091
private[spark] val SHUFFLE_ACCURATE_BLOCK_SKEWED_FACTOR = | |
ConfigBuilder("spark.shuffle.accurateBlockSkewedFactor") | |
.internal() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, the PR title looks wrong to me because we are touching spark.shuffle.accurateBlockSkewedFactor
instead of spark.sql.adaptive....
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would match the description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the typo, I've fixed the PR title and removed internal()
spark.sql.adaptive.skewJoin.skewedPartitionFactor
to config docs pagespark.shuffle.accurateBlockSkewedFactor
to config docs page
spark.shuffle.accurateBlockSkewedFactor
to config docs pagespark.shuffle.accurateBlockSkewedFactor
to config docs page
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you.
spark.shuffle.accurateBlockSkewedFactor
to config docs pagespark.shuffle.accurateBlockSkewedFactor
to config docs page
Merged to master. |
…config docs page ### What changes were proposed in this pull request? `spark.shuffle.accurateBlockSkewedFactor` was added in Spark 3.3.0 in https://issues.apache.org/jira/browse/SPARK-36967 and is a useful shuffle configuration to prevent issues where `HighlyCompressedMapStatus` wrongly estimates the shuffle block sizes when the block size distribution is skewed, which can cause the shuffle reducer to fetch too much data and OOM. This PR adds this config to the Spark config docs page to make it discoverable. ### Why are the changes needed? To make this useful config discoverable by users and make them able to resolve shuffle fetch OOM issues themselves. ### Does this PR introduce _any_ user-facing change? Yes, this is a documentation fix. Before this PR there's no `spark.sql.adaptive.skewJoin.skewedPartitionFactor` in the `Shuffle Behavior` section on [the Configurations page](https://spark.apache.org/docs/latest/configuration.html) and now there is. ### How was this patch tested? On the IDE: <img width="1633" alt="image" src="https://github.com/user-attachments/assets/616a94b9-2408-491c-a17b-c6dbdff14465"> Updated: <img width="1274" alt="image" src="https://github.com/user-attachments/assets/ba170e9a-eba2-4fdf-85eb-a3aebefc055e"> ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48189 from timlee0119/add-accurate-block-skewed-factor-to-doc. Authored-by: Tim Lee <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
…config docs page ### What changes were proposed in this pull request? `spark.shuffle.accurateBlockSkewedFactor` was added in Spark 3.3.0 in https://issues.apache.org/jira/browse/SPARK-36967 and is a useful shuffle configuration to prevent issues where `HighlyCompressedMapStatus` wrongly estimates the shuffle block sizes when the block size distribution is skewed, which can cause the shuffle reducer to fetch too much data and OOM. This PR adds this config to the Spark config docs page to make it discoverable. ### Why are the changes needed? To make this useful config discoverable by users and make them able to resolve shuffle fetch OOM issues themselves. ### Does this PR introduce _any_ user-facing change? Yes, this is a documentation fix. Before this PR there's no `spark.sql.adaptive.skewJoin.skewedPartitionFactor` in the `Shuffle Behavior` section on [the Configurations page](https://spark.apache.org/docs/latest/configuration.html) and now there is. ### How was this patch tested? On the IDE: <img width="1633" alt="image" src="https://github.com/user-attachments/assets/616a94b9-2408-491c-a17b-c6dbdff14465"> Updated: <img width="1274" alt="image" src="https://github.com/user-attachments/assets/ba170e9a-eba2-4fdf-85eb-a3aebefc055e"> ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#48189 from timlee0119/add-accurate-block-skewed-factor-to-doc. Authored-by: Tim Lee <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
What changes were proposed in this pull request?
spark.shuffle.accurateBlockSkewedFactor
was added in Spark 3.3.0 in https://issues.apache.org/jira/browse/SPARK-36967 and is a useful shuffle configuration to prevent issues whereHighlyCompressedMapStatus
wrongly estimates the shuffle block sizes when the block size distribution is skewed, which can cause the shuffle reducer to fetch too much data and OOM. This PR adds this config to the Spark config docs page to make it discoverable.Why are the changes needed?
To make this useful config discoverable by users and make them able to resolve shuffle fetch OOM issues themselves.
Does this PR introduce any user-facing change?
Yes, this is a documentation fix. Before this PR there's no
spark.sql.adaptive.skewJoin.skewedPartitionFactor
in theShuffle Behavior
section on the Configurations page and now there is.How was this patch tested?
On the IDE:
Updated:
Was this patch authored or co-authored using generative AI tooling?
No