-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-50858][PYTHON] Add configuration to hide Python UDF stack trace #49535
Changes from all commits
1e2a62c
e038bb8
c5999fb
31261b1
3f509be
c4f40b0
8901680
6245886
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3475,6 +3475,15 @@ object SQLConf { | |
.checkValues(Set("legacy", "row", "dict")) | ||
.createWithDefaultString("legacy") | ||
|
||
val PYSPARK_HIDE_TRACEBACK = | ||
buildConf("spark.sql.execution.pyspark.udf.hideTraceback.enabled") | ||
.doc( | ||
"When true, only show the message of the exception from Python UDFs, " + | ||
"hiding the stack trace. If this is enabled, simplifiedTraceback has no effect.") | ||
.version("4.0.0") | ||
.booleanConf | ||
.createWithDefault(false) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. another way is to create this conf as an int, and show the max depth of stacktrace but I don't feel strongly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a use case where we only want to show only last k frames of the stack? I'm under the impression that we want to show full stack trace for most exceptions, and completely hide stack trace for specific library exceptions when the message is sufficient to identify the reason. |
||
|
||
val PYSPARK_SIMPLIFIED_TRACEBACK = | ||
buildConf("spark.sql.execution.pyspark.udf.simplifiedTraceback.enabled") | ||
.doc( | ||
|
@@ -6286,6 +6295,8 @@ class SQLConf extends Serializable with Logging with SqlApiConf { | |
|
||
def pandasStructHandlingMode: String = getConf(PANDAS_STRUCT_HANDLING_MODE) | ||
|
||
def pysparkHideTraceback: Boolean = getConf(PYSPARK_HIDE_TRACEBACK) | ||
|
||
def pysparkSimplifiedTraceback: Boolean = getConf(PYSPARK_SIMPLIFIED_TRACEBACK) | ||
|
||
def pandasGroupedMapAssignColumnsByName: Boolean = | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use
conf.get(PYSPARK_HIDE_TRACEBACK)
here so that we don't need to override every subclass?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config is defined in
org.apache.spark.sql.internal.SQLConf
which seems to be inaccessible from here. For reference,PYSPARK_SIMPLIFIED_TRACEBACK
is also defined inSQLConf
soBasePythonRunner
subclasses have to override it.Is there an advantage for putting it in
SQLConf
rather than e.g.org.apache.spark.internal.config.Python
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conf in
SQLConf
is session-based conf that also can be set in runtime, and any conf incore
module orStaticSQLConf
is cluster-wide conf and can't be changed while the cluster is running.