[SPARK-50858][PYTHON] Add configuration to hide Python UDF stack trace #49535
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Add new configuration
spark.sql.execution.pyspark.udf.hideTraceback.enabled
. If set, when handling an exception from Python UDF, only the exception class and message are included. The configuration is turned off by default.Suggested review order:
python/pyspark/util.py
: logic changespython/pyspark/tests/test_util.py
: unit testsWhy are the changes needed?
This allows library provided UDFs to show only the relevant message without unnecessary stack trace.
Does this PR introduce any user-facing change?
If the configuration is turned off, no user change.
Otherwise, the stack trace is not included in the error message when handling an exception from Python UDF.
Example that illustrates the difference
With configuration turned off, the last line gives:
With configuration turned on, the last line gives:
How was this patch tested?
Added unit test in
python/pyspark/tests/test_util.py
, testing two cases with the configuration turned on and off respectively.Was this patch authored or co-authored using generative AI tooling?
No