[SPARK-51443] Fix singleVariantColumn in DSv2 and readStream. #50217
+80
−51
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
The current JSON
singleVariantColumn
mode doesn't work in DSv2 andspark.readStream
. This PR fixes the two cases:JsonFileFormat.inferSchema
, which callsJsonFileFormat.inferSchema
; DSv2 callsJsonFileFormat.inferSchema
. The previoussingleVariantColumn
code was inJsonFileFormat.inferSchema
, and is now moved intoJsonFileFormat.inferSchema
, so that both cases can be covered.spark.readStream
requires that there must be a user-specified schema.singleVariantColumn
plays the same row as a user-specified schema, but the check would fail.It also includes a small refactor that moves the option name definition
singleVariantColumn
fromJSONOptions
toDataSourceOptions
. It will be a common option name shared by multiple data sources (e.g., CSV) when we add the implementation in the future.Why are the changes needed?
It is a bug fix that improves the usability of variant.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Unit test. A test previously in
VariantSuite
is moved toJsonSuite
, so that we can test the read behavior in bothJsonV1Suite
andJsonV2Suite
. The test is also extended to includespark.readStream
.Was this patch authored or co-authored using generative AI tooling?
No.