Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51443] Fix singleVariantColumn in DSv2 and readStream. #50217

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

chenhao-db
Copy link
Contributor

What changes were proposed in this pull request?

The current JSON singleVariantColumn mode doesn't work in DSv2 and spark.readStream. This PR fixes the two cases:

  • DSv1 calls JsonFileFormat.inferSchema, which calls JsonFileFormat.inferSchema; DSv2 calls JsonFileFormat.inferSchema. The previous singleVariantColumn code was in JsonFileFormat.inferSchema, and is now moved into JsonFileFormat.inferSchema, so that both cases can be covered.
  • spark.readStream requires that there must be a user-specified schema. singleVariantColumn plays the same row as a user-specified schema, but the check would fail.

It also includes a small refactor that moves the option name definition singleVariantColumn from JSONOptions to DataSourceOptions. It will be a common option name shared by multiple data sources (e.g., CSV) when we add the implementation in the future.

Why are the changes needed?

It is a bug fix that improves the usability of variant.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test. A test previously in VariantSuite is moved to JsonSuite, so that we can test the read behavior in both JsonV1Suite and JsonV2Suite. The test is also extended to include spark.readStream.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Mar 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant