-
Notifications
You must be signed in to change notification settings - Fork 200
feat: make parquet native scan schema case insensitive #1575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1460,6 +1460,25 @@ class ParquetReadV1Suite extends ParquetReadSuite with AdaptiveSparkPlanHelper { | |
v1 = Some("parquet")) | ||
} | ||
} | ||
|
||
test("test V1 parquet native scan -- case insensitive") { | ||
withTempPath { path => | ||
spark.range(10).toDF("a").write.parquet(path.toString) | ||
Seq(CometConf.SCAN_NATIVE_DATAFUSION, CometConf.SCAN_NATIVE_ICEBERG_COMPAT).foreach( | ||
scanMode => { | ||
withSQLConf(CometConf.COMET_NATIVE_SCAN_IMPL.key -> scanMode) { | ||
withTable("test") { | ||
sql("create table test (A long) using parquet options (path '" + path + "')") | ||
val df = sql("select A from test") | ||
checkSparkAnswer(df) | ||
// TODO: pushed down filters do not used schema adapter in datafusion, will cause empty result | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This may need improvement in datafusion, possibly relevant code: https://github.com/apache/datafusion/blob/18feb8b2702b96a8a77ec4bc52fb67571e857d4d/datafusion/datasource-parquet/src/opener.rs#L86 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we should fix this in the parquet reader (parquet by itself does not specify whether the field names are case sensitive/insensitive). |
||
// val df = sql("select * from test where A > 5") | ||
// checkSparkAnswer(df) | ||
} | ||
} | ||
}) | ||
} | ||
} | ||
} | ||
|
||
class ParquetReadV2Suite extends ParquetReadSuite with AdaptiveSparkPlanHelper { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it makes sense to take the value from
spark.sql.caseSensitive
although this is an internal config and false by defaultThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed there is a TODO comment above, maybe we can make them configurable in the future
datafusion-comet/native/core/src/parquet/parquet_exec.rs
Line 107 in c58f261