-
Notifications
You must be signed in to change notification settings - Fork 1.9k
WIP: using arrow-avro. remove own implementation #17861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
❤️ amazing! Thank you @getChan |
|
Hi @getChan -- I am preparing to make an arrow release -- have you hit any blockers while integrating the new arrow-avro crate into DataFusion? |
No, not yet. Thanks for release. |
|
Thanks for jumping on this @getChan; let me know if I can help! |
# Conflicts: # Cargo.lock # Cargo.toml # datafusion/common/Cargo.toml
|
FYI I merged the arrow 57 upgrade to DataFusion -- so if you rebase this PR against main you'll have access to the new arrow-avro crate |
# Conflicts: # Cargo.lock
| ---- | ||
| logical_plan TableScan: avro_table projection=[f1, f2, f3] | ||
| physical_plan DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/testing/data/avro/simple_enum.avro]]}, projection=[f1, f2, f3], file_type=avro | ||
| physical_plan DataSourceExec: file_groups={4 groups: [[WORKSPACE_ROOT/testing/data/avro/simple_enum.avro:0..103], [WORKSPACE_ROOT/testing/data/avro/simple_enum.avro:103..206], [WORKSPACE_ROOT/testing/data/avro/simple_enum.avro:206..309], [WORKSPACE_ROOT/testing/data/avro/simple_enum.avro:309..411]]}, projection=[f1, f2, f3], file_type=avro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is pretty neat
# Conflicts: # Cargo.lock # Cargo.toml # datafusion/datasource-avro/src/avro_to_arrow/arrow_array_reader.rs # datafusion/datasource-avro/src/avro_to_arrow/schema.rs # datafusion/datasource-avro/src/source.rs
# Conflicts: # Cargo.lock
|
There is a compatibility issue with projection. I'm waiting for a release of arrow-avro that includes the necessary projection features.
|
|
Thanks for the update @getChan If the fix already included in arrow-avro (and you are waiting on a release), you could rebase this PR against this branch #19355 to get access to the pre-release code We would have to wait for the arrow release to actually merge it but it could potentially help unblock your work I actually would love to get some validation that we can cut over to the new arrow-avro reader before we make the next arrow release (so we can fix any issue that might be found) |
@alamb I'm going to start working on apache/arrow-rs#8923 early next week and should have a PR up before Jan 1st. |
# Conflicts: # Cargo.lock # Cargo.toml # datafusion/core/src/datasource/file_format/avro.rs # datafusion/datasource-avro/src/avro_to_arrow/schema.rs # datafusion/datasource-avro/src/source.rs
This reverts commit d62b9d3.
|
Hello. @alamb. |
This is great news @getChan -- thank you! Very exciting. What API change are you referring to? If you mean API changes for any users of DataFusion, then I think the best thing to do is the same as for any breaking public API change. It should documented in the upgrading.md guide: |
Which issue does this PR close?
arrow-avrofor performance and improved type support #14097.todo list
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
AvroErrortimestamp-*(UTC) types that used to be read with None timezone are now read as UTC. The None timezone applies to thelocal-timestamp-*types.