-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add support for Float16 type in substrait #16793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you @jatin510 @gabotechs or @LiaCastaneda would you have time to review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should follow another approach based on UDTs for shipping support for F16s.
I myself asked the question about whether it was fine to use a type variation ref for F16s here substrait-io/substrait#822, and this was the response:
No. Type variations are different encodings for a type (e.g. dictionary, string view) and they must be able to map 1:1 with the base type.
I don't think there's any precedence about using a UDT for representing an arrow type in Substrait, but maybe this is the first use-case.
made some changes @gabotechs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me! unless Gabriel thinks otherwise since he’s leading the epic.
DataType::Float16 => Ok(substrait::proto::Type { | ||
kind: Some(r#type::Kind::UserDefined(r#type::UserDefined { | ||
type_reference: FLOAT16_TYPE_REF, | ||
type_variation_reference: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use DEFAULT_TYPE_VARIATION_REF
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is starting to look good! unfortunately I think we might be missing some important details when working with UDTs in Substrait.
When working with User Define Types and User Defined Functions in Substrait, their references need to appear at the top level node that represents the plan:
https://github.com/substrait-io/substrait/blob/main/proto/substrait/plan.proto#L32-L35
Luckily, DataFusion already provides tooling for registering User Defined Types in the Substrait plan:
pub fn register_type(&mut self, type_name: String) -> u32 { |
As there is no precedence in generating UDTs out of DataFusion plans, several necessary pieces are still not there, and the work pending might not be trivial. For example, I see the SubstraitProducer
trait having a register_function
method, but it does not have a register_type
method (
fn register_function(&mut self, signature: String) -> u32; |
SubstraitProducer
will need be threaded to every place that can potentially produce a new UDT. The actual number identifying the UDT should probably come from this new register_type
method, rather than being hardcoded in a constant, same as for UDFs.
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
This commit adds support for the Arrow Float16 type in Substrait plans.
Are these changes tested?
Yes
Are there any user-facing changes?
Add support for Arrow Float16 type in Substrait plans