Skip to content

Add support for Float16 type in substrait #16793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jatin510
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

This commit adds support for the Arrow Float16 type in Substrait plans.

Are these changes tested?

Yes

Are there any user-facing changes?

Add support for Arrow Float16 type in Substrait plans

@github-actions github-actions bot added the substrait Changes to the substrait crate label Jul 15, 2025
@alamb
Copy link
Contributor

alamb commented Jul 15, 2025

Thank you @jatin510

@gabotechs or @LiaCastaneda would you have time to review this PR?

Copy link
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should follow another approach based on UDTs for shipping support for F16s.

I myself asked the question about whether it was fine to use a type variation ref for F16s here substrait-io/substrait#822, and this was the response:

No. Type variations are different encodings for a type (e.g. dictionary, string view) and they must be able to map 1:1 with the base type.

I don't think there's any precedence about using a UDT for representing an arrow type in Substrait, but maybe this is the first use-case.

@jatin510
Copy link
Contributor Author

made some changes @gabotechs

Copy link
Contributor

@LiaCastaneda LiaCastaneda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me! unless Gabriel thinks otherwise since he’s leading the epic.

DataType::Float16 => Ok(substrait::proto::Type {
kind: Some(r#type::Kind::UserDefined(r#type::UserDefined {
type_reference: FLOAT16_TYPE_REF,
type_variation_reference: 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use DEFAULT_TYPE_VARIATION_REF?

Copy link
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look good! unfortunately I think we might be missing some important details when working with UDTs in Substrait.

When working with User Define Types and User Defined Functions in Substrait, their references need to appear at the top level node that represents the plan:

https://github.com/substrait-io/substrait/blob/main/proto/substrait/plan.proto#L32-L35

Luckily, DataFusion already provides tooling for registering User Defined Types in the Substrait plan:

pub fn register_type(&mut self, type_name: String) -> u32 {

As there is no precedence in generating UDTs out of DataFusion plans, several necessary pieces are still not there, and the work pending might not be trivial. For example, I see the SubstraitProducer trait having a register_function method, but it does not have a register_type method (

fn register_function(&mut self, signature: String) -> u32;
) it probably needs to be added, and a mutable reference to the SubstraitProducer will need be threaded to every place that can potentially produce a new UDT. The actual number identifying the UDT should probably come from this new register_type method, rather than being hardcoded in a constant, same as for UDFs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
substrait Changes to the substrait crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[substrait] [sqllogictest] Unsupported cast type: Float16
4 participants