-
Notifications
You must be signed in to change notification settings - Fork 93
handle struct-based UDT literals in core #613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
133b704 to
e4f36ea
Compare
e4f36ea to
cef9798
Compare
cef9798 to
f3379c1
Compare
|
FYI, this PR makes no attempt to actually validate if the struct representation provided matches the definition in the yaml file. I think that this is the right thing to do, but it turned out to be slightly more complicated than I thought, as it involves threading an |
Support both opaque (google.protobuf.Any) and structured (Literal.Struct) encodings for user-defined type literals per Substrait spec. - Split UserDefinedLiteral into UserDefinedAny and UserDefinedStruct - Move type parameters to interface level for parameterized types - Comprehensive test coverage including roundtrip tests - Throw exception on unhandled struct-based representation in isthmus
1eb4fb4 to
f5b6341
Compare
| extensionCollector.getTypeReference(SimpleExtension.TypeAnchor.of(expr.urn(), expr.name())); | ||
| return lit( | ||
| bldr -> { | ||
| try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exception doesn't happen anymore because we don't parse the Any here. Instead, we have a reference to the pre-parsed proto directly.
| public ParameterizedType userDefined( | ||
| int ref, java.util.List<io.substrait.type.Type.Parameter> typeParameters) { | ||
| throw new UnsupportedOperationException( | ||
| "User defined types are not supported in Parameterized Types for now"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is consistent with the above, where we don't yet support ParamerizedType conversion.
| public DerivationExpression userDefined( | ||
| int ref, java.util.List<io.substrait.type.Type.Parameter> typeParameters) { | ||
| throw new UnsupportedOperationException( | ||
| "User defined types are not supported in Derivation Expressions for now"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also consistent with the above.
core/src/main/java/io/substrait/type/proto/TypeProtoConverter.java
Outdated
Show resolved
Hide resolved
| public RexNode visit(Expression.UserDefinedStructLiteral expr, Context context) | ||
| throws RuntimeException { | ||
| throw new UnsupportedOperationException( | ||
| "UserDefinedStructLiteral representation is not yet supported in Isthmus"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is #612
78488b4 to
e8cb862
Compare
vbarua
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started taking a look at this and left some comments. Mostly looks reasonable. Want to come back and look at your tests with fresh eyes, and also think about parameterized types with a fresh 🧠.
| return Type.withNullability(nullable()).userDefined(urn(), name()); | ||
| public abstract List<io.substrait.type.Type.Parameter> typeParameters(); | ||
|
|
||
| public abstract List<Literal> fields(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use a List<Literal> here instead of just a Expression.StructLiteral, which would map directly to the protobuf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sharing this comment because I believe it is relevant.
Basically it is the same issue, which is that the POJO class called StructLiteral is representing a special case of the Literal class.
message Literal { // Both StructLiteral and UserDefinedStructLiteral are representing this _whole_ message
oneof literal_type {
...
Struct struct = 25;
...
UserDefined user_defined = 33;
}
...
message Struct {
// A possibly heterogeneously typed list of literals
repeated Literal fields = 1;
}
message UserDefined {
oneof type_anchor_type {
// points to a type_anchor defined in this plan
uint32 type_reference = 1;
// points to a type_alias_anchor defined in this plan.
uint32 type_alias_reference = 5;
}
// The parameters to be bound to the type class, if the type class is
// parameterizable.
repeated Type.Parameter type_parameters = 3;
// a user-defined literal can be encoded in one of two ways
oneof val {
// the value of the literal, serialized using some type-specific protobuf message
google.protobuf.Any value = 2;
// the value of the literal, serialized using the structure definition in its declaration
Literal.Struct struct = 4;
}
}
}Back to your comment, switching this member variable to be Expression.StructLiteral would amount to embedding one literal proto inside of another. For example, the proto Struct doesn't actually have a nullable field. But the POJO Expression.StructLiteral does have a nullable field because it inherits from Literal.
This doesn't mean that we couldn't replace the member variable as you suggest, but if we did do that, we would be carrying around extra meaningless data, which I think is more confusing ultimately.
| * User-defined literal with value encoded as {@code Literal.Struct}. | ||
| * | ||
| * <p>This encoding uses a structured list of fields to represent the literal value. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring feels a little inconsistent. You have
literal with value encoded as {@code Literal.Struct}
but in the class the values are encoded as a List<Literal>, and the second part of the docstring is consistent with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that this could be made less confusing. The intention was to mean the proto Literal.Struct, no some java class. What do you think about just altering the message to say
User-defined literal with value encoded via the proto message {@code Literal.Struct}.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I altered it to instead link to the actual proto. Let me know what you think! I also did the same for the any case.
| @Override | ||
| public abstract List<io.substrait.type.Type.Parameter> typeParameters(); | ||
|
|
||
| public abstract com.google.protobuf.Any value(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Release Notes
Capturing the value as an Any instead of a ByteString does feel nicer ✨
| * @see UserDefinedAnyLiteral | ||
| * @see UserDefinedStructLiteral | ||
| */ | ||
| interface UserDefinedLiteral extends Literal { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Release Notes
We should call out that we don't construct UserDefinedLiterals anymore.
| * parameters (like the {@code 10} in {@code VARCHAR<10>}). This interface provides a type-safe | ||
| * representation of all possible parameter kinds. | ||
| */ | ||
| interface Parameter {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the function of https://github.com/substrait-io/substrait-java/blob/main/core/src/main/java/io/substrait/function/ParameterizedType.java applicable here? It feels vaguely related.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 yes that is an interesting point. Looking into it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as I understand it, ParameterizedType.java is used for representing abstract types with parameters in yaml files. Where as the Parameter above being introduced is actually a concrete argument passed into the type.
So for example, List<any1> could be a ParameterizedType, whereas List<int32> is a type with parameters [int32].
| @Value.Immutable | ||
| abstract class ParameterBooleanValue implements Parameter { | ||
| public abstract boolean value(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a usecase for having boolean literal type parameters? I can't imagine a usecase where something like MySpecialType<false> is something you would need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷 just wanted to be consistent with the spec
| @Value.Immutable | ||
| abstract class ParameterEnumValue implements Parameter { | ||
| public abstract String value(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would this type be used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, these were inspired by this portion of the simple extension schema:
type_param_defs: # an array of compound type parameter definitions
type: array
items:
...
properties:
...
type: # expected metatype for the parameter
type: string
enum:
- dataType
- boolean
- integer
- enumeration
- stringSo while I don't understand the usage of it, I thought it was best to include all of them for consistency with the spec.
| @Value.Immutable | ||
| abstract class ParameterStringValue implements Parameter { | ||
| public abstract String value(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would this type be used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fea5921 to
77f0bd7
Compare
3814053 to
4b4f5ee
Compare
b2742da to
3732cf9
Compare
|
FYI, I have a WIP PR for implementing this in Isthmus but I split it into two PRs because the code there is more complicated. This keeps the PRs a bit smaller! |
Support both opaque (google.protobuf.Any) and
structured (Literal.Struct) encodings for user-defined type literals per Substrait spec.
Closes #611