-
Notifications
You must be signed in to change notification settings - Fork 63
allow opt-in for new unset field semantics on csp.Struct (disc #536) #574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
annotate Struct fields as optional and Structs as strict add validation logic add preliminary testing Signed-off-by: Simon Beyzerov <[email protected]>
537ebe6
to
195165f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! I only have some minor comments, this is very impressive for a first CSP PR!
cpp/csp/engine/Struct.cpp
Outdated
CSP_THROW( ValueError, "Struct '" << s -> meta() -> name() << "' has non-strict inheritance of strict base '" << cur -> name() << "'" ); | ||
|
||
// rule 2: all local fields are set | ||
std::string missing_fields; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this need to account for optional struct fields as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the feedback! Let me know if you spot any other style bugs :)
For this comment, do you mean recursively validating nested strict structs, or checking than an optional field is valid? or am I misunderstanding?
The latter should be implicitly checked by the "Rule 2" loop: to be valid, an optional field should either be given a default or set directly. In either case isSet
is true
(and false
otherwise). All optional fields co-exist in m_fields
with the non-optional fields.
The former is a point I wanted to clarify: earlier, I removed recursive validation mostly for performance reasons, assuming any field of strict struct type that is set must've been previously validated (after its creation via meta -> create()
).
If created via the Python interface, this should always be the case via tp_init
.
From the C++ side, it's enforced manually - as I have it now, all instances of -> create()
are followed by a validate()
, including in adapters.
If there's future/existing functionality that automatically creates these nested structs w/o immediate validation, recursively validating may be a good move? Maybe there's some other common scenario I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"to be valid, an optional field should either be given a default or set directly"
I guess this is the point I am confused on. If I specify a field as Optional[T]
with no default, what happens if I leave it unset? Based on Dennis' comment, it will be defaulted to None
and then I agree with you, it will be "set" in C++. Then there's no difference between Optional[T]
and Optional[T] = None
, which I think is fine...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see based on your tests that using Optional[T]
with no default is equivalent to making the field required. This is confusing to me - what if we just don't allow annotating a field as optional with no default? I don't see a good reason why someone would intentionally do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see what you're referring to. Requiring non-defaulted optional fields was my interpretation of the following comment from Rob/the internal discussion notes:
"When opting in, all fields that are not defaulted will be required to be passed into the struct init method. This includes Optional fields that arent defaulted to None."
Was something else meant by this? I thought this was a point of deviation from Dennis' original comment.
In terms of semantics, maybe Dennis' "default optional non-defaulted fields to None
" approach is probably more appropriate then? rather than disallowing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semantically, both interpretations of a struct field annotated Optional[T]
with no default seem weird to me:
- If we default the value to
None
, that differs from how Python itself handles an Optional annotation. The user should just provide the default value in that case. - If we don't default the value and make it a "strict" field, then it is confusing that they annotated it as
Optional[T]
and notT
. This seems to me that they made a mistake by forgetting the= None
part.
So my vote would be to not allow annotating an Optional struct field without a default - @robambalu @p72dennisxu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there is a third interpretation here which makes more sense for Optional[T]
, which is that it's a field that "can be" None but will not be set as None by default. That makes sense to me and aligns with how Python uses the annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I think on the Python side we can probably just treat it the same as the strict fields, so the user may either provide a default value or set it every time they create a new instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Lmk if the this captures these semantics as expected. If we stick with this route, seems like the per-field isOptional bit is not even necessary then, which is a plus. Strictness would enforce that all fields are set after initialization is complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more comment, it looks like we are missing updates to autogen to also do validation when autogen-ed structs are created.
Another option is not to have validate calls in every adapter at all and instead specialize the pushTick call on struct types and have it call validate
@@ -113,6 +113,8 @@ void KafkaInputAdapter::processMessage( RdKafka::Message* message, bool live, cs | |||
if( m_tickTimestampField ) | |||
msgTime = m_tickTimestampField->value<DateTime>(tick.get()); | |||
|
|||
tick.get() -> validate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to be careful on all of the realtime adapters that are now validating. Keep in mind this is happening on a separate thread and an uncaught exception will crash the process.
Just need to make sure all of these validation calls are already under existing try/catch blocks ( some of them may already be )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some new changes that hopefully deal with this, though a sanity check is definitely needed. Left some corresponding comments as well.
6f7864e
to
1c9aa67
Compare
Signed-off-by: Simon Beyzerov <[email protected]>
1c9aa67
to
2af2edf
Compare
Signed-off-by: Simon Beyzerov <[email protected]>
Signed-off-by: Simon Beyzerov <[email protected]>
Signed-off-by: Simon Beyzerov <[email protected]>
Signed-off-by: Simon Beyzerov <[email protected]>
@@ -113,6 +113,10 @@ void KafkaInputAdapter::processMessage( RdKafka::Message* message, bool live, cs | |||
if( m_tickTimestampField ) | |||
msgTime = m_tickTimestampField->value<DateTime>(tick.get()); | |||
|
|||
if (!tick.get() -> validate()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
processMessage
is called in KafkaSubscriber.cpp
and in line 37 should catch this exception, I think
@@ -520,6 +520,9 @@ void ParquetStructAdapter::dispatchValue( const utils::Symbol *symbol, bool isNu | |||
{ | |||
fieldSetter( s ); | |||
} | |||
|
|||
CSP_TRUE_OR_THROW_RUNTIME( s -> validate(), "Struct validation failed for Parquet message, some fields are missing" ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be double-checked to make sure things are being caught properly. From the call graph of dispatchValue
it seems like its callers are StructColumnAdapter::readCurValue()
SingleTableParquetReader::readNextRow()
SingleTableParquetReader::start()
or SingleTableParquetReader::skipRow()
or MultipleFileParquetReader::dispatchRow
, which eventually seem to end up in calls that have some runtime throw of this same type (e.g., ParquetInputAdapterManager::processNextSimTimeSlice
). So seems like a runtime error would be caught OK?
All the way at AdapterManager::start
level there's also a ValueErr
catch.
@@ -145,6 +145,9 @@ StructPtr JSONMessageStructConverter::convertJSON( const char * fieldname, const | |||
} ); | |||
} | |||
|
|||
if( !struct_ -> validate() ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This propagates to the processMessage
in this file via asStruct
, and then to the websocket ClientAdapterManager
, which we add a new try/catch block.
@@ -52,8 +52,15 @@ void ClientAdapterManager::start( DateTime starttime, DateTime endtime ) | |||
if( m_inputAdapter ) { | |||
m_endpoint -> setOnMessage( | |||
[ this ]( void* c, size_t t ) { | |||
PushBatch batch( m_engine -> rootEngine() ); | |||
m_inputAdapter -> processMessage( c, t, &batch ); | |||
try |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not super familiar with the most natural way of handling this, so this should be double-checked.
see discussion #536
This PR adds an
allow_unset
flag tocsp.Struct
classes, optionally enforcing strict field validation (raise when any fields are unset).When
allow_unset=False
, the struct is said to be "strict."Summary of changes and points for discussion:
StructField
stores an additionalm_isOptional
bit accessed byisOptional()
StructMeta
stores an additionalm_isStrict
bit accessed byisStrict()
validate()
method__optional_fields__
and__strict_enabled__
metadatatp_init
, checking that all fields are set_unvalidated__call__
allows the creation of an unvalidated struct instance_obj_from_python
) wherevalidate()
is called manuallysetattr
csp.Struct
(at wiring time)fromts
andcollectts
validate()
once a struct is fully created and fields set. differs per-adaptercsp.Struct
, validate nested structshasattr
is true for all strict structs' fieldsfrom_dict
creates a strictcsp.Struct
, it is validated