Skip to content

Conversation

sim15
Copy link

@sim15 sim15 commented Aug 11, 2025

see discussion #536

This PR adds an allow_unset flag to csp.Struct classes, optionally enforcing strict field validation (raise when any fields are unset).

When allow_unset=False, the struct is said to be "strict."

Summary of changes and points for discussion:

  • On the C++ side,
    • StructField stores an additional m_isOptional bit accessed by isOptional()
    • StructMeta stores an additional m_isStrict bit accessed by isStrict()
    • a new instance validate() method
      • Validation enforces:
        1. all fields local to this instance are set
        2. full chain of inheritance is validated
        3. a non-strict struct may not inherit (directly or indirectly) from a strict base
  • On the Python side,
    • struct instances now have __optional_fields__ and __strict_enabled__ metadata
    • by default, each strict instance is validated upon tp_init, checking that all fields are set
    • _unvalidated__call__ allows the creation of an unvalidated struct instance
      • to allow create-then-set of strict structs (e.g., see _obj_from_python) where validate() is called manually
  • Additional semantics and points of validation (may need to be discussed):
    • raise when a strict struct's field is deleted (directly or indirectly) via setattr
    • raise upon accessing any optional field on a strict ts csp.Struct (at wiring time)
    • raise if all fields aren't set in a strict struct created in fromts and collectts
    • at the adapter level, manually call validate() once a struct is fully created and fields set. differs per-adapter
    • if attempting to convert a JSON to a csp.Struct, validate nested structs
      • We do not validate a top-level parsed struct as the adapter may set further fields manually. Such validation is handled by the adapter directly
    • implicitly, hasattr is true for all strict structs' fields
      • since validation esnures all fields are set
    • if from_dict creates a strict csp.Struct, it is validated

annotate Struct fields as optional and Structs as strict

add validation logic

add preliminary testing

Signed-off-by: Simon Beyzerov <[email protected]>
@sim15 sim15 force-pushed the feature/536-unsetsemantics branch from 537ebe6 to 195165f Compare August 11, 2025 19:09
@sim15 sim15 marked this pull request as ready for review August 11, 2025 20:07
Copy link
Collaborator

@AdamGlustein AdamGlustein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! I only have some minor comments, this is very impressive for a first CSP PR!

CSP_THROW( ValueError, "Struct '" << s -> meta() -> name() << "' has non-strict inheritance of strict base '" << cur -> name() << "'" );

// rule 2: all local fields are set
std::string missing_fields;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this need to account for optional struct fields as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the feedback! Let me know if you spot any other style bugs :)

For this comment, do you mean recursively validating nested strict structs, or checking than an optional field is valid? or am I misunderstanding?

The latter should be implicitly checked by the "Rule 2" loop: to be valid, an optional field should either be given a default or set directly. In either case isSet is true (and false otherwise). All optional fields co-exist in m_fields with the non-optional fields.

The former is a point I wanted to clarify: earlier, I removed recursive validation mostly for performance reasons, assuming any field of strict struct type that is set must've been previously validated (after its creation via meta -> create()).

If created via the Python interface, this should always be the case via tp_init.
From the C++ side, it's enforced manually - as I have it now, all instances of -> create() are followed by a validate(), including in adapters.

If there's future/existing functionality that automatically creates these nested structs w/o immediate validation, recursively validating may be a good move? Maybe there's some other common scenario I'm missing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to be valid, an optional field should either be given a default or set directly"

I guess this is the point I am confused on. If I specify a field as Optional[T] with no default, what happens if I leave it unset? Based on Dennis' comment, it will be defaulted to None and then I agree with you, it will be "set" in C++. Then there's no difference between Optional[T] and Optional[T] = None, which I think is fine...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see based on your tests that using Optional[T] with no default is equivalent to making the field required. This is confusing to me - what if we just don't allow annotating a field as optional with no default? I don't see a good reason why someone would intentionally do this.

Copy link
Author

@sim15 sim15 Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see what you're referring to. Requiring non-defaulted optional fields was my interpretation of the following comment from Rob/the internal discussion notes:

"When opting in, all fields that are not defaulted will be required to be passed into the struct init method. This includes Optional fields that arent defaulted to None."

Was something else meant by this? I thought this was a point of deviation from Dennis' original comment.

In terms of semantics, maybe Dennis' "default optional non-defaulted fields to None" approach is probably more appropriate then? rather than disallowing it

Copy link
Collaborator

@AdamGlustein AdamGlustein Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantically, both interpretations of a struct field annotated Optional[T] with no default seem weird to me:

  1. If we default the value to None, that differs from how Python itself handles an Optional annotation. The user should just provide the default value in that case.
  2. If we don't default the value and make it a "strict" field, then it is confusing that they annotated it as Optional[T] and not T. This seems to me that they made a mistake by forgetting the = None part.

So my vote would be to not allow annotating an Optional struct field without a default - @robambalu @p72dennisxu

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there is a third interpretation here which makes more sense for Optional[T], which is that it's a field that "can be" None but will not be set as None by default. That makes sense to me and aligns with how Python uses the annotation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I think on the Python side we can probably just treat it the same as the strict fields, so the user may either provide a default value or set it every time they create a new instance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Lmk if the this captures these semantics as expected. If we stick with this route, seems like the per-field isOptional bit is not even necessary then, which is a plus. Strictness would enforce that all fields are set after initialization is complete.

Copy link
Collaborator

@robambalu robambalu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment, it looks like we are missing updates to autogen to also do validation when autogen-ed structs are created.
Another option is not to have validate calls in every adapter at all and instead specialize the pushTick call on struct types and have it call validate

@@ -113,6 +113,8 @@ void KafkaInputAdapter::processMessage( RdKafka::Message* message, bool live, cs
if( m_tickTimestampField )
msgTime = m_tickTimestampField->value<DateTime>(tick.get());

tick.get() -> validate();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to be careful on all of the realtime adapters that are now validating. Keep in mind this is happening on a separate thread and an uncaught exception will crash the process.
Just need to make sure all of these validation calls are already under existing try/catch blocks ( some of them may already be )

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some new changes that hopefully deal with this, though a sanity check is definitely needed. Left some corresponding comments as well.

@sim15 sim15 force-pushed the feature/536-unsetsemantics branch from 6f7864e to 1c9aa67 Compare August 13, 2025 19:12
@sim15 sim15 force-pushed the feature/536-unsetsemantics branch from 1c9aa67 to 2af2edf Compare August 13, 2025 19:18
@timkpaine timkpaine added the type: feature Issues and PRs related to new features label Aug 14, 2025
Simon Beyzerov added 4 commits August 14, 2025 16:44
@@ -113,6 +113,10 @@ void KafkaInputAdapter::processMessage( RdKafka::Message* message, bool live, cs
if( m_tickTimestampField )
msgTime = m_tickTimestampField->value<DateTime>(tick.get());

if (!tick.get() -> validate())
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

processMessage is called in KafkaSubscriber.cpp and in line 37 should catch this exception, I think

@@ -520,6 +520,9 @@ void ParquetStructAdapter::dispatchValue( const utils::Symbol *symbol, bool isNu
{
fieldSetter( s );
}

CSP_TRUE_OR_THROW_RUNTIME( s -> validate(), "Struct validation failed for Parquet message, some fields are missing" );
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be double-checked to make sure things are being caught properly. From the call graph of dispatchValue it seems like its callers are StructColumnAdapter::readCurValue() $\rightarrow$ SingleTableParquetReader::readNextRow() $\rightarrow$ SingleTableParquetReader::start() or SingleTableParquetReader::skipRow() or MultipleFileParquetReader::dispatchRow, which eventually seem to end up in calls that have some runtime throw of this same type (e.g., ParquetInputAdapterManager::processNextSimTimeSlice). So seems like a runtime error would be caught OK?

All the way at AdapterManager::start level there's also a ValueErr catch.

@@ -145,6 +145,9 @@ StructPtr JSONMessageStructConverter::convertJSON( const char * fieldname, const
} );
}

if( !struct_ -> validate() )
Copy link
Author

@sim15 sim15 Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This propagates to the processMessage in this file via asStruct, and then to the websocket ClientAdapterManager, which we add a new try/catch block.

@@ -52,8 +52,15 @@ void ClientAdapterManager::start( DateTime starttime, DateTime endtime )
if( m_inputAdapter ) {
m_endpoint -> setOnMessage(
[ this ]( void* c, size_t t ) {
PushBatch batch( m_engine -> rootEngine() );
m_inputAdapter -> processMessage( c, t, &batch );
try
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super familiar with the most natural way of handling this, so this should be double-checked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature Issues and PRs related to new features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants