-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: schema validation #663
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is large, to help with review there are two halves to the changes:
- The half that given a model or a model instance can validate it
- The half that given a datafusion record batch of events calls the validation code from the first half
See the pipeline/src/aggregator/validation
for the first half and the src/pipeline/aggregator/mod.rs
for the second half.
.collect() | ||
.await?; | ||
Ok(concat_batches(&schemas::event_states(), &ordered_events)?) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is the heart of all the changes. Basically the aggregator flow now includes four explicit steps.
- Join events with previous
- Apply patch/updates based on stream type specific rules
- Validate newly updated events
- Store events with their validation status
|
||
/// Helpers to validate interface details. | ||
impl InterfaceUtil { | ||
/// TODO: this is basically unimplemented, but https://github.com/getsentry/json-schema-diff looks super promising! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file has a few TODOs. This logic is currently unused. Validating that models implement interfaces will be a follow up change as this one is large enough. This scaffold code simply shows where it will fit it.
3446426
to
a568424
Compare
a568424
to
2452b54
Compare
337c85a
to
55c04d4
Compare
With this change models and model instance documents are validated. The validation rules are in large part the same as in composedb with a few differences: * Models can now be updated (more below on this) * Relations are only partially validated, we ensure the field is a valid stream id of the correct type and nothing more. Applications building on Ceramic can extend this validation to load the stream. Models streams now allow data events that update the model. These are the rules of updating a model: * Only the name, description, implements and schema fields of a model may be modified. All other fields are immutable. * The schema must not make a breaking change * An interface may not be updated A model instance document stream now supports a new header value `modelVersion` that defines which version of a model to use when validating the document. The `modelVersion` must be set to the CID (not stream id) of event within the model stream corresponding to the version of the model used to validate the instance. If not set the modelVersion is defined to be the CID of the init event of the model, aka the CID part of the model stream id. Note the model version is mutable so a stream that initially validated against one version of a model may change to validate against a newer version of the model. This change is explicit and required. The result of these rules is that models can publish new backwards compatible versions. Model instances can explicitly update to those new versions by both updating their document and updating the model version header thus ensuring that the new document structure is correctly validated. Existing applications that take no action will see both old and new model instance documents however as the schema changes are backwards compatible the application will have no ill effects. Applications that update to handle the new schema can do so while handling cases where many model instances will still use the old schema. Additionally the mechanism that requires the controller of a model instance to update to a new version of a model means that a model publisher cannot invalidate existing model instances by making changes to the model.
Prior to this change it was assumed that the index column was a global ordering value for all tables in the pipeline. However that cannot be true as event_states have stricter order constraints than conclusion events. As such this change breaks that pattern with explicit order columns and their meaning. This is prework to be able to correctly buffer and reorder event states according to cross stream dependencies.
55c04d4
to
c950217
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a2c02ce
to
bff202c
Compare
With this change models and model instance documents are validated. The validation rules are in large part the same as in composedb with a few differences:
Models streams now allow data events that update the model. These are the rules of updating a model:
A model instance document stream now supports a new header value
modelVersion
that defines which version of a model to use when validating the document. ThemodelVersion
must be set to the CID (not stream id) of event within the model stream corresponding to the version of the model used to validate the instance. If not set the modelVersion is defined to be the CID of the init event of the model, aka the CID part of the model stream id. Note the model version is mutable so a stream that initially validated against one version of a model may change to validate against a newer version of the model. This change is explicit and required.The result of these rules is that models can publish new backwards compatible versions. Model instances can explicitly update to those new versions by both updating their document and updating the model version header thus ensuring that the new document structure is correctly validated. Existing applications that take no action will see both old and new model instance documents however as the schema changes are backwards compatible the application will have no ill effects. Applications that update to handle the new schema can do so while handling cases where many model instances will still use the old schema.
Additionally the mechanism that requires the controller of a model instance to update to a new version of a model means that a model publisher cannot invalidate existing model instances by making changes to the model.