Workflow operators for Batch scenarios #925
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a custom Airflow operator (ChrononSparkSubmitOperator) and a Dagster asset (chronon_spark_job) to model batch workflow execution for Apache Chronon. These implementations allow validation, serialization, and submission of Chronon Thrift objects as Spark jobs.
Why / Goal
This PR unlocks seamless batch processing for Chronon transformations (e.g., GroupBy, Join, Staging). The goal is to:
✅ Standardize job submission across orchestration platforms (Airflow & Dagster).
✅ Ensure validation of Chronon configs before execution.
✅ Automate serialization of Thrift objects into JSON.
Impact
Test Plan
Unit Tests: Added tests for validation and serialization of Thrift objects.
CI Coverage: Ensured that existing tests pass with the new operators.
Integration Testing:
Checklist
Reviewers