Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow operators for Batch scenarios #925

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

krisnaru
Copy link
Contributor

Summary

This PR introduces a custom Airflow operator (ChrononSparkSubmitOperator) and a Dagster asset (chronon_spark_job) to model batch workflow execution for Apache Chronon. These implementations allow validation, serialization, and submission of Chronon Thrift objects as Spark jobs.

Airflow: Extends SparkSubmitOperator to take a Thrift object as input, validate it, serialize it to JSON, and pass it to Spark.
Dagster: Implements a Dagster asset that performs the same workflow using SparkSubmitTaskDefinition

Why / Goal

This PR unlocks seamless batch processing for Chronon transformations (e.g., GroupBy, Join, Staging). The goal is to:
✅ Standardize job submission across orchestration platforms (Airflow & Dagster).
✅ Ensure validation of Chronon configs before execution.
✅ Automate serialization of Thrift objects into JSON.
Impact

Enables scalable & reproducible data workflows for batch feature generation.
Reduces manual handling of Thrift objects when integrating Chronon with Spark.
Provides a reusable operator for teams working with Apache Chronon.

Test Plan

Unit Tests: Added tests for validation and serialization of Thrift objects.
CI Coverage: Ensured that existing tests pass with the new operators.
Integration Testing:

  • Airflow DAG tested with GroupBy and Join Thrift objects.
    
  • Dagster pipeline executed successfully with SparkSubmitTaskDefinition.
    

Checklist

  • Documentation update

Reviewers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants