Replies: 1 comment
-
Greetings @jedcunningham any chance you have an update for this answer since #40856 (reply in thread) :) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm looking to implement a data pipeline using two DAGs and Airflow's Datasets feature. I'd appreciate guidance on the best way to structure this using Datasets or Dataset aliases.
Current setup:
ingest_DAG
:ingest_table
)run_id
column iningest_table
to uniquely identify each batch of ingested datatransform_DAG
:ingest_DAG
run_id
to identify the specific batch of data to processingest_table
using the receivedrun_id
transform_table
Current solution:
Questions:
run_id
fromingest_DAG
totransform_DAG
using Datasets?run_id
in the Dataset'sextra
field. Is this the recommended way to pass metadata between DAGs, or are there better alternatives?transform_DAG
, how can I ensure that only the data corresponding to the receivedrun_id
is processed?transform_DAG
, particularly if it needs to be rerun for the samerun_id
?Any insights, examples, or recommendations based on my current solution would be greatly appreciated. Thank you!
Beta Was this translation helpful? Give feedback.
All reactions