Replies: 1 comment 1 reply
-
From my experience I would love everything to be consistent with some norms we choose. However, especially "dropping" migration is very hard for existing event consumers - especially data observability tools providers that consume events generated by their clients Airflow instances - for example Atlan, Astronomer. However, we'll have good opportunity in next 6-12 months to do more breaking changes with Airflow 3 - that would necessary require somehow large changes to the structure of provider anyway. It would probably necessitate "major" release of the provider. I would recommend:
|
Beta Was this translation helpful? Give feedback.
-
OpenLineage v1.9.0 integration for Airflow produces following facets:
DAG run:
Task run:
There are few drawbacks with this approach:
dag_id
,task_id
,description
,owner
is present only as run facet, not job facet, although this information is not related to a particular DAG/Task run, and describes the DAG/Task itself.While implementing #40854, I've also got issues with
"airflow"
key of all these facets. For example, adding a DAG run facet leads to sending events like:So OpenLineage events send by Airflow for DAG run and Task run have the facet with the same key
"airflow"
, which has quite different content. This is not very comfortable to handle on OpenLineage backend side.My proposal is:
AirflowRunFacet
->AirflowTaskRunFacet
AirflowJobFacet
->AirflowDagInfoJobFacet
AirflowStateRunFacet
->AirflowDagStateRunFacet
"airflow"
which may lead to conflicts.dag
andtask
fields from run facet to job facet.taskUuid
field as it just copiesrunId
, and has no meaning for AirflowThis how the result can look like:
Option 1 - nested facets
DAG run:
Task run:
Option 2 - flat facets
DAG run:
Task run:
The concern is:
Beta Was this translation helpful? Give feedback.
All reactions