-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAG contrôle sanitaire de l'eau #380
Conversation
Checks whether any resource of the specified dataset has been update more recently | ||
than the specified resource | ||
""" | ||
prefix = "demo" if on_demo else "www" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it make sense to use the DATAGOUV_URL
variable set up at the top of the datagouv.py
file instead of recalculating here?
From my understanding a resource_id
is usually not the same between demo
and prod
and we would want to avoid targeting prod in dev no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that we never (so far) want to target demo in prod, but we often want to target prod from DAGs even in the dev process (to get fresh data). This syntax (not the cleanest I admit) allows to choose which data.gouv env you want to target, no matter which Airflow env you come from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! By default we target prod
here. Could we do it the other way around to be safe?
Like by default we use AIRFLOW_ENV
to check if we target demo or not. But with an option to force the use of prod even in dev?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say it's fair to target prod by default here, because we usually want to refresh data depending on the freshest platform (demo is far behind) (and tbh in dev mode I usually just add a return True
in this task to trigger the following ones, where the processing really happens)
Co-authored-by: Hadrien Bossard <[email protected]>
Also created an util function to check if a dataset has been updated more recently than a specific resource (a syntax used in this DAG and two others) and refactored accordingly