Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAG contrôle sanitaire de l'eau #380

Merged
merged 11 commits into from
Jan 15, 2025
Merged

DAG contrôle sanitaire de l'eau #380

merged 11 commits into from
Jan 15, 2025

Conversation

Pierlou
Copy link
Contributor

@Pierlou Pierlou commented Jan 7, 2025

Also created an util function to check if a dataset has been updated more recently than a specific resource (a syntax used in this DAG and two others) and refactored accordingly

@Pierlou Pierlou requested a review from hacherix January 10, 2025 09:37
utils/datagouv.py Outdated Show resolved Hide resolved
utils/datagouv.py Show resolved Hide resolved
Checks whether any resource of the specified dataset has been update more recently
than the specified resource
"""
prefix = "demo" if on_demo else "www"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it make sense to use the DATAGOUV_URL variable set up at the top of the datagouv.py file instead of recalculating here?

From my understanding a resource_id is usually not the same between demo and prod and we would want to avoid targeting prod in dev no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that we never (so far) want to target demo in prod, but we often want to target prod from DAGs even in the dev process (to get fresh data). This syntax (not the cleanest I admit) allows to choose which data.gouv env you want to target, no matter which Airflow env you come from.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! By default we target prod here. Could we do it the other way around to be safe?
Like by default we use AIRFLOW_ENV to check if we target demo or not. But with an option to force the use of prod even in dev?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say it's fair to target prod by default here, because we usually want to refresh data depending on the freshest platform (demo is far behind) (and tbh in dev mode I usually just add a return True in this task to trigger the following ones, where the processing really happens)

Co-authored-by: Hadrien Bossard <[email protected]>
@Pierlou Pierlou merged commit 6d8fe24 into main Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants