forked from Netflix/metaflow
-
Notifications
You must be signed in to change notification settings - Fork 2
Valay/airflow foreach base #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
valayDave
wants to merge
7
commits into
master
Choose a base branch
from
valay/airflow-foeach-base
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Add support for Kubernetes tolerations * Revert get_docker_registry import * Fix KUBERNETES_TOLERATIONS default value * Update toleration example * Remove KUBERNETES_NODE_SELECTOR in kubernetes_job.py * Fix black code style * Add param doc to KubernetesDecorator * Fix typo * Serialize tolerations in runtime_step_cli * Fix KUBERNETES_TOLERATIONS config * Fix node_selector env var in the kubernetes decorator * JSON loads KUBERNETES_TOLERATIONS in kubernetes_decorator init * Parse node_selector and tolerations in the decorator * Update comment * Validate tolerations object in kubernetes_decorator.py * Use hard coded tolerations attribute_map * Fix black code style * String formatting compatible with python 3.5 * Use V1Toleration.attribute_map to validate tolerations * Fix black lint * Improve error handling * Fix black lint
Co-authored-by: Dan <[email protected]>
* Fix CVE-2007-4559 (tar.extractall) See Netflix#1177 for more details * Address comment
- Support for all metaflow construct without foreach and sensors Squashed commit of the following: commit ef8b1e3 Author: Valay Dave <[email protected]> Date: Fri Jul 29 01:06:26 2022 +0000 Removed sernsors and banned foreach's commit 8d517c4 Author: Valay Dave <[email protected]> Date: Fri Jul 29 00:59:01 2022 +0000 commiting k8s related file from master. commit a7e1ecd Author: Valay Dave <[email protected]> Date: Fri Jul 29 00:54:45 2022 +0000 Uncommented code for foreach support with k8s KubernetesPodOperator version 4.2.0 renamed `resources` to `container_resources` - Check : (apache/airflow#24673) / - (apache/airflow@45f4290) This was done because `KubernetesPodOperator` didn't play nice with dynamic task mapping and they had to deprecate the `resources` argument. Hence the below codepath checks for the version of `KubernetesPodOperator` and then sets the argument. If the version < 4.2.0 then we set the argument as `resources`. If it is > 4.2.0 then we set the argument as `container_resources` The `resources` argument of KuberentesPodOperator is going to be deprecated soon in the future. So we will only use it for `KuberentesPodOperator` version < 4.2.0 The `resources` argument will also not work for foreach's. commit 2719f5d Author: Valay Dave <[email protected]> Date: Mon Jul 18 18:31:58 2022 +0000 nit fixes : - fixing comments. - refactor some variable/function names. commit 2079293 Author: Valay Dave <[email protected]> Date: Mon Jul 18 18:14:53 2022 +0000 change `token` to `production_token` commit 14aad5f Author: Valay Dave <[email protected]> Date: Mon Jul 18 18:11:56 2022 +0000 Refactored import Airflow Sensors. commit b1472d5 Author: Valay Dave <[email protected]> Date: Mon Jul 18 18:08:41 2022 +0000 new comment on `startup_timeout_seconds` env var. commit 6d81b75 Author: Valay Dave <[email protected]> Date: Mon Jul 18 18:06:09 2022 +0000 Removing traces of `@airflow_schedule_interval` commit 0673db7 Author: Valay Dave <[email protected]> Date: Thu Jul 14 12:43:08 2022 -0700 Foreach polish (#62) * Removing unused imports * Added validation logic for airflow version numbers with foreaches * Removed `airflow_schedule_interval` decorator. * Added production/deployment token related changes - Uses s3 as a backend to store the production token - Token used for avoiding nameclashes - token stored via `FlowDatastore` * Graph type validation for airflow foreachs - Airflow foreachs only support single node fanout. - validation invalidates graphs with nested foreachs * Added configuration about startup_timeout. * Added final todo on `resources` argument of k8sOp - added a commented code block - it needs to be uncommented when airflow releasese the patch for the op - Code seems feature complete keeping aside airflow patch commit 4b2dd12 Author: Valay Dave <[email protected]> Date: Thu Jul 7 19:33:07 2022 +0000 Removed retries from user-defaults. commit 0e87a97 Author: Valay Dave <[email protected]> Date: Wed Jul 6 16:29:33 2022 +0000 updated pod startup time commit fce2bd2 Author: Valay Dave <[email protected]> Date: Wed Jun 29 18:44:11 2022 +0000 Adding default 1 retry for any airflow worker. commit 5ef6bbc Author: Valay Dave <[email protected]> Date: Mon Jun 27 01:22:42 2022 +0000 Airflow Foreach Integration - Simple one node foreach-join support as gaurenteed by airflow - Fixed env variable setting issue - introduced MetaflowKuberentesOperator - Created a new operator to allow smootness in plumbing xcom values - Some todos commit d319fa9 Author: Valay Dave <[email protected]> Date: Fri Jun 24 21:12:09 2022 +0000 simplifying run-id macro. commit 0ffc813 Author: Valay Dave <[email protected]> Date: Fri Jun 24 11:51:42 2022 -0700 Refactored parameter macro settings. (#60) commit a3a4950 Author: Valay Dave <[email protected]> Date: Fri Jun 24 02:05:57 2022 +0000 added comment on need for `start_date` commit a3147be Author: Valay Dave <[email protected]> Date: Tue Jun 21 06:03:56 2022 +0000 Refactored an `id_creator` method. commit 04d7f20 Author: Valay Dave <[email protected]> Date: Tue Jun 21 05:52:05 2022 +0000 refactor : -`RUN_ID_LEN` to `RUN_HASH_ID_LEN` - `TASK_ID_LEN` to `TASK_ID_HASH_LEN` commit cde4605 Author: Valay Dave <[email protected]> Date: Tue Jun 21 05:48:55 2022 +0000 refactored an error string commit 1145818 Author: Valay Dave <[email protected]> Date: Mon Jun 20 22:42:36 2022 -0700 addressing savins comments. (#59) - Added many adhoc changes based for some comments. - Integrated secrets and `KUBERNETES_SECRETS` - cleaned up parameter setting - cleaned up setting of scheduling interval - renamed `AIRFLOW_TASK_ID_TEMPLATE_VALUE` to `AIRFLOW_TASK_ID` - renamed `AirflowSensorDecorator.compile` to `AirflowSensorDecorator.validate` - Checking if dagfile and flow file are same. - fixing variable names. - checking out `kubernetes_decorator.py` from master (6441ed5) - bug fixing secret setting in airflow. - simplified parameter type parsing logic - refactoring airflow argument parsing code. commit 83b20a7 Author: Valay Dave <[email protected]> Date: Mon Jun 13 14:02:57 2022 -0700 Addressing Final comments. (#57) - Added dag-run timeout. - airflow related scheduling checks in decorator. - Auto naming sensors if no name is provided - Annotations to k8s operators - fix: argument serialization for `DAG` arguments (method names refactored like `to_dict` became `serialize`) - annotation bug fix - setting`workflow-timeout` for only scheduled dags commit 4931f9c Author: Valay Dave <[email protected]> Date: Mon Jun 6 04:50:49 2022 +0000 k8s bug fix commit 200ae8e Author: Valay Dave <[email protected]> Date: Mon Jun 6 04:39:50 2022 +0000 removed un-used function commit 70e285e Author: Valay Dave <[email protected]> Date: Mon Jun 6 04:38:37 2022 +0000 Removed unused `sanitize_label` function commit 84fc622 Author: Valay Dave <[email protected]> Date: Mon Jun 6 04:37:34 2022 +0000 GPU support added + container naming same as argo commit c92280d Author: Valay Dave <[email protected]> Date: Mon Jun 6 04:25:17 2022 +0000 Refactored sensors to different files + bug fix - bug caused due `util.compress_list`. - The function doesn't play nice with strings with variety of characters. - Ensured that exceptions are handled appropriately. - Made new file for each sensor under `airflow.sensors` module. commit b72a1dc Author: Valay Dave <[email protected]> Date: Sat Jun 4 01:41:49 2022 +0000 ran black. commit 558c82f Author: Valay Dave <[email protected]> Date: Fri Jun 3 18:32:48 2022 -0700 Moving information from airflow_utils to compiler (#56) - commenting todos to organize unfinished changes. - some environment variables set via`V1EnvVar` - `client.V1ObjectFieldSelector` mapped env vars were not working in json form - Moving k8s operator import into its own function. - env vars moved. commit 9bb5f63 Author: Valay Dave <[email protected]> Date: Fri Jun 3 18:06:03 2022 +0000 added mising Run-id prefixes to variables. - merged `hash` and `dash_connect` filters. commit 37b5e6a Author: Valay Dave <[email protected]> Date: Fri Jun 3 18:00:22 2022 +0000 nit fix : variable name change. commit 660756f Author: Valay Dave <[email protected]> Date: Fri Jun 3 17:58:34 2022 +0000 nit fixes to dag.py's templating variables. commit 1202f5b Author: Valay Dave <[email protected]> Date: Fri Jun 3 17:56:53 2022 +0000 Fixed defaults passing - Addressed comments for airflow.py commit b9387dd Author: Valay Dave <[email protected]> Date: Fri Jun 3 17:52:24 2022 +0000 Following Changes: - Refactors setting scheduling interval - refactor dag file creating function - refactored is_active to is_paused_upon_creation - removed catchup commit 054e3f3 Author: Valay Dave <[email protected]> Date: Fri Jun 3 17:33:25 2022 +0000 Multiple Changes based on comments: 1. refactored `create_k8s_args` into _to_job 2. Addressed comments for snake casing 3. refactored `attrs` for simplicity. 4. refactored `metaflow_parameters` to `parameters`. 5. Refactored setting of `input_paths` commit d481b2f Author: Valay Dave <[email protected]> Date: Fri Jun 3 16:42:24 2022 +0000 Removed Sensor metadata extraction. commit d8e6ec0 Author: Valay Dave <[email protected]> Date: Fri Jun 3 16:30:34 2022 +0000 porting savin's comments - next changes : addressing comments. commit 3f2353a Merge: d370ffb c1ff469 Author: Valay Dave <[email protected]> Date: Thu Jul 28 23:52:16 2022 +0000 Merge branch 'master' into airflow commit d370ffb Merge: a82f144 e4eb751 Author: Valay Dave <[email protected]> Date: Thu Jul 14 19:38:48 2022 +0000 Merge branch 'master' into airflow commit a82f144 Merge: bdb1f0d 6f097e3 Author: Valay Dave <[email protected]> Date: Wed Jul 13 00:35:49 2022 +0000 Merge branch 'master' into airflow commit bdb1f0d Merge: 8511215 f9a4968 Author: Valay Dave <[email protected]> Date: Wed Jun 29 18:44:51 2022 +0000 Merge branch 'master' into airflow commit 8511215 Author: Valay Dave <[email protected]> Date: Tue Jun 21 02:53:11 2022 +0000 Bug fix from master merge. commit 90c06f1 Merge: 0fb73af 6441ed5 Author: Valay Dave <[email protected]> Date: Mon Jun 20 21:20:20 2022 +0000 Merge branch 'master' into airflow commit 0fb73af Author: Valay Dave <[email protected]> Date: Sat Jun 4 00:53:10 2022 +0000 squashing bugs after changes from master. commit 09c6ba7 Merge: 7bdf662 ffff49b Author: Valay Dave <[email protected]> Date: Sat Jun 4 00:20:38 2022 +0000 Merge branch 'master' into af-mmr commit 7bdf662 Author: Valay Dave <[email protected]> Date: Mon May 16 17:42:38 2022 -0700 Airflow sensor api (#3) * Fixed run-id setting - Change gaurentees that multiple dags triggered at same moment have unique run-id * added allow multiple in `Decorator` class * Airflow sensor integration. >> support added for : - ExternalTaskSensor - S3KeySensor - SqlSensor >> sensors allow multiple decorators >> sensors accept those arguments which are supported by airflow * Added `@airflow_schedule_interval` decorator * Fixing bug run-id related in env variable setting. commit 2604a29 Author: Valay Dave <[email protected]> Date: Thu Apr 21 18:26:59 2022 +0000 Addressed comments. commit 584e88b Author: Valay Dave <[email protected]> Date: Wed Apr 20 03:33:55 2022 +0000 fixed printing bug commit 169ac15 Author: Valay Dave <[email protected]> Date: Wed Apr 20 03:30:59 2022 +0000 Option help bug fix. commit 6f8489b Author: Valay Dave <[email protected]> Date: Wed Apr 20 03:25:54 2022 +0000 variable renamemetaflow_specific_args commit 0c779ab Merge: d299b13 5a61508 Author: Valay Dave <[email protected]> Date: Wed Apr 20 03:23:10 2022 +0000 Merge branch 'airflow-tests' into airflow commit 5a61508 Author: Valay Dave <[email protected]> Date: Wed Apr 20 03:22:54 2022 +0000 Removing un-used code / resolved-todos. commit d030830 Author: Valay Dave <[email protected]> Date: Wed Apr 20 03:06:03 2022 +0000 ran black, commit 2d1fc06 Merge: f2cb319 7921d13 Author: Valay Dave <[email protected]> Date: Wed Apr 20 03:04:19 2022 +0000 Merge branch 'master' into airflow-tests commit d299b13 Merge: f2cb319 7921d13 Author: Valay Dave <[email protected]> Date: Wed Apr 20 03:02:37 2022 +0000 Merge branch 'master' into airflow commit f2cb319 Author: Valay Dave <[email protected]> Date: Wed Apr 20 02:54:03 2022 +0000 reverting change. commit 05b9db9 Author: Valay Dave <[email protected]> Date: Wed Apr 20 02:47:41 2022 +0000 3 changes: - Removing s3 dep - remove uesless import - added `deployed_on` in dag file template commit c6afba9 Author: Valay Dave <[email protected]> Date: Fri Apr 15 22:50:52 2022 +0000 Fixed passing secrets with kubernetes. commit c3ce7e9 Author: Valay Dave <[email protected]> Date: Fri Apr 15 22:04:22 2022 +0000 Refactored code . - removed compute/k8s.py - Moved k8s code to airflow_compiler.py - ran isort to airflow_compiler.py commit d1c343d Author: Valay Dave <[email protected]> Date: Fri Apr 15 18:02:25 2022 +0000 Added validations about: - un-supported decorators - foreach Changed where validations are done to not save the package. commit 7b19f8e Author: Valay Dave <[email protected]> Date: Fri Apr 15 03:34:26 2022 +0000 Fixing mf log related bug - No double logging on metaflow. commit 4d1f6bf Author: Valay Dave <[email protected]> Date: Thu Apr 14 03:10:51 2022 +0000 Removed usless code WRT project decorator. commit 5ad9a39 Author: Valay Dave <[email protected]> Date: Thu Apr 14 03:03:19 2022 +0000 Remove readme. commit 60cb6a7 Author: Valay Dave <[email protected]> Date: Thu Apr 14 03:02:38 2022 +0000 Made file path required arguement. commit 9f0dc1b Author: Valay Dave <[email protected]> Date: Thu Apr 14 03:01:07 2022 +0000 changed `--is-active`->`--is-paused-upon-creation` - dags are active by default. commit 5b98f93 Author: Valay Dave <[email protected]> Date: Thu Apr 14 02:55:46 2022 +0000 shortened length of run-id and task-id hashes. commit e53426e Author: Valay Dave <[email protected]> Date: Thu Apr 14 02:41:32 2022 +0000 Removing un-used args. commit 72cbbfc Author: Valay Dave <[email protected]> Date: Thu Apr 14 02:39:59 2022 +0000 Moved exceptions to airflow compiler commit b2970dd Author: Valay Dave <[email protected]> Date: Thu Apr 14 02:33:02 2022 +0000 Changes based on PR comments: - removed airflow xcom push file , moved to decorator code - removed prefix configuration - nit fixes. commit 9e622ba Author: Valay Dave <[email protected]> Date: Mon Apr 11 20:39:00 2022 +0000 Removing un-used code paths + code cleanup commit 7425f62 Author: Valay Dave <[email protected]> Date: Mon Apr 11 19:45:04 2022 +0000 Fixing bug fix in schedule. commit eb775cb Author: Valay Dave <[email protected]> Date: Sun Apr 10 02:52:59 2022 +0000 Bug fixes WRT Kubernetes secrets + k8s deployments. - Fixing some error messages. - Added some comments. commit 04c92b9 Author: Valay Dave <[email protected]> Date: Sun Apr 10 01:20:53 2022 +0000 Added secrets support. commit 4a0a85d Author: Valay Dave <[email protected]> Date: Sun Apr 10 00:11:46 2022 +0000 Bug fix. commit af91099 Author: Valay Dave <[email protected]> Date: Sun Apr 10 00:03:34 2022 +0000 bug fix. commit c17f04a Author: Valay Dave <[email protected]> Date: Sat Apr 9 23:55:41 2022 +0000 Bug fix in active defaults. commit 0d37236 Author: Valay Dave <[email protected]> Date: Sat Apr 9 23:54:02 2022 +0000 @project, @schedule, default active dag support. - Added a flag to allow setting dag as active on creation - Airflow compatible schedule interval - Project name fixes. commit 5c97b15 Author: Valay Dave <[email protected]> Date: Thu Apr 7 21:15:18 2022 +0000 Max workers and worker pool support. commit 9c973f2 Author: Valay Dave <[email protected]> Date: Thu Apr 7 19:34:33 2022 +0000 Adding exceptions for missing features. commit 2a946e2 Author: Valay Dave <[email protected]> Date: Mon Mar 28 19:34:11 2022 +0000 2 changes : - removed hacky line - added support to directly throw dags in s3. commit e0772ec Author: Valay Dave <[email protected]> Date: Wed Mar 23 22:38:20 2022 +0000 fixing bugs in service account setting commit 874b94a Author: Valay Dave <[email protected]> Date: Sun Mar 20 23:49:15 2022 +0000 Added support for Branching with Airflow - remove `next` function in `AirflowTask` - `AirflowTask`s has no knowledge of next tasks. - removed todos + added some todos - Graph construction on airflow side using graph_structure datastructure. - graph_structure comes from`FlowGraph.output_steps()[1]` commit 8e9f649 Author: Valay Dave <[email protected]> Date: Sun Mar 20 02:33:04 2022 +0000 Added hacky line commit fd5db04 Author: Valay Dave <[email protected]> Date: Sun Mar 20 02:06:38 2022 +0000 Removed hacky line. commit 5b23eb7 Author: Valay Dave <[email protected]> Date: Sun Mar 20 01:44:57 2022 +0000 Added support for Parameters. - Supporting int, str, bool, float, JSONType commit c9378e9 Author: Valay Dave <[email protected]> Date: Sun Mar 20 00:14:10 2022 +0000 Removed todos + added some validation logic. commit 7250a44 Author: Valay Dave <[email protected]> Date: Sat Mar 19 23:45:15 2022 +0000 Fixing logs related change from master. commit d125978 Merge: 8cdac53 7e210a2 Author: Valay Dave <[email protected]> Date: Sat Mar 19 23:42:48 2022 +0000 Merge branch 'master' into aft-mm commit 8cdac53 Author: Valay Dave <[email protected]> Date: Sat Mar 19 23:36:47 2022 +0000 making changes sync with master commit 5a93d9f Author: Valay Dave <[email protected]> Date: Sat Mar 19 23:29:47 2022 +0000 Fixed bug when using catch + retry commit 62bc8df Author: Valay Dave <[email protected]> Date: Sat Mar 19 22:58:37 2022 +0000 Changed retry setting. commit 563a200 Author: Valay Dave <[email protected]> Date: Sat Mar 19 22:42:57 2022 +0000 Fixed setting `task_id` : - switch task-id from airflow job is to hash to "runid/stepname" - refactor xcom setting variables - added comments commit e2a1e50 Author: Valay Dave <[email protected]> Date: Sat Mar 19 17:51:59 2022 +0000 setting retry logic. commit a697b56 Author: Valay Dave <[email protected]> Date: Thu Mar 17 01:02:11 2022 +0000 Nit fix. commit 68f13be Author: Valay Dave <[email protected]> Date: Wed Mar 16 20:46:19 2022 +0000 Added @schedule support + readme commit 57bdde5 Author: Valay Dave <[email protected]> Date: Tue Mar 15 19:47:06 2022 +0000 Fixed setting run-id / task-id to labels in k8s - Fixed setting run-id has from cli macro - added hashing macro to ensure that jinja template set the correct run-id to k8s labels - commit 3d6c319 Author: Valay Dave <[email protected]> Date: Tue Mar 15 05:39:04 2022 +0000 Got linear workflows working on airflow. - Still not feature complete as lots of args are still unfilled / lots of unknows - minor tweek in eks to ensure airflow is k8s compatible. - passing state around via xcom-push - HACK : AWS keys are passed in a shady way. : Reverse this soon. commit db074b8 Author: Valay Dave <[email protected]> Date: Fri Mar 11 12:34:33 2022 -0800 Tweeks commit a9f0468 Author: Valay Dave <[email protected]> Date: Tue Mar 1 17:14:47 2022 -0800 some changes based on savin's comments. - Added changes to task datastore for different reason : (todo) Decouple these - Added comments to SFN for reference. - Airflow DAG is no longer dependent on metaflow commit f32d089 Author: Valay Dave <[email protected]> Date: Wed Feb 23 00:54:17 2022 -0800 First version of dynamic dag compiler. - Not completely finished code - Creates generic .py file a JSON that is parsed to create Airflow DAG. - Currently only boiler plate to make a linear dag but doesn't execute anything. - Unfinished code. commit d2def66 Author: Valay Dave <[email protected]> Date: Sat Feb 19 14:01:47 2022 -0800 more tweeks. commit b176311 Author: Valay Dave <[email protected]> Date: Thu Feb 17 09:04:29 2022 -0800 commit 0 - unfinished code.
1382e4a
to
25c0928
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.