Separate Python virtual environments for each DAG(not Task) #40651

nimanov · 2024-07-08T11:52:17Z

nimanov
Jul 8, 2024

Separate virtual environments for each DAG

Dear community,

Is there a way to declare the virtual environment(that is already created) the DAG will use for all the tasks that it has.

I am aware of the ExternalPythonOperator however it is used for each task which is kinda tedious to write for each task if there is a way to do only once for DAG.

Do we have some way to do that?

If not, could you please explain why this isn't possible? Is it because each task runs as a separate process by the worker?
I would appreciate a detailed answer if the answer is 'no'.

This need is required when 2 or more teams/developers working to create DAGs. They may use different versions of packages for whole DAG. Thus, declaring that the DAG will use that environment for all the tasks would be nice feature to have.

Thanks in advance.

potiuk · 2024-07-08T14:04:15Z

potiuk
Jul 8, 2024
Collaborator

No.

Mainly because no-one contributed it.

But since you need it and would like to use it, I would appreciate if you and your company would contribute such feature - similarly as nearly 3000 contributors so far - many of whom went the same route - they needed something and contributed it.

There are certain limitations when you want to "change" interpreter on the flight - mainly due to the fact that you need to serialize and deserialize input parameters and output values (this is what PythonVirtualenv and ExternalPython operators both do) - you need to pass them from one interpreter to another. So not all Python code will run because not all arguments and return values can be serialized.

But you have also plenty of other options - you can use default_args at the DAG level to specify virtualenv each of your ExternalPythonOperators will use. You can build your own custom operators extending from ExternalPyhon operator (separately for each team) or you can have each team use different queues - as described in the best practices (see last point in the list https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#handling-conflicting-complex-python-dependencies) - so the teams are separate on deployment level, you can even use cluster policies to force the queue.

Finally in Airflow 3 there is a multi-team feature planned https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-67+Multi-team+deployment+of+Airflow+components that you will be able to utilise to separate your team workfloads.

But in the meantime if you have an idea how it shoudl look like to specify different virtual environment for DAG to utilize for all tasks - you are absolutely welcome to contribute such feature. You seem to have quite a big team so it might be easy for you to spare some engineering power to do so. You are absolutely welcome to contribute it, That would be a nice way for your company to give back for the software you got absolutely for free - and this is how it works for others who contributed.

Can we count on you ?

0 replies

GShirey · 2024-08-29T17:28:48Z

GShirey
Aug 29, 2024

What I have done in the past is to create a custom operator extending from the bash operator and utilized the tags value that we provided in the dag definition/configuration. Depending on the value of the tag, then it would source the correct virtual environment before running the original bash command. The tag value could be each team/area that would need it's own virtual env.

tags = context['dag'].tags

if tags:
for tag in tags:
if "xxxxxxxx" in tags:
virtual_env_activate = f"source /virtualenv/{tag}/bin/activate"
bash_command_new = f'{virtual_env_activate} && {self.bash_command}
self.bash_command=bash_command_new

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate Python virtual environments for each DAG(not Task) #40651

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Separate Python virtual environments for each DAG(not Task) #40651

nimanov Jul 8, 2024

Separate virtual environments for each DAG

Replies: 2 comments

potiuk Jul 8, 2024 Collaborator

GShirey Aug 29, 2024

nimanov
Jul 8, 2024

potiuk
Jul 8, 2024
Collaborator

GShirey
Aug 29, 2024