You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to run tasks in isolated environments, I am currently experimenting with task.virtualenv. I am using a test standalone instance. The example dag is there (the code is public).
To sum up, the code I want to run from within airflows task.virtualenv tasks can output nicely formatted logs already, using the logging library. What would be the recommended way to make those logs live nicely along with airflow logging system ?
What I tried is only using sys.stdout / sys.stderr stream handlers so that the stream is picked up by the task.virtualenv operator logger. But doing so prefixes all the output with airflow own logging formatting, e.g.:
[2024-08-30T09:39:08.618+0000] {process_utils.py:191} INFO - **INFO** 2024-08-30T09:39:08.618124+00:00 @ synaptix.harvester: Initializing
[2024-08-30T09:39:08.653+0000] {process_utils.py:191} INFO - **INFO** 2024-08-30T09:39:08.652580+00:00 @ synaptix.harvester: FromHalXmlTeiOverHttp harvesting progress: 0 data points harvested [00:00, ? data points harvested/s]
[2024-08-30T09:39:10.490+0000] {process_utils.py:191} INFO - **INFO** 2024-08-30T09:39:10.490159+00:00 @ synaptix.harvester: FromHalXmlTeiOverHttp harvesting progress: 1737 data points harvested [00:01, 945.44 data points harvested/s]
it is not that bad but the [2024-08-30T09:39:08.618+0000] {process_utils.py:191} INFO part that prefixes the output capture is mostly noise:
process_utils.py:191 refers to a file in airflow internals, not my own files, I would like to filter it out
[2024-08-30T09:39:08.618+0000] is nice but here my library own logger can already display the date
the displayed level INFO is not synchronized with the level of my own logger, so were it to display **DEBUG**, airflow prefix will still be INFO.
and it seems to be the same thing in files written to disk.
Also, I would like to configure the logging locally from within the dag file, so that I don't need to apply instance-wide configs, in case I work with airflow instances that I'm not admin of.
It is not possible to configure airflow loggers from within the task, since it runs in a detached process, and airflow is not even installed there (expect_airflow=False). I also tried to configure beforehand in a simple task (with the regular python operator), but although it runs fine with airflow dags test command, it crashed when triggering from the UI, but does not say why.
So I am looking for help, to try to sort out wether my issues (that kind of sums up with: "is there a simple way to merge raw logs into airflow logs") are me being non idiomatic, or having feature requests, or bug reports. Maybe I should just leave stdout capture to airflow and use my own logging system independently ?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I would like to run tasks in isolated environments, I am currently experimenting with
task.virtualenv
. I am using a test standalone instance. The example dag is there (the code is public).To sum up, the code I want to run from within airflows
task.virtualenv
tasks can output nicely formatted logs already, using thelogging
library. What would be the recommended way to make those logs live nicely along with airflow logging system ?What I tried is only using sys.stdout / sys.stderr stream handlers so that the stream is picked up by the
task.virtualenv
operator logger. But doing so prefixes all the output with airflow own logging formatting, e.g.:it is not that bad but the
[2024-08-30T09:39:08.618+0000] {process_utils.py:191} INFO
part that prefixes the output capture is mostly noise:process_utils.py:191
refers to a file in airflow internals, not my own files, I would like to filter it out[2024-08-30T09:39:08.618+0000]
is nice but here my library own logger can already display the dateINFO
is not synchronized with the level of my own logger, so were it to display**DEBUG**
, airflow prefix will still be INFO.and it seems to be the same thing in files written to disk.
Also, I would like to configure the logging locally from within the dag file, so that I don't need to apply instance-wide configs, in case I work with airflow instances that I'm not admin of.
It is not possible to configure airflow loggers from within the task, since it runs in a detached process, and airflow is not even installed there (
expect_airflow=False
). I also tried to configure beforehand in a simpletask
(with the regular python operator), but although it runs fine withairflow dags test
command, it crashed when triggering from the UI, but does not say why.So I am looking for help, to try to sort out wether my issues (that kind of sums up with: "is there a simple way to merge raw logs into airflow logs") are me being non idiomatic, or having feature requests, or bug reports. Maybe I should just leave stdout capture to airflow and use my own logging system independently ?
Beta Was this translation helpful? Give feedback.
All reactions