-
Hi all, Am writing this discussion to ask some help from the Airflow community. Currently we are having some weird behavior with the git sync container on the scheduler and triggerer pods and want to get some insights from people to see if they think this is an Airflow issue or something else. To start, we are running Airflow 2.5.1, and are using Kubernetes and the Official Airflow Helm Chart 1.8.0, with an external postgres backend database. We are also using a git sync sidecar to store our DAGs, and to get Airflow to access this repository, we updated the _helpers.yaml file and values file to put in the username and password. What is happening is at the exact same time, everyday, for some reason the git-sync containers for the scheduler and triggerer go crazy and just start restarting out of no where: Looking at the logs from the git-sync containers this is what it shows: At first I thought it was maybe the DAGs, so I ran all of them in each env at once to see if that would cause it. Nothing happened, and there were no restarts. Then what I did was changed the wait time in our dev env from 5 seconds to 15 seconds and then checked it the next morning. It reduced the restarts, but 2 still did happen at that same time frame, 7:40 UTC: So my question is, is this something wrong with the Airflow git sync, or is something that we do not know about accessing our repos at this time? @potiuk or anybody else I would really appreciate some insight on this. If you have any questions or need more information please feel free to ask. Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 9 replies
-
Error 429 is "too many requests" - https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429 - you seem to exceed the limits. The "timing" of the issue indicate that there is something that uses the same git credentials around that time that overwhelms GIT with requests. It does not seem it comes from Airflow - because you would have this problem all the time, but very likely around that time you are running some massive GitHub sync or smth that exceed the limits. Unless the problem is that you are starting many tasks around the same time and they are generating massive number of requests when they start quickly around the same time. Rate limits for GitHub are applied "per authenticated user" when the user is authenticated, so the solution to that is to have a separate user only used by Airflow and NOT to share the credentials with some other systems of yours (especially if they generate a lot of HTTP requests". |
Beta Was this translation helpful? Give feedback.
Interesting read. After talking more with the GitHub vendor, it seems as if what I said before may not be the issue, and after the limit is hit we may have just been lumped in with everybody else. Which is why it looked like we were making unauthed requests even though we are not. At the EOD, the reason it is happening is because every team is lumped together on one IP, and there are some teams that are making tons of unauthed requests. I lowered our git sync wait time to 30s and now we only see 1 or 2 restarts if that, so that is what we are going with for now. I have been engaged with the our GitHub team so I am hoping they will find whoever is making all the requests or create multiple…