Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable timeout on prom-frontend #1347

Open
gravelg opened this issue Feb 7, 2025 · 7 comments
Open

Configurable timeout on prom-frontend #1347

gravelg opened this issue Feb 7, 2025 · 7 comments
Assignees

Comments

@gravelg
Copy link

gravelg commented Feb 7, 2025

Hello,

We have recently switched to using the datasource-syncer instead of prom-frontend for our Grafana instance to take advantage of higher query timeouts. Is it possible to do something similar for the prom-adapter? We are running into the 30s query timeout in prom-frontend right now, and would like to take advantage of the 120s timeout of querying GMP directly. Is this something that we could do?

Thanks!

@bwplotka
Copy link
Collaborator

Is it possible to do something similar for the prom-adapter?

Do you mean prom-frontend?

We can consider customized timeout for the frontend. Looks like it's hardcoded to 30s indeed (behind the http.DefaultTransport).

However, we wanted to deprecate frontend, given datasource-syncer exists. Could you help us understand what's the use case for frontend if you use datasource-syncer? (:

Thanks!

@gravelg
Copy link
Author

gravelg commented Feb 17, 2025

Hey @bwplotka, totally understand wanting to deprecate prom-frontend, however it seems that datasource-syncer does not support updating the prom-adapter datasource URL like it does for Grafana. Because of this, even if we use datasource-syncer for Grafana, we have to continue running prom-frontend to support prom-adapter.

I'd be happy to get rid of the frontend all together if you can provide me instructions on using datasource-syncer with prom-adapter, but all the docs I could find (https://cloud.google.com/stackdriver/docs/managed-prometheus/hpa#promethueus-adapter) use the frontend.

Thank you!

@bwplotka
Copy link
Collaborator

Got it, thanks. Note that prom-adapter also has a hardcoded timeout AFAIK.

In this case we have two routes:

A) Add timeout setting for both frontend and prom-adapter (adding flag in code)
B) Add timeout setting prom-adapter only and add Datasource syncer support for it too.
C) Add timeout setting prom-adapter only and add Google Oauth2 support for it too.

C might be easiest in some way 🤔

Help wanted, but we might want to add an issue on the adapter for it.

@bwplotka
Copy link
Collaborator

To solve all of this -- is there a way to simplify queries? Waiting 30s+ for autoscaling can be painful in itself (assuming you use prom-adapter for autoscaling reasons).

@gravelg
Copy link
Author

gravelg commented Feb 18, 2025

I imagine if your team is looking to get rid of frontend completely, then C is the right approach I imagine. Happy to try and knock out a PR if you got an example of how Google Oauth2 works in another component?

In the meantime we have tried simplifying queries (the timeout issue manifests itself when we have a deployment that can scale up to 200 replicas and I assume generates selector queries like pod=~pod1|pod2|...|pod200. We are trying to remove labels but it's also not clear exactly what query is getting generated and it doesn't seem like there is a setting on the adapter to lower the log level.

@pintohutch pintohutch assigned bwplotka and unassigned bernot-dev Feb 19, 2025
@pintohutch
Copy link
Collaborator

Drive-by comment - it could be worth trying KEDA or the custom-metrics-stackdriver-adapter to see if that helps things.

@gravelg
Copy link
Author

gravelg commented Feb 19, 2025

Kinda related: found a dead PR that is trying to solve what we think is causing the timeouts kubernetes-sigs/prometheus-adapter#670

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants