Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host metrics #293

Draft
wants to merge 9 commits into
base: monitoring
Choose a base branch
from
Draft

Host metrics #293

wants to merge 9 commits into from

Conversation

mwiencek
Copy link
Member

This is based on #291.

The service definition was based on https://github.com/metabrainz/prometheus-exp/blob/main/node.sh

reosarevok and others added 9 commits January 28, 2025 22:10
This is based on https://grafana.com/grafana/dashboards/14114-postgres-overview/
with an extra check for max query duration that seemed interesting,
and is mostly intended as a proof of concept for provisioning dashboards.
We can further improve the dashboard as needed.
As a start, monitor the amount of rows on sir-indexed tables.

Includes a dashboard with gauges for every table; I don't see
a reason why it would be useful to have these be line charts since
there's no reason we should expect huge jumps, it's just good
to have a clear idea of which tables are bigger with the numbers.
There seems to be no good reason why we would keep hitting the DB
every 30 seconds to get the counts. 5 minutes seems more than enough.

My understanding is that if I set min_interval here to 300s (5m)
it will just keep the value for that long and keep responding with it,
however often prometheus asks.
This will make the container come up when grafana does, I understand.
@mwiencek mwiencek changed the base branch from master to monitoring January 29, 2025 06:06
@mwiencek
Copy link
Member Author

The only issue I have with this is the node-exporter logs being spammed with the following:

node-exporter  | ts=2025-01-29T06:07:03.163Z caller=stdlib.go:105 level=error msg="error gathering metrics: 21 error(s) occurred:\n* [from Gatherer #2] collected metric \"node_filesystem_device_error\" { label:{name:\"device\"  value:\"devpts\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"devpts\"}  label:{name:\"mountpoint\"  value:\"/dev/pts\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_readonly\" { label:{name:\"device\"  value:\"devpts\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"devpts\"}  label:{name:\"mountpoint\"  value:\"/dev/pts\"}  gauge:{value:1}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_size_bytes\" { label:{name:\"device\"  value:\"devpts\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"devpts\"}  label:{name:\"mountpoint\"  value:\"/dev/pts\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_free_bytes\" { label:{name:\"device\"  value:\"devpts\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"devpts\"}  label:{name:\"mountpoint\"  value:\"/dev/pts\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_avail_bytes\" { label:{name:\"device\"  value:\"devpts\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"devpts\"}  label:{name:\"mountpoint\"  value:\"/dev/pts\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_files\" { label:{name:\"device\"  value:\"devpts\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"devpts\"}  label:{name:\"mountpoint\"  value:\"/dev/pts\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_files_free\" { label:{name:\"device\"  value:\"devpts\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"devpts\"}  label:{name:\"mountpoint\"  value:\"/dev/pts\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_device_error\" { label:{name:\"device\"  value:\"mqueue\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"mqueue\"}  label:{name:\"mountpoint\"  value:\"/dev/mqueue\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_readonly\" { label:{name:\"device\"  value:\"mqueue\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"mqueue\"}  label:{name:\"mountpoint\"  value:\"/dev/mqueue\"}  gauge:{value:1}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_size_bytes\" { label:{name:\"device\"  value:\"mqueue\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"mqueue\"}  label:{name:\"mountpoint\"  value:\"/dev/mqueue\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_free_bytes\" { label:{name:\"device\"  value:\"mqueue\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"mqueue\"}  label:{name:\"mountpoint\"  value:\"/dev/mqueue\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_avail_bytes\" { label:{name:\"device\"  value:\"mqueue\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"mqueue\"}  label:{name:\"mountpoint\"  value:\"/dev/mqueue\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_files\" { label:{name:\"device\"  value:\"mqueue\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"mqueue\"}  label:{name:\"mountpoint\"  value:\"/dev/mqueue\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_files_free\" { label:{name:\"device\"  value:\"mqueue\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"mqueue\"}  label:{name:\"mountpoint\"  value:\"/dev/mqueue\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_device_error\" { label:{name:\"device\"  value:\"proc\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"proc\"}  label:{name:\"mountpoint\"  value:\"/proc\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_readonly\" { label:{name:\"device\"  value:\"proc\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"proc\"}  label:{name:\"mountpoint\"  value:\"/proc\"}  gauge:{value:1}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_size_bytes\" { label:{name:\"device\"  value:\"proc\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"proc\"}  label:{name:\"mountpoint\"  value:\"/proc\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_free_bytes\" { label:{name:\"device\"  value:\"proc\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"proc\"}  label:{name:\"mountpoint\"  value:\"/proc\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_avail_bytes\" { label:{name:\"device\"  value:\"proc\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"proc\"}  label:{name:\"mountpoint\"  value:\"/proc\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_files\" { label:{name:\"device\"  value:\"proc\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"proc\"}  label:{name:\"mountpoint\"  value:\"/proc\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_files_free\" { label:{name:\"device\"  value:\"proc\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"proc\"}  label:{name:\"mountpoint\"  value:\"/proc\"}  gauge:{value:0}} was collected before with the same name and label values"

I have no idea why this is happening or how to resolve it -- devpts for example is already listed in --collector.filesystem.ignored-fs-types, so I don't know why it's being collected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants