Optimize Prometheus storage and reduce Docker volume usage #88

xyephy · 2025-08-19T15:23:22Z

Summary

Reduce Prometheus Docker volume storage consumption.

Changes

Upgrade Prometheus to v2.47.2 with WAL compression
Add 7-day retention and 2GB size limits
Optimize scrape intervals for system metrics while preserving critical benchmarking data

Expected Results

Reduces Docker volume size by ~70%
Maintains benchmarking accuracy for critical metrics
Automatic storage cleanup

Fixes #52

- Upgrade to v2.47.2 with WAL compression and retention policies - Set 7-day/2GB limits to prevent unlimited growth - Reduce system metric intervals while preserving benchmarking accuracy Fixes stratum-mining#52

GitGab19 · 2025-08-21T10:31:29Z

docker-compose-config-a.yaml

+      - "--storage.tsdb.retention.time=7d"
+      - "--storage.tsdb.retention.size=2GB"


Does it mean we are going to have benchmarks data only for the latest 7 days until we reach 2GB?

No, what it means is oldest data is cleaned up either after 7days or if data goes above 2GB, this allows for data stored to always be below 2GB of storage.

So we cannot have data which are older than 7days with this change.

It's not what we want, since we run benchmarks for more than that.

Thank you for highlighting that. I got a few questions to guide in how I address this: what's the typical maximum benchmark duration? also what approach do you recommend I look into so as to make the solution more robust in addressing the concerns you've raised?

There's not a typical duration, but we did run benchmarks for months in the past..

Tbh, I don't even know if the issue this PR wants to solve is still valid or not (it has been opened an year ago) and I feel it lacks proper context: how much time was the system running?

Did you run some tests to see how much prometheus volume grows in time?

Thanks for the questions! Here are the test results I got when I was working on the issue last year:

Q1: How much time was the system running?

21 days continuous (October 29 - November 19, 2024)

Q2: Did you run tests to see Prometheus volume growth over time?

Yes, here are the measured results:

Day Date Prometheus Volume Daily Growth

0 Oct 29 512 MB Baseline

7 Nov 5 5,321 MB +687 MB/day

14 Nov 12 10,130 MB +687 MB/day

21 Nov 19 14,939 MB +687 MB/day

Summary:

Growth rate: 687 MB per day (unconstrained growth)

Total growth: 14.4 GB over 21 days

Based on these findings, implementing configurable retention limits instead of removing them entirely would prevent unbounded growth while giving users flexibility. Enabling WAL compression provides additional storage optimization.

Proposed implementation:

command: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' - '--storage.tsdb.retention.time=${PROMETHEUS_RETENTION_TIME:-30d}' - '--storage.tsdb.retention.size=${PROMETHEUS_RETENTION_SIZE:-10GB}' - '--storage.tsdb.wal-compression' - '--web.enable-lifecycle' # Allows runtime config reload

This approach would:

Prevent unbounded growth (default 30d/10GB limits)

Allow extended test periods via environment variables

Reduce storage overhead through WAL compression

Address issue Large Prometheus Docker Volume #52 concern by @average-gary about long-running tests

Would this configurable approach work better than removing all limits? Happy to update the PR with this implementation if you think it's a good middle ground.

How many miners were you running back then? Which config were you testing?

Would you run the tool for a couple of days again to see if the situation improved now?

I feel that imposing some "too strict" hard-coded boundaries could be a bad idea if the user wants to run benchmarks for more days and doesn't have problems with storage..

Let me run it over the next 7 days then give you my findings, I have some bitaxes and Apollo this time around.

GitGab19 · 2025-08-21T10:32:15Z

prometheus/prometheus.yml

+  scrape_interval:     30s # By default, scrape targets every 30 seconds.
+  evaluation_interval: 30s # By default, scrape targets every 30 seconds.


I wouldn't change this tbh

I can revert to the original 15s

Optimize Prometheus storage and reduce Docker volume usage

36a630c

- Upgrade to v2.47.2 with WAL compression and retention policies - Set 7-day/2GB limits to prevent unlimited growth - Reduce system metric intervals while preserving benchmarking accuracy Fixes stratum-mining#52

GitGab19 reviewed Aug 21, 2025

View reviewed changes

Revert to original scrape interval times

d9be00d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize Prometheus storage and reduce Docker volume usage #88

Optimize Prometheus storage and reduce Docker volume usage #88

Uh oh!

xyephy commented Aug 19, 2025

Uh oh!

GitGab19 Aug 21, 2025

Uh oh!

xyephy Aug 22, 2025

Uh oh!

GitGab19 Aug 22, 2025

Uh oh!

xyephy Aug 22, 2025

Uh oh!

GitGab19 Aug 22, 2025

Uh oh!

xyephy Aug 22, 2025

Uh oh!

GitGab19 Aug 25, 2025 •

edited

Loading

Uh oh!

xyephy Aug 28, 2025

Uh oh!

GitGab19 Aug 21, 2025

Uh oh!

xyephy Aug 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		- "--storage.tsdb.retention.time=7d"
		- "--storage.tsdb.retention.size=2GB"

Day	Date	Prometheus Volume	Daily Growth
0	Oct 29	512 MB	Baseline
7	Nov 5	5,321 MB	+687 MB/day
14	Nov 12	10,130 MB	+687 MB/day
21	Nov 19	14,939 MB	+687 MB/day

		scrape_interval: 30s # By default, scrape targets every 30 seconds.
		evaluation_interval: 30s # By default, scrape targets every 30 seconds.

Optimize Prometheus storage and reduce Docker volume usage #88

Are you sure you want to change the base?

Optimize Prometheus storage and reduce Docker volume usage #88

Uh oh!

Conversation

xyephy commented Aug 19, 2025

Summary

Changes

Expected Results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Q1: How much time was the system running?

Q2: Did you run tests to see Prometheus volume growth over time?

Uh oh!

GitGab19 Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GitGab19 Aug 25, 2025 •

edited

Loading