Skip to content

Conversation

@xyephy
Copy link

@xyephy xyephy commented Aug 19, 2025

Summary

Reduce Prometheus Docker volume storage consumption.

Changes

  • Upgrade Prometheus to v2.47.2 with WAL compression
  • Add 7-day retention and 2GB size limits
  • Optimize scrape intervals for system metrics while preserving critical benchmarking data

Expected Results

  • Reduces Docker volume size by ~70%
  • Maintains benchmarking accuracy for critical metrics
  • Automatic storage cleanup

Fixes #52

- Upgrade to v2.47.2 with WAL compression and retention policies
- Set 7-day/2GB limits to prevent unlimited growth
- Reduce system metric intervals while preserving benchmarking accuracy

Fixes stratum-mining#52
Comment on lines +428 to +429
- "--storage.tsdb.retention.time=7d"
- "--storage.tsdb.retention.size=2GB"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean we are going to have benchmarks data only for the latest 7 days until we reach 2GB?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, what it means is oldest data is cleaned up either after 7days or if data goes above 2GB, this allows for data stored to always be below 2GB of storage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we cannot have data which are older than 7days with this change.

It's not what we want, since we run benchmarks for more than that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for highlighting that. I got a few questions to guide in how I address this: what's the typical maximum benchmark duration? also what approach do you recommend I look into so as to make the solution more robust in addressing the concerns you've raised?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not a typical duration, but we did run benchmarks for months in the past..

Tbh, I don't even know if the issue this PR wants to solve is still valid or not (it has been opened an year ago) and I feel it lacks proper context: how much time was the system running?

Did you run some tests to see how much prometheus volume grows in time?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the questions! Here are the test results I got when I was working on the issue last year:

Q1: How much time was the system running?

21 days continuous (October 29 - November 19, 2024)

Q2: Did you run tests to see Prometheus volume growth over time?

Yes, here are the measured results:

Day Date Prometheus Volume Daily Growth
0 Oct 29 512 MB Baseline
7 Nov 5 5,321 MB +687 MB/day
14 Nov 12 10,130 MB +687 MB/day
21 Nov 19 14,939 MB +687 MB/day

Summary:

  • Growth rate: 687 MB per day (unconstrained growth)
  • Total growth: 14.4 GB over 21 days

Based on these findings, implementing configurable retention limits instead of removing them entirely would prevent unbounded growth while giving users flexibility. Enabling WAL compression provides additional storage optimization.

Proposed implementation:

command:
  - '--config.file=/etc/prometheus/prometheus.yml'
  - '--storage.tsdb.path=/prometheus'
  - '--storage.tsdb.retention.time=${PROMETHEUS_RETENTION_TIME:-30d}'
  - '--storage.tsdb.retention.size=${PROMETHEUS_RETENTION_SIZE:-10GB}'
  - '--storage.tsdb.wal-compression'
  - '--web.enable-lifecycle'  # Allows runtime config reload

This approach would:

  • Prevent unbounded growth (default 30d/10GB limits)
  • Allow extended test periods via environment variables
  • Reduce storage overhead through WAL compression
  • Address issue Large Prometheus Docker Volume #52 concern by @average-gary about long-running tests

Would this configurable approach work better than removing all limits? Happy to update the PR with this implementation if you think it's a good middle ground.

Copy link
Member

@GitGab19 GitGab19 Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many miners were you running back then? Which config were you testing?

Would you run the tool for a couple of days again to see if the situation improved now?

I feel that imposing some "too strict" hard-coded boundaries could be a bad idea if the user wants to run benchmarks for more days and doesn't have problems with storage..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me run it over the next 7 days then give you my findings, I have some bitaxes and Apollo this time around.

Comment on lines 3 to 4
scrape_interval: 30s # By default, scrape targets every 30 seconds.
evaluation_interval: 30s # By default, scrape targets every 30 seconds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't change this tbh

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can revert to the original 15s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Large Prometheus Docker Volume

2 participants