- 
                Notifications
    
You must be signed in to change notification settings  - Fork 9
 
Optimize Prometheus storage and reduce Docker volume usage #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Optimize Prometheus storage and reduce Docker volume usage #88
Conversation
- Upgrade to v2.47.2 with WAL compression and retention policies - Set 7-day/2GB limits to prevent unlimited growth - Reduce system metric intervals while preserving benchmarking accuracy Fixes stratum-mining#52
| - "--storage.tsdb.retention.time=7d" | ||
| - "--storage.tsdb.retention.size=2GB" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean we are going to have benchmarks data only for the latest 7 days until we reach 2GB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, what it means is oldest data is cleaned up either after 7days or if data goes above 2GB, this allows for data stored to always be below 2GB of storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we cannot have data which are older than 7days with this change.
It's not what we want, since we run benchmarks for more than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for highlighting that. I got a few questions to guide in how I address this: what's the typical maximum benchmark duration? also what approach do you recommend I look into so as to make the solution more robust in addressing the concerns you've raised?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's not a typical duration, but we did run benchmarks for months in the past..
Tbh, I don't even know if the issue this PR wants to solve is still valid or not (it has been opened an year ago) and I feel it lacks proper context: how much time was the system running?
Did you run some tests to see how much prometheus volume grows in time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the questions! Here are the test results I got when I was working on the issue last year:
Q1: How much time was the system running?
21 days continuous (October 29 - November 19, 2024)
Q2: Did you run tests to see Prometheus volume growth over time?
Yes, here are the measured results:
| Day | Date | Prometheus Volume | Daily Growth | 
|---|---|---|---|
| 0 | Oct 29 | 512 MB | Baseline | 
| 7 | Nov 5 | 5,321 MB | +687 MB/day | 
| 14 | Nov 12 | 10,130 MB | +687 MB/day | 
| 21 | Nov 19 | 14,939 MB | +687 MB/day | 
Summary:
- Growth rate: 687 MB per day (unconstrained growth)
 - Total growth: 14.4 GB over 21 days
 
Based on these findings, implementing configurable retention limits instead of removing them entirely would prevent unbounded growth while giving users flexibility. Enabling WAL compression provides additional storage optimization.
Proposed implementation:
command:
  - '--config.file=/etc/prometheus/prometheus.yml'
  - '--storage.tsdb.path=/prometheus'
  - '--storage.tsdb.retention.time=${PROMETHEUS_RETENTION_TIME:-30d}'
  - '--storage.tsdb.retention.size=${PROMETHEUS_RETENTION_SIZE:-10GB}'
  - '--storage.tsdb.wal-compression'
  - '--web.enable-lifecycle'  # Allows runtime config reloadThis approach would:
- Prevent unbounded growth (default 30d/10GB limits)
 - Allow extended test periods via environment variables
 - Reduce storage overhead through WAL compression
 - Address issue Large Prometheus Docker Volume #52 concern by @average-gary about long-running tests
 
Would this configurable approach work better than removing all limits? Happy to update the PR with this implementation if you think it's a good middle ground.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many miners were you running back then? Which config were you testing?
Would you run the tool for a couple of days again to see if the situation improved now?
I feel that imposing some "too strict" hard-coded boundaries could be a bad idea if the user wants to run benchmarks for more days and doesn't have problems with storage..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me run it over the next 7 days then give you my findings, I have some bitaxes and Apollo this time around.
        
          
                prometheus/prometheus.yml
              
                Outdated
          
        
      | scrape_interval: 30s # By default, scrape targets every 30 seconds. | ||
| evaluation_interval: 30s # By default, scrape targets every 30 seconds. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't change this tbh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can revert to the original 15s
Summary
Reduce Prometheus Docker volume storage consumption.
Changes
Expected Results
Fixes #52