Skip to content

feat(monitor): improve power monitor with dynamic collection based on interval #2016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

sthaha
Copy link
Collaborator

@sthaha sthaha commented Apr 17, 2025

This refactor enhances the monitor as follows:

  • Replace mutex-based synchronization with atomic operations for thread safety
  • Implement dynamic collection scheduling with configurable intervals
  • Add option to disable automatic collection (interval=0) for on-demand only mode
  • Pass previous snapshot explicitly to calculateNodePower
  • Simplify zone name handling by removing unnecessary synchronization
  • Use a context for clean cancellation of collection goroutines
  • Ensure freshness checks always return recent data

Copy link

codecov bot commented Apr 17, 2025

Codecov Report

Attention: Patch coverage is 80.00000% with 10 lines in your changes missing coverage. Please review.

Project coverage is 93.16%. Comparing base (2e8cd65) to head (1b33e5b).
Report is 8 commits behind head on reboot.

Files with missing lines Patch % Lines
internal/monitor/monitor.go 77.77% 8 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           reboot    #2016      +/-   ##
==========================================
- Coverage   93.78%   93.16%   -0.63%     
==========================================
  Files          21       21              
  Lines        1159     1200      +41     
==========================================
+ Hits         1087     1118      +31     
- Misses         57       65       +8     
- Partials       15       17       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sthaha sthaha force-pushed the feat-monitor-with-interval branch 7 times, most recently from 105ea67 to 9472e01 Compare April 20, 2025 14:15
@sthaha sthaha changed the title refactor(monitor): improve power monitor with atomic snapshots and dynamic collection feat(monitor): improve power monitor with dynamic collection based on interval Apr 20, 2025
@sthaha sthaha added the feat A new feature or enhancement label Apr 20, 2025
@sthaha sthaha force-pushed the feat-monitor-with-interval branch 2 times, most recently from daa47b3 to da3e8d2 Compare April 23, 2025 02:56
Comment on lines 177 to 178
if err := pm.collectData(); err != nil {
pm.logger.Error("Failed to collect power data", "error", err)
}
pm.scheduleNextCollection()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call collectionLoop() here?

Copy link
Collaborator Author

@sthaha sthaha Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this "easier" to reason ...

  • collectionLoop starts the collection and then schedules Next if there is an interval otherwise it exits

  • scheduleNext() schedules the next collection (recursive) and aborts if canceled. I.E this is called only if there is an interval.

@vimalk78
Copy link
Collaborator

please rebase to reboot

@sthaha sthaha force-pushed the feat-monitor-with-interval branch 3 times, most recently from e217121 to 7badc4a Compare April 29, 2025 06:15
default:
pm.logger.Debug("Data channel is full, dropping signal")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need to say dropping signal , seems something negative has happened.

}
}

func (pm *PowerMonitor) Run(ctx context.Context) error {
pm.logger.Info("Monitor is running...")
go pm.collectionLoop() // NOTE: runs in background
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need to run this in separate go routine, its lifetime is short and is not a loop.

Copy link
Collaborator Author

@sthaha sthaha Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its lifetime will be long as soon as we add the process and containers and pods - right? collectionLoop runs an initial collection (time consuming) and then schedules next.

--- second thought ---
We can avoid the go call since Run does is to wait on context after the collection is started.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function does first collection, schedules timer, and returns.

timer := pm.clock.After(pm.interval)
go func() {
select {
case <-timer:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can get the current time from this channel, can avoid calling Now() in refresh data

Copy link
Collaborator Author

@sthaha sthaha Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now() should refer to run and not when the collection started, so caching/ reusing the current time isn't IMHO the right approach. Also, lets keep the args to a function as minimum as possible.

This refactor enhances the power monitoring system with the following improvements:

- Implement dynamic collection scheduling with configurable intervals
- Add option to disable automatic collection (interval=0) for on-demand only mode
- Use a context for clean cancellation of collection goroutines

Signed-off-by: Sunil Thaha <[email protected]>
@sthaha sthaha force-pushed the feat-monitor-with-interval branch from 7badc4a to 1b33e5b Compare April 29, 2025 07:00
@vimalk78 vimalk78 merged commit 8ab13c6 into sustainable-computing-io:reboot Apr 29, 2025
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat A new feature or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants