-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[RDF] Improvements for the progress bar #21181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
4dc45e4 to
56f0fa2
Compare
vepadulano
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Some minor comments for consideration
Test Results 21 files 21 suites 3d 6h 29m 19s ⏱️ For more details on these failures, see this check. Results for commit f672564. ♻️ This comment has been updated with latest results. |
|
Thanks for these updates, this is great! One comemnt that we discussed at some point, would it be possible to add an option for the users to pass the total number of events (if they know it)? We often need to open each file before we run RDF to build the file lists and the total sum of weights for normalisation. We could also trivially collect the total number of events. |
Hello @TomasDado, yes, it hasn't been forgotten. 🙂 |
When samples are split in multiple ranges, the progress bar takes very long to find out the total number of events. By adding tree->GetEntries() (and the RNTuple equivalent), the total number of events is known as soon as a file is opened.
7a04681 to
8f0378c
Compare
This removes redundant information from the line printed to the terminal and shortens the output. Otherwise, the progress bar frequently overflows the terminal. When the progress bar prints to a file, reduce the frequency to every 10 seconds, and limit its width to 60 chars to avoid cluttering the file. Furthermore, significantly improve how completion is estimated. The following two heuristics are employed: - Files that have been opened count as fractionOfFilesAlreadyOpened * eventsProcessed/totalEvents This tracks the progress of all files for which the number of events has been seen. - Files that have not been opened count as 1/totalFiles until they have been opened. This means that the progress bar e.g. can't reach 50% if half of the files haven't been opened yet. This change significantly reduces the jumps of the progress bar when a new file is opened. Finally, the code was refactored: - Handle locking logic for updates with a single RAII, update locks to C++17. - Remove a lock and mutex that didn't have any effect. - Reduce repeated function calls to functions that hold locks. - Simplify computation of average number of events. - Relax memory order of the atomics to what's necessary. - Outline as many functions as possible. Since RDFHelpers.hxx goes through JITting and pcms, it gets compiled frequently, so outlining is probably beneficial. - Make a lot of members const to avoid unintenional modifications. This makes it clear that no mutex needs to be locked to read these. - Collect helper functions in one anonymous namespace. - Remove a constructor argument that was ignored.
8f0378c to
f672564
Compare
vepadulano
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this is very cool!
I realised that we have enough information to ensure that the progress bar doesn't jump around as new files are opened:
So far, when a file was split in ranges, RDF was using the upper end of each range to update the estimated number of events, so the completion times are wrong until the last cluster is opened. Furthermore, unopened files weren't taken into account to estimate the completion.
With this PR, by getting the number of entries from the TTree or RNT datasources, and taking into account how many files haven't been opened yet, one can significantly improve this. Now, the bar doesn't jump backwards (but it might jump a bit faster/slower when the next files opened have significantly less/more events than the previous ones).
Furthermore, I did a lot of refactoring, trying to outline as much as possible into the
.cxx, and reducing the times that locks are taken etc. (see list at the end). Lastly, when the output doesn't go to a tty, the width of the line and frequency of updates is reduced to not unnecessarily clutter the files.New look
Check the times. The first two estimates are 57s and 52s, and it completes in 56s.
It also says
+ xfor the total number of events until all files have been opened.Old look
Check again the times: For a workflow of about 1 minute, the estimated times in the first 3 updates are: 11h, 10s and 6s. I understand that this is per file, but the above seems to do a better job.
And the width when written to a file seems to be somehow arbitrary. 🙂
More details of the refactoring
through JITting and pcms, it gets compiled frequently, so outlining is
probably beneficial.
might be interesting because it's an MT context.