measure task resources #1491

elfkuzco · 2025-11-01T07:24:01Z

Rationale

This PR adds support for measuring resources used by the scraper. For the CPU stats, it uses the Exponentially Weighted Moving Average to measure the percentage of CPU usage, along with the maximum CPU percentage used.

Also, max disk usage is computed by adding the filesizes of the files in the scraper mount directory.

Changes

add functions to compute CPU and Disk stats
show stats in UI

This closes #1423

elfkuzco · 2025-11-01T07:26:01Z

Implementation of disk usage is yet to be added as it would rely on the approval of docker/docker-py#3370

benoit74 · 2025-11-03T09:11:30Z

As discussed on Slack, I propose we wait few days for https://github.com/docker/docker-py/ maintainers to give us an answer.

If they don't reply soon enough, I propose two possible plans:

consider only mounts size in disk usage stat for the time being, since this is in general the "core" of disk usage, at least for big ZIMs which are primary concern
reimplement our own very-limited Docker SDK for only the operations we need ; could make since because Python Docker SDK seems to receive little attention from Docker, plus we only use few methods and could plug directly to the REST API just like the SDK does for these few methods we are using

benoit74 · 2025-11-10T06:31:19Z

@elfkuzco looks like I was right been concerned about not getting any feedback on your upstream PR.

Please advise which plan B (among the two I've proposed, or one you can propose) makes more sense to you so that we can move on and have CPU measure and at least a first estimate of Disk used.

elfkuzco · 2025-11-10T11:52:37Z

consider only mounts size in disk usage stat for the time being, since this is in general the "core" of disk usage, at least for big ZIMs which are primary concern

this would be simpler to implement plus, given almost everything is written to the mount point, i don't know if there's going to be any real metric obtained from writable layer. possibly .pyc files or pycache files but those really shouldn't be big enough, right?

benoit74 · 2025-11-10T13:32:44Z

Let's go for this alternative: consider only mount point for the time-being + open an issue about the fact that we might want to better track disk usage. Goal would be not only to capture the writable layer (which in general is supposed to be small, but this is not the case on all scrapers, not even speaking about bugs) but also the image size itself (which is then an slight overestimation of disk usage since image is shared across tasks).

elfkuzco · 2025-11-14T12:37:41Z

Updated PR description.

codecov · 2025-11-14T12:39:59Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.39%. Comparing base (921dc27) to head (b3a9a34).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1491      +/-   ##
==========================================
+ Coverage   83.38%   83.39%   +0.01%     
==========================================
  Files          91       91              
  Lines        4399     4403       +4     
  Branches      470      470              
==========================================
+ Hits         3668     3672       +4     
  Misses        606      606              
  Partials      125      125

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

benoit74

Small remarks + I need more data to confirm the average we are computing is really close to the average CPU consumption. Since we have tasks which might run for hours, I doubt an EWMA with 0.25 and update every minute will really represent something close to the average. I might be wrong, at least I need to be convinced 😄

worker/src/zimfarm_worker/task/worker.py

backend/src/zimfarm_backend/common/schemas/orms.py

benoit74

LGTM

elfkuzco self-assigned this Nov 2, 2025

elfkuzco force-pushed the measure-resources branch from 7bfc82c to 6165814 Compare November 14, 2025 12:34

elfkuzco requested a review from benoit74 November 14, 2025 12:37

elfkuzco mentioned this pull request Nov 14, 2025

Enhance Disk Usage Reporting #1520

Open

benoit74 requested changes Nov 14, 2025

View reviewed changes

elfkuzco requested a review from benoit74 November 17, 2025 18:51

benoit74 approved these changes Nov 17, 2025

View reviewed changes

measure task cpu and disk percentage

b3a9a34

elfkuzco force-pushed the measure-resources branch from 908e268 to b3a9a34 Compare November 18, 2025 04:13

elfkuzco merged commit 191e80c into main Nov 18, 2025
10 checks passed

elfkuzco deleted the measure-resources branch November 18, 2025 04:19

Uh oh!

measure task resources #1491

measure task resources #1491

Uh oh!

Conversation

elfkuzco commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale

Changes

Uh oh!

elfkuzco commented Nov 1, 2025

Uh oh!

benoit74 commented Nov 3, 2025

Uh oh!

benoit74 commented Nov 10, 2025

Uh oh!

elfkuzco commented Nov 10, 2025

Uh oh!

benoit74 commented Nov 10, 2025

Uh oh!

elfkuzco commented Nov 14, 2025

Uh oh!

codecov bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

benoit74 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

benoit74 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

elfkuzco commented Nov 1, 2025 •

edited

Loading

codecov bot commented Nov 14, 2025 •

edited

Loading