Skip to content

fix: prevent partial file reads during concurrent downloads#1548

Open
nico-martin wants to merge 6 commits intomainfrom
fix/concurrent-download-file-lock
Open

fix: prevent partial file reads during concurrent downloads#1548
nico-martin wants to merge 6 commits intomainfrom
fix/concurrent-download-file-lock

Conversation

@nico-martin
Copy link
Collaborator

Fixes #1544

What

Writes model files to a unique .tmp.<pid>.<random> path first, then atomically renames to the final path once the download is complete. This ensures concurrent readers (whether in other processes or in the same process) never see a partially-written file.

Changes

  • hub/files.js → split into hub/FileResponse.js and cache/FileCache.js
  • FileCache.put() now writes to a unique temp file and renames atomically on success, or deletes on error

The main fix is this commit and this small improvement. The rest ist just some clean-up.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@xenova
Copy link
Collaborator

xenova commented Mar 1, 2026

@sroussey would be great to get your take on this PR! :)

@sroussey
Copy link
Contributor

sroussey commented Mar 3, 2026

Looks good to me!

@sroussey
Copy link
Contributor

sroussey commented Mar 3, 2026

btw... does unlink() work for directories?

@sroussey
Copy link
Contributor

sroussey commented Mar 3, 2026

from the node docs:

To get a behavior similar to the rm -rf Unix command, use fsPromises.rm() with options { recursive: true, force: true }.

https://nodejs.org/api/fs.html#fspromisesrmpath-options

This is in regards to the catch()

@nico-martin
Copy link
Collaborator Author

btw... does unlink() work for directories?

No it only works for file. But in my opinion thats correct. tmpPath will always be a file.

@xenova
Copy link
Collaborator

xenova commented Mar 3, 2026

Indeed, we only control downloading on a per-file basis, so @nico-martin's approach should be fine.

@sroussey
Copy link
Contributor

sroussey commented Mar 3, 2026

Oh yes, of course. In my minds eye, I thought it was cleaning up the folder and renaming the folder. It’s the file cache so of course it’s files. :)

async put(request, response, progress_callback = undefined) {
let filePath = path.join(this.path, request);
// Include both PID and a random suffix so that concurrent put() call within the same process (e.g. multiple pipelines loading the same file in parallel) each get their own temp file and don't corrupt each other's writes.
let tmpPath = filePath + `.tmp.${process.pid}.${Math.random().toString(36).slice(2)}`;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there ever a case where process or process.pid isn't defined? Maybe we can add a guard here.

Also, should we use our random module for this? 👀 If we do, we should use a different instance (because otherwise we could have a scenario where browser and node versions don't match)

Copy link
Collaborator

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 One comment, but otherwise I think it's a solid improvment.

@sroussey have you been able to test yourself too? And verify that it fixed your issues?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Loading the same model simultaneously in multiple processes fails

4 participants