fix: prevent partial file reads during concurrent downloads#1548
fix: prevent partial file reads during concurrent downloads#1548nico-martin wants to merge 6 commits intomainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@sroussey would be great to get your take on this PR! :) |
|
Looks good to me! |
|
btw... does unlink() work for directories? |
|
from the node docs: To get a behavior similar to the rm -rf Unix command, use fsPromises.rm() with options { recursive: true, force: true }. https://nodejs.org/api/fs.html#fspromisesrmpath-options This is in regards to the |
No it only works for file. But in my opinion thats correct. |
|
Indeed, we only control downloading on a per-file basis, so @nico-martin's approach should be fine. |
|
Oh yes, of course. In my minds eye, I thought it was cleaning up the folder and renaming the folder. It’s the file cache so of course it’s files. :) |
| async put(request, response, progress_callback = undefined) { | ||
| let filePath = path.join(this.path, request); | ||
| // Include both PID and a random suffix so that concurrent put() call within the same process (e.g. multiple pipelines loading the same file in parallel) each get their own temp file and don't corrupt each other's writes. | ||
| let tmpPath = filePath + `.tmp.${process.pid}.${Math.random().toString(36).slice(2)}`; |
There was a problem hiding this comment.
is there ever a case where process or process.pid isn't defined? Maybe we can add a guard here.
Also, should we use our random module for this? 👀 If we do, we should use a different instance (because otherwise we could have a scenario where browser and node versions don't match)
Fixes #1544
What
Writes model files to a unique
.tmp.<pid>.<random>path first, then atomically renames to the final path once the download is complete. This ensures concurrent readers (whether in other processes or in the same process) never see a partially-written file.Changes
hub/files.js→ split intohub/FileResponse.jsandcache/FileCache.jsFileCache.put()now writes to a unique temp file and renames atomically on success, or deletes on errorThe main fix is this commit and this small improvement. The rest ist just some clean-up.