-
Notifications
You must be signed in to change notification settings - Fork 999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the performance of Juicefs gc #5671
Labels
kind/feature
New feature or request
Comments
PRs are very welcome |
yes, I'd like to create a pr to address this issue. |
SonglinLife
added a commit
to ctripcloud/juicefs
that referenced
this issue
Feb 19, 2025
Improve the file deletion performance by processing multiple files in parallel ref: juicedata#5671
SonglinLife
added a commit
to ctripcloud/juicefs
that referenced
this issue
Feb 19, 2025
Improve the file deletion performance by processing multiple files in parallel ref: juicedata#5671
SonglinLife
added a commit
to ctripcloud/juicefs
that referenced
this issue
Feb 19, 2025
Improve the file deletion performance by processing multiple files in parallel ref: juicedata#5671
SonglinLife
added a commit
to ctripcloud/juicefs
that referenced
this issue
Feb 19, 2025
Improve the file deletion performance by processing multiple files in parallel ref: juicedata#5671
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When dealing with a large number of pending deleted files, we typically use
juicefs gc
to remove these files. However, the juicefs gc command invokes the scanPendingFiles function, where each soft - deleted file is processed sequentially.juicefs/pkg/meta/tkv.go
Lines 2584 to 2594 in 2dd3897
The issue is that if there are a large number of files with relatively small individual sizes, the file deletion speed will drop significantly. Although gc provides the --threads parameter, in reality, this parameter controls the parallel deletion speed within a single file.
I think this is how we can solve it in tkv.go: process each file pending deletion in parallel. The code is provided below. Please help me review it. :)
And, I tested the results and found that it can improve the gc performance by 10x in the scenario of deleting small files.
The text was updated successfully, but these errors were encountered: