-
Notifications
You must be signed in to change notification settings - Fork 999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gc: improve the performance of Juicefs gc command #5683
base: main
Are you sure you want to change the base?
Conversation
Improve the file deletion performance by processing multiple files in parallel ref: juicedata#5671
I have submitted a pull request for issue #5671, which aims to improve the performance of the gc command, especially when handling a large number of small files. I would appreciate it if someone could take a moment to review it. |
pkg/meta/tkv.go
Outdated
startKey := m.fmtKey("D") | ||
endKey := nextKey(startKey) | ||
for { | ||
keys, values, err := m.scan(startKey, endKey, batchSize, func(k, v []byte) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use client.scan directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
client.scan
is a full scan method which retrieves all pending delete files, so it may not suitable for handling a large number of small files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scan implementation in TiKV and FDB already handles this situation by fetching in batches, but etcd doesn’t. Let’s keep it like this then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fully understand :)
return fmt.Errorf("invalid key %x", key) | ||
for i := 0; i < threads; i++ { | ||
wg.Add(1) | ||
go func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an exception occurs in the middle, the GC command should print that, same for redis and sql implement.
cmd/gc.go
Outdated
@@ -70,6 +70,11 @@ $ juicefs gc redis://localhost --delete`, | |||
Value: 10, | |||
Usage: "number threads to delete leaked objects", | |||
}, | |||
&cli.IntFlag{ | |||
Name: "cleanup-threads", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can reuse the threads
for cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, use the maxDeletes
to delete objects and handle pending deleted file.
deleteFileChan := make(chan redis.Z, threads) | ||
var wg sync.WaitGroup | ||
|
||
for i := 0; i < threads; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this part into base.go to reduce the duplicated code?
43aac94
to
573111c
Compare
Improve the file deletion performance by processing multiple files in parallel
ref: #5671