Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficiently handle sparse files in StorageBackend::get_into #527

Open
asomers opened this issue Dec 16, 2024 · 0 comments
Open

Efficiently handle sparse files in StorageBackend::get_into #527

asomers opened this issue Dec 16, 2024 · 0 comments

Comments

@asomers
Copy link
Contributor

asomers commented Dec 16, 2024

StorageBackend::get_into is the main way to read a file. Currently, it simply reads the entire file from the storage backend, and copies it to the Writer. But that's inefficient for sparse files. It would be better if it understood sparse files, and could synthesize zeros as needed. This would require:

  • Adding an lseek method to the trait
  • Implementing the lseek method for the FS backend
  • Within get_into, iterating through the file using lseek with SEEK_HOLE and SEEK_DATA
    • Copying data from the storage backend for every data region
    • Synthesizing zeros for every hole region

Alternatively, this change could be made entirely within the FS backend by returning a custom object that implements tokio::io::AsyncRead but is hole-aware, and synthesizes zeros as needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant