Can I use JuiceFS for storing sparse files? #5675

liyimeng · 2025-02-18T10:49:17Z

I am wondering if Juicefs is a good fit for storing sparse file.
According to #2637, it seems JuiceFS have limited support sparse features. What if I put a sparse file into JuiceFS, will juicefs upload the file with logic size into backend storage, or it only uploads the physical size of the file. I mean, if juicefs fill the storage with a lot of zero, or just skip them for efficiency? If it is the later case, what happens to the usage? Can Juicefs correctly calculate the real usage?

liyimeng · 2025-02-18T11:09:00Z

Sorry, I was a little rush to push out the question. In the issue, it mentioned similar problem in glusterfs, which is exactly dealing with sparse files.

So my question become:

if #3898 is merged, will juicefs can efficiently upload into backend storage, like s3? I get a impression from this comment that gnu tools can be impacted if we use cp on a file inside a mounted juicefs. But what I really care is between juicefs and backend storage, where is bandwidth and storage efficiency matter.

Any insight suggestions?

Thanks a lot in advanced!

liyimeng · 2025-02-19T07:22:16Z

I also observed some difference between local fs and s3 backend:

When usingg du command, local fs backend, juicefs report sparse files physical size properly while s3 backend always report virtual size.
first time to copy sparse into juicefs always result in a corrupted file. It must be deleted and re-copy again, while s3 seem always failed to verify the copied files and which treat as corrupted in my case.

why different backend make so much big difference?

liyimeng · 2025-02-19T09:21:28Z

Another finding, juicefs seem handle sparse file dramatically different depending on if file is in qcow2 or raw format? what could make such difference?

jiefenghuang · 2025-02-21T03:58:54Z

Sorry, I was a little rush to push out the question. In the issue, it mentioned similar problem in glusterfs, which is exactly dealing with sparse files.抱歉，我有点急切地发布了这个问题。在问题中，它提到了 glusterfs 中类似的问题，这正是处理稀疏文件的情况。

So my question become:所以我的问题变成了：

if #3898 is merged, will juicefs can efficiently upload into backend storage, like s3? I get a impression from this comment that gnu tools can be impacted if we use cp on a file inside a mounted juicefs. But what I really care is between juicefs and backend storage, where is bandwidth and storage efficiency matter.如果 #3898 被合并，juicefs 能否高效地上传到后端存储，如 s3？我从这条评论中得到的印象是，如果我们在一个挂载的 juicefs 文件上使用 cp，GNU 工具可能会受到影响。但我真正关心的是，在 juicefs 和后端存储之间，带宽和存储效率在哪里更重要。

Any insight suggestions?任何见解建议？

Thanks a lot in advanced!非常感谢提前！

for now, according to #2637, juicefs don't support seek_hole, seek_data for sparse files copy (like cp cmd);
Implementing a complete lseek (supporting SEEK_HOLE, SEEK_DATA) may not be suitable for general scenarios. However, optimizations could be considered during write operations. For example, even for PLAIN_SCANTYPE scan-based copies, enabling zero-block verification in specific directories could skip the copying of such data blocks, reducing unnecessary write overhead. This way, S3 would not need to store these zero blocks.

liyimeng · 2025-02-21T05:32:15Z

@jiefenghuang So #3898 dose have a value, even tools like cp are not fully benefited, demanding on backend storage, communication between FUSE and backend storage are significantly reduced. Why don't we just get it merged?

jiefenghuang · 2025-02-21T06:24:57Z

@jiefenghuang So #3898 dose have a value, even tools like cp are not fully benefited, demanding on backend storage, communication between FUSE and backend storage are significantly reduced. Why don't we just get it merged?所以 #3898 有值，即使像 cp 这样的工具也无法完全受益，对后端存储的需求增加，FUSE 与后端存储之间的通信显著减少。我们为什么不直接合并它呢？

it is too expensive for general scenarios， fyi #3924

liyimeng added the kind/question Further information is requested label Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I use JuiceFS for storing sparse files? #5675

Can I use JuiceFS for storing sparse files? #5675

liyimeng commented Feb 18, 2025 •

edited

Loading

liyimeng commented Feb 18, 2025

liyimeng commented Feb 19, 2025 •

edited

Loading

liyimeng commented Feb 19, 2025

jiefenghuang commented Feb 21, 2025

liyimeng commented Feb 21, 2025

jiefenghuang commented Feb 21, 2025

Can I use JuiceFS for storing sparse files? #5675

Can I use JuiceFS for storing sparse files? #5675

Comments

liyimeng commented Feb 18, 2025 • edited Loading

liyimeng commented Feb 18, 2025

liyimeng commented Feb 19, 2025 • edited Loading

liyimeng commented Feb 19, 2025

jiefenghuang commented Feb 21, 2025

liyimeng commented Feb 21, 2025

jiefenghuang commented Feb 21, 2025

liyimeng commented Feb 18, 2025 •

edited

Loading

liyimeng commented Feb 19, 2025 •

edited

Loading