Skip to content

0-byte file is the result of copying a file to itself with DVCFileSystem.get_file with any file larger than COPY_PBAR_MIN_SIZE #318

@adamliter

Description

@adamliter

Bug report

If you use DVCFileSystem's get_file method to copy a file to itself, you'll get a file with size of 0 bytes if the file size greater than COPY_PBAR_MIN_SIZE. However, if the file size is less than COPY_PBAR_MIN_SIZE, you'll get the original file back.

You end up with a 0-byte file because of this code here.

Current behavior

$ cd /tmp
$ mkdir dvc-test
$ cd dvc-test
$ pdm init --python cpython@3.12
$ git init
$ dvc init
$ git add .
$ git commit -m "initial commit"
$ truncate -s 2G model.ckpt
$ dvc add model.ckpt
$ git add .
$ git commit -m "trained model"
$ ls -lh model.ckpt
-rw-r--r--@ 1 adam.liter  wheel   2.0G Dec  9 17:21 model.ckpt

Then from Python (e.g., pdm run python):

from dvc.api import DVCFileSystem
fs = DVCFileSystem()
fs.get_file("model.ckpt", "model.ckpt")

Now go back to a shell and check the file size:

$ ls -lh model.ckpt
-rw-r--r--@ 1 adam.liter  wheel     0B Dec  9 17:25 model.ckpt

Expected behavior

The behavior of dvc_objects.fs.utils.copyfile should be the same for all files, regardless of file size. In particular, if copying a file to itself when the file size is greater than COPY_PBAR_MIN_SIZE, the result should not be a 0-byte file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions