-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Bug Report
Description
If your repository contains a git submodule and a dvc stage executes a file from that submodule, producing an output in the project's root folder, dvc exp run
does not run the experiment successfully but leaves a cluttered/broken git repository behind.
Reproduce
- Create a local git repository, initialize with dvc.
- Use another git repository, in my case I use a simple script
run.py
with the content:
import os
path = "model"
filename = "output.txt"
if not os.path.exists(path):
os.makedirs(path)
with open(os.path.join(path, filename), 'wb') as temp_file:
temp_file.write(b"Output")
to create a "fake" output.
3. Setup your dvc.yaml
pipeline as:
stages:
run-test:
cmd: python external/run.py
deps:
- external/
outs:
- model/output.txt
- Include the former subrepository with
git submodule add <my-subrepo-url> external/
- Commit the added submodule and all changes to your local main repository
- Run
dvc exp run
Reproducing experiment 'meaty-teff'
Building workspace index
Comparing indexes
Applying changes
Stage 'run-test' didn't change, skipping
ERROR: unexpected error - invalid data in index - invalid entry
After termination, the workspace is heavily cluttered by git changes like:
new file: .git/objects/ff/b6ed23131c48887b35e4e0d9e9bb8954b547bb
new file: .git/objects/ff/b81afb96b85488c92dd2a4ce0c7ebf68f533f6
new file: .git/objects/ff/b87e48727d95d793d8bb17ed20bc847107339f
new file: .git/objects/ff/b8e5f7ed10543ab4940212978c3eea6dd6d19f
new file: .git/objects/ff/b999f452746ae926fe6592eb4fc499804072c9
new file: .git/objects/ff/ba93b2cb261ac8c0235c4608cb6d0e79087078
new file: .git/objects/ff/bc49cd878675079f05cc65d0ebfa42590d52d3
new file: .git/objects/ff/bd5f4a623f5a0dd293e555e73f2f6c17be9cb2
new file: .git/objects/ff/bdc6ad64e18a10a82e7ea828fbdb2ed1c4fab6
new file: .git/objects/ff/c0207458f1ea7d2b70e9f4ba6d81107ac32147
new file: .git/objects/ff/c1d5324ceae9a256cd0a5180ed4b05012eb26c
new file: .git/objects/ff/c1dc8332b015572fe01371fde74173a6087aaf
new file: .git/objects/ff/c629a3e3484340fc326324889e4aa705164517
new file: .git/objects/ff/c7535b45a35b7e39f18332bdbf2e722a3104a6
new file: .git/objects/ff/c79bd1e5b4f91c81a40816d090880962d4a746
As well as changes like
modified: .git/HEAD
modified: .git/index
modified: .git/logs/HEAD
deleted: .git/refs/exps/exec/EXEC_BASELINE
deleted: .git/refs/exps/exec/EXEC_MERGE
- See the full stacktrace attached at the end
Expected
This setup used to work in the past. I reactivated a training-pipeline with a setup like this recently and found it broken. dvc exp run
should execute the job and successfully add the experiment to the git refs for further consumption.
Environment information
Tested with Ubuntu 24.04 and python 3.10 as well as Windows WSL2 Ubuntu 24.04 Python 3.12.
Output of dvc doctor
:
DVC version: 3.61.0 (pip)
-------------------------
Platform: Python 3.12.3 on Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39
Subprojects:
dvc_data = 3.16.10
dvc_objects = 5.1.1
dvc_render = 1.0.2
dvc_task = 0.40.2
scmrepo = 3.4.0
Supports:
http (aiohttp = 3.12.15, aiohttp-retry = 2.9.1),
https (aiohttp = 3.12.15, aiohttp-retry = 2.9.1),
s3 (s3fs = 2025.7.0, boto3 = 1.39.8)
Config:
Global: /home/hemker/.config/dvc
System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdd
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/sdd
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/cf372b30375dfe9c40a81de6a73671ef
Additional Information (if any):
Hint: I modified the scmrepo
files to temporarily exclude the pygit2
-backend. This solves the error as scmrepo
then iterates over the remaining dulwich
and gitpython
backends:
File changed: lib/python3.12/site-packages/scmrepo/git/__init__.py
class GitBackends(Mapping):
DEFAULT: ClassVar[dict[str, BackendCls]] = {
"dulwich": DulwichBackend,
#"pygit2": Pygit2Backend, <-------
"gitpython": GitPythonBackend,
}
Output of dvc exp run -vvv
.
Traceback (most recent call last):
File "/home/hemker/dev/submodule-test/env/lib/python3.12/site-packages/dvc/cli/__init__.py", line 211, in main
ret = cmd.do_run()
^^^^^^^^^^^^
File "/home/hemker/dev/submodule-test/env/lib/python3.12/site-packages/dvc/cli/command.py", line 30, in do_run
return self.run()
^^^^^^^^^^
File "/home/hemker/dev/submodule-test/env/lib/python3.12/site-packages/dvc/commands/experiments/run.py", line 14, in run
self.repo.experiments.run(
File "/home/hemker/dev/submodule-test/env/lib/python3.12/site-packages/dvc/repo/experiments/__init__.py", line 354, in run
return run(self.repo, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hemker/dev/submodule-test/env/lib/python3.12/site-packages/dvc/repo/__init__.py", line 58, in wrapper
return f(repo, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hemker/dev/submodule-test/env/lib/python3.12/site-packages/dvc/repo/experiments/run.py", line 77, in run
return repo.experiments.reproduce_one(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hemker/dev/submodule-test/env/lib/python3.12/site-packages/dvc/repo/experiments/__init__.py", line 126, in reproduce_one
results = self._reproduce_queue(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/hemker/dev/submodule-test/env/lib/python3.12/site-packages/dvc/repo/experiments/utils.py", line 62, in wrapper
ret = f(exp, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hemker/dev/submodule-test/env/lib/python3.12/site-packages/dvc/repo/experiments/__init__.py", line 249, in _reproduce_queue
exec_results = queue.reproduce(copy_paths=copy_paths, message=message)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hemker/dev/submodule-test/env/lib/python3.12/site-packages/dvc/repo/experiments/queue/workspace.py", line 93, in reproduce
self._reproduce_entry(
File "/home/hemker/dev/submodule-test/env/lib/python3.12/site-packages/dvc/repo/experiments/queue/workspace.py", line 137, in _reproduce_entry
executor.cleanup(infofile)
File "/home/hemker/dev/submodule-test/env/lib/python3.12/site-packages/dvc/repo/experiments/executor/local.py", line 251, in cleanup
with self._detach_stack:
File "/usr/lib/python3.12/contextlib.py", line 610, in __exit__
raise exc_details[1]
File "/usr/lib/python3.12/contextlib.py", line 595, in __exit__
if cb(*exc_details):
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
next(self.gen)
File "/home/hemker/dev/submodule-test/env/lib/python3.12/site-packages/scmrepo/git/__init__.py", line 468, in detach_head
self.reset()
File "/home/hemker/dev/submodule-test/env/lib/python3.12/site-packages/scmrepo/git/__init__.py", line 308, in _backend_func
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/hemker/dev/submodule-test/env/lib/python3.12/site-packages/scmrepo/git/backend/pygit2/__init__.py", line 868, in reset
self.repo.index.read(False)
^^^^^^^^^^^^^^^
File "/home/hemker/dev/submodule-test/env/lib/python3.12/site-packages/pygit2/repository.py", line 650, in index
check_error(err, io=True)
File "/home/hemker/dev/submodule-test/env/lib/python3.12/site-packages/pygit2/errors.py", line 66, in check_error
raise GitError(message)
_pygit2.GitError: invalid data in index - invalid entry
Metadata
Metadata
Assignees
Labels
Type
Projects
Status