Skip to content

Commit 65f0eca

Browse files
committed
filter-repo: updates and minor fixes in option help and README
Signed-off-by: Elijah Newren <[email protected]>
1 parent 6d231c0 commit 65f0eca

File tree

2 files changed

+74
-47
lines changed

2 files changed

+74
-47
lines changed

README.md

+64-41
Original file line numberDiff line numberDiff line change
@@ -3,16 +3,17 @@ git filter-repo is a versatile tool for rewriting history, which includes
33
else](#design-rationale-behind-filter-repo-why-create-a-new-tool). It
44
roughly falls into the same space of tool as [git
55
filter-branch](https://git-scm.com/docs/git-filter-branch) but without the
6-
[capitulation-inducing poor
7-
performance](https://public-inbox.org/git/CABPp-BGOz8nks0+Tdw5GyGqxeYR-3FF6FT5JcgVqZDYVRQ6qog@mail.gmail.com/),
8-
and with a design that scales usability-wise beyond trivial rewriting
9-
cases.
6+
capitulation-inducing poor
7+
[performance](https://public-inbox.org/git/CABPp-BGOz8nks0+Tdw5GyGqxeYR-3FF6FT5JcgVqZDYVRQ6qog@mail.gmail.com/),
8+
with far more capabilities, and with a design that scales usability-wise
9+
beyond trivial rewriting cases.
1010

1111
While most users will probably just use filter-repo as a simple command
1212
line tool (and likely only use a few of its flags), at its core filter-repo
1313
contains a library for creating history rewriting tools. As such, users
14-
with specialized needs can leverage it to quickly create entirely new
15-
history rewriting tools.
14+
with specialized needs can leverage it to quickly create [entirely new
15+
history rewriting
16+
tools](contrib/filter-repo-demos).
1617

1718
filter-repo is a single-file python script, depending only on the python
1819
standard library (and execution of git commands), all of which is designed
@@ -88,10 +89,21 @@ By contrast, filter-branch comes with a pile of caveats (more on that
8889
below) even once you figure out the necessary invocation(s):
8990

9091
```shell
91-
git filter-branch --tree-filter 'mkdir -p my-module && git ls-files | grep -v ^src/ | xargs git rm -f -q && ls -d * | grep -v my-module | xargs -I files mv files my-module/' --tag-name-filter 'echo "my-module-$(cat)"' --prune-empty -- --all
92+
git filter-branch \
93+
--tree-filter 'mkdir -p my-module && \
94+
git ls-files \
95+
| grep -v ^src/ \
96+
| xargs git rm -f -q && \
97+
ls -d * \
98+
| grep -v my-module \
99+
| xargs -I files mv files my-module/' \
100+
--tag-name-filter 'echo "my-module-$(cat)"' \
101+
--prune-empty -- --all
92102
git clone file://$(pwd) newcopy
93103
cd newcopy
94-
git for-each-ref --format="delete %(refname)" refs/tags/ | grep -v refs/tags/my-module- | git update-ref --stdin
104+
git for-each-ref --format="delete %(refname)" refs/tags/ \
105+
| grep -v refs/tags/my-module- \
106+
| git update-ref --stdin
95107
git gc --prune=now
96108
```
97109

@@ -100,10 +112,23 @@ slow due to using --tree-filter; you could alternatively use the
100112
--index-filter option of filter-branch, changing the above commands to:
101113

102114
```shell
103-
git filter-branch --index-filter 'git ls-files | grep -v ^src/ | xargs git rm -q --cached; git ls-files -s | sed "s-$(printf \\t)-&my-module/-" | git update-index --index-info; git ls-files | grep -v ^my-module/ | xargs git rm -q --cached' --tag-name-filter 'echo "my-module-$(cat)"' --prune-empty -- --all
115+
git filter-branch \
116+
--index-filter 'git ls-files \
117+
| grep -v ^src/ \
118+
| xargs git rm -q --cached;
119+
git ls-files -s \
120+
| sed "s%$(printf \\t)%&my-module/%" \
121+
| git update-index --index-info;
122+
git ls-files \
123+
| grep -v ^my-module/ \
124+
| xargs git rm -q --cached' \
125+
--tag-name-filter 'echo "my-module-$(cat)"' \
126+
--prune-empty -- --all
104127
git clone file://$(pwd) newcopy
105128
cd newcopy
106-
git for-each-ref --format="delete %(refname)" refs/tags/ | grep -v refs/tags/my-module- | git update-ref --stdin
129+
git for-each-ref --format="delete %(refname)" refs/tags/ \
130+
| grep -v refs/tags/my-module- \
131+
| git update-ref --stdin
107132
git gc --prune=now
108133
```
109134

@@ -135,7 +160,10 @@ new and old history before pushing somewhere. Other caveats:
135160
three times faster than the --tree-filter version, but both
136161
filter-branch commands are going to be multiple orders of magnitude
137162
slower than filter-repo.
138-
163+
* Both commands assume all filenames are composed entirely of regular
164+
ascii characters (even special ascii characters such as tabs or
165+
double quotes will wreak havoc and likely result in missing files
166+
or misnamed files)
139167

140168
## Design rationale behind filter-repo (why create a new tool?)
141169

@@ -642,7 +670,7 @@ that filter-repo uses
642670
[bytestrings](https://docs.python.org/3/library/stdtypes.html#bytes)
643671
everywhere instead of strings.
644672

645-
There are three callbacks that allow you to operate directly on raw
673+
There are four callbacks that allow you to operate directly on raw
646674
objects that contain data that's easy to write in [fast-import(1)
647675
format](https://git-scm.com/docs/git-fast-import#_input_format):
648676
```
@@ -758,7 +786,7 @@ An example of each:
758786

759787
```shell
760788
git filter-repo --tag-callback '
761-
if tag.tagger_name == "Jim Williams":
789+
if tag.tagger_name == b"Jim Williams":
762790
# Omit this tag
763791
tag.skip()
764792
else:
@@ -788,14 +816,13 @@ An example of each:
788816

789817
### Using filter-repo as a library
790818

791-
git-filter-repo can also be imported as a library in Python, allowing
792-
for further flexibility. Some [simple
793-
examples](https://github.com/newren/git-filter-repo/tree/master/t/t9391)
794-
exist in the testsuite. For this to work, the symlink to
795-
git-filter-repo named git_filter_repo.py either needs to have been
796-
installed in your $PYTHONPATH, or you need to create a symlink to (or
797-
a copy of) git-filter-repo named git_filter_repo.py and stick it in
798-
your $PYTHONPATH.
819+
git-filter-repo can also be imported as a library in Python, allowing for
820+
further flexibility. [Both trivial and involved
821+
examples](contrib/filter-repo-demos) are provided for reference ([the
822+
testsuite](t/t9391) has a few more examples as well). For any of these
823+
examples to work, a symlink to (or copy of) git-filter-repo named
824+
git_filter_repo.py needs to be created, and the directory where this
825+
symlink (or copy) is found must be included in your $PYTHONPATH.
799826

800827

801828
# Internals
@@ -816,7 +843,7 @@ sequence that more accurately reflects what filter-repo runs is:
816843
1. Verify we're in a fresh clone
817844
1. `git fetch -u . refs/remotes/origin/*:refs/heads/*`
818845
1. `git remote rm origin`
819-
1. `git fast-export --show-original-ids --fake-missing-tagger --signed-tags=strip --tag-of-filtered-object=rewrite --use-done-feature --no-data --reencode=yes --all | filter | git fast-import --force --quiet`
846+
1. `git fast-export --show-original-ids --reference-excluded-parents --fake-missing-tagger --signed-tags=strip --tag-of-filtered-object=rewrite --use-done-feature --no-data --reencode=yes --all | filter | git fast-import --force --quiet`
820847
1. `git update-ref --no-deref --stdin`, fed with a list of refs to nuke, and a list of [replace refs](https://git-scm.com/docs/git-replace) to delete, create, or update.
821848
1. `git reset --hard`
822849
1. `git reflog expire --expire=now --all`
@@ -843,15 +870,10 @@ Some notes or exceptions on each of the above:
843870
be passed to fast-export. But when we don't need to work on blobs,
844871
passing `--no-data` speeds things up. Also, other flags may change
845872
the structure of the pipeline as well (e.g. `--dry-run` and `--debug`)
846-
1. Selection of files based on paths could cause every commit in the
847-
history of a branch or tag to be pruned, resulting in the branch or
848-
tag needing to be pruned. However, filter-repo just works by
849-
stripping out the 'commit' and 'tag' directives for each one that's
850-
not needed, meaning fast-import won't do the branch or tag deletion
851-
for us. So we do it in a post-processing step to ensure we avoid
852-
mixing old and new history. Also, we use this step to write replace
853-
refs for accessing the newly written commit hashes using their
854-
previous names.
873+
1. We use this step to write replace refs for accessing the newly written
874+
commit hashes using their previous names. Also, if refs were renamed
875+
by various steps, we need to delete the old refnames in order to avoid
876+
mixing old and new history.
855877
1. Users also have old versions of files in their working tree and index;
856878
we want those cleaned up to match the rewritten history as well. Note
857879
that this step is skipped in bare repos.
@@ -954,16 +976,17 @@ the user when it detects an issue:
954976
filter-repo.
955977

956978
* Partial-repo filtering does not mesh well with filter-repo's "avoid
957-
mixing old and new history" design. filter-repo has some capability in
958-
this area but it is undocumented, mostly untested, and may require
959-
multiple non-obvious flags to be set to make sane use of it. While
960-
there might be valid usecases for partial-repo filtering, the only ones
961-
I've run into in the wild are sidestepping filter-branch's insanely
962-
slow execution on commits that would not be changed by the filters in
963-
question anyway (which is largely irrelevant since filter-repo is
964-
multiple orders of magnitude faster), or to do operations better suited
965-
to git-rebase(1) and which rebase grew special options for years ago
966-
(e.g. the `--signoff` option).
979+
mixing old and new history" design. filter-repo has some capability
980+
in this area but it is intentionally underdocumented and mostly left
981+
for use by external scripts which import filter-repo as a module (some
982+
examples in contrib/filter-repo-demos/ do use this). The only real
983+
usecases I've seen for partial repo filtering, though, are
984+
sidestepping filter-branch's insanely slow execution on commits that
985+
would not be changed by the filters in question anyway (which is
986+
largely irrelevant since filter-repo is multiple orders of magnitude
987+
faster), or to do operations better suited to git-rebase(1) and which
988+
rebase grew special options for years ago (e.g. the `--signoff`
989+
option).
967990

968991
### Comments on reversibility
969992

git-filter-repo

+10-6
Original file line numberDiff line numberDiff line change
@@ -1653,7 +1653,7 @@ EXAMPLES
16531653
help=_("Specify several path filtering and renaming directives, one "
16541654
"per line. Lines with '==>' in them specify path renames, "
16551655
"and lines can begin with 'literal:' (the default), 'glob:', "
1656-
"or 'regex: ' to specify different matching styles"))
1656+
"or 'regex:' to specify different matching styles"))
16571657
helpers.add_argument('--subdirectory-filter', metavar='DIRECTORY',
16581658
action=FilteringOptions.HelperFilter, type=os.fsencode,
16591659
help=_("Only look at history that touches the given subdirectory "
@@ -1678,7 +1678,8 @@ EXAMPLES
16781678
help=_("Strip blobs (files) bigger than specified size (e.g. '5M', "
16791679
"'2G', etc)"))
16801680
contents.add_argument('--strip-blobs-with-ids', metavar='BLOB-ID-FILENAME',
1681-
help=_("Strip blob with the specified git object ids (hashes)"))
1681+
help=_("Read git object ids from each line of the given file, and "
1682+
"strip all of them from history"))
16821683

16831684
refrename = parser.add_argument_group(title=_("Renaming of refs "
16841685
"(see also --refname-callback)"))
@@ -1798,17 +1799,20 @@ EXAMPLES
17981799
misc.add_argument('--dry-run', action='store_true',
17991800
help=_("Do not change the repository. Run `git fast-export` and "
18001801
"filter its output, and save both the original and the "
1801-
"filtered version for comparison. Some filtering of empty "
1802-
"commits may not occur due to inability to query the "
1803-
"fast-import backend."))
1802+
"filtered version for comparison. This also disables "
1803+
"rewriting commit messages due to not knowing new commit "
1804+
"IDs and disables filtering of some empty commits due to "
1805+
"inability to query the fast-import backend." ))
18041806
misc.add_argument('--debug', action='store_true',
18051807
help=_("Print additional information about operations being "
18061808
"performed and commands being run. When used together "
18071809
"with --dry-run, also show extra information about what "
18081810
"would be run."))
18091811
misc.add_argument('--stdin', action='store_true',
18101812
help=_("Instead of running `git fast-export` and filtering its "
1811-
"output, filter the fast-export stream from stdin."))
1813+
"output, filter the fast-export stream from stdin. The "
1814+
"stdin must be in the expected input format (e.g. it needs "
1815+
"to include original-oid directives)."))
18121816
misc.add_argument('--quiet', action='store_true',
18131817
help=_("Pass --quiet to other git commands called"))
18141818
return parser

0 commit comments

Comments
 (0)