-
Notifications
You must be signed in to change notification settings - Fork 142
Fix REF_DELTA chain bug in 'git index-pack' #1906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix REF_DELTA chain bug in 'git index-pack' #1906
Conversation
When trying to demonstrate certain behavior in tests, it can be helpful to create packfiles that have specific delta structures. 'git pack-objects' uses various algorithms to select deltas based on their compression rates, but that does not always demonstrate all possible packfile shapes. This becomes especially important when wanting to test 'git index-pack' and its ability to parse certain pack shapes. We have prior art in t/lib-pack.sh, where certain delta structures are produced by manually writing certain opaque pack contents. However, producing these script updates is cumbersome and difficult to do as a contributor. Instead, create a new test-tool, 'test-tool pack-deltas', that reads a list of instructions for which objects to include in a packfile and how those objects should be written in delta form. At the moment, this only supports REF_DELTAs as those are the kinds of deltas needed to exercise a bug in 'git index-pack'. Signed-off-by: Derrick Stolee <[email protected]>
This new test demonstrates some behavior where a valid packfile is being rejected by the Git client due to the order in which it is resolving REF_DELTAs. The thin packfile has a REF_DELTA chain A->B->C where C is not included in the packfile. However, the client repository contains both C and B already. Thus, 'git index-pack' is able to resolve A before resolving B. When resolving B, it then attempts to resolve any other REF_DELTAs that are pointing to B as a base. This "revisits" A and complains as if there is a cycle, but it did not actually detect a cycle. A fix will arrive in the next change. Signed-off-by: Derrick Stolee <[email protected]>
As detailed in the previous changes to t5309-pack-delta-cycles.sh, the logic within 'git index-pack' to analyze an incoming thin packfile with REF_DELTAs is suspect. The algorithm is overly cautious around delta cycles, and that leads in fact to failing even when there is no cycle. This change adjusts the algorithm to no longer fail in these cases. In fact, these cycle cases will no longer fail but more importantly the valid cases will no longer fail, either. The resulting packfile from the --fix-thin operation will not have cycles either since REF_DELTAs are forbidden from the on-disk format and OFS_DELTAs are impossible to write as a cycle. The crux of the matter is how the algorithm works when the REF_DELTAs point to base objects that exist in the local repository. When reading the thin packfile, the object IDs for the delta objects are unknown so we do not have the delta chain structure automatically. Instead, we need to start somewhere by selecting a delta whose base is inside our current object database. Consider the case where the packfile has two REF_DELTA objects, A and B, and the delta chain looks like "A depends on B" and "B depends on C" for some third object C, where C is already in the current repository. The algorithm _should_ start with all objects that depend on C, finding B, and then moving on to all objects depending on B, finding A. However, if the repository also already has object B, then the delta chain can be analyzed in a different order. The deltas with base B can be analyzed first, finding A, and then the deltas with base C are analyzed, finding B. The algorithm currently continues to look for objects that depend on B, finding A again. This fails due to A's 'real_type' member already being overwritten from OBJ_REF_DELTA to the correct object type. This scenario is possible in a typical 'git fetch' where the client does not advertise B as a 'have' but requests A as a 'want' (and C is noticed as a common object based on other 'have's). The reason this isn't typically seen is that most Git servers use OFS_DELTAs to represent deltas within a packfile. However, if a server uses only REF_DELTAs, then this kind of issue can occur. There is nothing in the explicit packfile format that states this use of inter-pack REF_DELTA is incorrect, only that REF_DELTAs should not be used in the on-disk representation to avoid cycles. This die() was introduced in ab791dd (index-pack: fix race condition with duplicate bases, 2014-08-29). Several refactors have adjusted the error message and the surrounding logic, but this issue has existed for a longer time as that was only a conversion from an assert(). The tests in t5309 originated in 3b910d0 (add tests for indexing packs with delta cycles, 2013-08-23) and b2ef3d9 (test index-pack on packs with recoverable delta cycles, 2013-08-23). These changes make note that the current behavior of handling "resolvable" cycles is mostly a documentation-only test, not that this behavior is the best way for Git to handle the situation. The fix here is somewhat complicated due to the amount of state being adjusted by the loop within threaded_second_pass(). Instead of trying to resume the start of the loop while adjusting the necessary context, I chose to scan the REF_DELTAs depending on the current 'parent' and skip any that have already been processed. This necessarily leaves us in a state where 'child' and 'child_obj' could be left as NULL and that must be handled later. There is also some careful handling around skipping REF_DELTAs when there are also OFS_DELTAs depending on that parent. There may be value in extending 'test-tool pack-deltas' to allow writing OFS_DELTAs in order to exercise this logic across the delta types. Signed-off-by: Derrick Stolee <[email protected]>
/submit |
Submitted as [email protected] To fetch this version into
To fetch this version to local tag
|
@@ -821,6 +821,7 @@ TEST_BUILTINS_OBJS += test-mergesort.o | |||
TEST_BUILTINS_OBJS += test-mktemp.o |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Derrick Stolee via GitGitGadget" <[email protected]> writes:
> From: Derrick Stolee <[email protected]>
>
> When trying to demonstrate certain behavior in tests, it can be helpful
> to create packfiles that have specific delta structures. 'git
> pack-objects' uses various algorithms to select deltas based on their
> compression rates, but that does not always demonstrate all possible
> packfile shapes. This becomes especially important when wanting to test
> 'git index-pack' and its ability to parse certain pack shapes.
>
> We have prior art in t/lib-pack.sh, where certain delta structures are
> produced by manually writing certain opaque pack contents. However,
> producing these script updates is cumbersome and difficult to do as a
> contributor.
>
> Instead, create a new test-tool, 'test-tool pack-deltas', that reads a
> list of instructions for which objects to include in a packfile and how
> those objects should be written in delta form.
>
> At the moment, this only supports REF_DELTAs as those are the kinds of
> deltas needed to exercise a bug in 'git index-pack'.
Wonderful writing. I agree with the destination where this effort
wants to go, including the decision that starting with ref-delta
only is a good enough first step.
As to the implementation, I was a tiny little bit bummed to see
that, even though it does share the code with the real pack-objects
code paths to compute delta data by calling diff_delta(), and to
write per-object header by calling encode_in_pack_object_header(),
it has its own compression loop that does not even do an error
checking after calling into zlib deflate machinery.
Perhaps that is unavoidable due to the code structure of the
production code.
> +static const char usage_str[] = "test-tool pack-deltas <n>";
> ...
> +int cmd__pack_deltas(int argc, const char **argv)
> +{
> + int N;
> + struct hashfile *f;
> + struct strbuf line = STRBUF_INIT;
> +
> + if (argc != 2) {
> + usage(usage_str);
> + return -1;
> + }
> +
> + N = atoi(argv[1]);
It somewhat looks strange to see an uppercase N used as a variable
name. Together with the usage string, how about renaming "N" and
"n" after "number of objects", e.g.
test-tool pack-deltas <num-objects>
int num_objects;
or something?
Other than that, looking very good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 4/23/2025 3:26 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <[email protected]> writes:
>
>> From: Derrick Stolee <[email protected]>
>>
>> When trying to demonstrate certain behavior in tests, it can be helpful
>> to create packfiles that have specific delta structures. 'git
>> pack-objects' uses various algorithms to select deltas based on their
>> compression rates, but that does not always demonstrate all possible
>> packfile shapes. This becomes especially important when wanting to test
>> 'git index-pack' and its ability to parse certain pack shapes.
>>
>> We have prior art in t/lib-pack.sh, where certain delta structures are
>> produced by manually writing certain opaque pack contents. However,
>> producing these script updates is cumbersome and difficult to do as a
>> contributor.
>>
>> Instead, create a new test-tool, 'test-tool pack-deltas', that reads a
>> list of instructions for which objects to include in a packfile and how
>> those objects should be written in delta form.
>>
>> At the moment, this only supports REF_DELTAs as those are the kinds of
>> deltas needed to exercise a bug in 'git index-pack'.
>
> Wonderful writing. I agree with the destination where this effort
> wants to go, including the decision that starting with ref-delta
> only is a good enough first step.
>
> As to the implementation, I was a tiny little bit bummed to see
> that, even though it does share the code with the real pack-objects
> code paths to compute delta data by calling diff_delta(), and to
> write per-object header by calling encode_in_pack_object_header(),
> it has its own compression loop that does not even do an error
> checking after calling into zlib deflate machinery.
I could strengthen these options to help folks more quickly understand
potential failures as being part of the pack write instead of them
failing during the later pack read.
> Perhaps that is unavoidable due to the code structure of the
> production code.
I briefly considered extracting some code out of builtin/pack-objects.c
but it relies heavily on globals and context that I won't have in this
helper. I'm open to suggestions for how I can safely share more code,
but my initial attempt required too much refactoring to be worth it.
I am grateful for the amount of code from pack-write.c that I _was_
able to reuse.
>> +static const char usage_str[] = "test-tool pack-deltas <n>";
>> ...
>> +int cmd__pack_deltas(int argc, const char **argv)
>> +{
>> + int N;
>> + struct hashfile *f;
>> + struct strbuf line = STRBUF_INIT;
>> +
>> + if (argc != 2) {
>> + usage(usage_str);
>> + return -1;
>> + }
>> +
>> + N = atoi(argv[1]);
>
> It somewhat looks strange to see an uppercase N used as a variable
> name. Together with the usage string, how about renaming "N" and
> "n" after "number of objects", e.g.
>
> test-tool pack-deltas <num-objects>
> int num_objects;
>
> or something?
I definitely should have used a better name here. Thanks.
-Stolee
t/t5309-pack-delta-cycles.sh
Outdated
@@ -75,4 +75,30 @@ test_expect_success 'failover to a duplicate object in the same pack' ' | |||
test_must_fail git index-pack --fix-thin --stdin <recoverable.pack |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Derrick Stolee via GitGitGadget" <[email protected]> writes:
> From: Derrick Stolee <[email protected]>
>
> This new test demonstrates some behavior where a valid packfile is being
> rejected by the Git client due to the order in which it is resolving
> REF_DELTAs.
>
> The thin packfile has a REF_DELTA chain A->B->C where C is not included
> in the packfile. However, the client repository contains both C and B
> already. Thus, 'git index-pack' is able to resolve A before resolving B.
In order to reconstitute A, B is needed (which recipient has), and
in order to reconstitute B, C is needed (which recipient also has).
The index-pack sees delta based on B to recreate A; it should be
able to reconstitute A using B that already exists.
OK.
> When resolving B, it then attempts to resolve any other REF_DELTAs that
> are pointing to B as a base. This "revisits" A and complains as if there
> is a cycle, but it did not actually detect a cycle.
That's interesting.
> +test_expect_failure 'index-pack works with thin pack A->B->C with B on disk' '
> + git init server &&
> + (
> + cd server &&
> + test_commit_bulk 4
> + ) &&
> +
> + A=$(git -C server rev-parse HEAD^{tree}) &&
> + B=$(git -C server rev-parse HEAD~1^{tree}) &&
> + C=$(git -C server rev-parse HEAD~2^{tree}) &&
> + git -C server reset --hard HEAD~1 &&
> +
> + cat >in <<-EOF &&
> + REF_DELTA $A $B
> + REF_DELTA $B $C
> + EOF
> +
> + test-tool -C server pack-deltas 2 <in >thin.pack &&
This is minor, but I somehow find it easier to follow without the
temporary file, i.e.
test-tool -C server pack-deltas 2 >thin.pack <<-EOF &&
REF_DELTA $A $B
REF_DELTA $B $C
EOF
> + git clone "file://$(pwd)/server" client &&
This truly loses A from the resulting "client" repository, but the
history still can reach B and C.
> + (
> + cd client &&
> + git index-pack --fix-thin --stdin <../thin.pack
> + )
> +'
Makes sense.
This patch series was integrated into seen via git@43235db. |
@@ -821,6 +821,7 @@ TEST_BUILTINS_OBJS += test-mergesort.o | |||
TEST_BUILTINS_OBJS += test-mktemp.o |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
I needed this to make
$ SANITIZE=leak GIT_TEST_PASSING_SANITIZE_LEAK=true make
$ cd t && sh t5309-pack-delta-cycles.sh
pass.
--- >8 ------ >8 ------ >8 ---
Subject: [PATCH] fixup! test-tool: add pack-deltas helper
t/helper/test-pack-deltas.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
index db7d1c3cd1..c8e837ea06 100644
--- a/t/helper/test-pack-deltas.c
+++ b/t/helper/test-pack-deltas.c
@@ -122,6 +122,7 @@ int cmd__pack_deltas(int argc, const char **argv)
if (get_oid_hex(base_oid_str, &base_oid))
die("invalid object: %s", base_oid_str);
}
+ string_list_clear(&items, 0);
if (!strcmp(type_str, "REF_DELTA"))
write_ref_delta(f, &content_oid, &base_oid);
--
2.49.0-555-g43235db9c8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 4/24/2025 3:41 PM, Junio C Hamano wrote:
> I needed this to make
>
> $ SANITIZE=leak GIT_TEST_PASSING_SANITIZE_LEAK=true make
> $ cd t && sh t5309-pack-delta-cycles.sh
>
> pass.
> --- >8 ------ >8 ------ >8 ---
> Subject: [PATCH] fixup! test-tool: add pack-deltas helper
>
> t/helper/test-pack-deltas.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
> index db7d1c3cd1..c8e837ea06 100644
> --- a/t/helper/test-pack-deltas.c
> +++ b/t/helper/test-pack-deltas.c
> @@ -122,6 +122,7 @@ int cmd__pack_deltas(int argc, const char **argv)
> if (get_oid_hex(base_oid_str, &base_oid))
> die("invalid object: %s", base_oid_str);
> }
> + string_list_clear(&items, 0);
Thanks. I'll make sure to apply it. My GGG PR validation was broken
top-to-bottom due to other environmental issues so I had not seen
this failure myself.
Thanks,
-Stolee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Derrick Stolee <[email protected]> writes:
> On 4/24/2025 3:41 PM, Junio C Hamano wrote:
>> I needed this to make
>>
>> $ SANITIZE=leak GIT_TEST_PASSING_SANITIZE_LEAK=true make
>> $ cd t && sh t5309-pack-delta-cycles.sh
>>
>> pass.
>> --- >8 ------ >8 ------ >8 ---
>> Subject: [PATCH] fixup! test-tool: add pack-deltas helper
>>
>> t/helper/test-pack-deltas.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
>> index db7d1c3cd1..c8e837ea06 100644
>> --- a/t/helper/test-pack-deltas.c
>> +++ b/t/helper/test-pack-deltas.c
>> @@ -122,6 +122,7 @@ int cmd__pack_deltas(int argc, const char **argv)
>> if (get_oid_hex(base_oid_str, &base_oid))
>> die("invalid object: %s", base_oid_str);
>> }
>> + string_list_clear(&items, 0);
>
> Thanks. I'll make sure to apply it. My GGG PR validation was broken
> top-to-bottom due to other environmental issues so I had not seen
> this failure myself.
I squashed this in so unless there are other things you need to
change, this alone does not make it necessary to reroll the series.
Thanks.
@@ -1109,8 +1109,8 @@ static void *threaded_second_pass(void *data) | |||
set_thread_data(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Derrick Stolee via GitGitGadget" <[email protected]> writes:
> From: Derrick Stolee <[email protected]>
>
> As detailed in the previous changes to t5309-pack-delta-cycles.sh, the
> logic within 'git index-pack' to analyze an incoming thin packfile with
> REF_DELTAs is suspect. The algorithm is overly cautious around delta
> cycles, and that leads in fact to failing even when there is no cycle.
>
> This change adjusts the algorithm to no longer fail in these cases. In
> fact, these cycle cases will no longer fail but more importantly the
> valid cases will no longer fail, either. The resulting packfile from the
> --fix-thin operation will not have cycles either since REF_DELTAs are
> forbidden from the on-disk format and OFS_DELTAs are impossible to write
> as a cycle.
Loosening cycle avoidance indeed is worrysome. How do we guarantee
"since REF_DELTAs are forbidden from the on-disk format" (it is
obvious with OFS_DELTA only you cannot form a cycle)? By code
inspection and in-code comment for the code path of "--fix-thin"?
I think the most interesting question would be how, with the
loosening of the cycle check in this patch, we would still protect
against a malicious on-the-wire packdata that has a cycle in it
already, where deltas cannot be resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Derrick Stolee wrote (reply to this):
On 4/24/2025 5:41 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <[email protected]> writes:
>
>> From: Derrick Stolee <[email protected]>
>>
>> As detailed in the previous changes to t5309-pack-delta-cycles.sh, the
>> logic within 'git index-pack' to analyze an incoming thin packfile with
>> REF_DELTAs is suspect. The algorithm is overly cautious around delta
>> cycles, and that leads in fact to failing even when there is no cycle.
>>
>> This change adjusts the algorithm to no longer fail in these cases. In
>> fact, these cycle cases will no longer fail but more importantly the
>> valid cases will no longer fail, either. The resulting packfile from the
>> --fix-thin operation will not have cycles either since REF_DELTAs are
>> forbidden from the on-disk format and OFS_DELTAs are impossible to write
>> as a cycle.
>
> Loosening cycle avoidance indeed is worrysome. How do we guarantee
> "since REF_DELTAs are forbidden from the on-disk format" (it is
> obvious with OFS_DELTA only you cannot form a cycle)? By code
> inspection and in-code comment for the code path of "--fix-thin"?
>
> I think the most interesting question would be how, with the
> loosening of the cycle check in this patch, we would still protect
> against a malicious on-the-wire packdata that has a cycle in it
> already, where deltas cannot be resolved.
The good news is two-fold:
1. We have a test in t5309 (index-pack detects REF_DELTA cycles) that
checks this case and did not change behavior with this patch.
2. If we don't already have any of the objects that exist in the cycle,
then we can't _start_ expanding the objects to their full contents
as we don't have their bases. So what is left is a list of REF_DELTAs
that have not been processed but also a list of base object IDs that
we can't load. We'll fail without writing those objects to disk.
Does that satisfy your concerns in this space?
Thanks,
-Stolee
This branch is now known as |
This patch series was integrated into seen via git@089a0e4. |
This patch series was integrated into seen via git@9a5a794. |
@@ -821,6 +821,7 @@ TEST_BUILTINS_OBJS += test-mergesort.o | |||
TEST_BUILTINS_OBJS += test-mktemp.o |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Patrick Steinhardt wrote (reply to this):
On Wed, Apr 23, 2025 at 05:40:02PM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
> new file mode 100644
> index 00000000000..db7d1c3cd1f
> --- /dev/null
> +++ b/t/helper/test-pack-deltas.c
> @@ -0,0 +1,140 @@
[snip]
> +int cmd__pack_deltas(int argc, const char **argv)
> +{
> + int N;
> + struct hashfile *f;
> + struct strbuf line = STRBUF_INIT;
> +
> + if (argc != 2) {
> + usage(usage_str);
> + return -1;
> + }
> +
> + N = atoi(argv[1]);
Is there a reason why we don't use `parse_options()` here? It might make
this tool easier to use and extend going forward, and we wouldn't have
to care about invalid arguments. Right now, we silently accept a
non-integer argument and do the wrong thing.
Patrick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Patrick,
On Fri, 25 Apr 2025, Patrick Steinhardt wrote:
> On Wed, Apr 23, 2025 at 05:40:02PM +0000, Derrick Stolee via GitGitGadget wrote:
> > diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
> > new file mode 100644
> > index 00000000000..db7d1c3cd1f
> > --- /dev/null
> > +++ b/t/helper/test-pack-deltas.c
> > @@ -0,0 +1,140 @@
> [snip]
> > +int cmd__pack_deltas(int argc, const char **argv)
> > +{
> > + int N;
> > + struct hashfile *f;
> > + struct strbuf line = STRBUF_INIT;
> > +
> > + if (argc != 2) {
> > + usage(usage_str);
> > + return -1;
> > + }
> > +
> > + N = atoi(argv[1]);
>
> Is there a reason why we don't use `parse_options()` here? It might make
> this tool easier to use and extend going forward, and we wouldn't have
> to care about invalid arguments. Right now, we silently accept a
> non-integer argument and do the wrong thing.
I think that `parse_options()` would be overkill here because:
- This is a _mandatory_ argument, not an optional one.
- The required data type is `uint32_t`, and `parse_options()` has no
support for that.
But you do have a good point in that we may want to validate the data type
(even if technically, this is not a user-facing program, it's a test
helper that is used under tight control by Git's own test suite).
Consequently, I would suggest this fixup instead:
-- snipsnap --
Subject: [PATCH] fixup! test-tool: add pack-deltas helper
Let's make the command-line parsing a bit more stringent. We _could_
use `parse_options()`, but that would be overkill for a single,
non-optional argument. Besides, it would not bring any benefit, as the
parsed value needs to fit in the `uint32_t` type, and `parse_options()`
has no provision to ensure that.
Signed-off-by: Johannes Schindelin <[email protected]>
---
t/helper/test-pack-deltas.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
index 4af69bdc05d3..f95d8ee16768 100644
--- a/t/helper/test-pack-deltas.c
+++ b/t/helper/test-pack-deltas.c
@@ -8,11 +8,12 @@
#include "hex.h"
#include "pack.h"
#include "pack-objects.h"
+#include "parse.h"
#include "setup.h"
#include "strbuf.h"
#include "string-list.h"
-static const char usage_str[] = "test-tool pack-deltas <n>";
+static const char usage_str[] = "test-tool pack-deltas <nr_entries>";
static unsigned long do_compress(void **pptr, unsigned long size)
{
@@ -79,7 +80,7 @@ static void write_ref_delta(struct hashfile *f,
int cmd__pack_deltas(int argc, const char **argv)
{
- int N;
+ unsigned long n;
struct hashfile *f;
struct strbuf line = STRBUF_INIT;
@@ -88,12 +89,13 @@ int cmd__pack_deltas(int argc, const char **argv)
return -1;
}
- N = atoi(argv[1]);
+ if (!git_parse_ulong(argv[1], &n) || n != (uint32_t)n)
+ die("invalid number of objects: %s", argv[1]);
setup_git_directory();
f = hashfd(1, "<stdout>");
- write_pack_header(f, N);
+ write_pack_header(f, n);
/* Read each line from stdin into 'line' */
while (strbuf_getline_lf(&line, stdin) != EOF) {
--
2.49.0.windows.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Patrick Steinhardt wrote (reply to this):
On Fri, Apr 25, 2025 at 11:34:01AM +0200, Johannes Schindelin wrote:
> Hi Patrick,
>
> On Fri, 25 Apr 2025, Patrick Steinhardt wrote:
>
> > On Wed, Apr 23, 2025 at 05:40:02PM +0000, Derrick Stolee via GitGitGadget wrote:
> > > diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
> > > new file mode 100644
> > > index 00000000000..db7d1c3cd1f
> > > --- /dev/null
> > > +++ b/t/helper/test-pack-deltas.c
> > > @@ -0,0 +1,140 @@
> > [snip]
> > > +int cmd__pack_deltas(int argc, const char **argv)
> > > +{
> > > + int N;
> > > + struct hashfile *f;
> > > + struct strbuf line = STRBUF_INIT;
> > > +
> > > + if (argc != 2) {
> > > + usage(usage_str);
> > > + return -1;
> > > + }
> > > +
> > > + N = atoi(argv[1]);
> >
> > Is there a reason why we don't use `parse_options()` here? It might make
> > this tool easier to use and extend going forward, and we wouldn't have
> > to care about invalid arguments. Right now, we silently accept a
> > non-integer argument and do the wrong thing.
>
> I think that `parse_options()` would be overkill here because:
>
> - This is a _mandatory_ argument, not an optional one.
>
> - The required data type is `uint32_t`, and `parse_options()` has no
> support for that.
Support for that has been merged just this week via 2bc5414c411 (Merge
branch 'ps/parse-options-integers', 2025-04-24).
> But you do have a good point in that we may want to validate the data type
> (even if technically, this is not a user-facing program, it's a test
> helper that is used under tight control by Git's own test suite).
>
> Consequently, I would suggest this fixup instead:
But in any case, I'd be equally fine with your suggestion.
Patrick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Patrick,
On Fri, 25 Apr 2025, Patrick Steinhardt wrote:
> On Fri, Apr 25, 2025 at 11:34:01AM +0200, Johannes Schindelin wrote:
> >
> > On Fri, 25 Apr 2025, Patrick Steinhardt wrote:
> >
> > > On Wed, Apr 23, 2025 at 05:40:02PM +0000, Derrick Stolee via GitGitGadget wrote:
> > > > diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
> > > > new file mode 100644
> > > > index 00000000000..db7d1c3cd1f
> > > > --- /dev/null
> > > > +++ b/t/helper/test-pack-deltas.c
> > > > @@ -0,0 +1,140 @@
> > > [snip]
> > > > +int cmd__pack_deltas(int argc, const char **argv)
> > > > +{
> > > > + int N;
> > > > + struct hashfile *f;
> > > > + struct strbuf line = STRBUF_INIT;
> > > > +
> > > > + if (argc != 2) {
> > > > + usage(usage_str);
> > > > + return -1;
> > > > + }
> > > > +
> > > > + N = atoi(argv[1]);
> > >
> > > Is there a reason why we don't use `parse_options()` here? It might make
> > > this tool easier to use and extend going forward, and we wouldn't have
> > > to care about invalid arguments. Right now, we silently accept a
> > > non-integer argument and do the wrong thing.
> >
> > I think that `parse_options()` would be overkill here because:
> >
> > - This is a _mandatory_ argument, not an optional one.
> >
> > - The required data type is `uint32_t`, and `parse_options()` has no
> > support for that.
>
> Support for that has been merged just this week via 2bc5414c411 (Merge
> branch 'ps/parse-options-integers', 2025-04-24).
That's too new for me to be useful, as I have to work on top of v2.49.0
(see https://github.com/microsoft/git/pull/745).
> > But you do have a good point in that we may want to validate the data type
> > (even if technically, this is not a user-facing program, it's a test
> > helper that is used under tight control by Git's own test suite).
> >
> > Consequently, I would suggest this fixup instead:
>
> But in any case, I'd be equally fine with your suggestion.
Thanks!
Johannes
User |
User |
This is an early version of work already under review upstream: gitgitgadget#1906
When fetching content from a remote, 'git index-pack' processes the packfile content, storing a packfile appropriate for on-disk storage and a pack-index helping to perform random-access into that packfile. To help with compression, the packfile sent over the wire can use REF_DELTAs in addition to OFS_DELTAs to refer to objects that are already known to exist in the client's repository. REF_DELTAs can also refer to objects within the packfile, though this is not typically done.
Because this inter-pack REF_DELTA is not a typical data shape, a latent bug has been waiting that causes 'git index-pack' to die() even on legitimate packfile content that it could resolve.
This series resolves this problem while also creating a test helper for constructing packfiles with specific objects represented in specific types of deltas and in a given order. This should make it easier to create test cases like this in the future instead of updating
t/lib-pack.sh
through other means.Thanks,
-Stolee
cc: [email protected]
cc: [email protected]
cc: Patrick Steinhardt [email protected]
cc: Johannes Schindelin [email protected]