Skip to content

Fix REF_DELTA chain bug in 'git index-pack' #1906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

derrickstolee
Copy link

@derrickstolee derrickstolee commented Apr 23, 2025

When fetching content from a remote, 'git index-pack' processes the packfile content, storing a packfile appropriate for on-disk storage and a pack-index helping to perform random-access into that packfile. To help with compression, the packfile sent over the wire can use REF_DELTAs in addition to OFS_DELTAs to refer to objects that are already known to exist in the client's repository. REF_DELTAs can also refer to objects within the packfile, though this is not typically done.

Because this inter-pack REF_DELTA is not a typical data shape, a latent bug has been waiting that causes 'git index-pack' to die() even on legitimate packfile content that it could resolve.

This series resolves this problem while also creating a test helper for constructing packfiles with specific objects represented in specific types of deltas and in a given order. This should make it easier to create test cases like this in the future instead of updating t/lib-pack.sh through other means.

Thanks,
-Stolee

cc: [email protected]
cc: [email protected]
cc: Patrick Steinhardt [email protected]
cc: Johannes Schindelin [email protected]

When trying to demonstrate certain behavior in tests, it can be helpful
to create packfiles that have specific delta structures. 'git
pack-objects' uses various algorithms to select deltas based on their
compression rates, but that does not always demonstrate all possible
packfile shapes. This becomes especially important when wanting to test
'git index-pack' and its ability to parse certain pack shapes.

We have prior art in t/lib-pack.sh, where certain delta structures are
produced by manually writing certain opaque pack contents. However,
producing these script updates is cumbersome and difficult to do as a
contributor.

Instead, create a new test-tool, 'test-tool pack-deltas', that reads a
list of instructions for which objects to include in a packfile and how
those objects should be written in delta form.

At the moment, this only supports REF_DELTAs as those are the kinds of
deltas needed to exercise a bug in 'git index-pack'.

Signed-off-by: Derrick Stolee <[email protected]>
This new test demonstrates some behavior where a valid packfile is being
rejected by the Git client due to the order in which it is resolving
REF_DELTAs.

The thin packfile has a REF_DELTA chain A->B->C where C is not included
in the packfile. However, the client repository contains both C and B
already. Thus, 'git index-pack' is able to resolve A before resolving B.

When resolving B, it then attempts to resolve any other REF_DELTAs that
are pointing to B as a base. This "revisits" A and complains as if there
is a cycle, but it did not actually detect a cycle.

A fix will arrive in the next change.

Signed-off-by: Derrick Stolee <[email protected]>
As detailed in the previous changes to t5309-pack-delta-cycles.sh, the
logic within 'git index-pack' to analyze an incoming thin packfile with
REF_DELTAs is suspect. The algorithm is overly cautious around delta
cycles, and that leads in fact to failing even when there is no cycle.

This change adjusts the algorithm to no longer fail in these cases. In
fact, these cycle cases will no longer fail but more importantly the
valid cases will no longer fail, either. The resulting packfile from the
--fix-thin operation will not have cycles either since REF_DELTAs are
forbidden from the on-disk format and OFS_DELTAs are impossible to write
as a cycle.

The crux of the matter is how the algorithm works when the REF_DELTAs
point to base objects that exist in the local repository. When reading
the thin packfile, the object IDs for the delta objects are unknown so
we do not have the delta chain structure automatically. Instead, we need
to start somewhere by selecting a delta whose base is inside our current
object database.

Consider the case where the packfile has two REF_DELTA objects, A and B,
and the delta chain looks like "A depends on B" and "B depends on C" for
some third object C, where C is already in the current repository. The
algorithm _should_ start with all objects that depend on C, finding B,
and then moving on to all objects depending on B, finding A.

However, if the repository also already has object B, then the delta
chain can be analyzed in a different order. The deltas with base B can
be analyzed first, finding A, and then the deltas with base C are
analyzed, finding B. The algorithm currently continues to look for
objects that depend on B, finding A again. This fails due to A's
'real_type' member already being overwritten from OBJ_REF_DELTA to the
correct object type.

This scenario is possible in a typical 'git fetch' where the client does
not advertise B as a 'have' but requests A as a 'want' (and C is noticed
as a common object based on other 'have's). The reason this isn't
typically seen is that most Git servers use OFS_DELTAs to represent
deltas within a packfile. However, if a server uses only REF_DELTAs,
then this kind of issue can occur. There is nothing in the explicit
packfile format that states this use of inter-pack REF_DELTA is
incorrect, only that REF_DELTAs should not be used in the on-disk
representation to avoid cycles.

This die() was introduced in ab791dd (index-pack: fix race condition
with duplicate bases, 2014-08-29). Several refactors have adjusted the
error message and the surrounding logic, but this issue has existed for
a longer time as that was only a conversion from an assert().

The tests in t5309 originated in 3b910d0 (add tests for indexing
packs with delta cycles, 2013-08-23) and b2ef3d9 (test index-pack on
packs with recoverable delta cycles, 2013-08-23). These changes make
note that the current behavior of handling "resolvable" cycles is mostly
a documentation-only test, not that this behavior is the best way for
Git to handle the situation.

The fix here is somewhat complicated due to the amount of state being
adjusted by the loop within threaded_second_pass(). Instead of trying to
resume the start of the loop while adjusting the necessary context, I
chose to scan the REF_DELTAs depending on the current 'parent' and skip
any that have already been processed. This necessarily leaves us in a
state where 'child' and 'child_obj' could be left as NULL and that must
be handled later. There is also some careful handling around skipping
REF_DELTAs when there are also OFS_DELTAs depending on that parent.
There may be value in extending 'test-tool pack-deltas' to allow writing
OFS_DELTAs in order to exercise this logic across the delta types.

Signed-off-by: Derrick Stolee <[email protected]>
@derrickstolee derrickstolee self-assigned this Apr 23, 2025
@derrickstolee
Copy link
Author

/submit

Copy link

gitgitgadget bot commented Apr 23, 2025

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1906/derrickstolee/index-pack-ref-deltas-v1

To fetch this version to local tag pr-1906/derrickstolee/index-pack-ref-deltas-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1906/derrickstolee/index-pack-ref-deltas-v1

@@ -821,6 +821,7 @@ TEST_BUILTINS_OBJS += test-mergesort.o
TEST_BUILTINS_OBJS += test-mktemp.o
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <[email protected]> writes:

> From: Derrick Stolee <[email protected]>
>
> When trying to demonstrate certain behavior in tests, it can be helpful
> to create packfiles that have specific delta structures. 'git
> pack-objects' uses various algorithms to select deltas based on their
> compression rates, but that does not always demonstrate all possible
> packfile shapes. This becomes especially important when wanting to test
> 'git index-pack' and its ability to parse certain pack shapes.
>
> We have prior art in t/lib-pack.sh, where certain delta structures are
> produced by manually writing certain opaque pack contents. However,
> producing these script updates is cumbersome and difficult to do as a
> contributor.
>
> Instead, create a new test-tool, 'test-tool pack-deltas', that reads a
> list of instructions for which objects to include in a packfile and how
> those objects should be written in delta form.
>
> At the moment, this only supports REF_DELTAs as those are the kinds of
> deltas needed to exercise a bug in 'git index-pack'.

Wonderful writing.  I agree with the destination where this effort
wants to go, including the decision that starting with ref-delta
only is a good enough first step.

As to the implementation, I was a tiny little bit bummed to see
that, even though it does share the code with the real pack-objects
code paths to compute delta data by calling diff_delta(), and to
write per-object header by calling encode_in_pack_object_header(),
it has its own compression loop that does not even do an error
checking after calling into zlib deflate machinery.

Perhaps that is unavoidable due to the code structure of the
production code.

> +static const char usage_str[] = "test-tool pack-deltas <n>";
> ...
> +int cmd__pack_deltas(int argc, const char **argv)
> +{
> +	int N;
> +	struct hashfile *f;
> +	struct strbuf line = STRBUF_INIT;
> +
> +	if (argc != 2) {
> +		usage(usage_str);
> +		return -1;
> +	}
> +
> +	N = atoi(argv[1]);

It somewhat looks strange to see an uppercase N used as a variable
name.  Together with the usage string, how about renaming "N" and
"n" after "number of objects", e.g.

	test-tool pack-deltas <num-objects>
	int num_objects;

or something?

Other than that, looking very good.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 4/23/2025 3:26 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <[email protected]> writes:
> 
>> From: Derrick Stolee <[email protected]>
>>
>> When trying to demonstrate certain behavior in tests, it can be helpful
>> to create packfiles that have specific delta structures. 'git
>> pack-objects' uses various algorithms to select deltas based on their
>> compression rates, but that does not always demonstrate all possible
>> packfile shapes. This becomes especially important when wanting to test
>> 'git index-pack' and its ability to parse certain pack shapes.
>>
>> We have prior art in t/lib-pack.sh, where certain delta structures are
>> produced by manually writing certain opaque pack contents. However,
>> producing these script updates is cumbersome and difficult to do as a
>> contributor.
>>
>> Instead, create a new test-tool, 'test-tool pack-deltas', that reads a
>> list of instructions for which objects to include in a packfile and how
>> those objects should be written in delta form.
>>
>> At the moment, this only supports REF_DELTAs as those are the kinds of
>> deltas needed to exercise a bug in 'git index-pack'.
> 
> Wonderful writing.  I agree with the destination where this effort
> wants to go, including the decision that starting with ref-delta
> only is a good enough first step.
> 
> As to the implementation, I was a tiny little bit bummed to see
> that, even though it does share the code with the real pack-objects
> code paths to compute delta data by calling diff_delta(), and to
> write per-object header by calling encode_in_pack_object_header(),
> it has its own compression loop that does not even do an error
> checking after calling into zlib deflate machinery.

I could strengthen these options to help folks more quickly understand
potential failures as being part of the pack write instead of them
failing during the later pack read.

> Perhaps that is unavoidable due to the code structure of the
> production code.

I briefly considered extracting some code out of builtin/pack-objects.c
but it relies heavily on globals and context that I won't have in this
helper. I'm open to suggestions for how I can safely share more code,
but my initial attempt required too much refactoring to be worth it.

I am grateful for the amount of code from pack-write.c that I _was_
able to reuse.

>> +static const char usage_str[] = "test-tool pack-deltas <n>";
>> ...
>> +int cmd__pack_deltas(int argc, const char **argv)
>> +{
>> +	int N;
>> +	struct hashfile *f;
>> +	struct strbuf line = STRBUF_INIT;
>> +
>> +	if (argc != 2) {
>> +		usage(usage_str);
>> +		return -1;
>> +	}
>> +
>> +	N = atoi(argv[1]);
> 
> It somewhat looks strange to see an uppercase N used as a variable
> name.  Together with the usage string, how about renaming "N" and
> "n" after "number of objects", e.g.
> 
> 	test-tool pack-deltas <num-objects>
> 	int num_objects;
> 
> or something?

I definitely should have used a better name here. Thanks.

-Stolee

@@ -75,4 +75,30 @@ test_expect_success 'failover to a duplicate object in the same pack' '
test_must_fail git index-pack --fix-thin --stdin <recoverable.pack
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <[email protected]> writes:

> From: Derrick Stolee <[email protected]>
>
> This new test demonstrates some behavior where a valid packfile is being
> rejected by the Git client due to the order in which it is resolving
> REF_DELTAs.
>
> The thin packfile has a REF_DELTA chain A->B->C where C is not included
> in the packfile. However, the client repository contains both C and B
> already. Thus, 'git index-pack' is able to resolve A before resolving B.

In order to reconstitute A, B is needed (which recipient has), and
in order to reconstitute B, C is needed (which recipient also has).

The index-pack sees delta based on B to recreate A; it should be
able to reconstitute A using B that already exists.

OK.

> When resolving B, it then attempts to resolve any other REF_DELTAs that
> are pointing to B as a base. This "revisits" A and complains as if there
> is a cycle, but it did not actually detect a cycle.

That's interesting.

> +test_expect_failure 'index-pack works with thin pack A->B->C with B on disk' '
> +	git init server &&
> +	(
> +		cd server &&
> +		test_commit_bulk 4
> +	) &&
> +
> +	A=$(git -C server rev-parse HEAD^{tree}) &&
> +	B=$(git -C server rev-parse HEAD~1^{tree}) &&
> +	C=$(git -C server rev-parse HEAD~2^{tree}) &&
> +	git -C server reset --hard HEAD~1 &&
> +
> +	cat >in <<-EOF &&
> +	REF_DELTA $A $B
> +	REF_DELTA $B $C
> +	EOF
> +
> +	test-tool -C server pack-deltas 2 <in >thin.pack &&

This is minor, but I somehow find it easier to follow without the
temporary file, i.e.

	test-tool -C server pack-deltas 2 >thin.pack <<-EOF &&
	REF_DELTA $A $B
	REF_DELTA $B $C
	EOF

> +	git clone "file://$(pwd)/server" client &&

This truly loses A from the resulting "client" repository, but the
history still can reach B and C.

> +	(
> +		cd client &&
> +		git index-pack --fix-thin --stdin <../thin.pack
> +	)
> +'

Makes sense.

Copy link

gitgitgadget bot commented Apr 23, 2025

This patch series was integrated into seen via git@43235db.

@@ -821,6 +821,7 @@ TEST_BUILTINS_OBJS += test-mergesort.o
TEST_BUILTINS_OBJS += test-mktemp.o
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

I needed this to make

$ SANITIZE=leak GIT_TEST_PASSING_SANITIZE_LEAK=true make
$ cd t && sh t5309-pack-delta-cycles.sh

pass.
--- >8 ------ >8 ------ >8 ---
Subject: [PATCH] fixup! test-tool: add pack-deltas helper

 t/helper/test-pack-deltas.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
index db7d1c3cd1..c8e837ea06 100644
--- a/t/helper/test-pack-deltas.c
+++ b/t/helper/test-pack-deltas.c
@@ -122,6 +122,7 @@ int cmd__pack_deltas(int argc, const char **argv)
 			if (get_oid_hex(base_oid_str, &base_oid))
 				die("invalid object: %s", base_oid_str);
 		}
+		string_list_clear(&items, 0);
 
 		if (!strcmp(type_str, "REF_DELTA"))
 			write_ref_delta(f, &content_oid, &base_oid);
-- 
2.49.0-555-g43235db9c8

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 4/24/2025 3:41 PM, Junio C Hamano wrote:
> I needed this to make
> 
> $ SANITIZE=leak GIT_TEST_PASSING_SANITIZE_LEAK=true make
> $ cd t && sh t5309-pack-delta-cycles.sh
> 
> pass.
> --- >8 ------ >8 ------ >8 ---
> Subject: [PATCH] fixup! test-tool: add pack-deltas helper
> 
>  t/helper/test-pack-deltas.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
> index db7d1c3cd1..c8e837ea06 100644
> --- a/t/helper/test-pack-deltas.c
> +++ b/t/helper/test-pack-deltas.c
> @@ -122,6 +122,7 @@ int cmd__pack_deltas(int argc, const char **argv)
>  			if (get_oid_hex(base_oid_str, &base_oid))
>  				die("invalid object: %s", base_oid_str);
>  		}
> +		string_list_clear(&items, 0);

Thanks. I'll make sure to apply it. My GGG PR validation was broken
top-to-bottom due to other environmental issues so I had not seen
this failure myself.

Thanks,
-Stolee

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Derrick Stolee <[email protected]> writes:

> On 4/24/2025 3:41 PM, Junio C Hamano wrote:
>> I needed this to make
>> 
>> $ SANITIZE=leak GIT_TEST_PASSING_SANITIZE_LEAK=true make
>> $ cd t && sh t5309-pack-delta-cycles.sh
>> 
>> pass.
>> --- >8 ------ >8 ------ >8 ---
>> Subject: [PATCH] fixup! test-tool: add pack-deltas helper
>> 
>>  t/helper/test-pack-deltas.c | 1 +
>>  1 file changed, 1 insertion(+)
>> 
>> diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
>> index db7d1c3cd1..c8e837ea06 100644
>> --- a/t/helper/test-pack-deltas.c
>> +++ b/t/helper/test-pack-deltas.c
>> @@ -122,6 +122,7 @@ int cmd__pack_deltas(int argc, const char **argv)
>>  			if (get_oid_hex(base_oid_str, &base_oid))
>>  				die("invalid object: %s", base_oid_str);
>>  		}
>> +		string_list_clear(&items, 0);
>
> Thanks. I'll make sure to apply it. My GGG PR validation was broken
> top-to-bottom due to other environmental issues so I had not seen
> this failure myself.

I squashed this in so unless there are other things you need to
change, this alone does not make it necessary to reroll the series.

Thanks.

@@ -1109,8 +1109,8 @@ static void *threaded_second_pass(void *data)
set_thread_data(data);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <[email protected]> writes:

> From: Derrick Stolee <[email protected]>
>
> As detailed in the previous changes to t5309-pack-delta-cycles.sh, the
> logic within 'git index-pack' to analyze an incoming thin packfile with
> REF_DELTAs is suspect. The algorithm is overly cautious around delta
> cycles, and that leads in fact to failing even when there is no cycle.
>
> This change adjusts the algorithm to no longer fail in these cases. In
> fact, these cycle cases will no longer fail but more importantly the
> valid cases will no longer fail, either. The resulting packfile from the
> --fix-thin operation will not have cycles either since REF_DELTAs are
> forbidden from the on-disk format and OFS_DELTAs are impossible to write
> as a cycle.

Loosening cycle avoidance indeed is worrysome.  How do we guarantee
"since REF_DELTAs are forbidden from the on-disk format" (it is
obvious with OFS_DELTA only you cannot form a cycle)?  By code
inspection and in-code comment for the code path of "--fix-thin"?

I think the most interesting question would be how, with the
loosening of the cycle check in this patch, we would still protect
against a malicious on-the-wire packdata that has a cycle in it
already, where deltas cannot be resolved.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 4/24/2025 5:41 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <[email protected]> writes:
> 
>> From: Derrick Stolee <[email protected]>
>>
>> As detailed in the previous changes to t5309-pack-delta-cycles.sh, the
>> logic within 'git index-pack' to analyze an incoming thin packfile with
>> REF_DELTAs is suspect. The algorithm is overly cautious around delta
>> cycles, and that leads in fact to failing even when there is no cycle.
>>
>> This change adjusts the algorithm to no longer fail in these cases. In
>> fact, these cycle cases will no longer fail but more importantly the
>> valid cases will no longer fail, either. The resulting packfile from the
>> --fix-thin operation will not have cycles either since REF_DELTAs are
>> forbidden from the on-disk format and OFS_DELTAs are impossible to write
>> as a cycle.
> 
> Loosening cycle avoidance indeed is worrysome.  How do we guarantee
> "since REF_DELTAs are forbidden from the on-disk format" (it is
> obvious with OFS_DELTA only you cannot form a cycle)?  By code
> inspection and in-code comment for the code path of "--fix-thin"?
> 
> I think the most interesting question would be how, with the
> loosening of the cycle check in this patch, we would still protect
> against a malicious on-the-wire packdata that has a cycle in it
> already, where deltas cannot be resolved.

The good news is two-fold:

1. We have a test in t5309 (index-pack detects REF_DELTA cycles) that
   checks this case and did not change behavior with this patch.

2. If we don't already have any of the objects that exist in the cycle,
   then we can't _start_ expanding the objects to their full contents
   as we don't have their bases. So what is left is a list of REF_DELTAs
   that have not been processed but also a list of base object IDs that
   we can't load. We'll fail without writing those objects to disk.

Does that satisfy your concerns in this space?

Thanks,
-Stolee

Copy link

gitgitgadget bot commented Apr 24, 2025

This branch is now known as ds/fix-thin-fix.

Copy link

gitgitgadget bot commented Apr 24, 2025

This patch series was integrated into seen via git@089a0e4.

Copy link

gitgitgadget bot commented Apr 25, 2025

This patch series was integrated into seen via git@9a5a794.

@@ -821,6 +821,7 @@ TEST_BUILTINS_OBJS += test-mergesort.o
TEST_BUILTINS_OBJS += test-mktemp.o
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Wed, Apr 23, 2025 at 05:40:02PM +0000, Derrick Stolee via GitGitGadget wrote:
> diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
> new file mode 100644
> index 00000000000..db7d1c3cd1f
> --- /dev/null
> +++ b/t/helper/test-pack-deltas.c
> @@ -0,0 +1,140 @@
[snip]
> +int cmd__pack_deltas(int argc, const char **argv)
> +{
> +	int N;
> +	struct hashfile *f;
> +	struct strbuf line = STRBUF_INIT;
> +
> +	if (argc != 2) {
> +		usage(usage_str);
> +		return -1;
> +	}
> +
> +	N = atoi(argv[1]);

Is there a reason why we don't use `parse_options()` here? It might make
this tool easier to use and extend going forward, and we wouldn't have
to care about invalid arguments. Right now, we silently accept a
non-integer argument and do the wrong thing.

Patrick

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Patrick,

On Fri, 25 Apr 2025, Patrick Steinhardt wrote:

> On Wed, Apr 23, 2025 at 05:40:02PM +0000, Derrick Stolee via GitGitGadget wrote:
> > diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
> > new file mode 100644
> > index 00000000000..db7d1c3cd1f
> > --- /dev/null
> > +++ b/t/helper/test-pack-deltas.c
> > @@ -0,0 +1,140 @@
> [snip]
> > +int cmd__pack_deltas(int argc, const char **argv)
> > +{
> > +	int N;
> > +	struct hashfile *f;
> > +	struct strbuf line = STRBUF_INIT;
> > +
> > +	if (argc != 2) {
> > +		usage(usage_str);
> > +		return -1;
> > +	}
> > +
> > +	N = atoi(argv[1]);
> 
> Is there a reason why we don't use `parse_options()` here? It might make
> this tool easier to use and extend going forward, and we wouldn't have
> to care about invalid arguments. Right now, we silently accept a
> non-integer argument and do the wrong thing.

I think that `parse_options()` would be overkill here because:

- This is a _mandatory_ argument, not an optional one.

- The required data type is `uint32_t`, and `parse_options()` has no
  support for that.

But you do have a good point in that we may want to validate the data type
(even if technically, this is not a user-facing program, it's a test
helper that is used under tight control by Git's own test suite).

Consequently, I would suggest this fixup instead:

-- snipsnap --
Subject: [PATCH] fixup! test-tool: add pack-deltas helper

Let's make the command-line parsing a bit more stringent. We _could_
use `parse_options()`, but that would be overkill for a single,
non-optional argument. Besides, it would not bring any benefit, as the
parsed value needs to fit in the `uint32_t` type, and `parse_options()`
has no provision to ensure that.

Signed-off-by: Johannes Schindelin <[email protected]>
---
 t/helper/test-pack-deltas.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
index 4af69bdc05d3..f95d8ee16768 100644
--- a/t/helper/test-pack-deltas.c
+++ b/t/helper/test-pack-deltas.c
@@ -8,11 +8,12 @@
 #include "hex.h"
 #include "pack.h"
 #include "pack-objects.h"
+#include "parse.h"
 #include "setup.h"
 #include "strbuf.h"
 #include "string-list.h"
 
-static const char usage_str[] = "test-tool pack-deltas <n>";
+static const char usage_str[] = "test-tool pack-deltas <nr_entries>";
 
 static unsigned long do_compress(void **pptr, unsigned long size)
 {
@@ -79,7 +80,7 @@ static void write_ref_delta(struct hashfile *f,
 
 int cmd__pack_deltas(int argc, const char **argv)
 {
-	int N;
+	unsigned long n;
 	struct hashfile *f;
 	struct strbuf line = STRBUF_INIT;
 
@@ -88,12 +89,13 @@ int cmd__pack_deltas(int argc, const char **argv)
 		return -1;
 	}
 
-	N = atoi(argv[1]);
+	if (!git_parse_ulong(argv[1], &n) || n != (uint32_t)n)
+		die("invalid number of objects: %s", argv[1]);
 
 	setup_git_directory();
 
 	f = hashfd(1, "<stdout>");
-	write_pack_header(f, N);
+	write_pack_header(f, n);
 
 	/* Read each line from stdin into 'line' */
 	while (strbuf_getline_lf(&line, stdin) != EOF) {
-- 
2.49.0.windows.1

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Fri, Apr 25, 2025 at 11:34:01AM +0200, Johannes Schindelin wrote:
> Hi Patrick,
> 
> On Fri, 25 Apr 2025, Patrick Steinhardt wrote:
> 
> > On Wed, Apr 23, 2025 at 05:40:02PM +0000, Derrick Stolee via GitGitGadget wrote:
> > > diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
> > > new file mode 100644
> > > index 00000000000..db7d1c3cd1f
> > > --- /dev/null
> > > +++ b/t/helper/test-pack-deltas.c
> > > @@ -0,0 +1,140 @@
> > [snip]
> > > +int cmd__pack_deltas(int argc, const char **argv)
> > > +{
> > > +	int N;
> > > +	struct hashfile *f;
> > > +	struct strbuf line = STRBUF_INIT;
> > > +
> > > +	if (argc != 2) {
> > > +		usage(usage_str);
> > > +		return -1;
> > > +	}
> > > +
> > > +	N = atoi(argv[1]);
> > 
> > Is there a reason why we don't use `parse_options()` here? It might make
> > this tool easier to use and extend going forward, and we wouldn't have
> > to care about invalid arguments. Right now, we silently accept a
> > non-integer argument and do the wrong thing.
> 
> I think that `parse_options()` would be overkill here because:
> 
> - This is a _mandatory_ argument, not an optional one.
> 
> - The required data type is `uint32_t`, and `parse_options()` has no
>   support for that.

Support for that has been merged just this week via 2bc5414c411 (Merge
branch 'ps/parse-options-integers', 2025-04-24).

> But you do have a good point in that we may want to validate the data type
> (even if technically, this is not a user-facing program, it's a test
> helper that is used under tight control by Git's own test suite).
> 
> Consequently, I would suggest this fixup instead:

But in any case, I'd be equally fine with your suggestion.

Patrick

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Patrick,

On Fri, 25 Apr 2025, Patrick Steinhardt wrote:

> On Fri, Apr 25, 2025 at 11:34:01AM +0200, Johannes Schindelin wrote:
> > 
> > On Fri, 25 Apr 2025, Patrick Steinhardt wrote:
> > 
> > > On Wed, Apr 23, 2025 at 05:40:02PM +0000, Derrick Stolee via GitGitGadget wrote:
> > > > diff --git a/t/helper/test-pack-deltas.c b/t/helper/test-pack-deltas.c
> > > > new file mode 100644
> > > > index 00000000000..db7d1c3cd1f
> > > > --- /dev/null
> > > > +++ b/t/helper/test-pack-deltas.c
> > > > @@ -0,0 +1,140 @@
> > > [snip]
> > > > +int cmd__pack_deltas(int argc, const char **argv)
> > > > +{
> > > > +	int N;
> > > > +	struct hashfile *f;
> > > > +	struct strbuf line = STRBUF_INIT;
> > > > +
> > > > +	if (argc != 2) {
> > > > +		usage(usage_str);
> > > > +		return -1;
> > > > +	}
> > > > +
> > > > +	N = atoi(argv[1]);
> > > 
> > > Is there a reason why we don't use `parse_options()` here? It might make
> > > this tool easier to use and extend going forward, and we wouldn't have
> > > to care about invalid arguments. Right now, we silently accept a
> > > non-integer argument and do the wrong thing.
> > 
> > I think that `parse_options()` would be overkill here because:
> > 
> > - This is a _mandatory_ argument, not an optional one.
> > 
> > - The required data type is `uint32_t`, and `parse_options()` has no
> >   support for that.
> 
> Support for that has been merged just this week via 2bc5414c411 (Merge
> branch 'ps/parse-options-integers', 2025-04-24).

That's too new for me to be useful, as I have to work on top of v2.49.0
(see https://github.com/microsoft/git/pull/745).

> > But you do have a good point in that we may want to validate the data type
> > (even if technically, this is not a user-facing program, it's a test
> > helper that is used under tight control by Git's own test suite).
> > 
> > Consequently, I would suggest this fixup instead:
> 
> But in any case, I'd be equally fine with your suggestion.

Thanks!
Johannes

Copy link

gitgitgadget bot commented Apr 25, 2025

User Patrick Steinhardt <[email protected]> has been added to the cc: list.

Copy link

gitgitgadget bot commented Apr 25, 2025

User Johannes Schindelin <[email protected]> has been added to the cc: list.

dscho added a commit to microsoft/git that referenced this pull request Apr 25, 2025
This is an early version of work already under review upstream:
gitgitgadget#1906
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant