You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cf7f34d (#1882) bumped prodash from 29.0.0 to 29.0.1 in Cargo.toml, as well as various resolved dependency versions in Cargo.lock, which included bumping miniz_oxide from 0.8.0 to 0.8.5.
Around that time and starting somewhat earlier, I was experimenting with running tests via cross in the hope of finding a way to reproduce some s390x-unknown-linux-gnu failures without s390x hardware. As noted in #1890, I was unable to reproduce those failing s390x tests that way, and I still have not managed to do so.
However, I also experimented with another target in cross, with the intention that it would help me understand what failures were target-specific. I didn't get failures in s390x, but I expected I would and that the other target could serve as a kind of control. It turned out that s390x was the control and the other target eventually had unexpected failures.
Specifically, for the other target, I chose armv7-linux-androideabi. I chose this because it had been partially tested via cross on CI before the test was replaced with a more comprehensive non-Android arm32v7 containerized (but non-cross) test on ARM hardware in #1777.
In its past testing on CI, armv7-linux-androideabi had problems using the external git command. Upgrading git in the cross container and patching the runner script to avoid preloading an unusable shared library overcame this, and I was able to run all tests with cross using that target.
All tests all passed--aside from one fuzzing-related performance test that had to be skipped due to slowdown from QEMU emulation of an unrelated architecture without hardware acceleration, and a couple of cases where tests were marked with the wrong #[cfg(...)], which was easily fixed. But...
Then prodash, and various locked dependencies, were upgraded
Starting after a rebase that included cf7f34d in the history, several test cases failed in cross runs on the armv7-linux-androideabi target. In the recent run presented here, this is eight tests. Running git revert --no-commit cf7f34d and rerunning the tests confirmed that this change is what triggered the failures.
I did a full gix clean -xde between the runs to ensure they did not contaminate each other. I also ran it a number of times with the intention of verify that the effect is consistent. In initially thought the effect was completely consistent based on some very similar looking runs, but actually the tests that fail differ. I suspect this is related to threading. I sometimes forced the Docker image to be rebuilt in between as well, to ensure contamination in the Docker image itself was not the cause (though I was not aware of a way that could happen), which it was not.
Full test run output of the failures, and of how they go away without the changes from cf7f34d, are presented in this gist. Although not shown there, I have also tried running the tests with --locked. This does not make a difference, which makes me think that upgrade to prodash itself is relevant, though the way it is relevant could still in principle be through a dependency.
The failures are all specific to armv7-linux-androideabi with cross, or at least they do not happen in any of the other ways I have been tested or that are tested on CI. In particular, this does not happen on the same system when running the tests natively, nor when running them on that system with cross with the s390x-unknown-linux-gnu target. In particular, they do not fail on either of the 32-bit targets tested (without cross) on CI, one of which is a non-Android ARMv7 target.
Indirect dependencies that may be relevant
The changes in prodash 29.0.1 are listed as replacing humantime with jiff and bumping jiff to version 0.2, which the diff confirms. Assuming the latest patch versions were used in both 0.1 and 0.2 in practice, there are a number of relevant differences, but none seem like they would affect miniz_oxide.
The reason I was looking for an effect on miniz_oxide is that the test failures are panics that happen there. gitoxide uses miniz_oxide through flate2. If there is any connection to prodash or jiff, I don't know what it is. But if there isn't, then I don't know why passing --locked made no difference: with or without --locked, the failures happened with the changes from cf7f34d, and did not happen without them.
But if we look at Cargo.lock changes in cf7f34d, the flate2 and miniz_oxide crates were among those upgraded:
Furthermore the versions of miniz_oxide listed in Cargo.lock seem to be used even when I do not pass --locked or --frozen. The compiled version is 0.8.5 when the tests fail, and the compiled version is 0.8.0 when the tests pass with the revert, as though locked versions are being used. Perhaps there is something I don't understand about when Cargo.lock versions are guaranteed to be used?
Which test modules have failures
Because I am using cross, I am using cargo test (via cross test) rather than cargo nextest. The output is accordingly much harder (at least for me) to read. Here's the broad summary at the end of the whole run:
error: 3 targets failed:
`-p gix --test gix`
`-p gix-odb-tests --test integrate`
`-p gix-pack-tests --test pack`
error: Recipe `cross-test` failed on line 234 with exit code 101
There are always failures in these three modules and no others. Details about all the specific failures in the linked gist are as follows.
Summary of gix failures
The failures to which -p gix --test gix pertains were:
failures:
remote::fetch::blocking_and_async_io::fetch_more_packs_than_can_be_handled
remote::fetch::blocking_and_async_io::fetch_with_multi_round_negotiation
repository::object::commit_as::specify_committer_and_author
test result: FAILED. 319 passed; 3 failed; 0 ignored; 0 measured; 0 filtered out; finished in 40.12s
error: test failed, to rerun pass `-p gix --test gix`
I don't know what's going on, but all of the failures are "attempt to subtract with overflow" errors in miniz_oxide. This is the case both in the run shown in the gist and in other runs where the tests that fail are sometimes slightly different. All the failures seem to resemble each other pretty closely. Here's the first failure output with its backtrace, to illustrate (the others can be viewed in the gist through the specific links, lest this issue become too long):
---- remote::fetch::blocking_and_async_io::fetch_more_packs_than_can_be_handled stdout ----
thread 'remote::fetch::blocking_and_async_io::fetch_more_packs_than_can_be_handled' panicked at /cargo/registry/src/index.crates.io-1949cf8c6b5b557f/miniz_oxide-0.8.5/src/deflate/core.rs:1990:38:
attempt to subtract with overflow
stack backtrace:
0: rust_begin_unwind
at /rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/std/src/panicking.rs:692:5
1: core::panicking::panic_fmt
at /rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/core/src/panicking.rs:75:14
2: core::panicking::panic_const::panic_const_sub_overflow
at /rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/core/src/panicking.rs:178:21
3: miniz_oxide::deflate::core::compress_fast
at /cargo/registry/src/index.crates.io-1949cf8c6b5b557f/miniz_oxide-0.8.5/src/deflate/core.rs:1990:38
4: miniz_oxide::deflate::core::compress_inner
at /cargo/registry/src/index.crates.io-1949cf8c6b5b557f/miniz_oxide-0.8.5/src/deflate/core.rs:2244:9
5: miniz_oxide::deflate::core::compress
at /cargo/registry/src/index.crates.io-1949cf8c6b5b557f/miniz_oxide-0.8.5/src/deflate/core.rs:2177:5
6: miniz_oxide::deflate::stream::deflate
at /cargo/registry/src/index.crates.io-1949cf8c6b5b557f/miniz_oxide-0.8.5/src/deflate/stream.rs:54:23
7: <flate2::ffi::rust::Deflate as flate2::ffi::DeflateBackend>::compress
at /cargo/registry/src/index.crates.io-1949cf8c6b5b557f/flate2-1.1.0/src/ffi/rust.rs:169:19
8: flate2::mem::Compress::compress
at /cargo/registry/src/index.crates.io-1949cf8c6b5b557f/flate2-1.1.0/src/mem.rs:334:9
9: gix_features::zlib::stream::deflate::impls::<impl gix_features::zlib::stream::deflate::Write<W>>::write_inner
at /project/gix-features/src/zlib/stream/deflate/mod.rs:69:30
10: gix_features::zlib::stream::deflate::impls::<impl std::io::Write for gix_features::zlib::stream::deflate::Write<W>>::flush
at /project/gix-features/src/zlib/stream/deflate/mod.rs:107:13
11: <gix_features::hash::write::Write<T> as std::io::Write>::flush
at /project/gix-features/src/hash.rs:182:13
12: gix_odb::store_impls::loose::write::<impl gix_object::traits::Write for gix_odb::store_impls::loose::Store>::write_stream
at /project/gix-odb/src/store_impls/loose/write.rs:91:9
13: gix_odb::store_impls::dynamic::write::<impl gix_object::traits::Write for gix_odb::store_impls::dynamic::Handle<S>>::write_stream
at /project/gix-odb/src/store_impls/dynamic/write.rs:34:26
14: gix_odb::cache::impls::<impl gix_object::traits::Write for gix_odb::Cache<S>>::write_stream
at /project/gix-odb/src/cache.rs:154:13
15: <gix_odb::memory::Proxy<T> as gix_object::traits::Write>::write_stream
at /project/gix-odb/src/memory.rs:216:20
16: gix_object::traits::Write::write_buf
at /project/gix-object/src/traits/mod.rs:16:9
17: gix::repository::object::<impl gix::types::Repository>::write_object_inner
at ./src/repository/object.rs:180:9
18: gix::repository::object::<impl gix::types::Repository>::write_object
at ./src/repository/object.rs:171:9
19: gix::repository::object::<impl gix::types::Repository>::commit_as_inner
at ./src/repository/object.rs:309:25
20: gix::repository::object::<impl gix::types::Repository>::commit_as
at ./src/repository/object.rs:273:9
21: gix::repository::object::<impl gix::types::Repository>::commit
at ./src/repository/object.rs:373:9
22: gix::remote::fetch::blocking_and_async_io::fetch_more_packs_than_can_be_handled::create_empty_commit
at ./tests/gix/remote/fetch.rs:97:13
23: gix::remote::fetch::blocking_and_async_io::fetch_more_packs_than_can_be_handled
at ./tests/gix/remote/fetch.rs:113:13
24: gix::remote::fetch::blocking_and_async_io::fetch_more_packs_than_can_be_handled::{{closure}}
at ./tests/gix/remote/fetch.rs:90:50
25: core::ops::function::FnOnce::call_once
at /rust/lib/rustlib/src/rust/library/core/src/ops/function.rs:250:5
26: core::ops::function::FnOnce::call_once
at /rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Looking at this step in the backtrace:
3: miniz_oxide::deflate::core::compress_fast
at /cargo/registry/src/index.crates.io-1949cf8c6b5b557f/miniz_oxide-0.8.5/src/deflate/core.rs:1990:38
The blame shows the most recent change to that line as a removal of the as usize cast on probe_pos. This is from commit Frommi/miniz_oxide@0f50464 where several as usize casts are removed. But I think this cannot be the cause of the new failure, since this change is present in all versions of miniz_oxide since 0.7.2.
Inspecting a broader diff, there are substantial changes, including two deflate/core.rs from 0.8.0 to 0.8.5, but I don't know which would cause this. Maybe something in the changelog would give a hint.
Assuming they are able to be resolved without causing problems with other dependencies, I might manage to bisect the miniz_oxide versions, or even on miniz_oxide commits, on one of the failing gitoxide tests. But perhaps you will have a more specific idea of what is going on.
Expected behavior 🤔
Ordinarily I would say I expect that all tests pass. But cross images are strange environments in which to run gitoxide tests, because they are x86-64 containers in which code of another architecture is built and then, through QEMU, run. I generally expect a cross image to require significant customization to be usable to run gitoxide tests, since gitoxide interacts with various parts of the system, even in a max-pure build as used here.
If I had always observed these failures, then I would have assumed the problem was just with the container as I had set it up, and not assumed a bug. But it is strange that it was working fine (once I got git to a new enough version and kept ld.so message from polluting stderr, as described below), but thencf7f34d broke it. That suggests to me that there is a bug somewhere.
Maybe there is a long-standing bug in gitoxide that this surfaces. Or maybe there is something wrong with the tools in the cross image I am using (either in the original image, or from my modifications, detailed below). Or maybe there is a bug in miniz_oxide, or in a crate that uses it such as flate2.
Git behavior
In Git itself
In the sense of git behavior that corresponds to the affected behavior of gitoxide, this is not applicable as far as I know.
In how we use it
I also suspect that the way the image is customized to be able to use git for tests that run it, either directly or through .gitignored fixtures, is not relevant here. But just in case it is, the most git-relevant changes in the container are:
Enable the git-core PPA.
Upgrade git. This is still the x86-64 git. (cross images are x86-64 images with cross toolchains installed. It is possible to install git in a cross image for some cross targets, and but probably not this one.)
Patch the runner script cross uses, to keep it from setting LD_PRELOAD to include a .so file implementing the C++ standard library. This could be useful in a C++ Android program, but when ld.so attempts to preload it in an x86-64 executable, it writes this text to standard error:
ERROR: ld.so: object '/android-ndk/sysroot/usr/lib/arm-linux-androideabi/libc++_shared.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored.
Then the program actually runs. But when we capture standard error and parse or make assertions about it, we get failures. So I kept LD_PRELOAD from being set to that path, to avoid this.
Full details of how I customized the container, including those not related to making git usable, are included below.
Steps to reproduce 🕹
Customizing the container
I'll open a pull request with the changes. That pull request will not fix this issue, but it will be possible to look at the changes in it, at least initially. However, I intend also to improve that pull request, both to be able to show these failures better--there are other problems on CI related to configuring binfmt_misc to use QEMU that obscure them currently--and to include further changes, both that I have already done and that I plan to do, for testing s390x. So the information about how the observations reported here were produced seems valuable to present here.
I made this etc/docker/test-cross.toml file:
# `cross` configuration for running tests. Treated like `Cross.toml` if enabled# with `CROSS_CONFIG=etc/docker/test-cross.toml`. This avoids affecting other# `cross` usage, e.g. in `release.yml`. See `cross-test` recipes in `justfile`.
[build.env]
passthrough = [
"CI",
"GITHUB_ACTIONS",
"GIX_CREDENTIALS_HELPER_STDERR",
"GIX_EXTERNAL_COMMAND_STDERR",
"GIX_OBJECT_CACHE_MEMORY",
"GIX_PACK_CACHE_MEMORY",
"GIX_TEST_CREATE_ARCHIVES_EVEN_ON_CI",
"GIX_TEST_EXPECT_REDUCED_TRUST",
"GIX_TEST_IGNORE_ARCHIVES",
"GIX_VERSION",
"NO_PRELOAD_CXX",
"RUST_BACKTRACE",
"RUST_LIB_BACKTRACE",
]
[target.armv7-linux-androideabi]
image = "cross-rs-gitoxide:armv7-linux-androideabi"
[target.s390x-unknown-linux-gnu]
image = "cross-rs-gitoxide:s390x-unknown-linux-gnu"
# Build a customized `cross` container image for testingcross-imagetarget:
docker build --build-arg "TARGET={{ target }}" -t "cross-rs-gitoxide:{{ target }}" \
- <etc/docker/Dockerfile.test-cross
# Test another platform with `cross`cross-testtarget: (cross-image target)
CROSS_CONFIG=etc/docker/test-cross.toml NO_PRELOAD_CXX=1 \
cross test --workspace --no-fail-fast --target "{{ target }}" \
--no-default-features --features max-pure \
-- --skip realpath::fuzzed_timeout
# Test s390x with `cross`cross-test-s390x: (cross-test's390x-unknown-linux-gnu')
# Test Android with `cross`cross-test-android: (cross-test'armv7-linux-androideabi')
Running the tests
Then, as shown in the gist, I ran various commands, the most important of which were this command, producing the errors:
RUST_BACKTRACE=1 just cross-test armv7-linux-androideabi
And these commands, cleaning the build directory and running the tests with the changes of cf7f34d undone. (To do this recently without having to manually resolve a conflict--but not originally, as shown in this older gist--I had to also revert 00d1a00, but that is not the cause.)
I have done this quite a few times; the failures appear deterministic. Enabling the backtrace is not the cause. (That would be strange, but who knows--this is already strange.) I have also run the tests multiple times without RUST_BACKTRACE=1.
The images are not stale
I have also tried forcing all Docker images to be reacquired or rebuilt, even before the second run, by running the following commands, where the third command is the one that actually deletes images, and the purpose of the others is to verify that there are no containers running or stopped, that there are images before I delete them, and that there are no images after I delete them. Please note that you might not want to run these commands, since they remove any Docker images that are not use by containers.
docker container list -a
docker image list -a
docker image prune -a
docker image list -a
I would not expect that to have made a difference, and it never did. The main value of it was to let me make sure the first run in the gist would show the whole process of customizing the Docker image, in case somehow that turns out to be relevant.
Minor patches to the test suite
Finally, though I think this is the least important for this issue, the changes to allow tests to pass instead of failing, or to run and pass instead of being skipped, were:
diff --git a/tests/tools/tests/umask.rs b/tests/tools/tests/umask.rs
index 0a4cc0425..5463b57ad 100644
--- a/tests/tools/tests/umask.rs+++ b/tests/tools/tests/umask.rs@@ -1,6 +1,9 @@
#[test]
#[cfg(unix)]
-#[cfg_attr(not(target_os = "linux"), ignore = "The test itself uses /proc")]+#[cfg_attr(+ not(any(target_os = "linux", target_os = "android")),+ ignore = "The test itself uses /proc"+)]
fn umask() {
use std::fs::File;
use std::io::{BufRead, BufReader};
There are a few other changes on the feature branch, but you are unlikely to want to see them in this issue. They do not affect code exercised in any of the tests, and they shall be included in the forthcoming PR (which, to be clear, does not hope to fix this issue).
Edits: The pull request
That PR is #1895. The commit at this point the point described above is 5d4eed7. Later commits may include modifications to some of the files shown here. (The history to that commit can be seen here, though it's possible that could eventually be GC'd following a rebase.)
As of e207b27 ("Configure binfmt_misc for QEMU on CI"), CI in #1895 shows the same bug as here, without any unrelated failures. So that may now be a good way to examine the bug. (Comparison of the failures there to those in the gist may also illuminate something about what changes and what stays the same across different runs.) For convenience, here's a failing run, and:
The possible role of cargo test (vs. cargo nextest) seems small
On this separate branch (currently at this commit), I ran an experiment where other tests, especially the test-32bit jobs, are changed to be more like these, including by making them use cargo test. This does not ever seem to cause them to fail.
So while the nondeterminism in exactly which test cases fail suggests a possible role of cargo test using multithreading, it does not seem to a significant contributing factor to the effect:
The effect only occurs since dependencies were updated.
Even after that, it does not occur in any tests except those being experimented with in armv7-linux-androideabi.
Not even if other tests are changed to use cargo test so they are more similar to how armv7-linux-androideabi is tested.
The text was updated successfully, but these errors were encountered:
EliahKagan
changed the title
8 armv7-linux-androideabi cross test "attempt to subtract with overflow" panics in miniz_oxide since new prodash
8 armv7-linux-androideabi cross test "attempt to subtract with overflow" in miniz_oxide 0.8.5
Mar 18, 2025
EliahKagan
changed the title
8 armv7-linux-androideabi cross test "attempt to subtract with overflow" in miniz_oxide 0.8.5
armv7-linux-androideabi cross test "attempt to subtract with overflow" in miniz_oxide 0.8.5
Mar 18, 2025
Thanks for the ping - this does look like it could be a regression in miniz_oxide. I can do some testing on my android TV box that runs armv7 android and see if I can hit the issue. Would be easiests if it was possible to reduce the issue to just the input to miniz_oxide.
Is there some way to more easily run tests on 32-bit and s390x on github CI? I suspect it's to do with 32-bit rather than android specifically though I could be wrong. I only have build tests for 32-bit currently via wasm target as from what I remember that target had issues running tests but should really have full tests on bot 32-bit and big endian (so s390x).
EDIT: So I didn't look closely enough at first - it's failing on 32-bit armv7 android but not on i386 (or arm32v7) - that makes it a lot more confusing. If it was a simple regression in miniz_oxide from some changes I did would have expect it to fail on other 32-bit platforms as well but that doesn't seem to be the case.
The compress_fast function is one part that wasn't changed at all between those versions, the changes were all elsewhere. I did some changes that yielded performance improvements to it more recently but those changes have not landed in a release yet.
Current behavior 😯
Background
cf7f34d (#1882) bumped
prodash
from 29.0.0 to 29.0.1 inCargo.toml
, as well as various resolved dependency versions inCargo.lock
, which included bumpingminiz_oxide
from 0.8.0 to 0.8.5.Around that time and starting somewhat earlier, I was experimenting with running tests via
cross
in the hope of finding a way to reproduce somes390x-unknown-linux-gnu
failures without s390x hardware. As noted in #1890, I was unable to reproduce those failing s390x tests that way, and I still have not managed to do so.However, I also experimented with another target in
cross
, with the intention that it would help me understand what failures were target-specific. I didn't get failures in s390x, but I expected I would and that the other target could serve as a kind of control. It turned out that s390x was the control and the other target eventually had unexpected failures.Specifically, for the other target, I chose
armv7-linux-androideabi
. I chose this because it had been partially tested viacross
on CI before the test was replaced with a more comprehensive non-Android arm32v7 containerized (but non-cross
) test on ARM hardware in #1777.In its past testing on CI,
armv7-linux-androideabi
had problems using the externalgit
command. Upgradinggit
in thecross
container and patching the runner script to avoid preloading an unusable shared library overcame this, and I was able to run all tests withcross
using that target.All tests all passed--aside from one fuzzing-related performance test that had to be skipped due to slowdown from QEMU emulation of an unrelated architecture without hardware acceleration, and a couple of cases where tests were marked with the wrong
#[cfg(...)]
, which was easily fixed. But...Then
prodash
, and various locked dependencies, were upgradedStarting after a rebase that included cf7f34d in the history, several test cases failed in
cross
runs on thearmv7-linux-androideabi
target. In the recent run presented here, this is eight tests. Runninggit revert --no-commit cf7f34d
and rerunning the tests confirmed that this change is what triggered the failures.I did a full
gix clean -xde
between the runs to ensure they did not contaminate each other. I also ran it a number of times with the intention of verify that the effect is consistent. In initially thought the effect was completely consistent based on some very similar looking runs, but actually the tests that fail differ. I suspect this is related to threading. I sometimes forced the Docker image to be rebuilt in between as well, to ensure contamination in the Docker image itself was not the cause (though I was not aware of a way that could happen), which it was not.Full test run output of the failures, and of how they go away without the changes from cf7f34d, are presented in this gist. Although not shown there, I have also tried running the tests with
--locked
. This does not make a difference, which makes me think that upgrade toprodash
itself is relevant, though the way it is relevant could still in principle be through a dependency.The failures are all specific to
armv7-linux-androideabi
withcross
, or at least they do not happen in any of the other ways I have been tested or that are tested on CI. In particular, this does not happen on the same system when running the tests natively, nor when running them on that system withcross
with thes390x-unknown-linux-gnu
target. In particular, they do not fail on either of the 32-bit targets tested (withoutcross
) on CI, one of which is a non-Android ARMv7 target.Indirect dependencies that may be relevant
The changes in prodash 29.0.1 are listed as replacing
humantime
withjiff
and bumpingjiff
to version 0.2, which the diff confirms. Assuming the latest patch versions were used in both 0.1 and 0.2 in practice, there are a number of relevant differences, but none seem like they would affectminiz_oxide
.The reason I was looking for an effect on
miniz_oxide
is that the test failures are panics that happen there.gitoxide
usesminiz_oxide
throughflate2
. If there is any connection toprodash
orjiff
, I don't know what it is. But if there isn't, then I don't know why passing--locked
made no difference: with or without--locked
, the failures happened with the changes from cf7f34d, and did not happen without them.But if we look at
Cargo.lock
changes in cf7f34d, theflate2
andminiz_oxide
crates were among those upgraded:Furthermore the versions of
miniz_oxide
listed inCargo.lock
seem to be used even when I do not pass--locked
or--frozen
. The compiled version is 0.8.5 when the tests fail, and the compiled version is 0.8.0 when the tests pass with the revert, as though locked versions are being used. Perhaps there is something I don't understand about whenCargo.lock
versions are guaranteed to be used?Which test modules have failures
Because I am using
cross
, I am usingcargo test
(viacross test
) rather thancargo nextest
. The output is accordingly much harder (at least for me) to read. Here's the broad summary at the end of the whole run:There are always failures in these three modules and no others. Details about all the specific failures in the linked gist are as follows.
Summary of
gix
failuresThe failures to which
-p gix --test gix
pertains were:Here are links to their full details in the gist:
remote::fetch::blocking_and_async_io::fetch_more_packs_than_can_be_handled
remote::fetch::blocking_and_async_io::fetch_with_multi_round_negotiation
repository::object::commit_as::specify_committer_and_author
Summary of
gix-odb-tests
failuresThe failures to which
-p gix-odb-tests --test integrate
pertains were:Here are links to their full details in the gist:
odb::store::dynamic::write
odb::store::loose::write::collisions_do_not_cause_failure
odb::store::loose::write::it_writes_objects_with_similar_permissions
odb::store::loose::write::read_and_write
Summary of
gix-pack-tests
failuresThe failure to which
-p gix-pack-tests --test pack
pertains was:Here is a link to its full details in the gist:
pack::data::output::count_and_entries::traversals
Analysis
I don't know what's going on, but all of the failures are "attempt to subtract with overflow" errors in
miniz_oxide
. This is the case both in the run shown in the gist and in other runs where the tests that fail are sometimes slightly different. All the failures seem to resemble each other pretty closely. Here's the first failure output with its backtrace, to illustrate (the others can be viewed in the gist through the specific links, lest this issue become too long):Looking at this step in the backtrace:
That is a reference to this line in
miniz_oxide
0.8.5:The blame shows the most recent change to that line as a removal of the
as usize
cast onprobe_pos
. This is from commit Frommi/miniz_oxide@0f50464 where severalas usize
casts are removed. But I think this cannot be the cause of the new failure, since this change is present in all versions ofminiz_oxide
since 0.7.2.Inspecting a broader diff, there are substantial changes, including two
deflate/core.rs
from 0.8.0 to 0.8.5, but I don't know which would cause this. Maybe something in the changelog would give a hint.Assuming they are able to be resolved without causing problems with other dependencies, I might manage to bisect the
miniz_oxide
versions, or even onminiz_oxide
commits, on one of the failing gitoxide tests. But perhaps you will have a more specific idea of what is going on.Expected behavior 🤔
Ordinarily I would say I expect that all tests pass. But
cross
images are strange environments in which to rungitoxide
tests, because they are x86-64 containers in which code of another architecture is built and then, through QEMU, run. I generally expect across
image to require significant customization to be usable to rungitoxide
tests, sincegitoxide
interacts with various parts of the system, even in amax-pure
build as used here.If I had always observed these failures, then I would have assumed the problem was just with the container as I had set it up, and not assumed a bug. But it is strange that it was working fine (once I got
git
to a new enough version and keptld.so
message from pollutingstderr
, as described below), but then cf7f34d broke it. That suggests to me that there is a bug somewhere.Maybe there is a long-standing bug in gitoxide that this surfaces. Or maybe there is something wrong with the tools in the
cross
image I am using (either in the original image, or from my modifications, detailed below). Or maybe there is a bug inminiz_oxide
, or in a crate that uses it such asflate2
.Git behavior
In Git itself
In the sense of
git
behavior that corresponds to the affected behavior ofgitoxide
, this is not applicable as far as I know.In how we use it
I also suspect that the way the image is customized to be able to use
git
for tests that run it, either directly or through.gitignore
d fixtures, is not relevant here. But just in case it is, the mostgit
-relevant changes in the container are:Enable the
git-core
PPA.Upgrade
git
. This is still the x86-64git
. (cross
images are x86-64 images with cross toolchains installed. It is possible to installgit
in across
image for somecross
targets, and but probably not this one.)Patch the runner script
cross
uses, to keep it from settingLD_PRELOAD
to include a.so
file implementing the C++ standard library. This could be useful in a C++ Android program, but whenld.so
attempts to preload it in an x86-64 executable, it writes this text to standard error:Then the program actually runs. But when we capture standard error and parse or make assertions about it, we get failures. So I kept
LD_PRELOAD
from being set to that path, to avoid this.Full details of how I customized the container, including those not related to making
git
usable, are included below.Steps to reproduce 🕹
Customizing the container
I'll open a pull request with the changes. That pull request will not fix this issue, but it will be possible to look at the changes in it, at least initially. However, I intend also to improve that pull request, both to be able to show these failures better--there are other problems on CI related to configuring
binfmt_misc
to use QEMU that obscure them currently--and to include further changes, both that I have already done and that I plan to do, for testing s390x. So the information about how the observations reported here were produced seems valuable to present here.I made this
etc/docker/test-cross.toml
file:And this
etc/docker/Dockerfile.test-cross
file:And added these recipes to the
justfile
:Running the tests
Then, as shown in the gist, I ran various commands, the most important of which were this command, producing the errors:
And these commands, cleaning the build directory and running the tests with the changes of cf7f34d undone. (To do this recently without having to manually resolve a conflict--but not originally, as shown in this older gist--I had to also revert 00d1a00, but that is not the cause.)
I have done this quite a few times; the failures appear deterministic. Enabling the backtrace is not the cause. (That would be strange, but who knows--this is already strange.) I have also run the tests multiple times without
RUST_BACKTRACE=1
.The images are not stale
I have also tried forcing all Docker images to be reacquired or rebuilt, even before the second run, by running the following commands, where the third command is the one that actually deletes images, and the purpose of the others is to verify that there are no containers running or stopped, that there are images before I delete them, and that there are no images after I delete them. Please note that you might not want to run these commands, since they remove any Docker images that are not use by containers.
I would not expect that to have made a difference, and it never did. The main value of it was to let me make sure the first run in the gist would show the whole process of customizing the Docker image, in case somehow that turns out to be relevant.
Minor patches to the test suite
Finally, though I think this is the least important for this issue, the changes to allow tests to pass instead of failing, or to run and pass instead of being skipped, were:
There are a few other changes on the feature branch, but you are unlikely to want to see them in this issue. They do not affect code exercised in any of the tests, and they shall be included in the forthcoming PR (which, to be clear, does not hope to fix this issue).
Edits: The pull request
That PR is #1895. The commit at
this pointthe point described above is 5d4eed7. Later commits may include modifications to some of the files shown here. (The history to that commit can be seen here, though it's possible that could eventually be GC'd following a rebase.)As of e207b27 ("Configure binfmt_misc for QEMU on CI"), CI in #1895 shows the same bug as here, without any unrelated failures. So that may now be a good way to examine the bug. (Comparison of the failures there to those in the gist may also illuminate something about what changes and what stays the same across different runs.) For convenience, here's a failing run, and:
-p gix --test gix
failures-p gix-odb-tests --test integrate
failures-p gix-pack-tests --test pack
failuresThe possible role of
cargo test
(vs.cargo nextest
) seems smallOn this separate branch (currently at this commit), I ran an experiment where other tests, especially the
test-32bit
jobs, are changed to be more like these, including by making them usecargo test
. This does not ever seem to cause them to fail.So while the nondeterminism in exactly which test cases fail suggests a possible role of
cargo test
using multithreading, it does not seem to a significant contributing factor to the effect:armv7-linux-androideabi
.cargo test
so they are more similar to howarmv7-linux-androideabi
is tested.The text was updated successfully, but these errors were encountered: