Skip to content

FileSnapshot::into_owned_or_cloned intermittently fails according to its journey test #1896

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
EliahKagan opened this issue Mar 18, 2025 · 4 comments · Fixed by #1897 or #1899
Closed

Comments

@EliahKagan
Copy link
Member

EliahKagan commented Mar 18, 2025

Current behavior 😯

The gix-fs::fs snapshot::journey test case was introduced in f93aa61 (#1884) along with the into_owned_or_cloned method that it tests. That test case fails intermittently.

This was first observed in #1892, but it is not related to the changes in that PR. It seems to happen occasionally on all platforms, though I have not found it to fail on macOS, only on Ubuntu and Windows. I suspect that this is merely because we have fewer macOS jobs than Ubuntu or Windows jobs, but I do not know that to be why it hasn't happened on macOS. All failures observed so far have been on CI. I have not experimented locally with it.

Although #1892 makes a change to a Windows job that is one of the jobs where the failure was observed, the change is made for its future effect, and it does not change what software currently gets installed. More importantly, the observed failure in that job on Windows (as well as in another separate Windows job) is from before that PR. The failures observed in that PR have been on Ubuntu (and also in multiple jobs).

Besides being observed in that PR, I have observed failures on the main branch of my fork, where I have rerun jobs to check for it, and unintentionally in #1864 after a rebase that causes it to have f93aa61 in its history. It does not appear that failure is more or less likely on any branch compared to any other branch (for branches that have the test in their history).

The failures look like:

        FAIL [   0.005s] gix-fs::fs snapshot::journey
──── STDOUT:             gix-fs::fs snapshot::journey

running 1 test
test snapshot::journey ... FAILED

failures:

failures:
    snapshot::journey

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 26 filtered out; finished in 0.00s

──── STDERR:             gix-fs::fs snapshot::journey

thread 'snapshot::journey' panicked at gix-fs/tests/fs/snapshot.rs:31:5:
assertion `left == right` failed: it picks up the change
  left: "content"
 right: "change"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The failing part is:

std::fs::write(&file_path, "change")?;
let snap = smut.recent_snapshot(check, open)?.expect("content read");
assert_eq!(&**snap, "change", "it picks up the change");

Specifically, that assert_eq! fails, due to the test observing the old contents content, rather than the new contents change.

Expected behavior 🤔

The test should pass. I don't know if the problem is the test or the code under test though. It seems like the test straightforwardly asserts that any write that is completed from the perspective of Rust code that runs in sequence with a subsequent access to the snapshot should be observed in that subsequent access. If so, then that suggests the bug could be in the code under test.

Git behavior

Not applicable.

Steps to reproduce 🕹

Running the test repeatedly seems to work to reproduce it, at least on CI. Usually the test passes but sometimes it fails.

Observed failures on CI so far, in the order they have been observed, have been:

I originally "reported" this in #1892 (comment) but I figured it should have an issue, since its connection to that PR is coincidental.

@Byron
Copy link
Member

Byron commented Mar 19, 2025

Thanks for reporting and collecting all available information, and of course, sorry for the hassle :/.
When I introduced the test I was lucky enough to believe it works consistently, but thinking about it, the implementation is flawed.

I think the issue is here:

if !has_nanosecond_times(tmp.path())? {
return Ok(());
}

And more specifically, here:

fn has_nanosecond_times(root: &Path) -> std::io::Result<bool> {
let test_file = root.join("nanosecond-test");
std::fs::write(&test_file, "a")?;
let first_time = test_file.metadata()?.modified()?;
std::fs::write(&test_file, "b")?;
let second_time = test_file.metadata()?.modified()?;
Ok(first_time != second_time)
}

The comparison in line 53 would also be true if nanoseconds aren't supported, and coincidentally the time switched from over from one second to another.

Fix is on the way.

Byron added a commit that referenced this issue Mar 19, 2025
improve detection of nanosecond support in `gix-fs` (#1896)
@EliahKagan
Copy link
Member Author

Which filesystems have nanosecond timestamps? If I understand you correctly, the test should be skipped on systems that don't have nanosecond timestamps but was sometimes wrongly not skipped on such systems. Failures occurred just about everywhere except macOS. Do all the non-macOS systems we test lack nanosecond timestamps?

@Byron
Copy link
Member

Byron commented Mar 19, 2025

It seemed so. If you take a look at the PR that fixed it, I also hope you will conclude that this actually is a fix.

I think despite what's written there, it could also be false-negative if the operation takes exactly 1s, or multiple of a second. It can't be false-positive, I think (unless the filesystem is broken).

@EliahKagan
Copy link
Member Author

EliahKagan commented Mar 19, 2025

If you take a look at the PR that fixed it, I also hope you will conclude that this actually is a fix.

The same failure still occurs. Specifically, I rebased #1862 and #1864 onto main again, and the test-fast job on windows-latest failed on 720a23f in #1864 with:

        FAIL [   0.007s] gix-fs::fs snapshot::journey
──── STDOUT:             gix-fs::fs snapshot::journey

running 1 test
test snapshot::journey ... FAILED

failures:

failures:
    snapshot::journey

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 26 filtered out; finished in 0.00s

──── STDERR:             gix-fs::fs snapshot::journey

thread 'snapshot::journey' panicked at gix-fs\tests\fs\snapshot.rs:31:5:
assertion `left == right` failed: it picks up the change
  left: "content"
 right: "change"
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The commit with the failure, 720a23f, has 2c9b214 (#1897) in its history, and contains the code that was intended as a fix:

Ok(second_time.duration_since(first_time).is_ok_and(|d|
// This can be falsely false if a filesystem would be ridiculously fast,
// which means a test won't run even though it could. But that's OK, and unlikely.
d.subsec_nanos() != 0))

So #1897 didn't have the intended effect, and this bug remains unfixed.

Edit: I've looked up the semantics of the operations performed in #1897, and also did some experiments with the function changed there and with some other code that is effectively an instrumented version of it, to measure how often it returns true on systems with only millisecond filesystem timestamp precision, as well as to examine the timestamps themselves and infer a likely mechanism. See #1897 (review) for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants