Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail mirroring more gracefully: #34002

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rremer
Copy link
Contributor

@rremer rremer commented Mar 24, 2025

  • reuse recoverable error checks across mirror_pull
  • add new cases for 'cannot lock ref/not our ref' (race condition in fetch) and 'Unable to create/lock"
  • move lfs sync right after commit graph write, and before other maintenance which may fail
  • try a prune for 'broken reference' as well as 'not our ref'
  • always sync LFS right after commit graph write, and before other maintenance which may fail

This handles a few cases where our very large and very active repositories could serve mirrored git refs, but be missing lfs files:

Case 1 (multiple variants): Race condition in git fetch

There was already a check for 'unable to resolve reference' on a failed git fetch, after which a git prune and then subsequent fetch are performed. This is to work around a race condition where the git remote tells Gitea about a ref for some HEAD of a branch, then fails a few seconds later because the remote branch was deleted, or the ref was updated (force push).

There are two more variants to the error message you can get, but for the same kind of race condition. These may be related to the git binary version Gitea has access to (in my case, it was 2.48.1).

Case 2: githttp.go can serve updated git refs before it's synced lfs oids

There is probably a more aggressive refactor we could do here to have the cat-file loop use FETCH_HEAD instead of relying on the commit graphs to be committed locally (and thus serveable to clients of Gitea), but a simple reduction in the occurrences of this for me was to move the lfs sync block immediately after the commit-graph write and before any other time-consuming (or potentially erroring/exiting) blocks.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Mar 24, 2025
@pull-request-size pull-request-size bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 24, 2025
@github-actions github-actions bot added the modifies/go Pull requests that update Go code label Mar 24, 2025
@rremer rremer force-pushed the missing-lfs-fixes branch from 39f376f to c6337ab Compare March 24, 2025 21:43
@lunny lunny added the type/enhancement An improvement of existing functionality label Mar 24, 2025
@lunny lunny added this to the 1.24.0 milestone Mar 24, 2025
Comment on lines 241 to 250
case strings.Contains(stderrMessage, "unable to resolve reference") && strings.Contains(stderrMessage, "reference broken"):
case strings.Contains(stderrMessage, "remote error") && strings.Contains(stderrMessage, "not our ref"):
case strings.Contains(stderrMessage, "cannot lock ref") && strings.Contains(stderrMessage, "but expected"):
case strings.Contains(stderrMessage, "Unable to create") && strings.Contains(stderrMessage, ".lock"):
default:
return true
Copy link
Contributor

@wxiaoguang wxiaoguang Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem right. Golang switch doesn't "fallthrough" by default.

It's better to create a test for this function, and add comments for each case: why it would happen and why it could be ignored.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the feedback, I documented the existing logic in a test, along with the new cases of it I've found helpful at my company.

@rremer rremer force-pushed the missing-lfs-fixes branch from c6337ab to 62a85f3 Compare March 25, 2025 17:23
* reuse recoverable error checks across mirror_pull
* add new cases for 'cannot lock ref/not our ref' (race condition in fetch) and 'Unable to create/lock"
* move lfs sync right after commit graph write, and before other maintenance which may fail
* try a prune for 'broken reference' as well as 'not our ref'
* always sync LFS right after commit graph write, and before other maintenance which may fail
@rremer rremer force-pushed the missing-lfs-fixes branch from 62a85f3 to 1871276 Compare March 25, 2025 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. modifies/go Pull requests that update Go code size/M Denotes a PR that changes 30-99 lines, ignoring generated files. type/enhancement An improvement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants