Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save full list of entries for every source #114

Merged
merged 1 commit into from
Feb 21, 2025

Conversation

tianon
Copy link
Member

@tianon tianon commented Feb 21, 2025

When we normalize sources by deduplicating on sourceId, we end up losing a little bit of data that's important for being able to reconnect source objects with their lines in library/foo.

This resolves that by adjusting our entry object to be a vector of entries instead, but sorted using the same SOURCE_DATE_EPOCH tiebreaker we used previously so that .entries[0] is the same as our old .entry, but we keep the full list of values for later use/lookup/cross-referencing.

I have additionally verified that meta.jq here was the only place we are actively consuming from the .entry object (currently), so this should be fully sufficient (no changes necessary elsewhere).

@tianon tianon requested a review from a team as a code owner February 21, 2025 00:59
@tianon
Copy link
Member Author

tianon commented Feb 21, 2025

(I dedicate this change to @mrjoelkamp 🫡 😭 ❤️)

When we normalize sources by deduplicating on `sourceId`, we end up losing a little bit of data that's important for being able to reconnect `source` objects with their lines in `library/foo`.

This resolves that by adjusting our `entry` object to be a vector of `entries` instead, but sorted using the same `SOURCE_DATE_EPOCH` tiebreaker we used previously so that `.entries[0]` is the same as our old `.entry`, but we keep the full list of values for later use/lookup/cross-referencing.

I have additionally verified that `meta.jq` here was the only place we are actively consuming from the `.entry` object (currently), so this should be fully sufficient (no changes necessary elsewhere).
@tianon
Copy link
Member Author

tianon commented Feb 21, 2025

An (easy) alternative we could do would be to calculate this .entries value, and then at the very very end, convert to .entry + .otherEntries or something, so the churn is smaller. 🤔

@tianon
Copy link
Member Author

tianon commented Feb 21, 2025

(but that makes using it for the intended purpose more annoying because the data is in two places, so I think this change is probably more correct/useful as-is and the one-time heavy indentation change is worth it)

@yosifkit yosifkit merged commit 1ac45c0 into docker-library:main Feb 21, 2025
1 check passed
@yosifkit yosifkit deleted the entries branch February 21, 2025 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants