-
Notifications
You must be signed in to change notification settings - Fork 31
Construct testdata at build time rather than cloning a repo #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
It turns out that making minor changes in the test data is prohibitively complicated since the tests very much rely on the git shasums, and any change will need to be reflected here. We also can't rebase the changes since that would break historical builds. Rather than rely on an external repo, instead, we now create the data within our own build. The individual states we want to use are modelled as separate directories within the integration test's `testdata` dir, and each state is given a meaningful name.
illicitonion
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally understand and agree with the problem (and sorry for not having automated the repo creation in the first place, it would've been polite/useful!)
I do think there are a couple of draw-backs to this approach, but I think with a pretty small amount of extra work we can get the best of both worlds?
The two big drawbacks to me are it makes it hard to verify the tests are actually testing what we think (if there are bugs in the generation, we'll never notice, and it makes it hard to manually run a test (e.g. right now I can just clone the test repo and run a manually built TD against it).
But given you're already generating git repos here, my ideal would be that we make a main function we can run which just generates a bunch of tags in a git repo with the states we care about (maybe named after the directory names + date of timestamp of generation or something), and generates the Commits class that contains the constants with the commit shas.
Which I think is roughly a trivial amount of work to do on top of what you're already doing (just "save and push" the commits, rather than "generate and discard them"), but gives us the best of both worlds?
WDYT?
|
Would you like to have a go at this once this PR lands? Or send a patch on top of this PR? |
|
Sure, I can give it a go, but likely won't be for at least a few days :) |
|
@illicitonion do you suggest we keep a separate testdata repo, or is the idea that the generator creates the git repo ~on the fly before testing and passes it as test data somehow? |
This canonicalize all the labels in all attributes found that are label related using the output of `bazel mod dump_repo_mapping ""` for each commit. This addresses #105 It doesn't have any integration tests yet because it would be much better to leverage #104 which is still in the works but it was tested against our internal monorepo and the reproducer in #105 Label like attributes (string defined but has nodep = True) are special and handled as an exception as if they were labels, and thus also converted. (See [this thread](https://bazelbuild.slack.com/archives/CDCMRLS23/p1742821059464199))
It turns out that making minor changes in the test data is prohibitively complicated since the tests very much rely on the git shasums, and any change will need to be reflected here. We also can't rebase the changes since that would break historical builds.
Rather than rely on an external repo, instead, we now create the data within our own build. The individual states we want to use are modelled as separate directories within the integration test's
testdatadir, and each state is given a meaningful name.