-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to controller-runtime 0.19.1 / Kube 1.31 #293
base: main
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: xrstf The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for k8s-prow ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/test all |
1 similar comment
/test all |
/test all |
9102251
to
97a1840
Compare
97a1840
to
2fc8f26
Compare
/test all |
/test all |
f1a2136
to
68c7dc4
Compare
/test all |
/cc |
pkg/scheduler/reconciler_test.go
Outdated
@@ -61,11 +61,11 @@ func (ft *fakeTracker) Get(gvr schema.GroupVersionResource, ns, name string, opt | |||
return ft.ObjectTracker.Get(gvr, ns, name, opts...) | |||
} | |||
|
|||
func (ft *fakeTracker) Update(gvr schema.GroupVersionResource, obj runtime.Object, ns string, opts ...metav1.UpdateOptions) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to file this away in my mental bank of "reasons the fake client is not what you want".
// can rerun from. | ||
// Horologium itself is pretty good at handling the configmap update, but | ||
// not kubelet, according to | ||
// https://github.com/kubernetes/kubernetes/issues/30189 kubelet syncs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n.b. the linked issue says some semantically useless annotation update should kick the kubelet
if !passed { | ||
t.Fatal("Expected updated job.") | ||
// Wait for the first job to be created by horologium. | ||
initialJob := getLatestJob(t, jobName, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: inital = getLatest()
is confusing - is it the initial or the latest?
}); err != nil { | ||
t.Logf("ERROR CLEANUP: %v", err) | ||
} | ||
}) | ||
ctx := context.Background() | ||
|
||
getLatestJob := func(t *testing.T, jobName string, lastRun *v1.Time) *prowjobv1.ProwJob { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we're missing the meaning of what this function was originally written to do - is the issue that the function expected to sort these resources by the resourceVersion
at which they were created, but when there are interspersed UPDATE calls, the objects' current resourceVersion
no longer sorts them?
Can we sort by the job ID since we know that's monotonically increasing? Creation timestamp is an awkward choice as it can have ties.
} | ||
|
||
// Prevent Deck from being too fast and recreating the new job in the same second | ||
// as the previous one. | ||
time.Sleep(1 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the downside of having the second job created in the same second? Can we fix that instead of adding a sleep?
} | ||
|
||
ready := true && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: in my experience, the moment this does not correctly happen within the timeout, return ready
will hide the details from the engineer debugging this, which makes for an unpleasant set of next steps. Could we please format the conditions you're looking for as a string, log it out on state transitions (e.g. do not spam log when nothing has changed), and indicate whether the observed state is as expected or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks great! Couple small comments.
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This PR brings Prow up-to-speed with the latest Kubernetes and controller-runtime dependencies, plus a few more changes to make these new dependencies work.
controller-tools 0.16.4
Without this update, codegen would fail:
golangci-lint 1.58.0
After updating code-generator, staticcheck suddenly threw false positives like:
However looking at the code, the
help == nil
check is leading to at.Fatal
, which should be recognized by staticcheck. I have no idea why this suddenly happened, but updating to the next highest golangci-lint version fixes the issue.Flakiness due to rate limitting
I noticed some tests flaking a lot and started digging. It turns out the issue wasn't actually from loops timing out or contexts getting cancelled, but from the client-side rate limitting that is enabled in the kube clients. I think during integration tests it doesn't make much sense to have rate limitting, as this would mean a lot of code potentially has to handle errors arising from it.
I have therefore disabled the rate limiter by setting
cfg.RateLimiter = flowcontrol.NewFakeAlwaysRateLimiter()
in the integration test utility code.Deck re-run tests
These tests have been reworked quite a bit, as they were quite flaky. The issue ultimately boiled down to the old code sorting ProwJobs by ResourceVersion, but during testing I found that it happens quite a lot that ProwJobs are created/updated nearly simultaneously. This has been resolved by sorting the ProwJobs by CreationTimestamp instead, which is unaffected by update calls.
However that is nearly the smallest change in the refactoring.
wait.PollUntilContextTimeout
. It's IMO unnecessary to have a back-off mechanism in integration tests like this. It just needlessly slows down the test.The "rotate Deployment instead of deleting Pods manually"-method has been applied to all other integration tests.