-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client: Added new session option to set timeout for session creation #16385
base: main
Are you sure you want to change the base?
client: Added new session option to set timeout for session creation #16385
Conversation
cd6fd75
to
ffe5ef5
Compare
What if the etcdserver shutdown after the connection has already established? Will the session timeout or be blocked waiting for the cluster to recover? The expected behaviour should be the latter. Can you add a test case to verify it? |
ffe5ef5
to
77ec301
Compare
Thanks for pointing this out, I've fixed the bug and added another test case to verify this. Do let me know what you think. |
Hey @ahrtr, any updates regarding the changes I've made? I'm not quite sure how to write a test that verifies that hanging would occur (without using timeout, as the behavior has to be differentiated from the timeout that the new session option creates), so I have verified the context object instead. |
Hey @fuweid, thanks for your review. I've addressed them accordingly, so do let me know what you think 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Please also squash the commits, thx |
9f1ce8d
to
976cff5
Compare
Hey @ahrtr, I've added a new test case where we verify that the default behavior isn't affected if an operation hangs beyond the timeout specified by the new option |
// if Put operation is blocked beyond the timeout specified using WithCreationTimeout, | ||
// that timeout is not used by the Put operation | ||
case <-time.After(2 * time.Second): | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please start the etcdserver again, and verify that eventually the cli.Put
will succeed.
011d138
to
357f70d
Compare
Hey @ahrtr, I've altered the test
How do you think we should proceed? |
4869140
to
d0ac5df
Compare
Hey @ahrtr, I've added the logging as suggested, and the error returned was |
d0ac5df
to
98fa529
Compare
case <-donec: | ||
case err := <-errorc: | ||
t.Errorf("Put failed: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a potential issue here. When the Put finishes, both donec and errorc might be ready to be read, then it's uncertain which case is selected.
So I suggest to remove donec, and always feed the err (returned by Put) into errroc, even it's nil.
Sorry for the confusion.
Hey @ahrtr no worries, I've removed donec as instructed. However I'm still facing the same error |
|
Hmm no, oddly this happens every time I've ran the test |
|
||
// create new cluster | ||
clus := integration2.NewCluster(t, &integration2.ClusterConfig{Size: 1}) | ||
defer clus.Terminate(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer clus.Terminate(t) | |
defer clus.Terminate(t) | |
clus.Members[0].KeepDataDirTerminate = true |
} | ||
|
||
// restarting and ensuring that the Put operation will eventually succeed | ||
clus.Members[0].Restart(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clus.Members[0].Restart(t) | |
clus.Members[0].Restart(t) | |
clus.Members[0].WaitOK(t) |
The reason why the workflow is green is the test was skipped.
When terminating the member, the data is removed automatically. You need to keep the data dir so that the lease is still valid after restarting the member. Please apply the suggested change, and try again. |
288707e
to
b3f356b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @AngstyDuck - Can you please rebase this pr to address conflicts, thanks!
2354cb9
to
895e9b8
Compare
Hey @jmhbnz, I've rebased the PR 👍 thanks! |
Discussed during sig-etcd triage meeting, @AngstyDuck can you please rebase this pr once more? Thanks! |
895e9b8
to
ba58a49
Compare
311eea4
to
6f76f6e
Compare
Hey @jmhbnz, the linting errors have been fixed, do let me know if you need anything else 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - Thanks @AngstyDuck and apologies this pr has been in review for so long.
/test all |
@AngstyDuck to be safe we should rebase one final time before we merge. |
2b5f97c
to
c723ffd
Compare
When communicating via gRPC, etcd sets the gRPC option WaitForReady = true to minimize client request error responses due to transient failures. But because of this option, the creation of new sessions hang without errors when all nodes could not be contacted by the client. This new session option allows client to receive an error if the nodes cannot be contacted after the specified amount of time. Signed-off-by: AngstyDuck <[email protected]>
c723ffd
to
b37c822
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahrtr, AngstyDuck, fuweid, jmhbnz, lhmzhou The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Any updates on this? |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@a-nych sorry for late update. Please rebase it |
When communicating via gRPC, etcd sets the gRPC option WaitForReady = true to minimize client request error responses due to transient failures. But because of this option, the creation of new sessions hang without errors when all nodes could not be contacted by the client. This new session option allows client to receive an error if the nodes cannot be contacted after the specified amount of time.
This seeks to resolve #14631
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.