-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue# 307 fix: Failure detection by maximum retry fail doesnt take the exact value set in exchange.max-retry-count config parameter #308
Conversation
lgtm |
…alue set in exchange.max-retry-count config parameter
/sync |
1 similar comment
/sync |
@sraghunandan: This pr has been synchronized to the Gitee Repository In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the opensourceways/test-infra repository. |
lgtm |
/lgtm |
LGTM label has been added. Git tree hash: 473599daf87aad3e140f4236fe8d1cad7af5ce98
|
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahanapradhan, sraghunandan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What does this PR do / why do we need it:
for max-retry based failure detection mechanism, exchange.max-retry-count value is used to retry failed task.
Once that number of times retry fails, query is failed.
The issue is: the number of retry happens doesn't match the configured value exchange.max-retry-count.
If exchange.max-retry-count=20 is set, retry happens for 21, 23, 24.. for random number of times which is some value close to 20, but it never is 20.
When exchange.max-retry-count is not set, default value 10 is considered. But retry happens for 15 (or so) times.
Cause:
failure count is modified using two synchronized methods --> Backoff.failure() and Backoff.maxTried().
Two threads can parallely use these two methods.
Unless read/write of failure count is not made synchronized, the number cannot match the exact expected number in presence of multiple threads.
This fix use synchronized getter and setter methods to read/update failure count value.
Which issue(s) this PR fixes:
Fixes #307
https://gitee.com/openlookeng/hetu-core/issues/I4WGE1
Special notes for your reviewers: