-
Notifications
You must be signed in to change notification settings - Fork 523
Per Partition Automatic Failover: Adds Hub Region Processing Only While Routing Requests Failed with 404/1002. #5447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good!
| && subStatusCode == SubStatusCodes.ReadSessionNotAvailable) | ||
| { | ||
| { | ||
| this.addHubRegionProcessingOnlyHeader = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flag for addHubRegionProcessingOnlyHeader is set for all the instances of 404/1002(Read session not found) and consequently we would set the header for every 404/1002 retry, is this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved with a comment
| } | ||
|
|
||
| if (statusCode == HttpStatusCode.NotFound | ||
| && subStatusCode == SubStatusCodes.ReadSessionNotAvailable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also double check: does this change also targeted for MM as well? For MM, writes can happen in any region, also enable this for MM might cause regression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirmed this with backend team and this change is not intended to be used in multi-master.
| if (this.addHubRegionProcessingOnlyHeader) | ||
| { | ||
| request.Headers[HubRegionHeader] = bool.TrueString; | ||
| this.addHubRegionProcessingOnlyHeader = false; // reset after applying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would be the errors returned if SDK try to read from non-hub region?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also falling back to new hub for that partition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting for design document
Pull Request Template
Description
During partition-level failover and failback under session consistency, a timing gap can cause read requests to fail with 404/1002 errors. When a partition temporarily fails over to a secondary region and later begins failing back to the primary region, the SDK’s read circuit breaker (PPCB) may start routing reads back to the primary region before it has fully caught up with the writes from the failover region. As a result, reads using session tokens from the previous write region may fail because the primary region does not yet have the corresponding session state. Since the SDK currently does not perform cross-regional retries for 404/1002 responses, these reads continue to fail until the primary region is fully synchronized. The goal is to leverage the new backend header x-ms-cosmos-hub-region-processing-only to detect such conditions and route retry requests to the correct write (hub) region, ensuring successful session-consistent reads during the failback window.
Type of change
Please delete options that are not relevant.
Closing issues
To automatically close an issue: closes #5440