-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
staticd: Fix an issue where SRv6 SIDs may not be allocated on heavily loaded systems #18317
base: master
Are you sure you want to change the base?
staticd: Fix an issue where SRv6 SIDs may not be allocated on heavily loaded systems #18317
Conversation
On a heavily loaded system, it may take some time to establish a connection between staticd and SRv6 SID Manager. As a result, staticd might try to allocate SIDs in SID Manager before the connection is fully established, leading to a failure. This issue is reflected in the log with the following error message: ``` 2025/03/04 15:24:01 STATIC: [HR66R-TWQYD][EC 100663302] srv6_manager_get_locator: invalid zclient socket ``` In this scenario, SIDs are still kept in the staticd configuration, but not allocated and not installed in the RIB. This commit fixes the issue by forcing staticd to retry to allocate the SIDs once the connection with SID Manager is successfully established. Signed-off-by: Carmine Scarpitta <[email protected]>
In same cases, it may happen staticd attempts to allocate a SID when it does not know about the parent locator yet. In this case, the locator pointer stored inside the SID data structure is NULL. When `static_zebra_request_srv6_sid` attempts to dereference the locator pointer, we get a NULL pointer dereference crash. This commit adds a check to prevent the crash from happening. If the locator pointer is NULL, it means that staticd is not aware about the parent locator yet. Then, staticd should call `static_zebra_srv6_manager_get_locator()` to get the locator from SRv6 Manager. Once staticd receives the requested locator, the existing callback `static_zebra_process_srv6_locator_internal()` will take care of retrying to allocate the SID. Signed-off-by: Carmine Scarpitta <[email protected]>
896e120
to
ecad2bd
Compare
@@ -921,6 +921,11 @@ extern void static_zebra_request_srv6_sid(struct static_srv6_sid *sid) | |||
if (!sid) | |||
return; | |||
|
|||
if (!sid->locator) { | |||
static_zebra_srv6_manager_get_locator(sid->locator_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is not it that we missed the srv6 locator event ?
can you please confirm if the srv6 locators are really populated in static?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cscarpitta , I reformulate.
If staticd did not have access to locator information, this means zebra may not have sent locators information to static.
- either static is not part of the daemons interested in locators
- or a retry needs to be done
- or your patch is fine
please advert,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1- This protection should be done for all dameons that read SRv6 SID. The same code pattern should be applied to ISIS and BGP.
2- A retry mechanism for locator should also be put in place. This hook should be provisioned in zebra, so that the current locator are resent to the various CP routing clients.
hook_call(zserv_client_connect, client);
The 2nd part can be part of a separate pull request.
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
On a heavily loaded system, it may take some time to establish a connection between staticd and SRv6 SID Manager.
As a result, staticd might try to allocate SIDs in SID Manager before the connection is fully established, leading to a failure.
This issue is reflected in the log with the following error message:
In this scenario, SIDs are still kept in the staticd configuration, but not allocated and not installed in the RIB.
This commit fixes the issue by forcing staticd to retry to allocate the SIDs once the connection with SID Manager is successfully established.