Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zebra: zebra crash for zapi stream #18359

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

soumyar-roy
Copy link

Issue:
If static route is created with a BGP route as nexthop, which recursively resolves over 512 ECMP v6 nexthops, zapi nexthop encode fails, as there is not enough memory allocated for stream. This causes assert/core dump in zebra. Right now we allocate fixed memory of ZEBRA_MAX_PACKET_SIZ size.

Fix:
1)Dynamically calculate required memory size for the stream 2)try to optimize memory usage

Testing:
No crash happens anymore with the fix
zebra: zebra crash for zapi stream

Issue:
If static route is created with a BGP route as nexthop, which recursively resolves over 512 ECMP v6 nexthops, zapi nexthop encode fails, as there is not enough memory allocated for stream. This causes assert/core dump in zebra. Right now we allocate fixed memory of ZEBRA_MAX_PACKET_SIZ size.

Fix:
1)Dynamically calculate required memory size for the stream 2)try to optimize memory usage

Testing:
No crash happens anymore with the fix
r1#
r1# sharp install routes 2100:cafe:: nexthop 2001:db8::1 1000 r1#

r2# conf
r2(config)# ipv6 route 2503:feca::100/128 2100:cafe::1 r2(config)# exit
r2#

@frrbot frrbot bot added the zebra label Mar 10, 2025
@soumyar-roy soumyar-roy marked this pull request as draft March 10, 2025 22:29
@frrbot frrbot bot added the tests Topotests, make check, etc label Mar 10, 2025
Issue:
If static route is created with a BGP route as nexthop, which
recursively resolves over 512 ECMP v6 nexthops, zapi nexthop encode
fails, as there is not enough memory allocated for stream. This causes
assert/core dump in zebra. Right now we allocate fixed memory
of ZEBRA_MAX_PACKET_SIZ size.

Fix:
1)Dynamically calculate required memory size for the stream
2)try to optimize memory usage

Testing:
No crash happens anymore with the fix
zebra: zebra crash for zapi stream

Issue:
If static route is created with a BGP route as nexthop, which
recursively resolves over 512 ECMP v6 nexthops, zapi nexthop encode
fails, as there is not enough memory allocated for stream. This causes
assert/core dump in zebra. Right now we allocate fixed memory
of ZEBRA_MAX_PACKET_SIZ size.

Fix:
1)Dynamically calculate required memory size for the stream
2)try to optimize memory usage

Testing:
No crash happens anymore with the fix
r1#
r1# sharp install routes 2100:cafe:: nexthop 2001:db8::1 1000
r1#

r2# conf
r2(config)# ipv6 route 2503:feca::100/128 2100:cafe::1
r2(config)# exit
r2#

Signed-off-by: Soumya Roy <[email protected]>

Signed-off-by: Soumya Roy <[email protected]>
@soumyar-roy soumyar-roy marked this pull request as ready for review March 11, 2025 06:21
@@ -1086,6 +1086,93 @@ static void zapi_nexthop_group_sort(struct zapi_nexthop *nh_grp,
&zapi_nexthop_cmp);
}

/* size needed by a stream for a zapi nexthop */
size_t zapi_nh_encode_stream_size(const struct zapi_nexthop *api_nh)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, no, it's not maintainable/safe to try to capture the sizes of our structs in code in this way.
can you take a look at the arithmetic approach that was recently added for the zapi_route struct, and see whether that or some adaptation of it could work here too?

Copy link
Author

@soumyar-roy soumyar-roy Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That model is not directly applicable here. 1) In other model, they have bigger ref max size which is zebra route size(there they are encoding route, not nexthop), does msg_size = sizeof(struct zapi_route); then msg_size - nh size of unused struct zapi_nexthop, in my case, I dont have any reference to start except the max ZEBRA_MAX_PACKET_SIZ, size of zapi_nexthop == 320 bytes, for 520 it is 520x320 >150000, which is more than ZEBRA_MAX_PACKET_SIZ, so I can't do ZEBRA_MAX_PACKET_SIZ - no_of_unused_nhx320. (memory requirement in negative size) 2) Also for other model, it was just lucky that sizeof(struct zapi_route) ~= 320000, so that assuming sizeof(struct zapi_route) - no_of_unused_nh*320, will always work, but in my view this assumption itself can be wrong, considering there is variable part(pointer,array) in zapi_nexthop, i.e szieof(zapi_nexthop) may not reflect the actual size we need to encode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm- I'll try to say what I meant in a different way.
the receiving side of this zapi message in lib/zclient uses struct zapi_route - and sizeof(struct zapi_route) is a good proxy for the maximum message size you need - isn't it? there are a couple of fields that aren't used in this specific path, but those are not large, mostly. the 'opaque' blob is the exception, and you can math that out? you'll be left with the fixed fields, and the embedded nexthops. you can also math out the right number of regular and backup nexthops. since those are used on the decode side, they're a good proxy for the space needed on the encode side?

@soumyar-roy soumyar-roy marked this pull request as draft March 12, 2025 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
master size/L tests Topotests, make check, etc zebra
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants