-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bgpd: check rmac and nh of evpn imported routes #18372
base: master
Are you sure you want to change the base?
bgpd: check rmac and nh of evpn imported routes #18372
Conversation
could you please add the verbiage you added to the PR to the actual commit message? I do not want to loose this data |
Check rmac and nh of new bestpath for evpn imported prefix before withdrawal. Currently when new bestpath is designated, evpn imported routes are being withdrawn from kernel causing downtime which length depends on amount of routes to process. Basically in a setup where multiple peers advertise identical evpn type-5 routes, if we clear session with any of them (and at least one is still up), we observe route withdrawals from kernel (even though we receive those routes from remaining peers). If the amount of routes is significant it causes a noticeable downtime before routes are readded to kernel. Also menitoned behavior (where routes are being withdrawn immediately even if other peers advertise them) has started to occur after backpressure bgp zebra client FRRouting#15524 Let's check rmac entry and nh of new selected bestpath and do not actually withdraw them from kernel if those two are the same. This fixes and issue where the same routes are being advertised by multiple peers and we clear session with one of them FRRouting#18240 Signed-off-by: Dawid Kopec <[email protected]>
8833051
to
4865a82
Compare
@Mergifyio backport stable/10.3 stable/10.2 stable/10.1 |
🟠 Waiting for conditions to match
|
@raja-rajasekar opinion? |
The withdraw actual shouldn't be at this point here(An action item I had in my plate for some time but never got a chance to do it), and https://github.com/FRRouting/frr/pull/18158/files addressed that issue (Not yet merged). This is because we can always end up in a situation where we immediately withdraw the route while the install is pending for later (backpressure), thereby blackhole-ing the traffic. So I am not sure why we need to patch this code, and rather maybe directly check with the fix in https://github.com/FRRouting/frr/pull/18158/files (Of course this does some other things as well which I haven't looked at it yet, however the commit which removes this block is what I am talking about) |
Indeed #18158 is a better fix and would be ideal to have. I have tested my case with frr build from that PR and there is no downtime (which is expected as the snippet with route withdrawal has been deleted). However there were concerns with breaking a standard pattern in zebra and there were no updates under the PR so I thought that fix might not eventually make it to master and thought of less invasive fix addressing the case where we have an update with old and new bestpaths with the same next-hop resulting in a downtime due to kernel add/del operations. |
Fix for an issue which has been described here: #18240
Basically in a setup where multiple peers advertise identical evpn type-5 rotues, if we clear session with any of them (and at least one is still up), we observe route withdrawals from kernel (even though we receive those routes from remaining peers). If the amount of routes is significant it causes a noticeable downtime before routes are readded to kernel.
I'd appreciate a feedback on why a route withdrawal is necessary in such case (per the comment above I can assume that rmac entry/nh might be lingering somewhere, but where? Change which introduced withdrawal has been merged long time ago, is it still necessary?)
Also menitoned behavior (where routes are being withdrawn immediately even if other peers advertise them) has started to occur after backpressure bgp zebra client #15524.