Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding multiple static routes with interface to same non-existent nexthop fail to install correctly #18321

Open
2 tasks done
mwinter-osr opened this issue Mar 6, 2025 · 6 comments · May be fixed by #18361
Open
2 tasks done
Labels
triage Needs further investigation

Comments

@mwinter-osr
Copy link
Member

Description

When installing multiple static routes to the same non-existent nexthop, only the first route is seen in the routing table (but they are correctly installed in the config).
As soon as another route is installed (to different nexthop), the previously routes show up

Version

This is seen at least back to FRR 8.4 up to current FRR master

How to reproduce

Load a config similar to this. No routing protocols needed, just zebra, mgmtd and staticd.

Current configuration:
!
frr version 10.4-dev
frr defaults traditional
hostname r1
log file /tmp/frr.log
log syslog informational
log commands
service integrated-vtysh-config
!
debug zebra events
debug zebra rib
debug zebra nexthop detail
debug static events
debug static route
!
interface ens3
 description Normal
 ip address 192.168.1.1/24
exit
!
end

Please be aware that there is NO default route

r1# sh ip route
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

IPv4 unicast VRF default:
C>* 192.168.1.0/24 is directly connected, ens3, weight 1, 00:00:32
L>* 192.168.1.1/32 is directly connected, ens3, weight 1, 00:00:32
C>* 192.168.122.0/23 [0/100] is directly connected, ens2, weight 1, 00:00:35
K>* 192.168.122.1/32 [0/100] is directly connected, ens2, weight 1, 00:00:35
L>* 192.168.122.13/32 is directly connected, ens2, weight 1, 00:00:35

Adding 3 routes with the same nexthop and output interface:

vtysh -c conf -c 'ip route 10.1.3.0/24 172.16.1.3 ens3'
vtysh -c conf -c 'ip route 10.1.4.0/24 172.16.1.3 ens3'
vtysh -c conf -c 'ip route 10.1.5.0/24 172.16.1.3 ens3'

Looking at the routes:

vtysh -c 'show ip route'
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

IPv4 unicast VRF default:
S   10.1.3.0/24 [1/0] via 172.16.1.3, ens3 inactive, weight 1, 00:00:00
C>* 192.168.1.0/24 is directly connected, ens3, weight 1, 00:00:03
L>* 192.168.1.1/32 is directly connected, ens3, weight 1, 00:00:03
C>* 192.168.122.0/23 [0/100] is directly connected, ens2, weight 1, 00:00:06
K>* 192.168.122.1/32 [0/100] is directly connected, ens2, weight 1, 00:00:06
L>* 192.168.122.13/32 is directly connected, ens2, weight 1, 00:00:06

--> Only first route is seen

Now adding a 4th route with a different nexthop:

vtysh -c conf -c 'ip route 10.1.6.0/24 172.16.1.100 ens3'

and all the routes are now seen

vtysh -c 'show ip route'
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

IPv4 unicast VRF default:
S   10.1.3.0/24 [1/0] via 172.16.1.3, ens3 inactive, weight 1, 00:00:00
S   10.1.4.0/24 [1/0] via 172.16.1.3, ens3 inactive, weight 1, 00:00:00
S   10.1.5.0/24 [1/0] via 172.16.1.3, ens3 inactive, weight 1, 00:00:00
S   10.1.6.0/24 [1/0] via 172.16.1.100, ens3 inactive, weight 1, 00:00:00
C>* 192.168.1.0/24 is directly connected, ens3, weight 1, 00:00:12
L>* 192.168.1.1/32 is directly connected, ens3, weight 1, 00:00:12
C>* 192.168.122.0/23 [0/100] is directly connected, ens2, weight 1, 00:00:15
K>* 192.168.122.1/32 [0/100] is directly connected, ens2, weight 1, 00:00:15
L>* 192.168.122.13/32 is directly connected, ens2, weight 1, 00:00:15

Expected behavior

All routes are seen in the "show ip route" output

Actual behavior

Adding routes to existing non-existent nexthop don't show up until other routes are added

Additional context

No response

Checklist

  • I have searched the open issues for this bug.
  • I have not included sensitive information in this report.
@mwinter-osr mwinter-osr added the triage Needs further investigation label Mar 6, 2025
@mwinter-osr
Copy link
Member Author

frr.log (look for comments starting with *** for annotation of the steps

2025/03/05 23:57:57 STATIC: [M7Q4P-46WDR] vty[21]@> enable
2025/03/05 23:57:57 ZEBRA: [M7Q4P-46WDR] vty[43]@# show ip route
2025/03/05 23:57:57 ZEBRA: [M7Q4P-46WDR] vty[43]@> enable
2025/03/05 23:57:57 ZEBRA: [M7Q4P-46WDR] vty[43]@# logmsg informational *** Adding 1st route 10.1.3.0/24 via 172.16.1.3
2025/03/05 23:57:57 ZEBRA: *** Adding 1st route 10.1.3.0/24 via 172.16.1.3
2025/03/05 23:57:57 MGMTD: [M7Q4P-46WDR] vty[17]@> enable
2025/03/05 23:57:57 ZEBRA: [M7Q4P-46WDR] vty[43]@> enable
2025/03/05 23:57:57 OSPF: [M7Q4P-46WDR] vty[13]@> enable
2025/03/05 23:57:57 STATIC: [M7Q4P-46WDR] vty[21]@> enable
2025/03/05 23:57:57 MGMTD: [M7Q4P-46WDR] vty[17]@# conf
2025/03/05 23:57:57 ZEBRA: [M7Q4P-46WDR] vty[43]@# conf
2025/03/05 23:57:57 OSPF: [M7Q4P-46WDR] vty[13]@# conf
2025/03/05 23:57:57 STATIC: [M7Q4P-46WDR] vty[21]@# conf
2025/03/05 23:57:57 MGMTD: [M7Q4P-46WDR] vty[17]@(config)# ip route 10.1.3.0/24 172.16.1.3 ens3
2025/03/05 23:57:57 STATIC: [K32YH-0RHMH] Registering nexthop(172.16.1.3/32) for 10.1.3.0/24
2025/03/05 23:57:57 ZEBRA: [M7Q4P-46WDR] vty[43]@> enable
2025/03/05 23:57:57 ZEBRA: [M7Q4P-46WDR] vty[43]@# logmsg informational *** Adding 2nd route 10.1.4.0/24 via 172.16.1.3
2025/03/05 23:57:57 ZEBRA: *** Adding 2nd route 10.1.4.0/24 via 172.16.1.3
2025/03/05 23:57:57 ZEBRA: [WEXBA-A02TB] zebra_nhe_find: id 0, lookup 0x55ea0cef3260, vrf 0, type 29, depends 0x0 => Found 0x0([NULL])
2025/03/05 23:57:57 ZEBRA: [PKSQ5-QBSHW] zebra_nhe_find: => created 0x55ea0cf8f120 (15[172.16.1.3 if 3 vrfid 0])
2025/03/05 23:57:57 ZEBRA: [MGYCQ-XGZST] zebra_nhg_rib_find_nhe: rt_nhe 0x55ea0cef3260(15[]) => nhe 0x55ea0cf8f120(15[172.16.1.3 if 3 vrfid 0])
2025/03/05 23:57:57 ZEBRA: [WDEB1-93HCZ] zebra_nhg_increment_ref: nhe 0x55ea0cf8f120 (15[172.16.1.3 if 3 vrfid 0]) 0 => 1
2025/03/05 23:57:57 ZEBRA: [SJ6HS-F486Q] process_subq_early_route_add: (default:?):10.1.3.0/24: Inserting route rn 0x55ea0cf8e870, re 0x55ea0cef2840 (static/IPv4/unicast) existing 0x0, same_count 0
2025/03/05 23:57:57 ZEBRA: [VKWCR-QB19H] zebra_nhg_free: nhe 0x55ea0cef3260 (15[]), refcnt 0, NH 172.16.1.3, via ens3
2025/03/05 23:57:57 ZEBRA: [VTD0C-NR53W] nexthop_active_update: re 0x55ea0cef2840 nhe 0x55ea0cf8f120 (15[172.16.1.3 if 3 vrfid 0]), curr_nhe 0x55ea0cef3420
2025/03/05 23:57:57 ZEBRA: [T9JWA-N8HM5] nexthop_active_check: re 0x55ea0cef2840, nexthop 172.16.1.3, via ens3
2025/03/05 23:57:57 ZEBRA: [ZG85Y-SBJH3] nexthop_active_update: re 0x55ea0cef2840 curr_active 0
2025/03/05 23:57:57 ZEBRA: [WEXBA-A02TB] zebra_nhe_find: id 0, lookup 0x55ea0cef3420, vrf 0, type 29, depends 0x0 => Found 0x55ea0cf8f120(15[172.16.1.3 if 3 vrfid 0])
2025/03/05 23:57:57 ZEBRA: [MGYCQ-XGZST] zebra_nhg_rib_find_nhe: rt_nhe 0x55ea0cef3420(0[]) => nhe 0x55ea0cf8f120(15[172.16.1.3 if 3 vrfid 0])
2025/03/05 23:57:57 ZEBRA: [HF388-B440H] nexthop_active_update: re 0x55ea0cef2840 CHANGED: nhe 0x55ea0cf8f120 (15[172.16.1.3 if 3 vrfid 0]) => new_nhe 0x55ea0cf8f120 (15[172.16.1.3 if 3 vrfid 0]) rib_find_nhe returned 0x55ea0cf8f120 (15[172.16.1.3 if 3 vrfid 0]) refcnt: 1
2025/03/05 23:57:57 ZEBRA: [VKWCR-QB19H] zebra_nhg_free: nhe 0x55ea0cef3420 (0[]), refcnt 0, NH 172.16.1.3, via ens3
2025/03/05 23:57:57 STATIC: [S4MGP-4WQTA] route_notify_owner: Route 10.1.3.0/24 failed to install for table: 254
2025/03/05 23:57:57 MGMTD: [M7Q4P-46WDR] vty[17]@> enable
2025/03/05 23:57:57 ZEBRA: [M7Q4P-46WDR] vty[43]@> enable
2025/03/05 23:57:57 OSPF: [M7Q4P-46WDR] vty[13]@> enable
2025/03/05 23:57:57 STATIC: [M7Q4P-46WDR] vty[21]@> enable
2025/03/05 23:57:57 MGMTD: [M7Q4P-46WDR] vty[17]@# conf
2025/03/05 23:57:57 ZEBRA: [M7Q4P-46WDR] vty[43]@# conf
2025/03/05 23:57:57 OSPF: [M7Q4P-46WDR] vty[13]@# conf
2025/03/05 23:57:57 STATIC: [M7Q4P-46WDR] vty[21]@# conf
2025/03/05 23:57:57 MGMTD: [M7Q4P-46WDR] vty[17]@(config)# ip route 10.1.4.0/24 172.16.1.3 ens3
2025/03/05 23:57:57 STATIC: [MHG7X-3SV2Z] Reusing registered nexthop(172.16.1.3/32) for 10.1.4.0/24 0
2025/03/05 23:57:57 ZEBRA: [M7Q4P-46WDR] vty[43]@> enable
2025/03/05 23:57:57 ZEBRA: [M7Q4P-46WDR] vty[43]@# logmsg informational *** Adding 3rd route 10.1.5.0/24 via 172.16.1.3
2025/03/05 23:57:57 ZEBRA: *** Adding 3rd route 10.1.5.0/24 via 172.16.1.3
2025/03/05 23:57:57 MGMTD: [M7Q4P-46WDR] vty[17]@> enable
2025/03/05 23:57:57 ZEBRA: [M7Q4P-46WDR] vty[43]@> enable
2025/03/05 23:57:57 OSPF: [M7Q4P-46WDR] vty[13]@> enable
2025/03/05 23:57:57 STATIC: [M7Q4P-46WDR] vty[21]@> enable
2025/03/05 23:57:57 MGMTD: [M7Q4P-46WDR] vty[17]@# conf
2025/03/05 23:57:57 ZEBRA: [M7Q4P-46WDR] vty[43]@# conf
2025/03/05 23:57:57 OSPF: [M7Q4P-46WDR] vty[13]@# conf
2025/03/05 23:57:57 STATIC: [M7Q4P-46WDR] vty[21]@# conf
2025/03/05 23:57:57 MGMTD: [M7Q4P-46WDR] vty[17]@(config)# ip route 10.1.5.0/24 172.16.1.3 ens3
2025/03/05 23:57:58 STATIC: [MHG7X-3SV2Z] Reusing registered nexthop(172.16.1.3/32) for 10.1.5.0/24 0
2025/03/05 23:57:58 MGMTD: [M7Q4P-46WDR] vty[17]@> enable
2025/03/05 23:57:58 ZEBRA: [M7Q4P-46WDR] vty[43]@> enable
2025/03/05 23:57:58 OSPF: [M7Q4P-46WDR] vty[13]@> enable
2025/03/05 23:57:58 STATIC: [M7Q4P-46WDR] vty[21]@> enable
2025/03/05 23:57:58 ZEBRA: [M7Q4P-46WDR] vty[43]@# show ip route
2025/03/05 23:57:58 ZEBRA: [M7Q4P-46WDR] vty[43]@> enable
2025/03/05 23:57:58 ZEBRA: [M7Q4P-46WDR] vty[43]@# logmsg informational *** Adding 4th route 10.1.6.0/24 via 172.16.1.100
2025/03/05 23:57:58 ZEBRA: *** Adding 4th route 10.1.6.0/24 via 172.16.1.100
2025/03/05 23:57:58 MGMTD: [M7Q4P-46WDR] vty[17]@> enable
2025/03/05 23:57:58 ZEBRA: [M7Q4P-46WDR] vty[43]@> enable
2025/03/05 23:57:58 OSPF: [M7Q4P-46WDR] vty[13]@> enable
2025/03/05 23:57:58 STATIC: [M7Q4P-46WDR] vty[21]@> enable
2025/03/05 23:57:58 MGMTD: [M7Q4P-46WDR] vty[17]@# conf
2025/03/05 23:57:58 ZEBRA: [M7Q4P-46WDR] vty[43]@# conf
2025/03/05 23:57:58 OSPF: [M7Q4P-46WDR] vty[13]@# conf
2025/03/05 23:57:58 STATIC: [M7Q4P-46WDR] vty[21]@# conf
2025/03/05 23:57:58 MGMTD: [M7Q4P-46WDR] vty[17]@(config)# ip route 10.1.6.0/24 172.16.1.100 ens3
2025/03/05 23:57:58 STATIC: [K32YH-0RHMH] Registering nexthop(172.16.1.100/32) for 10.1.6.0/24
2025/03/05 23:57:58 ZEBRA: [WEXBA-A02TB] zebra_nhe_find: id 0, lookup 0x55ea0cef2d80, vrf 0, type 29, depends 0x0 => Found 0x55ea0cf8f120(15[172.16.1.3 if 3 vrfid 0])
2025/03/05 23:57:58 ZEBRA: [MGYCQ-XGZST] zebra_nhg_rib_find_nhe: rt_nhe 0x55ea0cef2d80(0[]) => nhe 0x55ea0cf8f120(15[172.16.1.3 if 3 vrfid 0])
2025/03/05 23:57:58 ZEBRA: [WDEB1-93HCZ] zebra_nhg_increment_ref: nhe 0x55ea0cf8f120 (15[172.16.1.3 if 3 vrfid 0]) 1 => 2
2025/03/05 23:57:58 ZEBRA: [SJ6HS-F486Q] process_subq_early_route_add: (default:?):10.1.4.0/24: Inserting route rn 0x55ea0cef3540, re 0x55ea0cef2c40 (static/IPv4/unicast) existing 0x0, same_count 0
2025/03/05 23:57:58 ZEBRA: [VKWCR-QB19H] zebra_nhg_free: nhe 0x55ea0cef2d80 (0[]), refcnt 0, NH 172.16.1.3, via ens3
2025/03/05 23:57:58 ZEBRA: [WEXBA-A02TB] zebra_nhe_find: id 0, lookup 0x55ea0cef3120, vrf 0, type 29, depends 0x0 => Found 0x55ea0cf8f120(15[172.16.1.3 if 3 vrfid 0])
2025/03/05 23:57:58 ZEBRA: [MGYCQ-XGZST] zebra_nhg_rib_find_nhe: rt_nhe 0x55ea0cef3120(0[]) => nhe 0x55ea0cf8f120(15[172.16.1.3 if 3 vrfid 0])
2025/03/05 23:57:58 ZEBRA: [WDEB1-93HCZ] zebra_nhg_increment_ref: nhe 0x55ea0cf8f120 (15[172.16.1.3 if 3 vrfid 0]) 2 => 3
2025/03/05 23:57:58 ZEBRA: [SJ6HS-F486Q] process_subq_early_route_add: (default:?):10.1.5.0/24: Inserting route rn 0x55ea0cef2d80, re 0x55ea0cef2ce0 (static/IPv4/unicast) existing 0x0, same_count 0
2025/03/05 23:57:58 ZEBRA: [VKWCR-QB19H] zebra_nhg_free: nhe 0x55ea0cef3120 (0[]), refcnt 0, NH 172.16.1.3, via ens3
2025/03/05 23:57:58 ZEBRA: [WEXBA-A02TB] zebra_nhe_find: id 0, lookup 0x55ea0cef3380, vrf 0, type 29, depends 0x0 => Found 0x0([NULL])
2025/03/05 23:57:58 ZEBRA: [PKSQ5-QBSHW] zebra_nhe_find: => created 0x55ea0cef3120 (16[172.16.1.100 if 3 vrfid 0])
2025/03/05 23:57:58 ZEBRA: [MGYCQ-XGZST] zebra_nhg_rib_find_nhe: rt_nhe 0x55ea0cef3380(16[]) => nhe 0x55ea0cef3120(16[172.16.1.100 if 3 vrfid 0])
2025/03/05 23:57:58 ZEBRA: [WDEB1-93HCZ] zebra_nhg_increment_ref: nhe 0x55ea0cef3120 (16[172.16.1.100 if 3 vrfid 0]) 0 => 1
2025/03/05 23:57:58 ZEBRA: [SJ6HS-F486Q] process_subq_early_route_add: (default:?):10.1.6.0/24: Inserting route rn 0x55ea0cf22990, re 0x55ea0cef3080 (static/IPv4/unicast) existing 0x0, same_count 0
2025/03/05 23:57:58 ZEBRA: [VKWCR-QB19H] zebra_nhg_free: nhe 0x55ea0cef3380 (16[]), refcnt 0, NH 172.16.1.100, via ens3
2025/03/05 23:57:58 ZEBRA: [VTD0C-NR53W] nexthop_active_update: re 0x55ea0cef2c40 nhe 0x55ea0cf8f120 (15[172.16.1.3 if 3 vrfid 0]), curr_nhe 0x55ea0cef3360
2025/03/05 23:57:58 ZEBRA: [T9JWA-N8HM5] nexthop_active_check: re 0x55ea0cef2c40, nexthop 172.16.1.3, via ens3
2025/03/05 23:57:58 ZEBRA: [ZG85Y-SBJH3] nexthop_active_update: re 0x55ea0cef2c40 curr_active 0
2025/03/05 23:57:58 ZEBRA: [WEXBA-A02TB] zebra_nhe_find: id 0, lookup 0x55ea0cef3360, vrf 0, type 29, depends 0x0 => Found 0x55ea0cf8f120(15[172.16.1.3 if 3 vrfid 0])
2025/03/05 23:57:58 ZEBRA: [MGYCQ-XGZST] zebra_nhg_rib_find_nhe: rt_nhe 0x55ea0cef3360(0[]) => nhe 0x55ea0cf8f120(15[172.16.1.3 if 3 vrfid 0])
2025/03/05 23:57:58 ZEBRA: [HF388-B440H] nexthop_active_update: re 0x55ea0cef2c40 CHANGED: nhe 0x55ea0cf8f120 (15[172.16.1.3 if 3 vrfid 0]) => new_nhe 0x55ea0cf8f120 (15[172.16.1.3 if 3 vrfid 0]) rib_find_nhe returned 0x55ea0cf8f120 (15[172.16.1.3 if 3 vrfid 0]) refcnt: 3
2025/03/05 23:57:58 ZEBRA: [VKWCR-QB19H] zebra_nhg_free: nhe 0x55ea0cef3360 (0[]), refcnt 0, NH 172.16.1.3, via ens3
2025/03/05 23:57:58 ZEBRA: [VTD0C-NR53W] nexthop_active_update: re 0x55ea0cef2ce0 nhe 0x55ea0cf8f120 (15[172.16.1.3 if 3 vrfid 0]), curr_nhe 0x55ea0cf22a10
2025/03/05 23:57:58 ZEBRA: [T9JWA-N8HM5] nexthop_active_check: re 0x55ea0cef2ce0, nexthop 172.16.1.3, via ens3
2025/03/05 23:57:58 ZEBRA: [ZG85Y-SBJH3] nexthop_active_update: re 0x55ea0cef2ce0 curr_active 0
2025/03/05 23:57:58 ZEBRA: [WEXBA-A02TB] zebra_nhe_find: id 0, lookup 0x55ea0cf22a10, vrf 0, type 29, depends 0x0 => Found 0x55ea0cf8f120(15[172.16.1.3 if 3 vrfid 0])
2025/03/05 23:57:58 ZEBRA: [MGYCQ-XGZST] zebra_nhg_rib_find_nhe: rt_nhe 0x55ea0cf22a10(0[]) => nhe 0x55ea0cf8f120(15[172.16.1.3 if 3 vrfid 0])
2025/03/05 23:57:58 ZEBRA: [HF388-B440H] nexthop_active_update: re 0x55ea0cef2ce0 CHANGED: nhe 0x55ea0cf8f120 (15[172.16.1.3 if 3 vrfid 0]) => new_nhe 0x55ea0cf8f120 (15[172.16.1.3 if 3 vrfid 0]) rib_find_nhe returned 0x55ea0cf8f120 (15[172.16.1.3 if 3 vrfid 0]) refcnt: 3
2025/03/05 23:57:58 ZEBRA: [VKWCR-QB19H] zebra_nhg_free: nhe 0x55ea0cf22a10 (0[]), refcnt 0, NH 172.16.1.3, via ens3
2025/03/05 23:57:58 ZEBRA: [VTD0C-NR53W] nexthop_active_update: re 0x55ea0cef3080 nhe 0x55ea0cef3120 (16[172.16.1.100 if 3 vrfid 0]), curr_nhe 0x55ea0cf22b00
2025/03/05 23:57:58 ZEBRA: [T9JWA-N8HM5] nexthop_active_check: re 0x55ea0cef3080, nexthop 172.16.1.100, via ens3
2025/03/05 23:57:58 ZEBRA: [ZG85Y-SBJH3] nexthop_active_update: re 0x55ea0cef3080 curr_active 0
2025/03/05 23:57:58 ZEBRA: [WEXBA-A02TB] zebra_nhe_find: id 0, lookup 0x55ea0cf22b00, vrf 0, type 29, depends 0x0 => Found 0x55ea0cef3120(16[172.16.1.100 if 3 vrfid 0])
2025/03/05 23:57:58 ZEBRA: [MGYCQ-XGZST] zebra_nhg_rib_find_nhe: rt_nhe 0x55ea0cf22b00(0[]) => nhe 0x55ea0cef3120(16[172.16.1.100 if 3 vrfid 0])
2025/03/05 23:57:58 ZEBRA: [HF388-B440H] nexthop_active_update: re 0x55ea0cef3080 CHANGED: nhe 0x55ea0cef3120 (16[172.16.1.100 if 3 vrfid 0]) => new_nhe 0x55ea0cef3120 (16[172.16.1.100 if 3 vrfid 0]) rib_find_nhe returned 0x55ea0cef3120 (16[172.16.1.100 if 3 vrfid 0]) refcnt: 1
2025/03/05 23:57:58 ZEBRA: [VKWCR-QB19H] zebra_nhg_free: nhe 0x55ea0cf22b00 (0[]), refcnt 0, NH 172.16.1.100, via ens3
2025/03/05 23:57:58 STATIC: [S4MGP-4WQTA] route_notify_owner: Route 10.1.4.0/24 failed to install for table: 254
2025/03/05 23:57:58 STATIC: [S4MGP-4WQTA] route_notify_owner: Route 10.1.5.0/24 failed to install for table: 254
2025/03/05 23:57:58 STATIC: [S4MGP-4WQTA] route_notify_owner: Route 10.1.6.0/24 failed to install for table: 254
2025/03/05 23:57:58 MGMTD: [M7Q4P-46WDR] vty[17]@> enable
2025/03/05 23:57:58 ZEBRA: [M7Q4P-46WDR] vty[43]@> enable
2025/03/05 23:57:58 OSPF: [M7Q4P-46WDR] vty[13]@> enable
2025/03/05 23:57:58 STATIC: [M7Q4P-46WDR] vty[21]@> enable
2025/03/05 23:57:58 ZEBRA: [M7Q4P-46WDR] vty[43]@# show ip route

@donaldsharp
Copy link
Member

I disagree staticd should not be installing any routes until the nexthop is valid.

@mwinter-osr
Copy link
Member Author

I disagree staticd should not be installing any routes until the nexthop is valid.

So in your view, it's still wrong, but you think none of the routes should be visible in "show ip route" until the nexthop is reachable? (So no inactive routes at all in the output?)

@eqvinox
Copy link
Contributor

eqvinox commented Mar 6, 2025

I disagree staticd should not be installing any routes until the nexthop is valid.

This is not how zebra, staticd, and ZAPI were designed. Daemons submit candidate routes to zebra, zebra resolves and installs them. staticd doesn't even need full NHT, it only needs interface state tracking because we can't submit a nexthop to zebra if the interface doesn't have an ifindex.

And because we never wrote down any of this we are now yet again in a situation where we can't answer this question other than with people's opinions, therefore we're don't know what the proper fix for a bug is, and will run into this problem again next time we look at the topic and don't have anything written down.

@donaldsharp
Copy link
Member

Actually it is. All upper level protocols are not installing routes until they are actually reachable. static routes pre this change were the only ones that had routes installed that were inactive and I would argue that was a artifact of the implementation instead of any foreplanning from that perspective.

@eqvinox
Copy link
Contributor

eqvinox commented Mar 10, 2025

"All upper level protocols" does not make sense here to begin with; this is about unresolved nexthops which can't happen with IS-IS, RIP(ng) or OSPFv3. BGP, staticd and OSPF are the only "classic" protocols where this can happen to begin with.

(I'm going to exclude babeld, eigrpd, pbrd & pathd here because I don't know their behavior exactly enough.)

OSPF still installs routes (ASEs with 3rd party nexthops) without NHT to this day. They're normally on-link but AFAIR don't have to be. (I'm only 80% confident here, can go look that up at some point.)

staticd when it was inside zebra was interwoven with the RIB quite tightly, and the code had to be triggered when nexthops change. When it was isolated out, it "took that code with it", which might not have been the right thing to do to begin with.

BGP… is the most complicated setup. It needs NHT in some cases, but in others NHT indirectly substitutes for installation confirmation. But in times before NHT, even BGP just installed routes with no regard to nexthop reachability.

There are two issues here:

  • we don't have any design document. We clearly have different views on what exactly is going on here, and we can't resolve that by looking it up somewhere because that somewhere doesn't exist. The absence of that doc is IMHO a serious problem. I don't even care if I'm right or not, I care that we can't answer the question.

  • we also don't have a why rationale. And now I'm looking at it and I don't see any reason why indeed we would loop nexthop changes for static routes into NHT messages to staticd and back out into zebra. They have essentially no porpose or effect that I can see — if staticd just feeds the route, zebra has all the information it needs. Are we taking notable cost here just to share behavior with bgpd?

    • and as an extension — we didn't have that rationale included at the time of merging the PR. This moves the design rationale out of view of the PR review process; because it's not in a file visible in the diff, it doesn't get looked at. Not good.

We've somehow ended up midway between behaviors in staticd right now. Sometimes routes are waiting for NHT, sometimes they're installed inactive in zebra. Clearly this needs to be fixed to consistently be one of the two. But I'd say we really need to answer a few questions here before we make that call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage Needs further investigation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants