Skip to content

Conversation

sknat
Copy link
Collaborator

@sknat sknat commented Apr 8, 2025

This patch changes the way we persist the data on disk when running Calico/VPP. Instead of using struc and binary format we transition to json files. Size should not be an issue as number of pods per node are typically low (~100). This will make troubleshooting easier and errors clearer when parsing fails.

We thus remove the /bin/debug troubleshooting utility as the data format is not human readable.

Doing this, we address an issue where PBL indexes were reused upon dataplane restart, as they were stored in a list. We now will use a map to retain the containerIP mapping.

We also split the configuration from runtime spec in LocalPodSpec and add a step to clear it when corresponding VRFs are not found in VPP.

Finally we address an issue where uRPF was not properly set up for ipv6.

@sknat sknat marked this pull request as draft April 8, 2025 16:17
@sknat sknat requested review from hedibouattour and onong April 8, 2025 16:17
@sknat sknat force-pushed the nsk-move-json-storage branch from 6a4c197 to 3fe1285 Compare April 14, 2025 16:33
@sknat sknat force-pushed the nsk-move-json-storage branch from 3fe1285 to 0c31407 Compare May 26, 2025 16:17
@sknat sknat force-pushed the nsk-move-json-storage branch 4 times, most recently from 579cb63 to b842c3d Compare July 1, 2025 15:19
@sknat sknat marked this pull request as ready for review July 2, 2025 08:29
@sknat sknat force-pushed the nsk-move-json-storage branch 3 times, most recently from 4adc912 to 5d8435c Compare July 23, 2025 09:34
@sknat sknat force-pushed the nsk-move-json-storage branch from 5d8435c to b400da8 Compare August 26, 2025 13:16
@sknat sknat self-assigned this Sep 2, 2025
This patch changes the way we persist the data on disk when
running Calico/VPP. Instead of using struc and binary format
we transition to json files. Size should not be an issue as number
of pods per node are typically low (~100). This will make
troubleshooting easier and errors clearer when parsing fails.

We thus remove the /bin/debug troubleshooting utility as the
data format is not human readable.

Doing this, we address an issue where PBL indexes were reused
upon dataplane restart, as they were stored in a list. We now
will use a map to retain the containerIP mapping.

We also split the configuration from runtime spec in LocalPodSpec
and add a step to clear it when corresponding VRFs are not found
in VPP.

Finally we address an issue where uRPF was not properly set up
for ipv6.

Signed-off-by: Nathan Skrzypczak <[email protected]>
@sknat sknat force-pushed the nsk-move-json-storage branch from b400da8 to f4b7b9b Compare September 16, 2025 14:52
@@ -39,7 +39,7 @@ func (v *VppLink) SetCustomURPF(swifindex uint32, tableID uint32) error {
return nil
}

func (v *VppLink) UnsetURPF(swifindex uint32) error {
func (v *VppLink) UnsetURPF(swifindex uint32, ipFamily IPFamily) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, but I think you forgot to consume this one.

}
// We do not have a VRF in VPP for this pod, VPP has probably
// restarted, so we clear the state we have.
podSpec.LocalPodSpecStatus = *model.NewLocalPodSpecStatus()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why assume that vpp restarted in this case? That's the normal path for a freshly created pod, no ?

@hedibouattour
Copy link
Collaborator

Amazing change, thanks Nathan !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants