Add `nexus_generation` to blueprint #8863

smklein · 2025-08-19T20:19:12Z

Adds nexus_generation to the blueprint, for Nexus zones, and also as a top-level field
- When provisioning a new nexus zone: if the matches any existing zone, use that nexus_generation value
- Otherwise: choose a generation number higher than all existing instances.
Changes deployment of Nexus zones, to proactively provision new zones alongside old ones, rather than doing a replacement.
- This PR does not implement the handoff process. However, it does permit "new Nexus zones" to expunge old Nexus zones which have an older nexus_generation, if any of the "new Nexuses" are running.
Adds a do_plan_nexus_generation_update method to the planner, which decides when the top-level Nexus generation number should be incremented.

dev-tools/omdb/src/bin/omdb/reconfigurator.rs

nexus/db-model/src/deployment.rs

smklein · 2025-08-20T21:03:56Z

nexus/reconfigurator/planning/src/blueprint_builder/builder.rs

+    ///
+    /// If `must_have_nexus_zones` is false, then these settings
+    /// are permitted to use default values.
+    pub fn sled_add_zone_nexus_internal(


There are some tests which want to "add a Nexus zone, even though we don't have existing Nexus zones".

They previously called sled_add_zone_nexus_with_config directly, but I want them to converge on this common pathway as much as possible, to share nexus_generation calculation logic.

To mitigate:

This API exposes a must_have_nexus_zones argument, which can toggle "whether or not we must copy data from existing Nexus zones or not"

Most callers will use sled_add_zone_nexus, which uses must_have_nexus_zones = true

Callers in test cases that want to spawn Nexuses from nothing can use must_have_nexus_zones = false.

What about instead having nexus_generation be something that the caller always specifies? Then the code paths also won't diverge. I like this for a few reasons:

We can keep the existing sled_add_zone_nexus() / sled_add_zone_nexus_with_config() split. I think this was pretty clean -- it clearly separated the two use cases and was very explicit in the second one ("I'm giving you exactly the config that you need"). must_have_nexus_zones confuses me -- what happens if I "must have them" but I don't? What happens if I don't need them but they're there?

It allows reconfigurator-cli (and tests) to control this directly. That in turn means people can test the handoff behavior without worrying about the images. (I guess the way I think about this is: the handoff behavior is purely a function of the generation numbers. For deployed systems, the images are used to determine the generation numbers. But that's a planning choice. Everything would work -- and it's probably useful for testing and such -- if someone used the same images everywhere but picked different generation numbers in order to trigger a handoff.)

This seems reasonable. I still want to test some of the determine_nexus_generation error cases -- and that's easier to do if i just call that internal function directly - but that's definitely still possible with your proposal.

Updated in 5ce4870

nexus/reconfigurator/planning/src/example.rs

nexus/reconfigurator/planning/src/planner.rs

nexus/types/src/deployment/planning_report.rs

nexus/types/src/deployment.rs

nexus/reconfigurator/planning/src/planner.rs

smklein · 2025-08-21T21:27:06Z

nexus/reconfigurator/planning/src/planner.rs

            parent = blueprint;
        }

        panic!("did not converge after {MAX_PLANNING_ITERATIONS} iterations");
    }
+
+    struct BlueprintGenerator {


I made this struct to help the actual contents of test_nexus_generation_update be easier to write... but after doing so, I'd be kinda on-board to move more tests over to using this explicitly.

IMO it helps make the test much more concise when blueprint generation is as one-liner.

smklein · 2025-08-21T23:05:35Z

nexus/reconfigurator/planning/src/planner.rs

+                report.set_waiting_on(
+                    NexusGenerationBumpWaitingOn::NewNexusBringup,
+                );
+                return Ok(report);


I actually do not have test coverage for this case, and would like to add it before this PR merges. I have struggled to do it through zone manipulation - because Nexus is discretionary, we'll be eager to add the new Nexus zones if we can (and why not? They should wait on boot for handoff).

To force this to happen, I'm thinking I'll need to construct a scenario where we expunge a sled so that we cannot actually place this new Nexus, and observe that the handoff does not occur while we're operating at a reduced capacity.

davepacheco

Thanks, @smklein!

This PR's gotten pretty big and I think it has at least two pretty separable pieces:

Adding nexus_generation to the blueprint (in-memory + database). These parts of this PR already look pretty solid to me. That's also all I need for #8875.
The planner changes that implement the behaviors around nexus_generation. This is a lot trickier and will take more time to get to ground.

Could you separate these into separate PRs? That'll be much easier to review, get confidence in, and it will also unblock the quiesce work sooner.

You could also separate out the change to the way we report "discretionary zones placed", but that's small and simple enough that it's less critical to me.

davepacheco · 2025-08-22T18:45:38Z

nexus/reconfigurator/planning/src/planner.rs

@@ -109,6 +112,31 @@ const NUM_CONCURRENT_MGS_UPDATES: usize = 1;
 /// A receipt that `check_input_validity` has been run prior to planning.
 struct InputChecked;

+#[derive(Debug)]
+#[expect(dead_code)]


This is admittedly the first time I've seen #[expect(dead_code)], but why is it here? It looks like this struct is used?

Ah I think it's because we only use it for the Debug impl.

These structs were admittedly here on main - I was moving them to be usable outside the single do_plan_zone_updates function. (see:

omicron/nexus/reconfigurator/planning/src/planner.rs

Lines 1215 to 1238 in e050434

#[derive(Debug)]

#[expect(dead_code)]

struct ZoneCurrentlyUpdating<'a> {

zone_id: OmicronZoneUuid,

zone_kind: ZoneKind,

reason: UpdatingReason<'a>,

}

#[derive(Debug)]

#[expect(dead_code)]

enum UpdatingReason<'a> {

ImageSourceMismatch {

bp_image_source: &'a BlueprintZoneImageSource,

inv_image_source: &'a OmicronZoneImageSource,

},

MissingInInventory {

bp_image_source: &'a BlueprintZoneImageSource,

},

ReconciliationError {

bp_image_source: &'a BlueprintZoneImageSource,

inv_image_source: &'a OmicronZoneImageSource,

message: &'a str,

},

}

)

I believe that this is using expect instead of allow as a part of the new rust 1.81 features, where:

allow is permissive (e.g., lets you have dead code, does not complain if it's used)

expect flags a warning if the lint isn't triggered (e.g., if this is labelled dead code, but that doesn't trigger, the compiler will warn us: https://doc.rust-lang.org/reference/attributes/diagnostics.html#r-attributes.diagnostics.lint.expect)

We definitely are using these fields, because they're emitted to a log via the debug implementation, but they're otherwise not used directly. I can confirm that removing them results in (unwanted) compiler warnings about the fields never being read -- even though they do end up in logs.

davepacheco · 2025-08-22T18:46:53Z

nexus/reconfigurator/planning/src/planner.rs

+#[derive(Debug)]
+#[expect(dead_code)]
+struct ZoneCurrentlyUpdating<'a> {
+    zone_id: OmicronZoneUuid,
+    zone_kind: ZoneKind,
+    reason: UpdatingReason<'a>,
+}
+
+#[derive(Debug)]
+#[expect(dead_code)]
+enum UpdatingReason<'a> {
+    ImageSourceMismatch {
+        bp_image_source: &'a BlueprintZoneImageSource,
+        inv_image_source: &'a OmicronZoneImageSource,
+    },
+    MissingInInventory {
+        bp_image_source: &'a BlueprintZoneImageSource,
+    },
+    ReconciliationError {
+        bp_image_source: &'a BlueprintZoneImageSource,
+        inv_image_source: &'a OmicronZoneImageSource,
+        message: &'a str,
+    },
+}


Some doc comments might help here. I'm confused about what these are supposed to mean. I would have thought ZoneCurrentlyUpdating with a reason would mean "this zone is being updated and here's why". But then I don't get why MissingInInventory or ReconciliationError would be a reason that a zone would be updating.

as mentioned above, I'm refactoring this to access the get_zones_not_yet_propagated_to_inventory function - but I believe that's why this is reconciliation-focused. These responses are more about "is the inventory in-sync with a blueprint" than answering the more specific question of "has an update completed".

I'll update names away from "update" and more towards "zone propagation" in 55ebd1c

davepacheco · 2025-08-22T19:01:03Z

nexus/reconfigurator/planning/src/planner.rs

-                                    )
-                                })
-                                .collect(),
+            let image_sources = match zone_kind {


I feel like we could use a comment explaining more context here. Something like:

// Our goal here is to make sure that if we have less redundancy for discretionary zones than needed that we deploy additional ones. For most zones, we only care about the total count of that kind of zone. The way we deploy Nexus means we need the expected count for redundancy for _both_ active zone images.

No pressure to use any of that text -- it's just an example of what felt missing.

davepacheco · 2025-08-22T19:21:22Z

nexus/reconfigurator/planning/src/planner.rs

+                    let our_image = self.lookup_current_nexus_image();
+
+                    let mut images = vec![];
+                    if old_image != new_image {


If we replace the Nexus identity in the PlanningInput with list of which Nexus instances are in charge, then I think the logic here becomes something like:

always include the new image

also include the image for the Nexus instances currently in charge, if it's different

davepacheco · 2025-08-22T19:35:46Z

nexus/reconfigurator/planning/src/blueprint_builder/builder.rs

+        if self.nexus_generation != current_generation {
+            return Err(Error::NexusGenerationMismatch {
+                expected: current_generation,
+                actual: self.nexus_generation,
+            });
+        }


Why do we check this?

davepacheco · 2025-08-22T19:38:39Z

nexus/reconfigurator/planning/src/planner.rs

+            ZoneKind::Nexus => {
+                // Get the nexus_generation of the zone being considered for shutdown
+                let zone_nexus_generation = match &zone.zone_type {
+                    BlueprintZoneType::Nexus(nexus_zone) => {
+                        nexus_zone.nexus_generation
+                    }
+                    _ => unreachable!("zone kind is Nexus but type is not"),
+                };
+
+                let Some(current_gen) = self.lookup_current_nexus_generation()
+                else {
+                    // If we don't know the current Nexus zone ID, or its
+                    // generation, we can't perform the handoff safety check.
+                    report.unsafe_zone(
+                        zone,
+                        Nexus {
+                            zone_generation: zone_nexus_generation,
+                            current_nexus_generation: None,
+                        },
+                    );
+                    return false;
+                };
+
+                // It's only safe to shut down if handoff has occurred.
+                //
+                // That only happens when the current generation of Nexus (the
+                // one running right now) is greater than the zone we're
+                // considering expunging.
+                if current_gen <= zone_nexus_generation {
+                    report.unsafe_zone(
+                        zone,
+                        Nexus {
+                            zone_generation: zone_nexus_generation,
+                            current_nexus_generation: Some(current_gen),
+                        },
+                    );
+                    return false;
+                }
+
+                true
+            }


What problem is this trying to prevent?

davepacheco · 2025-08-22T19:48:29Z

nexus/reconfigurator/planning/src/blueprint_builder/builder.rs

+    ///
+    /// If `must_have_nexus_zones` is false, then these settings
+    /// are permitted to use default values.
+    pub fn sled_add_zone_nexus_internal(


What about instead having nexus_generation be something that the caller always specifies? Then the code paths also won't diverge. I like this for a few reasons:

We can keep the existing sled_add_zone_nexus() / sled_add_zone_nexus_with_config() split. I think this was pretty clean -- it clearly separated the two use cases and was very explicit in the second one ("I'm giving you exactly the config that you need"). must_have_nexus_zones confuses me -- what happens if I "must have them" but I don't? What happens if I don't need them but they're there?

It allows reconfigurator-cli (and tests) to control this directly. That in turn means people can test the handoff behavior without worrying about the images. (I guess the way I think about this is: the handoff behavior is purely a function of the generation numbers. For deployed systems, the images are used to determine the generation numbers. But that's a planning choice. Everything would work -- and it's probably useful for testing and such -- if someone used the same images everywhere but picked different generation numbers in order to trigger a handoff.)

davepacheco · 2025-08-22T20:19:08Z

nexus/types/src/deployment/planning_input.rs

+    /// ID of the currently running Nexus zone
+    ///
+    /// This is used to identify which Nexus is currently executing the planning
+    /// operation, which is needed for safe shutdown decisions during handoff.
+    current_nexus_zone_id: Option<OmicronZoneUuid>,


I'd strongly suggest that instead of putting the current Nexus zone into the planning input, let's put either the currently in-charge Nexus generation or else the set of Nexus instances currently in control. (If you have the blueprint, you can compute either of these from the other.) PlanningInputFromDb could determine this based on the contents of db_metadata_nexus. (That could be done in a separate PR if we want to keep this PR decoupled from the db_metadata_nexus one.)

There are a few of reasons for this:

It's confusing to me that this field is both "important" and "optional". What's the semantics of it being None? Does that mean certain planning operations fail or do the wrong thing? Do we just not do any of those operations from the contexts where we're providing None today? On the other hand, if we say this is the set of instances currently in-charge, I'm hoping we can fill in some values here. I took a quick look through the callers that are providing None here and I think they're basically all either using PlanningInputFromDb::assemble (which can get the real value of "who's in charge" from the database) or else are tests that have a blueprint available (so we could have a helper that pulls them out of that blueprint).

In terms of comprehensibility of the system: it feels weird to me that the result of planning would depend on who is doing the planning. With this PR, every Nexus in a running system will be providing different input to its planner, which feels like it confuses the aim of determinism in the planning process.

It would eliminate quite a lot of call sites where you've had to add None.

davepacheco · 2025-08-22T20:21:43Z

nexus/types/src/deployment/planning_input.rs

+    pub fn set_current_nexus_zone_id(&mut self, id: OmicronZoneUuid) {
+        self.current_nexus_zone_id = Some(id);
+    }


nit: I've thought of these types like PlanningInput being immutable and PlanningInputBuilder being the mutable version. What do you think of having callers do this:

let mut new_builder = planning_input.into_builder(); new_builder.set( ... ) let planning_input = new_builder.build();

It's a little more verbose but I feel like preserves the nice property that: when you're modifying it, you're working with a builder. The planning input itself remains immutable. (Again, take it or leave it.)

smklein force-pushed the nexus_generation branch 2 times, most recently from 85446aa to 7a3d744 Compare August 20, 2025 20:54

smklein commented Aug 20, 2025

View reviewed changes

smklein force-pushed the nexus_generation branch from 7a3d744 to aba8f4d Compare August 20, 2025 22:20

davepacheco mentioned this pull request Aug 20, 2025

start updating quiesce for new Nexus handoff #8875

Open

smklein force-pushed the nexus_generation branch 4 times, most recently from e688fc6 to 8235551 Compare August 21, 2025 19:50

Add nexus_generation to blueprint

ecd6a00

smklein force-pushed the nexus_generation branch from 8235551 to ecd6a00 Compare August 21, 2025 19:53

smklein commented Aug 21, 2025

View reviewed changes

nexus/reconfigurator/planning/src/planner.rs Show resolved Hide resolved

smklein marked this pull request as ready for review August 21, 2025 20:32

fix openapi-manager

3ce28b3

smklein requested review from davepacheco and jgallagher August 21, 2025 21:19

smklein commented Aug 21, 2025

View reviewed changes

davepacheco reviewed Aug 22, 2025

View reviewed changes

smklein added 3 commits August 25, 2025 13:34

Remove sled_add_zone_nexus_internal, use config-based version

5ce4870

Update comment about image sources

3be10b8

update names, comments, for zone propagation

55ebd1c

karencfv mentioned this pull request Aug 26, 2025

schema update: manage internal/external DNS #8505

Open

smklein added 4 commits August 26, 2025 12:07

Merge with main

d5a6141

Finish merging (adding nexus_generation to structs)

4d7235b

Merge with main

b86ffb3

keep merging

d7943fb

	#[derive(Debug)]
	#[expect(dead_code)]
	struct ZoneCurrentlyUpdating<'a> {
	zone_id: OmicronZoneUuid,
	zone_kind: ZoneKind,
	reason: UpdatingReason<'a>,
	}

	#[derive(Debug)]
	#[expect(dead_code)]
	enum UpdatingReason<'a> {
	ImageSourceMismatch {
	bp_image_source: &'a BlueprintZoneImageSource,
	inv_image_source: &'a OmicronZoneImageSource,
	},
	MissingInInventory {
	bp_image_source: &'a BlueprintZoneImageSource,
	},
	ReconciliationError {
	bp_image_source: &'a BlueprintZoneImageSource,
	inv_image_source: &'a OmicronZoneImageSource,
	message: &'a str,
	},
	}

Add nexus_generation to blueprint #8863

Are you sure you want to change the base?

Add nexus_generation to blueprint #8863

Uh oh!

Conversation

smklein commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

davepacheco left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Add `nexus_generation` to blueprint #8863

Add `nexus_generation` to blueprint #8863

smklein commented Aug 19, 2025 •

edited

Loading