-
Notifications
You must be signed in to change notification settings - Fork 53
Add nexus_generation
to blueprint
#8863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
85446aa
to
7a3d744
Compare
/// | ||
/// If `must_have_nexus_zones` is false, then these settings | ||
/// are permitted to use default values. | ||
pub fn sled_add_zone_nexus_internal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some tests which want to "add a Nexus zone, even though we don't have existing Nexus zones".
They previously called sled_add_zone_nexus_with_config
directly, but I want them to converge on this common pathway as much as possible, to share nexus_generation
calculation logic.
To mitigate:
- This API exposes a
must_have_nexus_zones
argument, which can toggle "whether or not we must copy data from existing Nexus zones or not" - Most callers will use
sled_add_zone_nexus
, which usesmust_have_nexus_zones = true
- Callers in test cases that want to spawn Nexuses from nothing can use
must_have_nexus_zones = false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about instead having nexus_generation
be something that the caller always specifies? Then the code paths also won't diverge. I like this for a few reasons:
- We can keep the existing
sled_add_zone_nexus()
/sled_add_zone_nexus_with_config()
split. I think this was pretty clean -- it clearly separated the two use cases and was very explicit in the second one ("I'm giving you exactly the config that you need").must_have_nexus_zones
confuses me -- what happens if I "must have them" but I don't? What happens if I don't need them but they're there? - It allows
reconfigurator-cli
(and tests) to control this directly. That in turn means people can test the handoff behavior without worrying about the images. (I guess the way I think about this is: the handoff behavior is purely a function of the generation numbers. For deployed systems, the images are used to determine the generation numbers. But that's a planning choice. Everything would work -- and it's probably useful for testing and such -- if someone used the same images everywhere but picked different generation numbers in order to trigger a handoff.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable. I still want to test some of the determine_nexus_generation
error cases -- and that's easier to do if i just call that internal function directly - but that's definitely still possible with your proposal.
Updated in 5ce4870
7a3d744
to
aba8f4d
Compare
e688fc6
to
8235551
Compare
8235551
to
ecd6a00
Compare
parent = blueprint; | ||
} | ||
|
||
panic!("did not converge after {MAX_PLANNING_ITERATIONS} iterations"); | ||
} | ||
|
||
struct BlueprintGenerator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this struct to help the actual contents of test_nexus_generation_update
be easier to write... but after doing so, I'd be kinda on-board to move more tests over to using this explicitly.
IMO it helps make the test much more concise when blueprint generation is as one-liner.
report.set_waiting_on( | ||
NexusGenerationBumpWaitingOn::NewNexusBringup, | ||
); | ||
return Ok(report); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually do not have test coverage for this case, and would like to add it before this PR merges. I have struggled to do it through zone manipulation - because Nexus is discretionary, we'll be eager to add the new Nexus zones if we can (and why not? They should wait on boot for handoff).
To force this to happen, I'm thinking I'll need to construct a scenario where we expunge a sled so that we cannot actually place this new Nexus, and observe that the handoff does not occur while we're operating at a reduced capacity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @smklein!
This PR's gotten pretty big and I think it has at least two pretty separable pieces:
- Adding
nexus_generation
to the blueprint (in-memory + database). These parts of this PR already look pretty solid to me. That's also all I need for #8875. - The planner changes that implement the behaviors around
nexus_generation
. This is a lot trickier and will take more time to get to ground.
Could you separate these into separate PRs? That'll be much easier to review, get confidence in, and it will also unblock the quiesce work sooner.
You could also separate out the change to the way we report "discretionary zones placed", but that's small and simple enough that it's less critical to me.
@@ -109,6 +112,31 @@ const NUM_CONCURRENT_MGS_UPDATES: usize = 1; | |||
/// A receipt that `check_input_validity` has been run prior to planning. | |||
struct InputChecked; | |||
|
|||
#[derive(Debug)] | |||
#[expect(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is admittedly the first time I've seen #[expect(dead_code)]
, but why is it here? It looks like this struct is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I think it's because we only use it for the Debug
impl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These structs were admittedly here on main - I was moving them to be usable outside the single do_plan_zone_updates
function. (see:
omicron/nexus/reconfigurator/planning/src/planner.rs
Lines 1215 to 1238 in e050434
#[derive(Debug)] | |
#[expect(dead_code)] | |
struct ZoneCurrentlyUpdating<'a> { | |
zone_id: OmicronZoneUuid, | |
zone_kind: ZoneKind, | |
reason: UpdatingReason<'a>, | |
} | |
#[derive(Debug)] | |
#[expect(dead_code)] | |
enum UpdatingReason<'a> { | |
ImageSourceMismatch { | |
bp_image_source: &'a BlueprintZoneImageSource, | |
inv_image_source: &'a OmicronZoneImageSource, | |
}, | |
MissingInInventory { | |
bp_image_source: &'a BlueprintZoneImageSource, | |
}, | |
ReconciliationError { | |
bp_image_source: &'a BlueprintZoneImageSource, | |
inv_image_source: &'a OmicronZoneImageSource, | |
message: &'a str, | |
}, | |
} |
I believe that this is using expect
instead of allow
as a part of the new rust 1.81 features, where:
allow
is permissive (e.g., lets you have dead code, does not complain if it's used)expect
flags a warning if the lint isn't triggered (e.g., if this is labelled dead code, but that doesn't trigger, the compiler will warn us: https://doc.rust-lang.org/reference/attributes/diagnostics.html#r-attributes.diagnostics.lint.expect)
We definitely are using these fields, because they're emitted to a log via the debug implementation, but they're otherwise not used directly. I can confirm that removing them results in (unwanted) compiler warnings about the fields never being read -- even though they do end up in logs.
#[derive(Debug)] | ||
#[expect(dead_code)] | ||
struct ZoneCurrentlyUpdating<'a> { | ||
zone_id: OmicronZoneUuid, | ||
zone_kind: ZoneKind, | ||
reason: UpdatingReason<'a>, | ||
} | ||
|
||
#[derive(Debug)] | ||
#[expect(dead_code)] | ||
enum UpdatingReason<'a> { | ||
ImageSourceMismatch { | ||
bp_image_source: &'a BlueprintZoneImageSource, | ||
inv_image_source: &'a OmicronZoneImageSource, | ||
}, | ||
MissingInInventory { | ||
bp_image_source: &'a BlueprintZoneImageSource, | ||
}, | ||
ReconciliationError { | ||
bp_image_source: &'a BlueprintZoneImageSource, | ||
inv_image_source: &'a OmicronZoneImageSource, | ||
message: &'a str, | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some doc comments might help here. I'm confused about what these are supposed to mean. I would have thought ZoneCurrentlyUpdating
with a reason
would mean "this zone is being updated and here's why". But then I don't get why MissingInInventory
or ReconciliationError
would be a reason that a zone would be updating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned above, I'm refactoring this to access the get_zones_not_yet_propagated_to_inventory
function - but I believe that's why this is reconciliation-focused. These responses are more about "is the inventory in-sync with a blueprint" than answering the more specific question of "has an update completed".
I'll update names away from "update" and more towards "zone propagation" in 55ebd1c
) | ||
}) | ||
.collect(), | ||
let image_sources = match zone_kind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we could use a comment explaining more context here. Something like:
// Our goal here is to make sure that if we have less redundancy for discretionary zones than needed that we deploy additional ones. For most zones, we only care about the total count of that kind of zone. The way we deploy Nexus means we need the expected count for redundancy for _both_ active zone images.
No pressure to use any of that text -- it's just an example of what felt missing.
let our_image = self.lookup_current_nexus_image(); | ||
|
||
let mut images = vec![]; | ||
if old_image != new_image { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we replace the Nexus identity in the PlanningInput with list of which Nexus instances are in charge, then I think the logic here becomes something like:
- always include the new image
- also include the image for the Nexus instances currently in charge, if it's different
if self.nexus_generation != current_generation { | ||
return Err(Error::NexusGenerationMismatch { | ||
expected: current_generation, | ||
actual: self.nexus_generation, | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we check this?
ZoneKind::Nexus => { | ||
// Get the nexus_generation of the zone being considered for shutdown | ||
let zone_nexus_generation = match &zone.zone_type { | ||
BlueprintZoneType::Nexus(nexus_zone) => { | ||
nexus_zone.nexus_generation | ||
} | ||
_ => unreachable!("zone kind is Nexus but type is not"), | ||
}; | ||
|
||
let Some(current_gen) = self.lookup_current_nexus_generation() | ||
else { | ||
// If we don't know the current Nexus zone ID, or its | ||
// generation, we can't perform the handoff safety check. | ||
report.unsafe_zone( | ||
zone, | ||
Nexus { | ||
zone_generation: zone_nexus_generation, | ||
current_nexus_generation: None, | ||
}, | ||
); | ||
return false; | ||
}; | ||
|
||
// It's only safe to shut down if handoff has occurred. | ||
// | ||
// That only happens when the current generation of Nexus (the | ||
// one running right now) is greater than the zone we're | ||
// considering expunging. | ||
if current_gen <= zone_nexus_generation { | ||
report.unsafe_zone( | ||
zone, | ||
Nexus { | ||
zone_generation: zone_nexus_generation, | ||
current_nexus_generation: Some(current_gen), | ||
}, | ||
); | ||
return false; | ||
} | ||
|
||
true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What problem is this trying to prevent?
/// | ||
/// If `must_have_nexus_zones` is false, then these settings | ||
/// are permitted to use default values. | ||
pub fn sled_add_zone_nexus_internal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about instead having nexus_generation
be something that the caller always specifies? Then the code paths also won't diverge. I like this for a few reasons:
- We can keep the existing
sled_add_zone_nexus()
/sled_add_zone_nexus_with_config()
split. I think this was pretty clean -- it clearly separated the two use cases and was very explicit in the second one ("I'm giving you exactly the config that you need").must_have_nexus_zones
confuses me -- what happens if I "must have them" but I don't? What happens if I don't need them but they're there? - It allows
reconfigurator-cli
(and tests) to control this directly. That in turn means people can test the handoff behavior without worrying about the images. (I guess the way I think about this is: the handoff behavior is purely a function of the generation numbers. For deployed systems, the images are used to determine the generation numbers. But that's a planning choice. Everything would work -- and it's probably useful for testing and such -- if someone used the same images everywhere but picked different generation numbers in order to trigger a handoff.)
/// ID of the currently running Nexus zone | ||
/// | ||
/// This is used to identify which Nexus is currently executing the planning | ||
/// operation, which is needed for safe shutdown decisions during handoff. | ||
current_nexus_zone_id: Option<OmicronZoneUuid>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd strongly suggest that instead of putting the current Nexus zone into the planning input, let's put either the currently in-charge Nexus generation or else the set of Nexus instances currently in control. (If you have the blueprint, you can compute either of these from the other.) PlanningInputFromDb
could determine this based on the contents of db_metadata_nexus
. (That could be done in a separate PR if we want to keep this PR decoupled from the db_metadata_nexus
one.)
There are a few of reasons for this:
- It's confusing to me that this field is both "important" and "optional". What's the semantics of it being
None
? Does that mean certain planning operations fail or do the wrong thing? Do we just not do any of those operations from the contexts where we're providingNone
today? On the other hand, if we say this is the set of instances currently in-charge, I'm hoping we can fill in some values here. I took a quick look through the callers that are providingNone
here and I think they're basically all either usingPlanningInputFromDb::assemble
(which can get the real value of "who's in charge" from the database) or else are tests that have a blueprint available (so we could have a helper that pulls them out of that blueprint). - In terms of comprehensibility of the system: it feels weird to me that the result of planning would depend on who is doing the planning. With this PR, every Nexus in a running system will be providing different input to its planner, which feels like it confuses the aim of determinism in the planning process.
- It would eliminate quite a lot of call sites where you've had to add
None
.
pub fn set_current_nexus_zone_id(&mut self, id: OmicronZoneUuid) { | ||
self.current_nexus_zone_id = Some(id); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I've thought of these types like PlanningInput
being immutable and PlanningInputBuilder
being the mutable version. What do you think of having callers do this:
let mut new_builder = planning_input.into_builder();
new_builder.set( ... )
let planning_input = new_builder.build();
It's a little more verbose but I feel like preserves the nice property that: when you're modifying it, you're working with a builder. The planning input itself remains immutable. (Again, take it or leave it.)
nexus_generation
to the blueprint, for Nexus zones, and also as a top-level fieldnexus_generation
valuenexus_generation
, if any of the "new Nexuses" are running.do_plan_nexus_generation_update
method to the planner, which decides when the top-level Nexus generation number should be incremented.Fixes #8853, #8843