Skip to content

[sled-agent] Report boot partition contents of M.2 drives in inventory (PR 2/2) #8451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 7, 2025

Conversation

jgallagher
Copy link
Contributor

This is stacked on top of #8450. It adds a BootPartitionContents structure to the inventory reported by the sled-agent-config-reconciler which contains:

  • which slot we booted from (A or B)
  • the contents of each slot

each of which is a Result in case we failed to determine any of those three items. In practice, I'd expect us to basically never fail to report which slot we booted from, and to only fail on the contents of a slot if that slot doesn't have a valid phase 2 image in it.

Almost all of the changes here are related to updating the db schema to store this new information (and updating the queries to insert/read/delete the new tables). up3.sql is worth a particularly close look, because we've moved a field out of inv_sled_agent and into a new table, so that part of the migration attempts to ensure we create rows in the new table for each row in inv_sled_agent that had a value for that field. There's a data migration test that covers this.

@jgallagher
Copy link
Contributor Author

From testing on dublin: The latest inventory collection shows the contents of each M.2 in each of the four sleds. slot A on each sled is active, and matches the image built from commit 7777e64 on this branch:

root@oxz_switch0:~# omdb db inventory collections show latest 2>/dev/null | grep -A 8 'boot disk'
    boot disk slot: A
    slot A details:
        artifact: 4ee09067977a1e934bb86aff11b2f05af9c4b9937c03d02c1c9fb0d54dae58bc (1048580096 bytes)
        image name: ci 7777e64/d632cf2 2025-06-25 17:33
        phase 2 hash: 1dd883934a9824cdc8bc01971ca817d77dfd42da27efbe0a4d9600cf7859af57
    slot B details:
        artifact: e3265b54d91e380b2e735d01d9e0b3c94ab73e705d7184307a6d79f7e95e1b47 (1048580096 bytes)
        image name: ci bc0fded/f395100 2025-06-06 21:40
        phase 2 hash: 0adfb349eeec9fb76d6dc2540223f6cfbcb303f7e4541d40e9a6504b58d59f38
--
    boot disk slot: A
    slot A details:
        artifact: 4ee09067977a1e934bb86aff11b2f05af9c4b9937c03d02c1c9fb0d54dae58bc (1048580096 bytes)
        image name: ci 7777e64/d632cf2 2025-06-25 17:33
        phase 2 hash: 1dd883934a9824cdc8bc01971ca817d77dfd42da27efbe0a4d9600cf7859af57
    slot B details:
        artifact: c6d1d03889191622a09011936fa701382a6ae74bd6437bd14013e8044dcec108 (1048580096 bytes)
        image name: ci 5637b78/2c8e34f 2025-06-18 02:17
        phase 2 hash: 2a40db3bd34664a0b644bea43c25089ffbd763474f2ffb34ce80d443b872f53e
--
    boot disk slot: A
    slot A details:
        artifact: 4ee09067977a1e934bb86aff11b2f05af9c4b9937c03d02c1c9fb0d54dae58bc (1048580096 bytes)
        image name: ci 7777e64/d632cf2 2025-06-25 17:33
        phase 2 hash: 1dd883934a9824cdc8bc01971ca817d77dfd42da27efbe0a4d9600cf7859af57
    slot B details:
        artifact: c6d1d03889191622a09011936fa701382a6ae74bd6437bd14013e8044dcec108 (1048580096 bytes)
        image name: ci 5637b78/2c8e34f 2025-06-18 02:17
        phase 2 hash: 2a40db3bd34664a0b644bea43c25089ffbd763474f2ffb34ce80d443b872f53e
--
    boot disk slot: A
    slot A details:
        artifact: 4ee09067977a1e934bb86aff11b2f05af9c4b9937c03d02c1c9fb0d54dae58bc (1048580096 bytes)
        image name: ci 7777e64/d632cf2 2025-06-25 17:33
        phase 2 hash: 1dd883934a9824cdc8bc01971ca817d77dfd42da27efbe0a4d9600cf7859af57
    slot B details:
        artifact: c6d1d03889191622a09011936fa701382a6ae74bd6437bd14013e8044dcec108 (1048580096 bytes)
        image name: ci 5637b78/2c8e34f 2025-06-18 02:17
        phase 2 hash: 2a40db3bd34664a0b644bea43c25089ffbd763474f2ffb34ce80d443b872f53e

After uploading this same commit's TUF repo, we can confirm that the artifact hash and size is correct, and that it's been distributed to every sled:

root@oxz_switch0:~# pilot host exec -c 'echo . && ls -l /pool/int/*/update/4ee09067977a1e934bb86aff11b2f05af9c4b9937c03d02c1c9fb0d54dae58bc' 14-17
14  BRM42220026        ok: .
-rw-r--r--   1 root     root     1048580096 Jun 25 20:12 /pool/int/2196ba5d-e211-4495-9c0b-e0948ba0afbd/update/4ee09067977a1e934bb86aff11b2f05af9c4b9937c03d02c1c9fb0d54dae58bc
-rw-r--r--   1 root     root     1048580096 Jun 25 20:12 /pool/int/f3673b2a-0cdc-4315-ac53-7889918cee4c/update/4ee09067977a1e934bb86aff11b2f05af9c4b9937c03d02c1c9fb0d54dae58bc
15  BRM27230037        ok: .
-rw-r--r--   1 root     root     1048580096 Jun 25 20:12 /pool/int/8d21e428-0e1d-4282-b7d5-a69c5655685b/update/4ee09067977a1e934bb86aff11b2f05af9c4b9937c03d02c1c9fb0d54dae58bc
-rw-r--r--   1 root     root     1048580096 Jun 25 20:12 /pool/int/d82ba65f-6b72-425b-98f7-60842853459a/update/4ee09067977a1e934bb86aff11b2f05af9c4b9937c03d02c1c9fb0d54dae58bc
16  BRM23230018        ok: .
-rw-r--r--   1 root     root     1048580096 Jun 25 20:12 /pool/int/0727496f-d365-41d9-b9ca-e2e99f4534ff/update/4ee09067977a1e934bb86aff11b2f05af9c4b9937c03d02c1c9fb0d54dae58bc
-rw-r--r--   1 root     root     1048580096 Jun 25 20:12 /pool/int/9262f68b-91b1-4ab1-b836-5c4e7cbeddf5/update/4ee09067977a1e934bb86aff11b2f05af9c4b9937c03d02c1c9fb0d54dae58bc
17  BRM23230010        ok: .
-rw-r--r--   1 root     root     1048580096 Jun 25 20:12 /pool/int/655c88c8-a114-4251-b598-11638274fb9c/update/4ee09067977a1e934bb86aff11b2f05af9c4b9937c03d02c1c9fb0d54dae58bc
-rw-r--r--   1 root     root     1048580096 Jun 25 20:12 /pool/int/928f7fe8-d89a-4edd-885f-242f44d47999/update/4ee09067977a1e934bb86aff11b2f05af9c4b9937c03d02c1c9fb0d54dae58bc

Base automatically changed from john/config-reconciler-read-phase2 to main July 3, 2025 15:18
@jgallagher jgallagher force-pushed the john/config-reconciler-report-phase2 branch from 7777e64 to 325544c Compare July 3, 2025 15:42
Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

-- Move any non-NULL `last_reconciled_config` values out of `inv_sled_agent`
-- and into new rows in `inv_sled_config_reconciler`. We fill in the rest of the
-- columns with dummy errors for old collections that don't have data.
INSERT INTO omicron.public.inv_sled_config_reconciler (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -2505,6 +2505,151 @@ fn after_155_0_0<'a>(ctx: &'a MigrationContext<'a>) -> BoxFuture<'a, ()> {
})
}

mod migration_156 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test!

@jgallagher jgallagher merged commit 28bb4b0 into main Jul 7, 2025
18 checks passed
@jgallagher jgallagher deleted the john/config-reconciler-report-phase2 branch July 7, 2025 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants