Skip to content

[WIP] Simplfy/remove /persist/status/zedagent/*#5584

Draft
eriknordmark wants to merge 9 commits intolf-edge:masterfrom
eriknordmark:persist2memory
Draft

[WIP] Simplfy/remove /persist/status/zedagent/*#5584
eriknordmark wants to merge 9 commits intolf-edge:masterfrom
eriknordmark:persist2memory

Conversation

@eriknordmark
Copy link
Contributor

Description

We currently have a checkpoint of the protobuf config in /persist/checkpoint/lastconfig, which is signed by the controller. Its signature is verified before it is used, and then it is used to populate zedagent's publications even if there is no connection to the controller.
Thus the persistent publications (with files in /persist/status/zedagent) should not be needed, and getting rid of them simplifies analyzing any security impact unauthorized modifications will have to such files.

However, we need to have the ControllerCerts available since the CipherContext (used for object encryption) depend on those. This is adddressed by introducing a new /persist/checkpoint/controllercerts which contains the protobuf objects received from the controller. This file is then verified at boot (the same way as when we receive an update - that the certificate chain verifies all the way to the cert in /config/root-certificate.pem)

Both lastconfig and controllercerts have a .bak file which should ensure that even if there is a power outage when the file is written we will have a valid backup which we can use.

Note that more of the publications in /persist/status/zedagent need to be addressed.
Next is ConfigItemValueMap which might have some chicken and egg problems at bootup; need to have that published based on the checkpoint lastconfig as the agents start.

How to test and validate this PR

TBD

Since we are touching code which relate to rolling the controller certificates that needs to be tested very carefully, including any corner case.
And since the purpose of the checkpoints are to allow the device (including datastore and WiFi credentials) and app instances (including with cloud-init) to boot even if there is no network, that needs to be carefully tested.

It is not clear whether we can test the corruption of /persist/checkpoint/lastconfig or /persist/checkpoint/controllercerts, but that is why we have the .bak files (to handle inopportune power outages while the checkpoint file(s) are updated.)

Changelog notes

TBD

PR Backports

TBD

Here is the list of current LTS branches (it should be always up to date):

  • 16.0-stable
  • 14.5-stable
  • 13.4-stable

For example, if this PR fixes a bug in a feature that was introduced in 14.5,
you can write:

- 16.0-stable: To be backported.
- 14.5-stable: No, as the feature is not available there.
- 13.4-stable: No, as the feature is not available there.

Also, to the PRs that should be backported into any stable branch, please
add a label stable.

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR

For backport PRs (remove it if it's not a backport):

  • I've added a reference link to the original PR
  • PR's title follows the template

And the last but not least:

  • I've checked the boxes above, or I've provided a good reason why I didn't
    check them.

Please, check the boxes above after submitting the PR in interactive mode.

@codecov
Copy link

codecov bot commented Jan 31, 2026

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 29.49%. Comparing base (2281599) to head (6310bd7).
⚠️ Report is 264 commits behind head on master.

Files with missing lines Patch % Lines
pkg/pillar/cmd/vcomlink/vcomlink.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5584      +/-   ##
==========================================
+ Coverage   19.52%   29.49%   +9.96%     
==========================================
  Files          19       18       -1     
  Lines        3021     2417     -604     
==========================================
+ Hits          590      713     +123     
+ Misses       2310     1552     -758     
- Partials      121      152      +31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This is used as the startup config if we crash and don't have
controller connectivity. It is saved after we have been running
with an updated config for at least 10 minutes.

Signed-off-by: eriknordmark <erik@zededa.com>
@eriknordmark eriknordmark force-pushed the persist2memory branch 3 times, most recently from a2babff to 2c66d50 Compare February 4, 2026 00:48
And its .bak file. Those are checkpointed protobuf files which
have their chains verified before writing them, but also when they
are read after a reboot. When we load from
/persist/checkpoint/controllercerts we also publish ControllerCerts for
use by others.
This will remove the need to for /persist/certs/server-signing-cert.pem

Signed-off-by: eriknordmark <erik@zededa.com>
Instead it is only kept in memory/pubsub and lookupControllerSigningCert
can be used to fetch it.

Signed-off-by: eriknordmark <erik@zededa.com>
The need for Touch went away when we started accepting arbitrarily old
checkpoints.

Signed-off-by: eriknordmark <erik@zededa.com>
And make sure we update the checkpoint when there are real
changes to the controller certs. This requires comparing the
set of Keys aka hashes of the certificates to avoid a falsely
detection changes due to ordering differences in the protobuf
bytes.

Signed-off-by: eriknordmark <erik@zededa.com>
They are created from the checkpointed controllercerts and lastconfig
when zedagent starts, and then they are published to other agents.

Signed-off-by: eriknordmark <erik@zededa.com>
The ConfigItemValueMap will no longer be a persistent publications
hence there will be no need to convert from old to new formats,
nor set default values. The defaults will be applied by zedagent
on startup.

Signed-off-by: eriknordmark <erik@zededa.com>
Zedagent initializes it from /persist/checkpoint/lastconfig
on startup so that other agents can get their global config.

Signed-off-by: eriknordmark <erik@zededa.com>
Since some persistent publication are no longer persistent

Signed-off-by: eriknordmark <erik@zededa.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant