[nexus] config flag to disable SP ereport ingestion #8709
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR #8296 added the
sp_ereport_ingester
background task to Nexus for periodically collecting ereports from SPs via MGS. However, the Hubris PR adding the Hubris task that actually responds to these requests from the control plane, oxidecomputer/hubris#2126, won't make it in until after R17. This means that if we release R17 with a control plane that tries to collect ereports, and a SP firmware that doesn't know how to respond to such requests, the Nexus logs will be littered with 36 log lines like this every 30 seconds:Similarly, MGS will also have a bunch of noisy complaints about these requests failing.
The consequences of this are really not terrible: it just means we'll be logging a lot of errors. But it seems mildly unfortunate to be constantly trying to do something that's invariably doomed to failure, and then yelling about how it didn't work. So, this commit adds a config flag for disabling the whole thing, which we can turn on for R17's production Nexus config and then turn back off when the Hubris changes make it in. I did this using a config setting, rather than hard-coding it to always be disabled, because there are also integration tests for this stuff, which will break if we disabled it everywhere.