Skip to content

Conversation

labbott
Copy link
Collaborator

@labbott labbott commented Jul 29, 2025

check the various DIMM PCAMP signals

($dev:ident, $index:expr) => {
let $dev = self.dimms.dimm_pcamp.$dev();
if !$dev {
ringbuf_entry!(Trace::DimmFailure { index: $index });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, we'll definitely want to record an ereport here as well --- I opened an issue for that: #2173

check the various DIMM PCAMP signals
@labbott labbott force-pushed the cosmo_check_dimm_failure branch from fae58fa to 433155c Compare July 29, 2025 18:38
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me --- perhaps it's worth waiting for Matt or Nathanael to have an opinion, but this is pretty straightforward.

@rmustacc
Copy link
Contributor

Is there an assumption that something else has already taken corrective action on the box here? If we lose a DIMM regulator power good, which I realize is only one part of the PCAMP situation, but is the default behavior based on the PMIC situation, then that implies that we need to MAPO, capture information, and transition to A2. Is this merely logging here because the FPGA has issued the MAPO? Running with a failed DIMM here is not helpful to anyone and the host is going to panic in mere moment after this if we actually have this failure.

Related, if we do, is the normal hot-swap logic that we have in A2 going to actually handle clearing faults and related?

@labbott labbott marked this pull request as draft July 30, 2025 17:25
@labbott
Copy link
Collaborator Author

labbott commented Jul 30, 2025

Keeping this as a draft until we finalize how we want to handle our response/reporting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants