-
Notifications
You must be signed in to change notification settings - Fork 206
Check for DIMM failures in cosmo #2174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
task/cosmo-spd/src/main.rs
Outdated
($dev:ident, $index:expr) => { | ||
let $dev = self.dimms.dimm_pcamp.$dev(); | ||
if !$dev { | ||
ringbuf_entry!(Trace::DimmFailure { index: $index }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, we'll definitely want to record an ereport here as well --- I opened an issue for that: #2173
check the various DIMM PCAMP signals
fae58fa
to
433155c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me --- perhaps it's worth waiting for Matt or Nathanael to have an opinion, but this is pretty straightforward.
Is there an assumption that something else has already taken corrective action on the box here? If we lose a DIMM regulator power good, which I realize is only one part of the PCAMP situation, but is the default behavior based on the PMIC situation, then that implies that we need to MAPO, capture information, and transition to A2. Is this merely logging here because the FPGA has issued the MAPO? Running with a failed DIMM here is not helpful to anyone and the host is going to panic in mere moment after this if we actually have this failure. Related, if we do, is the normal hot-swap logic that we have in A2 going to actually handle clearing faults and related? |
Keeping this as a draft until we finalize how we want to handle our response/reporting |
check the various DIMM PCAMP signals