-
Notifications
You must be signed in to change notification settings - Fork 196
Cosmo support #2013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cosmo support #2013
Conversation
22a84bf
to
7c30d67
Compare
dfbb462
to
ab18423
Compare
73fbbb9
to
e9d1098
Compare
e9d1098
to
a25bac1
Compare
2189162
to
f8387a6
Compare
if !okay { | ||
// We'll return to A2, leaving jefe and our local state | ||
// unchanged (since they're set after this block). | ||
self.log_pg_registers(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to know what seq API state and seq raw state we were in at the point of timeout, in addition to the PGs. Combining these pieces of information can help us pinpoint which rail(s) didn't come up or let us figure out what we are waiting on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I added a call to self.log_state_registers()
here as well (and below) in 128d6f9
fff65cd
to
8190dfe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good; two potentially incorrect values plus some questions/nits.
task/thermal/src/bsp/cosmo_a.rs
Outdated
|
||
// Bonus bits for M.2 power, which is switched separately. We *cannot* | ||
// read the M.2 drives when they are unpowered; otherwise, we risk | ||
// locking up the I2C bus (see hardware-gimlet#1804 for the gory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check and see if this is still necessary!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be unnecessary, so I removed it (see oxidecomputer/quartz#321 and https://github.com/oxidecomputer/hardware-cosmo/issues/641)
sensors::NUM_NVME_BMC_TEMPERATURE_SENSORS; | ||
|
||
// The control loop is driven by CPU, NIC, and BMC temperatures | ||
// XXX we should also monitor DIMM temperatures here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nathanaelhuffman / @Aaron-Hartwig I think we need integration with the main FGPA here, because we'll be reading DIMM temperatures through the I3C proxy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true though may be a post-bringup activity. We've been having a number of issues with the ruby+grapefruit signal integrity to properly prototype this and while I'm still working to improve the test fixture over the next couple of days, we're running out of time before lab pack-up must commence.
9c367eb
to
4a13b4e
Compare
608fcab
to
e91e4f9
Compare
c57a592
to
2c3bde0
Compare
task-slots = ["sys"] | ||
notifications = ["spi-irq"] | ||
|
||
# XXX this is only used by cosmo_seq; could we merge it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XXX comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This remains a potential future improvement, but I don't want to handle it right now!
build/fpga-regmap/src/lib.rs
Outdated
1 => unreachable!(), | ||
2..=8 => "u8", | ||
9..=16 => "u16", | ||
17..=32 => "u32", | ||
_ => panic!("invalid width {width}"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it feels a little strange to have 1 be unreachable
but everything else panic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are meaningfully different, though! We should never hit the 1-bit path, because cases where lsb == msb
are handled as bool
. I'll add some text to the unreachable!(..)
to make that more clear.
|
||
use fmc_periph::A0Sm; | ||
match (self.get_state_impl(), state) { | ||
(PowerState::A2, PowerState::A0) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we want to add the CPU_PRESENT
CORETYPE
SP5
checks before merging? e.g. https://github.com/oxidecomputer/hubris/blob/master/drv/gimlet-seq-server/src/main.rs#L763
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #2051 in the interest of getting this merged
/// TODO: explain rationale for this value. | ||
const TRACE_DEPTH: usize = 52; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy-pasta from Gimlet 🙃
task/thermal/src/bsp/cosmo_a.rs
Outdated
gain_i: 0.0135, | ||
gain_d: 0.4, | ||
min_output: 0.0, | ||
max_output: 10.0, // XXX fix this before merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix this before merging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nathanaelhuffman now that we're getting CPU temperatures, can we bump this back up to 100%?
} | ||
|
||
// In general, see RFD 276 Detailed Thermal Loop Design for references. | ||
// TODO: temperature_slew_deg_per_sec is made up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely!
6c6ea7c
to
a07a13a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛵
I'm opening this early to run it through CI and do preliminary self-review, but don't feel obliged to look at it yet