-
Notifications
You must be signed in to change notification settings - Fork 52
kernel_npem: add support for kernel NPEM driver #270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
mtkaczyk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Stuart ,
Thank you for your contribution, Nice work!
After the review, I have a few points to clarify and some minor corrections.
It seems that the NPEM kernel still operates per slots, so my question is: did you use the implemented slot tests for the "NPEM kernel"? If not, please run them and verify that everything works correctly.
Also, please rebase the branch.
Thanks!
Add a new controller for the recently-added kernel NPEM/_DSM driver (in the kernel at drivers/pci/npem.c) that provides access to NPEM LEDs via sysfs. This allows access even if the kernel doesn't allow direct access to PCI config space (for NPEM), and also allows access to the PCI _DSM method of controlling PCIe SSD LEDs (PCI Firmware Specification, Revision 3.3, section 4.7 "_DSM Definitions for PCIe SSD Status LED"). The code was based on the npem controller code, but uses the sysfs interface rather than directly accessing the NPEM capability in PCI config space. Signed-off-by: Stuart Hayes <[email protected]>
A few cosmetic changes for kernel_npem. Signed-off-by: Stuart Hayes <[email protected]>
Include LED_CNTRL_TYPE_KERNEL_NPEM in led_controller_slot_support(). Signed-off-by: Stuart Hayes <[email protected]>
Add the newly-added KERNEL_NPEM controller type to two places in block.c that were missed in the initial commit. Signed-off-by: Stuart Hayes <[email protected]>
Update lib_unit_test.c to allow 7 controller types so it doesn't fail with KERNEL_NPEM. Signed-off-by: Stuart Hayes <[email protected]>
b645630 to
7088fe0
Compare
I hadn't run the tests. I found an issue when I ran the tests, which I fixed with another commit (7088fe0). |
Done, thanks for all the feedback! |
|
I manually tested this on a Dell PowerEdge R750. I'm seeing the following let states when listing the slots: The hardware only supports locate, and I can both set and read that state: Given that, I think it’s reasonable to report this as a known state (e.g., “OK”) rather than “UNKNOWN” when the hardware only implements a subset of the standard. We still need to be able to distinguish between:
On a related note, I’m a little unsure about exposing yet another controller type to the end user. From their perspective, this feels more like an implementation detail: the same hardware on an older kernel would use a different implementation but should ideally look the same from the tool’s interface. That said, I understand why you introduced a new controller type here; I just want to flag the end user/API difference. Notes
# ledctl failure=/dev/nvme16n1 ; echo $?;
/dev/shm/ledmon.conf: does not exist, using global config file
/etc/ledmon.conf: does not exist, using built-in defaults
0
# ledctl --set-slot --slot 0000:f4:00.0 --controller-type "Kernel NPEM" --state failure ; echo $?
/dev/shm/ledmon.conf: does not exist, using global config file
/etc/ledmon.conf: does not exist, using built-in defaults
10 |
| [LED_CNTRL_TYPE_AHCI] = "AHCI", | ||
| [LED_CNTRL_TYPE_NPEM] = "NPEM", | ||
| [LED_CNTRL_TYPE_AMD] = "AMD", | ||
| [LED_CNTRL_TYPE_KERNEL_NPEM] = "Kernel NPEM", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh one last thing. On Linux systems spaces are avoided, so please check it to KERNEL_NPEM
I see that there is "Dell SSD" and you did it for consistently but I still think that changing it to avoid "" every time is better
I would say that the right it to say that it belongs to hardware in this case, we would consider "HARDWARE_MANAGED" or something like that. I don't think that saying OK is correct because hardware may behave differently. Anyway, it is probably not big issue so it should be fine anyway.
This is reasonable. it should configurable by conf, I don't see a strong need to give users raw access to both interfaces directly. |
This is what we have today enum led_ibpi_pattern {
LED_IBPI_PATTERN_UNKNOWN = 0,
LED_IBPI_PATTERN_NONE = 1, /* used only to initialize ibpi_prev */
LED_IBPI_PATTERN_NORMAL = 2,
LED_IBPI_PATTERN_ONESHOT_NORMAL = 3,
LED_IBPI_PATTERN_DEGRADED = 4,
LED_IBPI_PATTERN_HOTSPARE = 5,
LED_IBPI_PATTERN_REBUILD = 6,
LED_IBPI_PATTERN_FAILED_ARRAY = 7,
LED_IBPI_PATTERN_PFA = 8,
LED_IBPI_PATTERN_FAILED_DRIVE = 9,
LED_IBPI_PATTERN_LOCATE = 10,
LED_IBPI_PATTERN_LOCATE_OFF = 11,
LED_IBPI_PATTERN_ADDED = 12,
LED_IBPI_PATTERN_REMOVED = 13,
LED_IBPI_PATTERN_LOCATE_AND_FAIL = 14,How about IMHO we need to let the API caller know that we have some level of hardware support, NOTE: We can certainly add an additional value here. |
For example, for NPEM it means that OK capability is ON. This would be also kind of inconsistently when we are presetting NORMAL when all LEDs are OFF. Unknown at least suggests that we don't know what is happening.
That would be perfect but probably we don't have such info in all cases. For NPEM, this information is visible as NPEM_ENABLE and it is turned on persistently when first NPEM request happens. There is no option to disable this and NPEM enable is not presented by driver: /* This bit is always required */
ctrl = inds | PCI_NPEM_CTRL_ENABLE;So, if all kernel interface NPEM capabilities are off before ledctl is used first time - it is hardware managed but if we turned NORMAL and disabled NORMAL, all are off but it is not hardware managed because NPEM_ENABLE is kept. But overall, If we have good argument for that we can change UNKNOWN to NORMAL, I'm not against. It cannot be handled perfectly because kernel driver interface does not provide functionality to determine it (NPEM_ENABLE). We have chosen to not present NPEM_ENABLE on purpose because for the regular user it provides unnecessary level of complexity and was not kernel ledclass interface friendly. |
Add a new controller for the recently-added kernel NPEM/_DSM driver (in the kernel at drivers/pci/npem.c) that provides access to NPEM LEDs via sysfs. This allows access even if the kernel doesn't allow direct access to PCI config space (for NPEM), and also allows access to the PCI _DSM method of controlling PCIe SSD LEDs (PCI Firmware Specification, Revision 3.3, section 4.7 "_DSM Definitions for PCIe SSD Status LED").
The code was based on the npem controller code, but uses the sysfs interface rather than directly accessing the NPEM capability in PCI config space.