Report kernel module signing errors to prevent silent failures #496

KernelGhost · 2025-02-22T10:33:30Z

This PR improves the kernel module signing process by ensuring failures are properly detected and reported. Previously, dkms suppressed the signing command output and did not check the exit status, leading to silent failures. This was a personal issue for me, as I had a malformed X.509 certificate and spent an hour debugging why dkms wasn't signing kernel modules. Now, if signing fails, a clear error is displayed along with the signing command output for easier debugging. These changes were tested successfully on Fedora Linux (6.12.13-200.fc41.x86_64).

scaronni · 2025-02-22T13:02:08Z

Looks good, thanks. Can you also please add a test for it in run_test.sh?

anbe42

Please update run_tests.sh for the changed output on some signing tests (that now show error messages). The checks need to pass again.

dkms.in

KernelGhost · 2025-02-23T22:44:49Z

I’ve added a test case for the revised code in run_test.sh, but I’m not very familiar with this process and seem to have introduced a failure in the "Testing dkms.conf specifying a module twice" case. I've attached the output from the failing test below.

Given my lack of experience writing test cases for dkms and my limited familiarity with the project, I’d appreciate guidance from someone more acquainted with the project. If anyone could provide insights or a commit to fix the issue, that would be greatly appreciated.

Testing dkms.conf specifying a module twice
 Building and installing the test module
--- test_cmd_expected_output.log	2025-02-24 09:31:17.418198816 +1100
+++ test_cmd_output.log	2025-02-24 09:31:28.343245758 +1100
@@ -9,6 +9,8 @@
 Signing module /var/lib/dkms/dkms_duplicate_test/1.0/build/dkms_duplicate_test.ko
 strip: '/var/lib/dkms/dkms_duplicate_test/1.0/build/dkms_duplicate_test.ko': No such file
 Signing module /var/lib/dkms/dkms_duplicate_test/1.0/build/dkms_duplicate_test.ko
+Warning: Failed to sign module '/var/lib/dkms/dkms_duplicate_test/1.0/build/dkms_duplicate_test.ko'!
+sign-file: /var/lib/dkms/dkms_duplicate_test/1.0/build/dkms_duplicate_test.ko
 xz: /var/lib/dkms/dkms_duplicate_test/1.0/build/dkms_duplicate_test.ko: No such file or directory
 cp: cannot stat '/var/lib/dkms/dkms_duplicate_test/1.0/build/dkms_duplicate_test.ko': No such file or directory
 Cleaning build area...(bad exit status: 1)
Error: unexpected output from: dkms install -k 6.12.15-200.fc41.x86_64 -m dkms_duplicate_test -v 1.0

KernelGhost · 2025-02-25T11:03:55Z

With the latest change in commit 896e3e4, running sudo ./run_tesh.sh now returns *** All tests successful :). However, as I mentioned earlier, I'm not very familiar with this project, and I found modifying the testing script somewhat challenging. The testing script's logic was difficult to follow, and much of the code felt opaque to someone unfamiliar with it. Given this, I’d appreciate a review of my changes to the testing script by a more experienced maintainer.

Specifically, I’d like to ensure that I haven’t:

Introduced unnecessarily hard-coded strings
Violated any assumptions or structural integrity of the testing script
Made the script less flexible or incompatible with other Linux distributions
Contributed low quality or sub-optimal code

Thank you in advance for your review!

anbe42 · 2025-03-03T00:58:42Z

I'm working on a solution for the failing test ...
... and I think I just found more ways how a bad dkms.conf with duplicate settings could make dkms explode ...

KernelGhost · 2025-03-03T06:22:04Z

I've noticed that the testing script relies heavily on matching specific output strings from DKMS, which seems to make it quite fragile and challenging to adapt when there are minor changes in the main program. A more robust approach might be to use specific exit status codes to represent different errors or error categories rather than relying on exact string comparisons. Alternatively, leveraging regular expressions and grep to match expected DKMS output could make the tests more resilient and maintainable.

I also found the script somewhat opaque and difficult to modify. Despite making only a single-line change, I wasn't able to get all tests to pass, which suggests that the current approach might introduce unnecessary complexity. That said, I fully appreciate the effort that has gone into this testing framework. It's clear that it's designed to ensure a high-quality, battle-tested tool that works across various Linux distributions and configurations.

This is just a suggestion, and I completely understand if there are reasons for the current implementation that I’m not aware of. I’m not deeply involved in the project, so please feel free to disregard this if it doesn’t align with the overall goals. Just wanted to share some thoughts in case they might be helpful!

anbe42

please rebase, I changed the logic for duplicate module case and thus the last commit will most likely no longer be neccessary

dkms.in

anbe42 · 2025-03-03T09:15:14Z

I've noticed that the testing script relies heavily on matching specific output strings from DKMS, which seems to make it quite fragile and challenging to adapt when there are minor changes in the main program.

The problem you encountered is in strings outside the control of dkms: distribution specific errors (or not-errors) with (distribution specific) error messages from (distribution specific) commands called by dkms ...

A more robust approach might be to use specific exit status codes to represent different errors or error categories rather than relying on exact string comparisons.

I'm at least happy that dkms no longer exits with 0 in case of an error ... which was not always the case. For cleanup of the error codes used see #463

Alternatively, leveraging regular expressions and grep to match expected DKMS output could make the tests more resilient and maintainable.

If we change strings within dkms the tests should immediately blow up and 'trivial' to fix. Ideally each string emittable is covered by a test ..

I also found the script somewhat opaque and difficult to modify. Despite making only a single-line change, I wasn't able to get all tests to pass, which suggests that the current approach might introduce unnecessary complexity. That said, I fully appreciate the effort that has gone into this testing framework. It's clear that it's designed to ensure a high-quality, battle-tested tool that works across various Linux distributions and configurations.

You are right, it's a non-trivial piece of code but so far the best we have. But it looks like we are reaching a scalability point soon - right now it takes about 20 minutes while it is still only testing a fraction of the functionality ... Parallelization won't be trivial since we cannot modify /lib/modules or /var/lib/dkms or run depmod in parallel

This is just a suggestion, and I completely understand if there are reasons for the current implementation that I’m not aware of. I’m not deeply involved in the project, so please feel free to disregard this if it doesn’t align with the overall goals. Just wanted to share some thoughts in case they might be helpful!

Trying to fix it ourself and providing feedback about the problems you encountered is very welcome and helpful!

And as a followup me trying to fix the test on the failing distributions made no notice and fix two more things:

why do we need the distribution specifc error messages at all? We already checked that the module was built successfully, if it is going to disappear during later processing, just emit a warning and skip it
zstd default behavior is slightly different from gzip/bzip2/xz (it does not delete the input file after compression) which made the error case very distribution specific

…ction

KernelGhost · 2025-03-05T04:38:43Z

@anbe42 I've rebased my branch on the upstream repository's master branch and resolved the conflict between my most recent commit and the changes you pushed to resolve the failing test. Additionally, I have squashed all my changes into a single commit and force-pushed the updated branch to keep the history clean.

anbe42

You lost the commit adding the new test, I'll cherry-pick that from the previous version.

scaronni added the enhancement label Feb 22, 2025

scaronni self-assigned this Feb 22, 2025

anbe42 requested changes Feb 22, 2025

View reviewed changes

dkms.in Outdated Show resolved Hide resolved

scaronni assigned anbe42 Feb 22, 2025

KernelGhost requested a review from anbe42 February 25, 2025 08:15

anbe42 requested changes Mar 3, 2025

View reviewed changes

dkms.in Show resolved Hide resolved

Alert users to kernel module signing failures using existing warn fun…

38bcfae

…ction

KernelGhost force-pushed the master branch from 896e3e4 to 38bcfae Compare March 5, 2025 04:29

anbe42 approved these changes Mar 5, 2025

View reviewed changes

anbe42 merged commit be42f2c into dell:master Mar 5, 2025
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report kernel module signing errors to prevent silent failures #496

Report kernel module signing errors to prevent silent failures #496

KernelGhost commented Feb 22, 2025 •

edited

Loading

scaronni commented Feb 22, 2025

anbe42 left a comment

KernelGhost commented Feb 23, 2025

KernelGhost commented Feb 25, 2025 •

edited

Loading

anbe42 commented Mar 3, 2025

KernelGhost commented Mar 3, 2025

anbe42 left a comment

anbe42 commented Mar 3, 2025

KernelGhost commented Mar 5, 2025

anbe42 left a comment

Report kernel module signing errors to prevent silent failures #496

Report kernel module signing errors to prevent silent failures #496

Conversation

KernelGhost commented Feb 22, 2025 • edited Loading

scaronni commented Feb 22, 2025

anbe42 left a comment

Choose a reason for hiding this comment

KernelGhost commented Feb 23, 2025

KernelGhost commented Feb 25, 2025 • edited Loading

anbe42 commented Mar 3, 2025

KernelGhost commented Mar 3, 2025

anbe42 left a comment

Choose a reason for hiding this comment

anbe42 commented Mar 3, 2025

KernelGhost commented Mar 5, 2025

anbe42 left a comment

Choose a reason for hiding this comment

KernelGhost commented Feb 22, 2025 •

edited

Loading

KernelGhost commented Feb 25, 2025 •

edited

Loading