Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report kernel module signing errors to prevent silent failures #496

Merged
merged 1 commit into from
Mar 5, 2025

Conversation

KernelGhost
Copy link
Contributor

@KernelGhost KernelGhost commented Feb 22, 2025

This PR improves the kernel module signing process by ensuring failures are properly detected and reported. Previously, dkms suppressed the signing command output and did not check the exit status, leading to silent failures. This was a personal issue for me, as I had a malformed X.509 certificate and spent an hour debugging why dkms wasn't signing kernel modules. Now, if signing fails, a clear error is displayed along with the signing command output for easier debugging. These changes were tested successfully on Fedora Linux (6.12.13-200.fc41.x86_64).

@scaronni
Copy link
Collaborator

Looks good, thanks. Can you also please add a test for it in run_test.sh?

@scaronni scaronni self-assigned this Feb 22, 2025
Copy link
Collaborator

@anbe42 anbe42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update run_tests.sh for the changed output on some signing tests (that now show error messages). The checks need to pass again.

@KernelGhost
Copy link
Contributor Author

I’ve added a test case for the revised code in run_test.sh, but I’m not very familiar with this process and seem to have introduced a failure in the "Testing dkms.conf specifying a module twice" case. I've attached the output from the failing test below.

Given my lack of experience writing test cases for dkms and my limited familiarity with the project, I’d appreciate guidance from someone more acquainted with the project. If anyone could provide insights or a commit to fix the issue, that would be greatly appreciated.

Testing dkms.conf specifying a module twice
 Building and installing the test module
--- test_cmd_expected_output.log	2025-02-24 09:31:17.418198816 +1100
+++ test_cmd_output.log	2025-02-24 09:31:28.343245758 +1100
@@ -9,6 +9,8 @@
 Signing module /var/lib/dkms/dkms_duplicate_test/1.0/build/dkms_duplicate_test.ko
 strip: '/var/lib/dkms/dkms_duplicate_test/1.0/build/dkms_duplicate_test.ko': No such file
 Signing module /var/lib/dkms/dkms_duplicate_test/1.0/build/dkms_duplicate_test.ko
+Warning: Failed to sign module '/var/lib/dkms/dkms_duplicate_test/1.0/build/dkms_duplicate_test.ko'!
+sign-file: /var/lib/dkms/dkms_duplicate_test/1.0/build/dkms_duplicate_test.ko
 xz: /var/lib/dkms/dkms_duplicate_test/1.0/build/dkms_duplicate_test.ko: No such file or directory
 cp: cannot stat '/var/lib/dkms/dkms_duplicate_test/1.0/build/dkms_duplicate_test.ko': No such file or directory
 Cleaning build area...(bad exit status: 1)
Error: unexpected output from: dkms install -k 6.12.15-200.fc41.x86_64 -m dkms_duplicate_test -v 1.0

@KernelGhost KernelGhost requested a review from anbe42 February 25, 2025 08:15
@KernelGhost
Copy link
Contributor Author

KernelGhost commented Feb 25, 2025

With the latest change in commit 896e3e4, running sudo ./run_tesh.sh now returns *** All tests successful :). However, as I mentioned earlier, I'm not very familiar with this project, and I found modifying the testing script somewhat challenging. The testing script's logic was difficult to follow, and much of the code felt opaque to someone unfamiliar with it. Given this, I’d appreciate a review of my changes to the testing script by a more experienced maintainer.

Specifically, I’d like to ensure that I haven’t:

  1. Introduced unnecessarily hard-coded strings
  2. Violated any assumptions or structural integrity of the testing script
  3. Made the script less flexible or incompatible with other Linux distributions
  4. Contributed low quality or sub-optimal code

Thank you in advance for your review!

@anbe42
Copy link
Collaborator

anbe42 commented Mar 3, 2025

I'm working on a solution for the failing test ...
... and I think I just found more ways how a bad dkms.conf with duplicate settings could make dkms explode ...

@KernelGhost
Copy link
Contributor Author

I've noticed that the testing script relies heavily on matching specific output strings from DKMS, which seems to make it quite fragile and challenging to adapt when there are minor changes in the main program. A more robust approach might be to use specific exit status codes to represent different errors or error categories rather than relying on exact string comparisons. Alternatively, leveraging regular expressions and grep to match expected DKMS output could make the tests more resilient and maintainable.

I also found the script somewhat opaque and difficult to modify. Despite making only a single-line change, I wasn't able to get all tests to pass, which suggests that the current approach might introduce unnecessary complexity. That said, I fully appreciate the effort that has gone into this testing framework. It's clear that it's designed to ensure a high-quality, battle-tested tool that works across various Linux distributions and configurations.

This is just a suggestion, and I completely understand if there are reasons for the current implementation that I’m not aware of. I’m not deeply involved in the project, so please feel free to disregard this if it doesn’t align with the overall goals. Just wanted to share some thoughts in case they might be helpful!

Copy link
Collaborator

@anbe42 anbe42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rebase, I changed the logic for duplicate module case and thus the last commit will most likely no longer be neccessary

@anbe42
Copy link
Collaborator

anbe42 commented Mar 3, 2025

I've noticed that the testing script relies heavily on matching specific output strings from DKMS, which seems to make it quite fragile and challenging to adapt when there are minor changes in the main program.

The problem you encountered is in strings outside the control of dkms: distribution specific errors (or not-errors) with (distribution specific) error messages from (distribution specific) commands called by dkms ...

A more robust approach might be to use specific exit status codes to represent different errors or error categories rather than relying on exact string comparisons.

I'm at least happy that dkms no longer exits with 0 in case of an error ... which was not always the case. For cleanup of the error codes used see #463

Alternatively, leveraging regular expressions and grep to match expected DKMS output could make the tests more resilient and maintainable.

If we change strings within dkms the tests should immediately blow up and 'trivial' to fix. Ideally each string emittable is covered by a test ..

I also found the script somewhat opaque and difficult to modify. Despite making only a single-line change, I wasn't able to get all tests to pass, which suggests that the current approach might introduce unnecessary complexity. That said, I fully appreciate the effort that has gone into this testing framework. It's clear that it's designed to ensure a high-quality, battle-tested tool that works across various Linux distributions and configurations.

You are right, it's a non-trivial piece of code but so far the best we have. But it looks like we are reaching a scalability point soon - right now it takes about 20 minutes while it is still only testing a fraction of the functionality ... Parallelization won't be trivial since we cannot modify /lib/modules or /var/lib/dkms or run depmod in parallel

This is just a suggestion, and I completely understand if there are reasons for the current implementation that I’m not aware of. I’m not deeply involved in the project, so please feel free to disregard this if it doesn’t align with the overall goals. Just wanted to share some thoughts in case they might be helpful!

Trying to fix it ourself and providing feedback about the problems you encountered is very welcome and helpful!

And as a followup me trying to fix the test on the failing distributions made no notice and fix two more things:

  • why do we need the distribution specifc error messages at all? We already checked that the module was built successfully, if it is going to disappear during later processing, just emit a warning and skip it
  • zstd default behavior is slightly different from gzip/bzip2/xz (it does not delete the input file after compression) which made the error case very distribution specific

@KernelGhost
Copy link
Contributor Author

@anbe42 I've rebased my branch on the upstream repository's master branch and resolved the conflict between my most recent commit and the changes you pushed to resolve the failing test. Additionally, I have squashed all my changes into a single commit and force-pushed the updated branch to keep the history clean.

Copy link
Collaborator

@anbe42 anbe42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You lost the commit adding the new test, I'll cherry-pick that from the previous version.

@anbe42 anbe42 merged commit be42f2c into dell:master Mar 5, 2025
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants