Skip to content

Conversation

@sarroutbi
Copy link
Contributor

@sarroutbi sarroutbi commented Dec 1, 2025

The agent was incorrectly interpreting 403 Forbidden responses from the registrar as API version incompatibility errors. This caused two problems:

  1. The agent would try all enabled API versions, even though 403 indicates a permanent security rejection (e.g., TPM identity mismatch during re-registration)

  2. The agent would continue running after registration failure, making it appear operational when it was not properly registered

This issue became apparent with the Python keylime registrar security fix for CVE-2025-13609 (duplicate UUID vulnerability), which returns 403 Forbidden when an agent attempts to re-register with a different TPM identity.

The agent will now correctly fail fast when the registrar rejects registration for security reasons, allowing proper error detection in tests and production deployments.

Related: keylime/keylime#1820 (Python registrar UUID spoofing fix)

@sarroutbi sarroutbi force-pushed the 202512011610-fail-on-registration-failure branch 2 times, most recently from 300d1a3 to 052b1b5 Compare December 1, 2025 15:17
@codecov
Copy link

codecov bot commented Dec 1, 2025

Codecov Report

❌ Patch coverage is 20.83333% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.32%. Comparing base (a7cafe7) to head (eab1619).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
keylime/src/registrar_client.rs 29.41% 12 Missing ⚠️
keylime-agent/src/main.rs 0.00% 7 Missing ⚠️
Additional details and impacted files
Flag Coverage Δ
e2e-testsuite 57.32% <20.83%> (-0.11%) ⬇️
upstream-unit-tests 57.32% <20.83%> (-0.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
keylime/src/config/testing.rs 33.81% <ø> (ø)
keylime-agent/src/main.rs 15.28% <0.00%> (-0.29%) ⬇️
keylime/src/registrar_client.rs 75.71% <29.41%> (-3.00%) ⬇️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sarroutbi sarroutbi force-pushed the 202512011610-fail-on-registration-failure branch from 052b1b5 to 7ecb4ae Compare December 1, 2025 17:13
@sarroutbi sarroutbi marked this pull request as ready for review December 2, 2025 09:21
error!("Failed to register agent: Registration forbidden - {message}");
error!("This indicates a security rejection (403 Forbidden), likely due to TPM identity mismatch or UUID spoofing attempt.");
error!("The existing agent record must be deleted before re-registering with a different TPM.");
std::process::exit(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling process::exit directly in the business logic is problematic because it skips resource cleanup (e.g. drop handles will not run, leaving TPM handles, file locks, network connections and other resources unreleased), and it also makes it not testable. You should likely use a more idiomatic pattern of propagating the error via Result and have main() decide to exit.

Something like this (change main() signature to return Result):

  async fn main() -> Result<()> {
      // ... existing code ...

      match keylime::agent_registration::register_agent(aa, &mut ctx).await {
          Ok(()) => (),
          Err(Error::RegistrarClient(
              RegistrarClientError::RegistrationForbidden { message },
          )) => {
              error!("Failed to register agent: Registration forbidden - {message}");
              error!("This indicates a security rejection (403 Forbidden), likely due to TPM identity mismatch or UUID
  spoofing attempt.");
              error!("The existing agent record must be deleted before re-registering with a different TPM.");
              return Err(Error::RegistrarClient(
                  RegistrarClientError::RegistrationForbidden { message },
              ));
          }
          Err(e) => {
              error!("Failed to register agent: {e:?}");
              // Decide: should other registration errors also be fatal?
              return Err(e);
          }
      }

      // ... remaining of main ...

      Ok(())
  }

Note: The Rust runtime automatically converts Err from main() to a non-zero exit code, so the external behavior (process exits with error) remains the same while ensuring proper cleanup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the rest of the registration issues, I just wanted to be "as less aggressive as possible". I am not sure we should change Pull Mode agent behavior for registration issues not related to registry duplicate UUID issue.

@sarroutbi sarroutbi force-pushed the 202512011610-fail-on-registration-failure branch from 7ecb4ae to ede0932 Compare December 2, 2025 10:52
Copy link
Contributor

@sergio-correia sergio-correia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks good to me. In the future we can add some testing to cover the new RegistrationForbidden path.

@sarroutbi sarroutbi force-pushed the 202512011610-fail-on-registration-failure branch from ede0932 to 70a0b23 Compare December 2, 2025 11:38
@sarroutbi sarroutbi closed this Dec 2, 2025
@sarroutbi sarroutbi reopened this Dec 2, 2025
@sarroutbi sarroutbi force-pushed the 202512011610-fail-on-registration-failure branch from 70a0b23 to d638060 Compare December 2, 2025 11:40
@sarroutbi sarroutbi force-pushed the 202512011610-fail-on-registration-failure branch from d638060 to 2bc9d13 Compare December 2, 2025 12:21
The agent was incorrectly interpreting 403 Forbidden responses from
the registrar as API version incompatibility errors. This caused two
problems:

1. The agent would try all enabled API versions, even though 403
   indicates a permanent security rejection (e.g., TPM identity
   mismatch during re-registration)

2. The agent would continue running after registration failure,
   making it appear operational when it was not properly registered

This issue became apparent with the Python keylime registrar security
fix for CVE-2025-13609 (duplicate UUID vulnerability), which returns
403 Forbidden when an agent attempts to re-register with a different
TPM identity.

The agent will now correctly fail fast when the registrar rejects
registration for security reasons.

Related: keylime/keylime#1820 (Python registrar UUID spoofing fix)
Co-Authored-By: Claude <[email protected]>
Signed-off-by: Sergio Arroutbi <[email protected]>
@sarroutbi sarroutbi force-pushed the 202512011610-fail-on-registration-failure branch from 2bc9d13 to eab1619 Compare December 2, 2025 12:24
@sarroutbi sarroutbi merged commit ec3e412 into keylime:master Dec 2, 2025
13 of 14 checks passed
@sarroutbi sarroutbi deleted the 202512011610-fail-on-registration-failure branch December 2, 2025 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants