Skip to content

RUST-2235 Implement GSSAPI auth support for Linux and macOS #1413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Jul 18, 2025

Conversation

mattChiaravalloti
Copy link
Collaborator

This PR implements the GSSAPI auth mechanism, as described by the spec. This took a bit longer than anticipated because manual end-to-end testing revealed that the initial implementation was not correct. Subsequent fixes made slow progress towards correctness. Ultimately, the implementation you see here has been manually tested on macOS (locally, against a local KDC and mongod) and an ubuntu2204-arm64-small evergreen host (against the LDAPTEST KDC/mongod). I still need to test this on Windows. I have a separate follow-up ticket to implement automated end-to-end testing, RUST-2236, which I will put up a PR for after confirming this works on Windows and actually writing end-to-end tests.

This implementation is guarded by a feature flag, and passes the existing spec tests when the feature flag is enabled.

@mattChiaravalloti mattChiaravalloti requested a review from a team as a code owner July 8, 2025 15:42
@mattChiaravalloti mattChiaravalloti requested a review from abr-egn July 8, 2025 15:42
@mattChiaravalloti
Copy link
Collaborator Author

@abr-egn I am not able to add other reviewers, so if you know who else could/should be tagged on this, can you add them? Thank you!

Copy link

codeowners-service-app bot commented Jul 9, 2025

Assigned rishitb-mongodb for team dbx-rust because abr-egn is out of office.
Assigned rishitb-mongodb for team dbx-rust because abr-egn is out of office.
Assigned rishitb-mongodb for team dbx-rust because abr-egn is out of office.
Assigned rishitb-mongodb for team dbx-rust because abr-egn is out of office.

@mattChiaravalloti mattChiaravalloti removed the request for review from rishitb-mongodb July 9, 2025 17:02
@mattChiaravalloti
Copy link
Collaborator Author

Ok I believe I've addressed the majority of failures now. The two remaining issues are:

  1. Compiling with the MSRV (1.81.0) fails because libgssapi-sys (a deeply nested indirect dependency) uses unsafe extern "C" blocks, which are disallowed in that version. I'm still trying to figure out how best to address this. Any ideas from reviewers are appreciated.
  2. It seems the Graviton tests are failing because libgssapi-sys cannot compile without setting an ENV variable. The relevant error seems to be: Unable to find libclang: "couldn't find any valid shared libraries matching: ['libclang.so', 'libclang-*.so', 'libclang.so.*', 'libclang-*.so.*'], set the LIBCLANG_PATH environment variable to a path where one of these files can be found (invalid: [])". So we should be able to set LIBCLANG_PATH for that distro and be good to go. I know nothing about this distro though. Perhaps @abr-egn can help out with this one when he's back!

Beyond addressing those two issues, the last thing I need to do is double check the spec for how to handle password being provided on Windows. It looks like there is supposed to be some special handling that happens only on Windows if the password is provided as a uri option. I'll look into that and make any relevant changes.

@mattChiaravalloti mattChiaravalloti removed the request for review from rishitb-mongodb July 10, 2025 20:09
pmeredit
pmeredit previously approved these changes Jul 11, 2025
Copy link
Contributor

@pmeredit pmeredit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once everything is working on every OS/tests are passing. Good design. Just fix the format!s because clippy will start complaining whenever they update.

Cargo.toml Outdated
@@ -58,6 +58,9 @@ gcp-oidc = ["dep:reqwest"]
# This can only be used with the tokio-runtime feature flag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abr-egn

Unrelated to this PR, really (though we could fix it here), should we update this comment since async-std support was dropped?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, yes, good catch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the update should be here. Do I just delete this comment on line 58? Or change it to say something else (if so, what should it say)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, yup, just delete the comment. The driver only supports tokio now so the there's nothing that needs to be documented about this feature.

}

if credential.source.as_deref().unwrap_or("$external") != "$external" {
return Err(ErrorKind::InvalidArgument {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, just noticed that the "must be $external" message for OIDC is slightly different than everything else, maybe we should update it 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated 👍

hostname: &str,
) -> Result<(Self, Vec<u8>)> {
let service_name: &str = properties.service_name.as_ref();
let mut service_principal = format!("{}/{}", service_name, hostname);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut service_principal = format!("{}/{}", service_name, hostname);
let mut service_principal = format!("{service_name}/{hostname}");

Let's just fix this before clippy starts whining about it the next time they update it :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated all 👍

let service_name: &str = properties.service_name.as_ref();
let mut service_principal = format!("{}/{}", service_name, hostname);
if let Some(service_realm) = properties.service_realm.as_ref() {
service_principal = format!("{}@{}", service_principal, service_realm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
service_principal = format!("{}@{}", service_principal, service_realm);
service_principal = format!("{service_principal}@{service_realm}");

ditto

// If no SERVICE_REALM was specified, use realm specified in the
// username. Note that `realm` starts with '@'.
let (_, realm) = user_principal.split_at(idx);
service_principal = format!("{}{}", service_principal, realm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
service_principal = format!("{}{}", service_principal, realm);
service_principal = format!("{service_principal}{realm}");

@pmeredit
Copy link
Contributor

@mattChiaravalloti all the failures atm seem to be from missing libclang, it may make sense to turn off the gssapi feature flag for those tests

@abr-egn abr-egn removed the request for review from rishitb-mongodb July 14, 2025 08:38
Copy link
Contributor

@abr-egn abr-egn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very reasonable!

The MSRV issue can be solved by just bumping up MSRV to 1.82 which introduced the unsafe extern block syntax; that's well past our six month rolling window.

I agree with Patrick that it makes sense to turn off the GSSAPI feature flag for the Graviton variants; I don't think there's enough value in the marginal test coverage increase to justify the engineer time to figure out if it can be made to work.

Cargo.toml Outdated
@@ -58,6 +58,9 @@ gcp-oidc = ["dep:reqwest"]
# This can only be used with the tokio-runtime feature flag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, yes, good catch.

Cargo.toml Outdated
derive_more = "0.99.17"
derive-where = "1.2.7"
dns-lookup = { version = "2.0", optional = true }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use hickory-resolver which we already have as an optional dep?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't even realize we had that imported at first to be honest. Just updated to use it. The interface is a bit clunkier than dns-lookup but at the end of the day I think it's preferable to use the same crate. Thanks for pointing this out.

@mattChiaravalloti
Copy link
Collaborator Author

The MSRV issue can be solved by just bumping up MSRV to 1.82 which introduced the unsafe extern block syntax; that's well past our six month rolling window.

Thanks! I believe I updated 1.81 -> 1.82 in all appropriate places. Time to see if it works as expected 🤞

I agree with Patrick that it makes sense to turn off the GSSAPI feature flag for the Graviton variants; I don't think there's enough value in the marginal test coverage increase to justify the engineer time to figure out if it can be made to work.

Amazing, love when this is a solution. Given this, and the upcoming end-to-end GSSAPI auth testing coming up, I refactored gssapi-auth to be its own variant with its own tasks/functions, similar to all other auth mechanisms.

@mattChiaravalloti
Copy link
Collaborator Author

@abr-egn I discovered that cross-krb5 is not quite as "cross-platform" as I initially believed: it does not work on Windows the way we need it to. It either requires that the Kerberos credentials be the same as the currently logged-in Windows user (not an assumption we could/should make), or that the Windows client install MIT Kerberos. That wouldn't be the worst thing to require but also isn't an ideal user experience. I started trying to write an alternative implementation for Windows using the originally planned sspi crate.

I talked to Natacha and the rest of my team and we decided I'll split this up. This ticket/PR will just cover GSSAPI auth support for Linux and Mac. I'll also only cover end-to-end testing for those two in the next ticket I have in progress, RUST-2236. I'll file a follow-up ticket (in the epic) to implement and e2e test GSSAPI auth support on Windows. I'll pick that one up a bit later, though, since I've been struggling with the kerberos API for about 3 weeks now and need to step away from it a bit to catch my breath.

Just an FYI for why Windows support wasn't done as part of this PR!

@mattChiaravalloti mattChiaravalloti changed the title RUST-2235 Implement GSSAPI auth support RUST-2235 Implement GSSAPI auth support for Linux and macOS Jul 16, 2025
return Ok(hostname.to_string());
}

let resolver = TokioAsyncResolver::tokio_from_system_conf().map_err(|e| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should use ClientOptions::resolver_config, so that'll likely need to be added to HandshakerOptions and plumbed through in EstablisherOptions::from_client_options and Credential::authenticate_stream.

If you're feeling ambitious it's feeling like it's about time for an AuthenticateOptions struct that bundles together the non-Connection arguments to authenticate_stream and has a From<ClientOptions> impl so it can just get passed through Handshaker without having to update all the call sites each time new data like this is needed, but absolutely reasonable if you'd rather not include that in this PR 🙂

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, I only piped through resolver_config so that it makes it all the way down to canonicalize_hostname. Let me know if you think it all looks good since it involved touching a big number of files. I decided to go with [cfg(any(feature = "dns-lookup", feature = "gssapi-auth"))] in the places where it's possible that only gssapi-auth is enabled without explicitly enabling dns-lookup (and vice versa). Curious to see what you think about that decision in particular.

Also, I was considering how to best introduce the AuthenticateOptions idea you proposed, but I'm a bit confused what you're envisioning. You suggested having it implement From<ClientOptions>, but where would those ClientOptions come from? By the time we call authenticate_stream we're already 2 steps removed from ClientOptions: it goes ClientOptions --into--> HandshakerOptions --into--> Handshaker, and then Handshaker calls authenticate_stream. Are you suggesting that Handshaker be updated to contain an AuthenticateOptions struct field instead of all the non-connection individual fields it currently has? Would that same thing apply to HandshakerOptions? Maybe it would be best to leave this to someone on your team in a follow-up ticket since you guys have better expertise on the design practices you use in this repo!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, I only piped through resolver_config so that it makes it all the way down to canonicalize_hostname. Let me know if you think it all looks good since it involved touching a big number of files. I decided to go with [cfg(any(feature = "dns-lookup", feature = "gssapi-auth"))] in the places where it's possible that only gssapi-auth is enabled without explicitly enabling dns-lookup (and vice versa). Curious to see what you think about that decision in particular.

I think the right thing here is for the 'gssapi-auth' feature to require 'dns-lookup', not 'dep:hickory-resolver' (after all, it doesn't really care how the dns resolution happens). That would mean all the existing places that are conditioned on dns-lookup can continue to be conditioned on just that and don't need to change, and anything added for GSSAPI support specifically can be conditioned on just #[cfg(feature = "gssapi-auth")].

Also, I was considering how to best introduce the AuthenticateOptions idea you proposed, but I'm a bit confused what you're envisioning. You suggested having it implement From<ClientOptions>, but where would those ClientOptions come from? By the time we call authenticate_stream we're already 2 steps removed from ClientOptions: it goes ClientOptions --into--> HandshakerOptions --into--> Handshaker, and then Handshaker calls authenticate_stream. Are you suggesting that Handshaker be updated to contain an AuthenticateOptions struct field instead of all the non-connection individual fields it currently has? Would that same thing apply to HandshakerOptions? Maybe it would be best to leave this to someone on your team in a follow-up ticket since you guys have better expertise on the design practices you use in this repo!

Agreed, definitely better for one of us in a follow-up ticket, there's a lot of context here 🙂

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart, yes that's a better solution. I'll work on that change and get it back in shortly. Thanks!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abr-egn It is all updated appropriately now 👍 Do you want me to file the follow-up ticket for AuthenticationOptions, or do you want to file it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed RUST-2247; it'll make a nice capper for the end of spec fest 😁

@mattChiaravalloti
Copy link
Collaborator Author

MacOS hosts are pretty busy now so those might take a while. They've been sporadically failing the last few commits. Hopefully those are just transient errors that don't need to be addressed as part of this PR.

What's more concerning was oidc-auth-test-gcp-latest's first run failing (I restarted it once for now). I double checked and nothing in this PR should have impacted that test. Not sure if that kind of failure is transient though. Hoping the second run sheds more light on this situation.

@abr-egn
Copy link
Contributor

abr-egn commented Jul 18, 2025

MacOS hosts are pretty busy now so those might take a while. They've been sporadically failing the last few commits. Hopefully those are just transient errors that don't need to be addressed as part of this PR.

What's more concerning was oidc-auth-test-gcp-latest's first run failing (I restarted it once for now). I double checked and nothing in this PR should have impacted that test. Not sure if that kind of failure is transient though. Hoping the second run sheds more light on this situation.

There's an uncommon but persistent flaky failure in that test (example) that we haven't been able to nail down, I suspect that's what you ran into.

Copy link
Contributor

@abr-egn abr-egn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@abr-egn abr-egn merged commit d750b6c into mongodb:main Jul 18, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants