Skip to content

Implements Token Federation for Python Driver #552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 39 commits into
base: main
Choose a base branch
from
Open

Implements Token Federation for Python Driver #552

wants to merge 39 commits into from

Conversation

madhav-db
Copy link
Contributor

@madhav-db madhav-db commented May 7, 2025

What type of PR is this?

  • Refactor
  • Feature
  • Bug Fix
  • Other

Description

This PR adds token federation support to the Databricks SQL Python connector, which allows using external identity provider tokens (like GitHub Actions OIDC tokens) with Databricks SQL.

Key Changes

Core Implementation

  • Added token federation as a new auth type with supporting classes and methods
  • Implemented token exchange mechanism to convert external tokens to Databricks tokens

Code Architecture

  • Added DatabricksTokenFederationProvider class to handle token federation
  • Added Token class to manage token lifecycle and expiry
  • Implemented timezone-aware datetime handling to prevent comparison issues
  • Added IdP detection to support various identity providers (Azure AD, GitHub, Google, AWS)

API & Configuration

  • Added identity_federation_client_id parameter for token federation
  • Added proper OIDC discovery for finding token endpoints
  • Added fallback mechanisms for error handling

Testing

  • Added unit tests with mocking for token federation components
  • Added end-to-end test for GitHub OIDC tokens

Future Improvements

  • Token federation should be refactored as a feature that works with different auth types instead of being an auth type itself
  • OAuthProvider should be integrated with token federation to allow token exchange for OAuth-acquired tokens
  • Use a standardized approach for feature flags across the codebase

This PR enables Databricks SQL connector users to leverage external identity providers for authentication, particularly useful in CI/CD environments like GitHub Actions.

How is this tested?

  • Unit tests
  • E2E Tests
  • Manually (via CI/CD)
  • N/A

Related Tickets & Documents

Notes for reviewers:

Token Federation Flow

1. Client Initialization

  • User creates a SQL connection with auth_type="token-federation" and provides an external token
  • Can be initialized either with access_token or a custom credentials_provider
  • LIMITATION: Currently implemented as a standalone auth type, not a feature that can be combined with other auth types
  • TODO: Refactor to make token federation a feature that works with any auth type via a use_token_federation flag

2. Auth Provider Selection

  • get_auth_provider() in auth.py detects token federation auth type
  • Creates a DatabricksTokenFederationProvider wrapper around the credential source
  • TODO: Remove TOKEN_FEDERATION as an auth_type while maintaining backward compatibility
  • TODO: Allow wrapping of existing providers (DatabricksOAuthProvider, AccessTokenAuthProvider, etc.)

3. Token Evaluation

  • When headers are requested, the federation provider:
    1. Gets external token from underlying provider
    2. Parses JWT claims to check token issuer
    3. Determines if token needs exchange based on issuer comparison
  • The token evaluation works with any valid JWT, regardless of how it was obtained
  • TODO: Design interfaces to wrap any auth provider with token federation capability

4. Token Exchange

  • If token is from a different issuer than the target Databricks host:
    1. Uses OIDC discovery to find token endpoint
    2. Exchanges external token for Databricks token via token exchange protocol
    3. Stores exchanged token and original external token for future reference
  • If token is from same issuer, uses original token without exchange
  • This process works correctly for any token regardless of source

5. Token Refresh

  • Before token expiry (controlled by TOKEN_REFRESH_BUFFER_SECONDS = 10):
    1. Requests fresh external token from underlying provider
    2. Exchanges this fresh token for a new Databricks token
    3. Updates stored tokens and headers
  • LIMITATION: Relies on underlying provider for fresh tokens

6. Fallback Handling

  • If token exchange or refresh fails, falls back to original external token
  • Logs appropriate warnings/errors

Future Provider Integration Plan

To properly integrate token federation with all auth providers in authenticators.py:

  1. Decorator Pattern Implementation:

    • Create a wrapper class that can decorate any AuthProvider with token federation capabilities
    • Allow wrapping of DatabricksOAuthProvider, AccessTokenAuthProvider, etc.
  2. Configuration Changes:

    • Add a use_token_federation boolean flag to connection parameters
    • Modify get_auth_provider() to apply token federation wrapper when flag is set
  3. Provider Interface Enhancement:

    • Update CredentialsProvider interface to expose necessary token information
    • Ensure DatabricksOAuthProvider properly implements this interface for token access
  4. Backward Compatibility:

    • Maintain support for existing auth_type="token-federation" during transition
    • Add deprecation warnings and migration guidance

The core token exchange functionality works well for any token, but the current architecture limits token federation to being a separate auth type. The primary improvement needed is architectural - enabling token federation to work with other auth types (including OAuth-based ones) while maintaining backward compatibility.

Copy link

github-actions bot commented May 7, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 7, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 7, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 7, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 7, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 7, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 7, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 7, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

…nd enhance unit tests for accurate expiry verification
@madhav-db madhav-db deployed to azure-prod May 12, 2025 06:10 — with GitHub Actions Active
Copy link

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@madhav-db madhav-db requested a review from jprakash-db May 12, 2025 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants