Skip to content

Add OAuth2 refresh_token support#705

Open
talatuyarer wants to merge 6 commits intoduckdb:mainfrom
talatuyarer:oauth-refresh-token
Open

Add OAuth2 refresh_token support#705
talatuyarer wants to merge 6 commits intoduckdb:mainfrom
talatuyarer:oauth-refresh-token

Conversation

@talatuyarer
Copy link
Contributor

OAuth2's refresh token flow (https://datatracker.ietf.org/doc/html/rfc6749#section-6) is not supported yet in iceberg extension.

This PR adds RFC 6749 refresh token grant support to DuckDB’s Iceberg extension OAuth2 flow. The Iceberg extension previously supported acquiring access tokens but did not implement the refresh path, causing long-running sessions to fail once the access token expired.

Refresh token rotation is tracked in-memory only. Rotated refresh tokens are not persisted back into DuckDB secrets; after DETACH/re-ATTACH, the original refresh token is used.

@talatuyarer
Copy link
Contributor Author

@Tmonster Do you want to review this ?

…T body credentials for Google compatibility. Add tests to validate behavior for refresh_token and client_credentials grants.
@talatuyarer
Copy link
Contributor Author

You can also give a try by using this commands

Extract Your Credentials from gcloud ADC file:

CLIENT_ID=$(cat ~/.config/gcloud/application_default_credentials.json | jq -r '.client_id')
CLIENT_SECRET=$(cat ~/.config/gcloud/application_default_credentials.json | jq -r '.client_secret')
REFRESH_TOKEN=$(cat ~/.config/gcloud/application_default_credentials.json | jq -r '.refresh_token')
LOAD iceberg;

-- Create secret with refresh_token for automatic refresh
CREATE SECRET biglake_auto_refresh (
    TYPE ICEBERG,
    CLIENT_ID '<your-client-id>',
    CLIENT_SECRET '<your-client-secret>',
    REFRESH_TOKEN '<your-refresh-token>',
    OAUTH2_SERVER_URI 'https://oauth2.googleapis.com/token',
    EXTRA_HTTP_HEADERS MAP {'x-goog-user-project': '<your-project-id>'}
);

-- Attach to BigLake
ATTACH 'gs://biglake-public-nyc-taxi-iceberg' AS biglake_public (
    TYPE ICEBERG,
    SECRET biglake_auto_refresh,
    ENDPOINT 'https://biglake.googleapis.com/iceberg/v1/restcatalog'
);

-- Query data - token will auto-refresh when needed!
SELECT COUNT(*) FROM biglake_public.public_data.nyc_taxicab;

@talatuyarer
Copy link
Contributor Author

Just additional context. I had to change Original code (line 193):

doc = std::unique_ptr<yyjson_doc, YyjsonDocDeleter>(ICUtils::api_result_to_doc(response->body));

The old code was using an Iceberg catalog API parser for OAuth2 token responses that matches Iceberg catalog APIs (e.g. { "error": { "message": "..." } }), but it does not match OAuth2, which violates RFC 6749. The new code parses OAuth2 responses according to the RFC 6749 specification.

OAuth2 token endpoints return successful responses like:

{ "access_token": "...", "token_type": "Bearer", "expires_in": 3600 }

And per RFC 6749 §5.2, OAuth2 errors look like:

{ "error": "invalid_scope", "error_description": "..." }

Notice error is a string, not an object. So when api_result_to_doc does yyjson_obj_get(error, "message"), it returns NULL, and we end up throwing "No message available" which is super misleading and hides the real OAuth2 error.

Switching to:

auto *doc = yyjson_read(response->body.c_str(), response->body.size(), 0);

means we’re just parsing JSON without imposing Iceberg catalog semantics on it. Then OAuthTokenResponse::FromJSON can validate the token response in the correct OAuth2-aware way and surface meaningful errors.

Copy link
Collaborator

@Tmonster Tmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
I will try and take a closer look tomorrow, today my schedule is pretty busy.

@talatuyarer talatuyarer requested a review from Tmonster February 12, 2026 17:46
Copy link
Collaborator

@Tmonster Tmonster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! a couple of comments.
Also, do you know of any cloud based catalogs I could test this against? I'm guessing BigLake supports this?

post_data);
} catch (HTTPException &http_ex) {
// APIUtils::Request throws HTTPException for non-2xx responses.
throw InvalidConfigurationException("Could not get token from %s: HTTP error: %s", uri, http_ex.what());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APIUtils actually doesn't throw for non-2XX responses. 400 responses can be parsed via the response object so you can inspect codes and error messages directly

// Compute expiry time if we have both token and expires_in
if (!token.IsNull() && expires_in_seconds > 0) {
// Pass empty string for refresh_token to preserve the one we just set
result->UpdateTokenState(token.ToString(), expires_in_seconds, "");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accoring to src/storage/authorization/oauth2.cpp:630 we should have a lock when calling UpdateTokenState. Seems Seems like we don't have it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No lock needed here. we're still constructing this object and haven't shared it with other threads yet. UpdateTokenState is safe to call during construction. Once this object is returned and shared, UpdateTokenState will only be called from RefreshAccessTokenUnlocked which requires holding token_mutex.


string OAuth2Authorization::GetToken(ClientContext &context, const string &grant_type, const string &uri,
const string &client_id, const string &client_secret, const string &scope) {
namespace {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you merge with the other namespace above?

ATTACH '' AS quick_expiry_lake (
TYPE ICEBERG,
CLIENT_ID 'client1',
CLIENT_SECRET 's3cr3t',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also pass an expires_in here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expires_in comes from the OAuth2 server response. No need to add expires_in her. it's automatically retrieved from the OAuth2 server response when using client credentials!

@talatuyarer
Copy link
Contributor Author

Hi @Tmonster Thank you for reviewing.

Yes Google Could implement this RFC standard. You can use Biglake to test it.

I gave instruction here. If you dont have credentail file

gcloud auth application-default login authenticates you with Google Cloud using your user account and creates a local credentials file.

It should be under here:

Linux, macOS: ~/.config/gcloud/application_default_credentials.json
Windows: %APPDATA%\gcloud\application_default_credentials.json

#705 (comment)

@Tmonster
Copy link
Collaborator

Ah yes, thanks. Will do that then. Also working on fixing the Polaris tests, will post here when they are fixed

…g a lock for token state checks and refresh operations.

Addresed comments on PR
@talatuyarer talatuyarer requested a review from Tmonster February 14, 2026 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants