Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: REST Catalog OAuth2 Support Refresh Token Flow #12362

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nika-qubit
Copy link

Supported refreshing token using the refresh token flow in addition to token exchange flow for OAuth2.

Closes #12196

Supported refreshing token using the refresh token flow in addition to token exchange flow for OAuth2.

Closes apache#12196
@nika-qubit
Copy link
Author

Rebased to the latest main HEAD at this moment.

Added test is ./gradlew :iceberg-core:test --tests org.apache.iceberg.rest.TestRESTCatalog.testCatalogTokenRefreshByRefreshTokenFlow.

Could someone please review it? Thanks!

@nika-qubit nika-qubit changed the title Core: REST Catalog Core: REST Catalog OAuth2 Support Refresh Token Flow Feb 20, 2025
// Catalog headers are used to send requests to the Catalog REST endpoint.
Map<String, String> catalogHeaders =
ImmutableMap.of("Authorization", "Bearer client-credentials-token:sub=catalog");
// Basic headers are used to send requests to the oauth2 server enepoint.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a typo here, will fix it with other changes based on comments.

@adutra
Copy link
Contributor

adutra commented Feb 21, 2025

Hi @nika-qubit thank you so much for starting working on this, it's been a while that I wanted to tackle this myself.

However I am not sure I agree with the direction taken in this PR:

First off, refresh tokens should not be exposed as a client configuration option, for a few reasons:

  1. Contrary to an access token, refresh tokens are intended for the client acting on behalf of the resource owner (the user), but not for the users themselves. IOW, it should stay internal to the communication between the client and the authorization server.
  2. Moreover, refresh tokens are typically long-lived, and thus have a higher risk of being compromised.
  3. And finally, refresh tokens are generally revoked by the IDP when they are first used, and a new refresh token is issued and returned to the client. This is part of the security recommendations from RFC 9700. So it doesn't really make sense to provide a refresh token via configuration, knowing that it would be probably revoked when used the first time.

Secondly, I would argue that since the primary grant type is always client_credentials, we shouldn't have to use refresh tokens at all.

Let me expand on that:

The refresh_token grant type defined in RFC 6749 is meant for grant types that involve human interaction, such as authorization_code. This is to allow the client to request another access token without asking the user to login again, which would be really annoying. But in the client_credentials grant, there is no user interaction; it is just as simple for the client to send another client_credentials request and fetch another token.

That's why the RFC 6749 explicitly states that "a refresh token SHOULD NOT be included" in responses to the client_credentials flow.

Auth0, for instance, does not issue refresh tokens for the client_credentials grant type. Keycloak can be configured to do so, but does not by default.

So I would suggest instead:

  • if token is provided, then it should be used as is and never refreshed (as it is the case today)
  • if credential is provided, then I would introduce a flag to determine how to refresh an expired access token automatically:
    • if the flag value is legacy, then use the current token exchange flow to refresh the token;
    • if the flag value is strict, then discard the token and use client_credentials to fetch a new one.

Does that make sense?

\cc @danielcweeks @nastra

@nika-qubit
Copy link
Author

@adutra Thanks for the detailed explanation.

A little background of why I was adding the refresh token flow:

I'm trying to make the auth package work with Google OAuth 2.0. Since the client is not the (cloud) resource owner initially, a user consent is always needed from the very beginning.

  1. The user consent gets you an authorization code. (This is the only place where a user interaction is needed).
  2. Then your (default) application can use the authorization code to exchange for an access code through the authorization_code grant type (this additionally requires the client id and secret in the request and returns a long-living refresh token along the access token).
  3. Then your application can either continuously use the previous step or a refresh token flow to get new access tokens.

Based on my knowledge, Google OAuth 2.0 does not support the client_credentials grant type (nor the token_exchange grant type but some other auth API endpoint may do).

From my perspective, both point 2 ("authorization code flow") and point 3 ("refresh token flow") send requests to the auth server with the client credential info with each of their own additional code/token without user interaction.

Do you think I should start looking for Google's auth endpoints that do support the "token_exchange" grant type?

A side question is that an access token usually expires in an hour. Does it mean if we set the option, the REST client can only work for that amount of time?

@adutra
Copy link
Contributor

adutra commented Feb 22, 2025

@nika-qubit thanks for the detailed context!

Indeed, Google OAuth 2.0 uses the so-called "Authorization Code" flow for authentication. Iceberg REST unfortunately does not have support for this flow.

I understand and share your pain here, but imho the right way to proceed is to provide Iceberg REST with support for this flow natively.

Just implementing support for refreshing tokens, as you did in this PR, imho is not going to solve this problem since you still need to pass the refresh token manually. Instead, we need the Iceberg REST client to handle that transparently, which would include redirecting the user to their browser for authentication whenever required.

But this is a complex flow. The good news is, I've been working on this for quite a while. The plan would be as follows:

  1. Get the AuthManager API merged (Auth Manager API part 6: API enablement #12197) and released, possibly in Iceberg 1.9.0.
  2. Provide support for the Authorization Code flow as an implementation of that API.

So, how urgent is this for you? Can you wait a few weeks more?

@adutra
Copy link
Contributor

adutra commented Feb 22, 2025

A side question is that an access token usually expires in an hour. Does it mean if we set the option, the REST client can only work for that amount of time?

If you set the token option and the access token expires in one hour, then yes, the client can only work for one hour. That's why using the credential option instead is better – but this option for now does not support the Authorization Code flow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[REST Catalog] OAuth 2 grant type "refresh_token" not implemented
2 participants