Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Spec: Add ConnectionConfigInfo to ExternalCatalog #1026

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dennishuo
Copy link
Contributor

Remove the currently unused remoteUrl field from the top-level ExternalCatalog into the ConnectionConfigInfo as remoteUri instead for better consistency; remote catalogs in the future may be defined by arbitrary URIs that are not, for example, http(s) URLs.

This is just the spec definition for now, so it's not yet wired into the internal entity layer or persistence objects.

Allow extensibility of different connection types in the future even if we start with only an ICEBERG_REST type.

Similarly, provide extensibility for different authn mechanisms to use with the connection.

Remove the currently unused remoteUrl field from the top-level ExternalCatalog
into the ConnectionConfigInfo as remoteUri instead for better consistency;
remote catalogs in the future may be defined by arbitrary URIs that are not,
for example, http(s) URLs.

This is just the spec definition for now, so it's not yet wired into the
internal entity layer or persistence objects.

Allow extensibility of different connection types in the future even if we
start with only an ICEBERG_REST type.

Similarly, provide extensibility for different authn mechanisms to use
with the connection.
connectionConfigInfo:
$ref: "#/components/schemas/ConnectionConfigInfo"

ConnectionConfigInfo:
Copy link
Member

@XJDKC XJDKC Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow updating the ConnectionConfigInfo on catalog entity? See UpdateCatalogRequest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, yes, I'll add it to the update as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! For updating connection configs, I think we should only allow modifying the secret info so that users can rotate the secrets. For other connection configs, if we allow customers to modify them, customers may point to another remote catalog.

@@ -850,9 +850,92 @@ components:
- $ref: "#/components/schemas/Catalog"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For external catalog, is the storage config required if Polaris just passes through the response sent from remote catalog and not generates the subscoped creds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, StorageConfig might become more optional. However, this is actually an important design point about whether we're willing to return the remote catalog's subscoped creds.

At least some of the known use cases explicitly want Polaris to be the one responsible for access control and credential vending, while the remote catalog does not perform credential vending. So we want the ability for Polaris to mix-in vended credentials.

Returning the remote catalog's vended credentials will probably need to be configurable. For most real use cases we'd probably want some formal protocol for declaring the "on-behalf-of delegation chain"; e.g. the ConnectionConfig contains a "system identity" but we'd want a way to declare the identity of the calling Principal in the request to the remote catalog.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that we should provide a configuration for customers!

The vended credentials send back from the remote catalog represents the system identity, it's very powerful and we need some sorts of request-scoped identity. This could be achieved by passing a http header to polaris.

type: string
description: URL to the remote catalog API
connectionConfigInfo:
$ref: "#/components/schemas/ConnectionConfigInfo"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make ConnectionConfigInfo a required property for external catalogs, just like StorageConfigInfo is for internal catalogs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For current backwards-compatibility, I was actually thinking of it like:

  1. if (ExternalCatalog.getConnectionConfigInfo() == null) { internalSubtype = STATIC_FACADE; }
  2. else { internalSubtype = PASSTHROUGH_FACADE; }

Admittedly that might be kind of a hack to only use the presence of connection config to determine static vs passthrough facade, but conceptually, it makes sense that an ExternalCatalog that can't dial out to a remote catalog fundamentally must behave as a "static facade" where content is "pushed" into the ExternalCatalog.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or should we introduce a new catalog type like federated?

enum:
- ICEBERG_REST
description: The type of remote catalog service represented by this connection
remoteUri:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remote for a URI is superfluous, IMHO


OauthRestAuthenticationInfo:
type: object
description: OAuth authentication based on client_id/client_secret
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's just about the client credentials flow, maybe we can rename the type to be something like OAuthClientCredentialsParameters?.. Iceberg REST Servers do not have to be restricted to client credentials, other flows may be available (e.g. delegation), which are still OAuth2, but will require different parameters.

type: string
description: oauth scopes to specify when exchanging for a short-lived access token

BearerRestAuthenticationInfo:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bearer auth is not specific to Iceberg REST. How about BearerAuthenticationParameters?

Copy link
Member

@XJDKC XJDKC Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! We might need to create an authentication config specifically for ConnectionConfig. For other connections, like IcebergHiveConnectionConfig, we can still reuse the existing authentication parameters.

Also, using Parameters as part of the name is a great approach. Since we’ll be generating some classes based on the spec, suffixing them with Parameters makes sense as they are part of requests. This also provides more flexibility in naming our internal classes.

For example, we currently have AwsStorageConfigInfo (generated by OpenAPI) and AwsStorageConfigurationInfo (internal representation of the AWS storage config), which can be quite confusing. Using a clearer naming convention will help distinguish between generated request models and internal representations.

Maybe we can use IcebergRestConnectionConfigInfo in our spec and IcebergRestConnectionConfig as the name of internal class, or other suffix like Model. If we follow the same pattern as the storage config, the name of the internal name would be very long, e.g. IcebergRestConnectionConfigurationInfo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion above makes sense to me (my first comment was just a minor naming concern)

remoteUrl:
type: string
description: URL to the remote catalog API
connectionConfigInfo:
Copy link
Contributor

@dimas-b dimas-b Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do updates happen? It looks like changing any config property in an ExternalCatalog requires re-submitting the whole object... Is that so?

Having client re-submit credentials on every config change is probably not ideal 🤔 WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants