-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Spec: Add ConnectionConfigInfo to ExternalCatalog #1026
base: main
Are you sure you want to change the base?
Conversation
Remove the currently unused remoteUrl field from the top-level ExternalCatalog into the ConnectionConfigInfo as remoteUri instead for better consistency; remote catalogs in the future may be defined by arbitrary URIs that are not, for example, http(s) URLs. This is just the spec definition for now, so it's not yet wired into the internal entity layer or persistence objects. Allow extensibility of different connection types in the future even if we start with only an ICEBERG_REST type. Similarly, provide extensibility for different authn mechanisms to use with the connection.
connectionConfigInfo: | ||
$ref: "#/components/schemas/ConnectionConfigInfo" | ||
|
||
ConnectionConfigInfo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we allow updating the ConnectionConfigInfo on catalog entity? See UpdateCatalogRequest
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, yes, I'll add it to the update as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! For updating connection configs, I think we should only allow modifying the secret info so that users can rotate the secrets. For other connection configs, if we allow customers to modify them, customers may point to another remote catalog.
@@ -850,9 +850,92 @@ components: | |||
- $ref: "#/components/schemas/Catalog" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For external catalog, is the storage config required if Polaris just passes through the response sent from remote catalog and not generates the subscoped creds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, StorageConfig might become more optional. However, this is actually an important design point about whether we're willing to return the remote catalog's subscoped creds.
At least some of the known use cases explicitly want Polaris to be the one responsible for access control and credential vending, while the remote catalog does not perform credential vending. So we want the ability for Polaris to mix-in vended credentials.
Returning the remote catalog's vended credentials will probably need to be configurable. For most real use cases we'd probably want some formal protocol for declaring the "on-behalf-of delegation chain"; e.g. the ConnectionConfig contains a "system identity" but we'd want a way to declare the identity of the calling Principal in the request to the remote catalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that we should provide a configuration for customers!
The vended credentials send back from the remote catalog represents the system identity, it's very powerful and we need some sorts of request-scoped identity
. This could be achieved by passing a http header to polaris.
type: string | ||
description: URL to the remote catalog API | ||
connectionConfigInfo: | ||
$ref: "#/components/schemas/ConnectionConfigInfo" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make ConnectionConfigInfo
a required property for external catalogs, just like StorageConfigInfo
is for internal catalogs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For current backwards-compatibility, I was actually thinking of it like:
- if (ExternalCatalog.getConnectionConfigInfo() == null) { internalSubtype = STATIC_FACADE; }
- else { internalSubtype = PASSTHROUGH_FACADE; }
Admittedly that might be kind of a hack to only use the presence of connection config to determine static vs passthrough facade, but conceptually, it makes sense that an ExternalCatalog that can't dial out to a remote catalog fundamentally must behave as a "static facade" where content is "pushed" into the ExternalCatalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or should we introduce a new catalog type like federated
?
enum: | ||
- ICEBERG_REST | ||
description: The type of remote catalog service represented by this connection | ||
remoteUri: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remote
for a URI is superfluous, IMHO
|
||
OauthRestAuthenticationInfo: | ||
type: object | ||
description: OAuth authentication based on client_id/client_secret |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's just about the client credentials flow, maybe we can rename the type to be something like OAuthClientCredentialsParameters
?.. Iceberg REST Servers do not have to be restricted to client credentials, other flows may be available (e.g. delegation), which are still OAuth2, but will require different parameters.
type: string | ||
description: oauth scopes to specify when exchanging for a short-lived access token | ||
|
||
BearerRestAuthenticationInfo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bearer auth is not specific to Iceberg REST. How about BearerAuthenticationParameters
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! We might need to create an authentication config specifically for ConnectionConfig. For other connections, like IcebergHiveConnectionConfig
, we can still reuse the existing authentication parameters.
Also, using Parameters
as part of the name is a great approach. Since we’ll be generating some classes based on the spec, suffixing them with Parameters
makes sense as they are part of requests. This also provides more flexibility in naming our internal classes.
For example, we currently have AwsStorageConfigInfo
(generated by OpenAPI) and AwsStorageConfigurationInfo
(internal representation of the AWS storage config), which can be quite confusing. Using a clearer naming convention will help distinguish between generated request models and internal representations.
Maybe we can use IcebergRestConnectionConfigInfo
in our spec and IcebergRestConnectionConfig
as the name of internal class, or other suffix like Model
. If we follow the same pattern as the storage config, the name of the internal name would be very long, e.g. IcebergRestConnectionConfigurationInfo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion above makes sense to me (my first comment was just a minor naming concern)
remoteUrl: | ||
type: string | ||
description: URL to the remote catalog API | ||
connectionConfigInfo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do updates happen? It looks like changing any config property in an ExternalCatalog
requires re-submitting the whole object... Is that so?
Having client re-submit credentials on every config change is probably not ideal 🤔 WDYT?
Remove the currently unused remoteUrl field from the top-level ExternalCatalog into the ConnectionConfigInfo as remoteUri instead for better consistency; remote catalogs in the future may be defined by arbitrary URIs that are not, for example, http(s) URLs.
This is just the spec definition for now, so it's not yet wired into the internal entity layer or persistence objects.
Allow extensibility of different connection types in the future even if we start with only an ICEBERG_REST type.
Similarly, provide extensibility for different authn mechanisms to use with the connection.