-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added freshness aware table loading using entityId:entityVersion for ETag #1037
base: main
Are you sure you want to change the base?
Added freshness aware table loading using entityId:entityVersion for ETag #1037
Conversation
…g and added feature to Polaris
Please use "Draft PRs" for this - not "ready for review" |
...ce/common/src/main/java/org/apache/polaris/service/catalog/PolarisCatalogHandlerWrapper.java
Outdated
Show resolved
Hide resolved
...ce/common/src/main/java/org/apache/polaris/service/catalog/PolarisCatalogHandlerWrapper.java
Outdated
Show resolved
Hide resolved
...ce/common/src/main/java/org/apache/polaris/service/catalog/PolarisCatalogHandlerWrapper.java
Outdated
Show resolved
Hide resolved
...ce/common/src/main/java/org/apache/polaris/service/catalog/PolarisCatalogHandlerWrapper.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm supportive of the idea, but we need some tests. Otherwise left a few comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this PR implements ETag
and If-None-Match
in a non-HTTP-spec conformant way.
I'd really prefer to respect RFC 9110 here.
/** | ||
* Entities that can expose an ETag that can uniquely identify their current state. | ||
*/ | ||
public interface ETaggableEntity { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface mixes REST/HTTP concerns w/ persistence concerns. Please remove the ETag functionality from persistence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed! Removed all references to it in the persistence layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this separation is wise. Besides the fact that it complicates the code, we'll eventually want to expose etag semantics to the persistence layer in some form. i.e. if the entity an etag refers to is indeed still in the version specified in the etag, we don't need to pull the entity out of the metastore at all.
…ed persistence from etag logic
I've gone ahead and added HTTP compliant representations and parsing of the
|
|
||
protected static Pattern ETAG_PATTERN = Pattern.compile("(W/)?\"([^\"]*)\""); | ||
|
||
private final boolean weak; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to support weak & strong etags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I supported their parsing, but in the current implementation all table operations only return weak etags. So, we never generate a strong etag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the strong etag code is dead code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followed up in #1037 (comment)
@Override | ||
public boolean equals(Object o) { | ||
if (o instanceof ETag other) { | ||
return weak == other.weak && value.equals(other.value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right. If anything, all of our etags are technically weak because the credentials would be different but we think the responses are semantically identical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused at what's being pointed out here- the line commented on compares two etags to see if they're both strong or weak validators, and then that they contain the same internal value. I've marked all of our ETags as weak since we never do a byte for byte comparison. My thinking was, we don't know the content that we encode in a weak vs strong etag if we choose to support both in future, so comparisons should only be between weak etags and other weak etags and strong etags and other strong etags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our etags are never strong etags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. As such, if we receive a strong etag we should ensure that we don't match it. This requires us to define a way to tell strong etags from weak ones.
While we don't generate strong etags, I don't believe the distinction being available is dead code. If I receive a strong etag in the header, the distinction that isWeak()
provides me is the ability to check if the received etag was strong or weak, and then only return 304 if the received header was weak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you somehow receive a strong etag in the header, simple string matching will show that it doesn't match any etag our application can generate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I've removed the wrapper and now we do just regular string comparisons.
private final List<ETag> etags; | ||
|
||
/** | ||
* Parses a non wildcard If-None-Match header value into ETags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the reason to support this? You can use an etag of *
to check if a table exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only added this to be compliant with the HTTP spec as @snazy requested, since *
is a valid If-None-Match
value with the desired behavior of matching any resourxe. I believe the wildcard is supposed to be useful to see if something already exists on a put, kind of like a create if not exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is lacking wildcard support really noncompliant? Can you link the relevant part of the RFC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. But in our case the PUT (createTable) already has semantics around failing the request if the table exists. What additional semantics would having an *
on that request provide?
I wouldn't have a problem with just failing on an etag like this as we really don't expect one from the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The endpoint now rejects the wildcard value. 👍
service/common/src/main/java/org/apache/polaris/service/http/ETag.java
Outdated
Show resolved
Hide resolved
/** | ||
* Logical representation of an HTTP compliant If-None-Match header. | ||
*/ | ||
public class IfNoneMatch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: couldn't this be a record? The current constructor could be turned into a static factory method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are certain semantics surrounding the If-None-Match header that I feel would not be as clear with a record.
The If-None-Match
header can either take the value of the wildcard *
or a (possibly empty) list of etags, not both. The canonical constructor for a class like this would be defined like IfNoneMatch(boolean isWildcard, List<ETag> etags)
. However, an If-None-Match header is one or the other, not both. I shouldn't be able to construct a header that's both a wildcard and contains etags.
It's easier to capture this when we don't expose the canonical constructor. I've improved the semantics a bit. Now, I can either construct the header as a wildcard by calling IfNoneMatch.wildcard()
to construct a wildcard header, or I can init the header by calling new IfNoneMatch(List.of(etag1, etag2, etag3))
, but I can't give it both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public record IfNoneMatch(Boolean isWildcard, List<ETag> etags) {
public IfNoneMatch(boolean isWildcard) {
this(isWildcard, null);
}
public IfNoneMatch(List<ETag> etags) {
this(null, etags);
}
public IfNoneMatch {
if (!(isWildcard == null ^ etags == null)) {
throw new IllegalArgumentException(...);
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used suggestion! Changed to record with similar semantics.
Ok- I've gone ahead and removed the |
service/common/src/test/java/org/apache/polaris/service/http/IfNoneMatchTest.java
Show resolved
Hide resolved
*/ | ||
public record IfNoneMatch(boolean isWildcard, @Nonnull List<String> eTags) { | ||
|
||
protected static Pattern ETAG_PATTERN = Pattern.compile("(W/)?\"([^\"]*)\""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we don't need the first capture group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. I have taken it out and wrapped the entire thing in a capture group so it can be used to extract the etag from the header/validate an individual etag. See #1037 (comment)
public static IfNoneMatch fromHeader(String rawValue) { | ||
// parse null header as an empty header | ||
if (rawValue == null) { | ||
return new IfNoneMatch(List.of()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make IfNoneMatch(List.of())
a constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made constant IfNoneMatch.EMPTY
} | ||
|
||
rawValue = rawValue.trim(); | ||
if (rawValue.equals("*")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use a constant here, too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made constant for the value, WILDCARD_HEADER_VALUE
and made the header object a constant, IfNoneMatch.WILDCARD
} else { | ||
|
||
List<String> parts = Stream.of(rawValue.split("\\s+")) // Tokenizes string , eg. we will now have [`W/"etag1",`, `W/"etag2"`] | ||
.map(s -> s.endsWith(",") ? s.substring(0, s.length() - 1) : s) // Remove trailing comma from each part, so we now have [`W/"etag1"`, `W/"etag2"`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just split on ,?\\s+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have taken this logic out entirely, see #1037 (comment) for an explanation as to why the parsing was incorrect.
boolean allValid = parts.stream().allMatch(s -> { | ||
Matcher matcher = ETAG_PATTERN.matcher(s); | ||
return matcher.matches(); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks redundant with the static block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have this part out, now we just use the pattern to extract the etags out so we can build the object. The static block still has the validation so that we can validate objects built from the constructor instead of from fromHeader(..)
I realized a bug I introduced with the parsing of the header, splitting by whitespace or commas is incorrect because these are not necessarily always delimiters as they can be contained within the value of an ETag. I have updated it to now use two RegExs, one to validate the format of the entire header to be HTTP compliant (eg. comma and space between each ETag, each ETag matches, no trailing comma) and then one to extract the captured ETags from the header. I have removed unnecessary capture groups from both of these. Previously, an invalid header like I have added a test to ensure this is not passed going forward. I have also replaced the empty and wildcard ETags as constants. |
if (ifNoneMatch.isWildcard()) | ||
throw new BadRequestException("If-None-Match may not take the value of '*'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use braces here
* @return the Polaris table entity for the table | ||
*/ | ||
private TableLikeEntity getTableEntity(TableIdentifier tableIdentifier) { | ||
PolarisResolvedPathWrapper target = resolutionManifest.getPassthroughResolvedPath(tableIdentifier); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we potentially adding another trip to the metastore here after an operation like createTable?
* @param eTag the eTag value | ||
* @param <T> The type of the encapsulated response object | ||
*/ | ||
public record ETaggedResponse<T> (@Nonnull T response, @Nonnull String eTag) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the trailing newline got dropped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very close to me, commented on a few lingering concerns. It seems like we are just trusting the cache to avoid extra hits against the metastore here
return new ETaggedResponse<>( | ||
doCatalogOperation(() -> CatalogHandlers.createTable(baseCatalog, namespace, request)), | ||
generateETagValueForTable(getTableEntity(identifier)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to ensure that the entity we're using to generate the etag here is the exact same one returned in the response.
Otherwise, imagine the response body describes the table at time T but you call getTableEntity
at time T+1 and generate the etag accordingly. A user who keeps polling for the table using the etag will never see the table's state at time T+1
Adding freshness aware table loading to Polaris to match the Iceberg 1.8.0 REST Spec change.
createTable
,registerTable
, andloadTable
will now issue an ETag, and loadTable will consume an If-None-Match header and compare it against the current ETag in the TableLikeEntity to determine if the the client's version is current.The ETag value was decided to be a tuple of entityId:entityVersion.