Added freshness aware table loading using entityId:entityVersion for ETag #1037

mansehajsingh · 2025-02-20T19:49:41Z

Adding freshness aware table loading to Polaris to match the Iceberg 1.8.0 REST Spec change. createTable, registerTable, and loadTable will now issue an ETag, and loadTable will consume an If-None-Match header and compare it against the current ETag in the TableLikeEntity to determine if the the client's version is current.

The ETag value was decided to be a tuple of entityId:entityVersion.

Added authorization unit tests to ensure the new PolarisCatalogHandlerWrapper APIs adhere to same privileges
Added integration tests to ensure correct behavior of registerTable, loadTable, and createTable via new API changes

…g and added feature to Polaris

snazy · 2025-02-21T12:31:53Z

NOTE: DO NOT REVIEW. Currently only to demonstrate a POC.

Please use "Draft PRs" for this - not "ready for review"

spec/rest-catalog-open-api.yaml

...ce/common/src/main/java/org/apache/polaris/service/catalog/PolarisCatalogHandlerWrapper.java

eric-maynard

I'm supportive of the idea, but we need some tests. Otherwise left a few comments

…ware-table-loading

snazy

It looks like this PR implements ETag and If-None-Match in a non-HTTP-spec conformant way.

I'd really prefer to respect RFC 9110 here.

snazy · 2025-02-27T18:59:23Z

polaris-core/src/main/java/org/apache/polaris/core/entity/ETaggableEntity.java

+/**
+ * Entities that can expose an ETag that can uniquely identify their current state.
+ */
+public interface ETaggableEntity {


This interface mixes REST/HTTP concerns w/ persistence concerns. Please remove the ETag functionality from persistence.

Removed! Removed all references to it in the persistence layer.

I'm not sure this separation is wise. Besides the fact that it complicates the code, we'll eventually want to expose etag semantics to the persistence layer in some form. i.e. if the entity an etag refers to is indeed still in the version specified in the etag, we don't need to pull the entity out of the metastore at all.

…ed persistence from etag logic

mansehajsingh · 2025-02-28T00:22:43Z

I've gone ahead and added HTTP compliant representations and parsing of the ETag and If-None-Match headers. Now,

The returned ETag will always be of the form W/"entityId:entityVersion" since we never do byte by byte comparisons
The provided If-None-Match header can specify the wildcard * to match any ETag or can specify the format W/"entityId:entityVersion", and even multiple ETags in the header as is defined in the HTTP standard. For example, if the returned ETag was W/"850:2" all of *, W/"850:2", and W/"850:2", "some-other-strong-etag" would return 304. However, W/"850:1" or "850:2" would not match it.

eric-maynard · 2025-02-28T00:27:04Z

service/common/src/main/java/org/apache/polaris/service/http/ETag.java

+
+    protected static Pattern ETAG_PATTERN = Pattern.compile("(W/)?\"([^\"]*)\"");
+
+    private final boolean weak;


Do we really need to support weak & strong etags?

I supported their parsing, but in the current implementation all table operations only return weak etags. So, we never generate a strong etag.

So the strong etag code is dead code?

Followed up in #1037 (comment)

eric-maynard · 2025-02-28T00:28:21Z

service/common/src/main/java/org/apache/polaris/service/http/ETag.java

+    @Override
+    public boolean equals(Object o) {
+        if (o instanceof ETag other) {
+            return weak == other.weak && value.equals(other.value);


This doesn't look right. If anything, all of our etags are technically weak because the credentials would be different but we think the responses are semantically identical

I'm a bit confused at what's being pointed out here- the line commented on compares two etags to see if they're both strong or weak validators, and then that they contain the same internal value. I've marked all of our ETags as weak since we never do a byte for byte comparison. My thinking was, we don't know the content that we encode in a weak vs strong etag if we choose to support both in future, so comparisons should only be between weak etags and other weak etags and strong etags and other strong etags.

Our etags are never strong etags.

I agree. As such, if we receive a strong etag we should ensure that we don't match it. This requires us to define a way to tell strong etags from weak ones.

While we don't generate strong etags, I don't believe the distinction being available is dead code. If I receive a strong etag in the header, the distinction that isWeak() provides me is the ability to check if the received etag was strong or weak, and then only return 304 if the received header was weak.

If you somehow receive a strong etag in the header, simple string matching will show that it doesn't match any etag our application can generate

Sounds good, I've removed the wrapper and now we do just regular string comparisons.

eric-maynard · 2025-02-28T00:29:02Z

service/common/src/main/java/org/apache/polaris/service/http/IfNoneMatch.java

+    private final List<ETag> etags;
+
+    /**
+     * Parses a non wildcard If-None-Match header value into ETags


What would be the reason to support this? You can use an etag of * to check if a table exists?

I only added this to be compliant with the HTTP spec as @snazy requested, since * is a valid If-None-Match value with the desired behavior of matching any resourxe. I believe the wildcard is supposed to be useful to see if something already exists on a put, kind of like a create if not exists.

Is lacking wildcard support really noncompliant? Can you link the relevant part of the RFC?

This is the section I was trying to address https://httpwg.org/specs/rfc9110.html#field.if-none-match:~:text=If%20the%20field%20value%20is%20%22*%22%2C%20the%20condition%20is%20false%20if%20the%20origin%20server%20has%20a%20current%20representation%20for%20the%20target%20resource.

I see. But in our case the PUT (createTable) already has semantics around failing the request if the table exists. What additional semantics would having an * on that request provide?

I wouldn't have a problem with just failing on an etag like this as we really don't expect one from the client.

The endpoint now rejects the wildcard value. 👍

service/common/src/main/java/org/apache/polaris/service/http/ETag.java

adutra · 2025-02-28T09:32:52Z

service/common/src/main/java/org/apache/polaris/service/http/IfNoneMatch.java

+/**
+ * Logical representation of an HTTP compliant If-None-Match header.
+ */
+public class IfNoneMatch {


Same here: couldn't this be a record? The current constructor could be turned into a static factory method.

There are certain semantics surrounding the If-None-Match header that I feel would not be as clear with a record.

The If-None-Match header can either take the value of the wildcard * or a (possibly empty) list of etags, not both. The canonical constructor for a class like this would be defined like IfNoneMatch(boolean isWildcard, List<ETag> etags). However, an If-None-Match header is one or the other, not both. I shouldn't be able to construct a header that's both a wildcard and contains etags.

It's easier to capture this when we don't expose the canonical constructor. I've improved the semantics a bit. Now, I can either construct the header as a wildcard by calling IfNoneMatch.wildcard() to construct a wildcard header, or I can init the header by calling new IfNoneMatch(List.of(etag1, etag2, etag3)), but I can't give it both.

public record IfNoneMatch(Boolean isWildcard, List<ETag> etags) { public IfNoneMatch(boolean isWildcard) { this(isWildcard, null); } public IfNoneMatch(List<ETag> etags) { this(null, etags); } public IfNoneMatch { if (!(isWildcard == null ^ etags == null)) { throw new IllegalArgumentException(...); } } }

Used suggestion! Changed to record with similar semantics.

mansehajsingh · 2025-03-01T00:46:16Z

Ok- I've gone ahead and removed the ETag wrapper. We now no longer need explicit distinctions between strong and weak etags, we can just use regular string matches. I fixed some parsing logic for the If-None-Match header, and made the wrapper class into a record. It also now fails on * since we don't really have use for that functionality of If-None-Match,

service/common/src/test/java/org/apache/polaris/service/http/IfNoneMatchTest.java

eric-maynard · 2025-03-03T21:28:43Z

service/common/src/main/java/org/apache/polaris/service/http/IfNoneMatch.java

+ */
+public record IfNoneMatch(boolean isWildcard, @Nonnull List<String> eTags) {
+
+    protected static Pattern ETAG_PATTERN = Pattern.compile("(W/)?\"([^\"]*)\"");


It looks like we don't need the first capture group

Yes, you're right. I have taken it out and wrapped the entire thing in a capture group so it can be used to extract the etag from the header/validate an individual etag. See #1037 (comment)

eric-maynard · 2025-03-03T21:29:43Z

service/common/src/main/java/org/apache/polaris/service/http/IfNoneMatch.java

+    public static IfNoneMatch fromHeader(String rawValue) {
+        // parse null header as an empty header
+        if (rawValue == null) {
+            return new IfNoneMatch(List.of());


Can we make IfNoneMatch(List.of()) a constant?

Made constant IfNoneMatch.EMPTY

eric-maynard · 2025-03-03T21:29:53Z

service/common/src/main/java/org/apache/polaris/service/http/IfNoneMatch.java

+        }
+
+        rawValue = rawValue.trim();
+        if (rawValue.equals("*")) {


Let's use a constant here, too

Made constant for the value, WILDCARD_HEADER_VALUE and made the header object a constant, IfNoneMatch.WILDCARD

eric-maynard · 2025-03-03T21:30:58Z

service/common/src/main/java/org/apache/polaris/service/http/IfNoneMatch.java

+        } else {
+
+            List<String> parts = Stream.of(rawValue.split("\\s+")) // Tokenizes string , eg. we will now have [`W/"etag1",`, `W/"etag2"`]
+                    .map(s -> s.endsWith(",") ? s.substring(0, s.length() - 1) : s) // Remove trailing comma from each part, so we now have [`W/"etag1"`, `W/"etag2"`]


Can we just split on ,?\\s+

I have taken this logic out entirely, see #1037 (comment) for an explanation as to why the parsing was incorrect.

eric-maynard · 2025-03-03T21:31:38Z

service/common/src/main/java/org/apache/polaris/service/http/IfNoneMatch.java

+            boolean allValid = parts.stream().allMatch(s -> {
+                Matcher matcher = ETAG_PATTERN.matcher(s);
+                return matcher.matches();
+            });


This looks redundant with the static block

I have this part out, now we just use the pattern to extract the etags out so we can build the object. The static block still has the validation so that we can validate objects built from the constructor instead of from fromHeader(..)

mansehajsingh · 2025-03-04T18:32:10Z

I realized a bug I introduced with the parsing of the header, splitting by whitespace or commas is incorrect because these are not necessarily always delimiters as they can be contained within the value of an ETag. I have updated it to now use two RegExs, one to validate the format of the entire header to be HTTP compliant (eg. comma and space between each ETag, each ETag matches, no trailing comma) and then one to extract the captured ETags from the header. I have removed unnecessary capture groups from both of these.

Previously, an invalid header like W/"etag" W/"etag" would have passed through, now the delimiter will be properly validated and parsed.

I have added a test to ensure this is not passed going forward.

I have also replaced the empty and wildcard ETags as constants.

eric-maynard · 2025-03-05T17:53:59Z

service/common/src/main/java/org/apache/polaris/service/catalog/IcebergCatalogAdapter.java

+    if (ifNoneMatch.isWildcard())
+      throw new BadRequestException("If-None-Match may not take the value of '*'");


Let's use braces here

eric-maynard · 2025-03-05T17:57:06Z

...ce/common/src/main/java/org/apache/polaris/service/catalog/PolarisCatalogHandlerWrapper.java

+   * @return the Polaris table entity for the table
+   */
+  private TableLikeEntity getTableEntity(TableIdentifier tableIdentifier) {
+    PolarisResolvedPathWrapper target = resolutionManifest.getPassthroughResolvedPath(tableIdentifier);


Are we potentially adding another trip to the metastore here after an operation like createTable?

eric-maynard · 2025-03-05T17:59:30Z

service/common/src/main/java/org/apache/polaris/service/catalog/response/ETaggedResponse.java

+ * @param eTag the eTag value
+ * @param <T> The type of the encapsulated response object
+ */
+public record ETaggedResponse<T> (@Nonnull T response, @Nonnull String eTag) {}


Looks like the trailing newline got dropped

eric-maynard

Looks very close to me, commented on a few lingering concerns. It seems like we are just trusting the cache to avoid extra hits against the metastore here

eric-maynard · 2025-03-05T18:02:31Z

...ce/common/src/main/java/org/apache/polaris/service/catalog/PolarisCatalogHandlerWrapper.java

+    return new ETaggedResponse<>(
+            doCatalogOperation(() -> CatalogHandlers.createTable(baseCatalog, namespace, request)),
+            generateETagValueForTable(getTableEntity(identifier))


I think we need to ensure that the entity we're using to generate the etag here is the exact same one returned in the response.

Otherwise, imagine the response body describes the table at time T but you call getTableEntity at time T+1 and generate the etag accordingly. A user who keeps polling for the table using the etag will never see the table's state at time T+1

Pulled in iceberg 1.8.0 spec changes for freshness aware table loadin…

39e0ea4

…g and added feature to Polaris

mansehajsingh requested review from adutra, ashvina, dennishuo, dimas-b, eric-maynard, jackye1995, jbonofre, vvcephei, collado-mike, snazy, RussellSpitzer, takidau, MonkeyCanCode, flyrain and ebyhr as code owners February 20, 2025 19:49

mansehajsingh changed the title ~~Pulled in iceberg 1.8.0 spec changes for freshness aware table loadin…~~ Added freshness aware table loading using metadata location for ETag Feb 20, 2025