Incompatible schema changes break file skipping #712

scovich · 2025-02-21T22:36:37Z

Describe the bug

Suppose a query includes a skipping-eligible predicate over LONG column c.

Then we expect the add.stats column to include min/max stats that can parse as LONG.

However, it is possible that a recent table replace operation changed the schema of c -- previously it was a STRING column. This is incompatible with c new type. In that case, every file that had been in the table will be canceled by a remove (table replacement always truncates the original table), ensuring that no incompatible file actions survive log replay.

Unfortunately, kernel currently attempts to parse the entire add.stats column before deduplicating (in order to avoid tracking pruned files), and is thus exposed to parsing failures for rows that contain canceled add actions (e.g. add.stats.minValues.c = 'A' cannot parse as LONG).

This issue has a second aspect: Data skipping doesn't track or exclude partition columns directly. So we attempt data skipping over c, with the same risk of parsing failures, even if it's now a partition column. Fixing the general problem would make this harmless, but it's probably worth specifically tracking and excluding partition columns from the data skipping machinery so we don't waste time trying to parse (usually non-existent) stats and evaluating (provably useless) data skipping expressions for partition columns.

NOTE: Ideally, this issue should not arise if column mapping is enabled, because the physical names of the new columns should differ from the originals even if their logical names still seem to match.

To Reproduce

Invoke LogReplayScanner::process_scan_batch twice -- once with a batch containing an incompatible remove (to mark the file as "seen"), and again with a batch containing a matching incompatible add. It will fail with e.g.

Arrow(JsonError("whilst decoding field 'minValues': whilst decoding field 'c': failed to parse \"A\" as Int64"))

Expected behavior

The previously-seen remove should eliminate the add before it gets a chance to cause trouble.

Additional context

No response

The text was updated successfully, but these errors were encountered:

scovich · 2025-02-24T18:43:48Z

NOTE: The java kernel avoids this issue by deduplicating file actions before attempting to parse add.stats, and also because the json parser honors selection vectors and ignores unselected rows.

The rust kernel json parser also ignores null rows, but we don't currently (have a way to) update the null mask based on the deduplication kernel performed. We'll need to figure out how to do that. Additionally, we would want to split the deduplication into "check" and "update" passes, so that we can do:

Sanitize the rows of a batch (eliminate non-file action rows, eliminate previously seen files, etc
Parse stats and partition values of surviving rows, apply further pruning
Update the "seen" set only for files that survived pruning

That way, we get the best of both worlds: pruning minimizes the cardinality of the "seen" set, but the "seen" set can still protect pruning attempts from incompatible schema changes.

## What changes are proposed in this pull request? Add basic support for partition pruning by combining two pieces of existing infra: 1. The log replay row visitor already needs to parse partition values and already filters out unwanted rows 2. The default predicate evaluator works directly with scalars Result: partition pruning gets applied during log replay, just before deduplication so we don't have to remember pruned files. WARNING: The implementation currently has a flaw, in case the history contains a table-replace that affected partition columns. For example, changing a value column into a non-nullable partition column, or an incompatible type change to a partition column. In such cases, the remove actions generated by the table-replace operation (for old files) would have the wrong type or even be entirely absent. While the code can handle an absent partition value, an incompatibly typed value would cause a parsing error that fails the whole query. Note that stats-based data skipping already has the same flaw, so we are not making the problem worse. We will fix the problem for both as a follow-up item, tracked by #712 NOTE: While this is a convenient way to achieve partition pruning in the immediate term, Delta [checkpoints](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#checkpoints-1) can provide strongly-typed `stats_parsed` and `partitionValues_parsed` columns which would have a completely different access. * For `stats` vs. `stats_parsed`, the likely solution is simple enough because we already json-parse `stats` into a strongly-typed nested struct in order to evaluate the data skipping predicate over its record batch. We just avoid the parsing overhead if `stats_parsed` is already available. * The `partitionValues` field poses a bigger challenge, because it's a string-string map, not a JSON literal. In order to turn it into a strongly-typed nested struct, we would need a SQL expression that can extract the string values and try-cast them to the desired types. That's ugly enough we might prefer to keep completely different code paths for parsed vs. string partition values, but then there's a risk that partition pruning behavior changes depending on which path got invoked. ## How was this change tested? New unit tests, and adjusted one unit test that assumed no partition pruning.

scovich · 2025-03-13T15:03:10Z

The same issue affects partition pruning, and the same check-parse-update approach should fix it.

scovich added the bug Something isn't working label Feb 21, 2025

scovich mentioned this issue Feb 21, 2025

feat: Add basic partition pruning support #713

Merged

scovich mentioned this issue Mar 3, 2025

Option for JSON parser to return NULL field values on type mismatch apache/arrow-rs#7230

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incompatible schema changes break file skipping #712

Incompatible schema changes break file skipping #712

scovich commented Feb 21, 2025 •

edited

Loading

scovich commented Feb 24, 2025 •

edited

Loading

scovich commented Mar 13, 2025

Incompatible schema changes break file skipping #712

Incompatible schema changes break file skipping #712

Comments

scovich commented Feb 21, 2025 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Additional context

scovich commented Feb 24, 2025 • edited Loading

scovich commented Mar 13, 2025

scovich commented Feb 21, 2025 •

edited

Loading

scovich commented Feb 24, 2025 •

edited

Loading