refactor: Improve metadata expression extraction and validation; Update split metadata config format; Add split metadata columns during metadata resolution. #89

wraymo · 2025-11-07T01:26:51Z

Description

Background

To support metadata filtering, we maintain a split metadata configuration file that specifies available metadata columns and the possible mappings between data columns and metadata columns. However, the current design has several major issues:

The columnName field is inconsistent. It is intended to represent a metadata column name, but when a range mapping is included, it actually refers to a data column name, which is confusing.
Metadata columns lack explicit type definitions.
The logic for determining whether a column should be included in a filter is tightly coupled with metadata column definitions, and column name remapping (e.g., exposing a different metadata column name to Presto) is not supported.
Metadata column matching and data-to-metadata mapping rely on string manipulation and regex replacement, which is error-prone and fragile.

To address these issues and provide more robust support for metadata column projection, we redesigned the configuration format and reimplemented metadata expression extraction and validation using Presto RowExpression instead of plain strings. Validation and SQL translation are now handled in the SplitProvider, since split filtering logic is inherently tied to the split provider rather than the plan optimizer.

Implementation

This PR includes:

New metadata config format
- Defines metadata columns, exposed names, and range mappings.
- Supports hierarchical namespaces (global → schema → table).
- Documented in the updated presto-docs/src/main/sphinx/connector/clp.rst.
ClpSplitMetadataConfig
- Loads and interprets the new config.
- Provides access to metadata columns and range mappings.
ClpMySqlSplitMetadataExpressionConverter
- Converts RowExpression trees to SQL filters for metadata pushdown.
- Validates required columns and maps data columns to metadata bounds.
Refactored ClpFilterToKqlConverter
- Uses RowExpression for metadata pushdown instead of strings.
- Separates metadata pushdown and KQL pushdown for different cases and throws exceptions for unsupported metadata pushdown
- Modifies the logic of handling AND and OR.
- Fixes two issues:
  - Added support for non-VARCHAR metadata filters.
  - Correctly handles decimal constants (previously lost scale information).
Updated ClpTableLayoutHandle and ClpExpression
- Store RowExpression instead of strings.
Removed old metadata-related classes
- Replaced with the new config-driven logic.
Added split metadata columns in ClpMySqlMetadataProvider::listColumnHandles.
Removed ClpMetadataDbSetup.
- Replaced by the mock database. This class was partially refactored. After the full refactoring, the whole class can be removed.
Integrated with ClpMySqlSplitProvider
- Uses the new converter in listSplits.
Updated tests and documentation
- All tests updated to the new format.
- Added the new metadata config format documentation.

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Passed all unit tests
Ran an end-to-end test with the following metadata split config file

{
  "": {
    "metaColumns": {
      "size": {
        "type": "BIGINT"
      }
    }
  },
  "default": {
    "metaColumns": {
      "begin_timestamp": {
        "type": "BIGINT"
      },
      "end_timestamp": {
        "type": "BIGINT"
      }
    },
    "filterRules": [
      {
        "column": "begin_timestamp",
        "required": true,
        "reason": "Full scan would be too expensive without timestamp filtering."
      }
    ]
  }
}

And here's the query results.

presto:default> select to_unixtime(from_iso8601_timestamp(timestamp)), clp_get_json_string() from default where service = 'user-service' and begin_timestamp >= 0 order by timestamp;
     _col0     |                                                                                    _col1                                                                                     
---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1.747148401E9 | {"timestamp":"2025-05-13T15:00:01Z","level":"INFO","service":"user-service","event":{"action":"login","status":"success"},"user":{"id":"12345","email":"[email protected]"}} 
 1.747148401E9 | {"timestamp":"2025-05-13T15:00:01Z","level":"INFO","service":"user-service","event":{"action":"login","status":"success"},"user":{"id":"12345","email":"[email protected]"}} 
 1.747148406E9 | {"timestamp":"2025-05-13T15:00:06Z","level":"INFO","service":"user-service","event":{"action":"profile_update"},"user":{"id":"67890","name":"Jane Doe"}}                     
 1.747148406E9 | {"timestamp":"2025-05-13T15:00:06Z","level":"INFO","service":"user-service","event":{"action":"profile_update"},"user":{"id":"67890","name":"Jane Doe"}}                     
 1.747148411E9 | {"timestamp":"2025-05-13T15:00:11Z","level":"WARN","service":"user-service","event":{"action":"multiple_login_attempts"},"user":{"id":"67890"},"meta":{"attempts":5}}        
 1.747148411E9 | {"timestamp":"2025-05-13T15:00:11Z","level":"WARN","service":"user-service","event":{"action":"multiple_login_attempts"},"user":{"id":"67890"},"meta":{"attempts":5}}        
 1.747148414E9 | {"timestamp":"2025-05-13T15:00:14Z","level":"INFO","service":"user-service","event":{"action":"password_change"},"user":{"id":"12345"}}                                      
 1.747148414E9 | {"timestamp":"2025-05-13T15:00:14Z","level":"INFO","service":"user-service","event":{"action":"password_change"},"user":{"id":"12345"}}                                      
  1.74714842E9 | {"timestamp":"2025-05-13T15:00:20Z","level":"INFO","service":"user-service","event":{"action":"logout"},"user":{"id":"12345"}}                                               
  1.74714842E9 | {"timestamp":"2025-05-13T15:00:20Z","level":"INFO","service":"user-service","event":{"action":"logout"},"user":{"id":"12345"}}                                               
(10 rows)

Query 20251110_183238_00009_pfytp, FINISHED, 1 node
Splits: 4 total, 4 done (100.00%)
[Latency: client-side: 143ms, server-side: 124ms] [10 rows, 0B] [80 rows/s, 0B/s]

presto:default> select to_unixtime(from_iso8601_timestamp(timestamp)), clp_get_json_string() from default where service = 'user-service' and begin_timestamp > 0 order by timestamp;
     _col0     |                                                                                    _col1                                                                                     
---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1.747148401E9 | {"timestamp":"2025-05-13T15:00:01Z","level":"INFO","service":"user-service","event":{"action":"login","status":"success"},"user":{"id":"12345","email":"[email protected]"}} 
 1.747148406E9 | {"timestamp":"2025-05-13T15:00:06Z","level":"INFO","service":"user-service","event":{"action":"profile_update"},"user":{"id":"67890","name":"Jane Doe"}}                     
 1.747148411E9 | {"timestamp":"2025-05-13T15:00:11Z","level":"WARN","service":"user-service","event":{"action":"multiple_login_attempts"},"user":{"id":"67890"},"meta":{"attempts":5}}        
 1.747148414E9 | {"timestamp":"2025-05-13T15:00:14Z","level":"INFO","service":"user-service","event":{"action":"password_change"},"user":{"id":"12345"}}                                      
  1.74714842E9 | {"timestamp":"2025-05-13T15:00:20Z","level":"INFO","service":"user-service","event":{"action":"logout"},"user":{"id":"12345"}}                                               
(5 rows)

Query 20251110_183245_00010_pfytp, FINISHED, 1 node
Splits: 3 total, 3 done (100.00%)
[Latency: client-side: 123ms, server-side: 107ms] [5 rows, 0B] [46 rows/s, 0B/s]

presto:default> select to_unixtime(from_iso8601_timestamp(timestamp)), clp_get_json_string() from default where service = 'user-service'  order by timestamp;
Query 20251110_183255_00011_pfytp failed: required filters must be specified

In this setup, we have two archives containing identical data except for the begin_timestamp metadata column. From the results of the first two queries, we can see that metadata filters are correctly extracted and applied during split filtering. In the third query, the system successfully verifies whether a required metadata column is present in the filter.

Summary by CodeRabbit

New Features
- Introduced split‑metadata configuration (namespaced meta‑columns, exposed mappings, range bounds) and metadata‑expression pushdown; metadata columns are now returned with table listings.
- Added MySQL metadata expression converter to translate metadata expressions into WHERE clauses for pushdown.
Bug Fixes / API
- Replaced legacy split‑filter API with the split‑metadata model; updated error codes and stricter required‑column validation.
Tests
- Added metadata config/converter tests, updated pushdown tests, removed obsolete split‑filter tests.
Documentation
- Updated connector docs and config examples to the split‑metadata format.
Chores
- Removed an unused JSON library dependency.

coderabbitai · 2025-11-07T01:27:50Z

Walkthrough

Replaces the split-filter abstraction with a structured split-metadata configuration: adds ClpSplitMetadataConfig and expression-based metadata handling, removes ClpSplitFilterProvider and related files, updates connector/module/optimizer/split-provider/metadata wiring, converts metadata SQL strings to RowExpression-based flows, updates tests/resources and docs, and removes jackson-core from the POM.

Changes

Cohort / File(s)	Summary
Dependency Management `presto-clp/pom.xml`	Removed `jackson-core` dependency; other Jackson deps remain.
Configuration `presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java`	Replaced `splitFilterConfig` with `splitMetadataConfigPath` config key and removed `SplitFilterProviderType`.
Connector & DI Wiring ClpConnector / ClpConnectorFactory / ClpModule / ClpPlanOptimizerProvider `presto-clp/src/main/java/.../ClpConnector.java`, `.../ClpConnectorFactory.java`, `.../ClpModule.java`, `.../ClpPlanOptimizerProvider.java`	Replaced `ClpSplitFilterProvider` usage with `ClpSplitMetadataConfig`; updated imports, fields, constructors and bindings; duplicate early `TypeManager` binding added in factory; module binds `ClpSplitMetadataConfig`.
Error Codes `presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java`	Removed split-filter-specific codes; added `CLP_SPLIT_METADATA_CONFIG_NOT_FOUND` and `CLP_MANDATORY_COLUMN_NOT_IN_FILTER` (reused numeric codes).
Table/Layout & Expression Types `presto-clp/src/main/java/.../ClpTableLayoutHandle.java`, `presto-clp/src/main/java/.../ClpExpression.java`	Replaced Optional metadata SQL fields with Optional metadataExpression; updated constructors, getters, equals/hashCode/toString and package for ClpExpression.
Optimizer / Pushdown `presto-clp/src/main/java/.../optimization/ClpComputePushDown.java`, `presto-clp/src/main/java/.../optimization/ClpFilterToKqlConverter.java`, `presto-clp/src/main/java/.../optimization/ClpPlanOptimizerProvider.java`	Swapped split-filter provider for `ClpSplitMetadataConfig`; refactored pushdown to produce/consume metadata RowExpression; extended filter-to-KQL conversion to produce separate KQL and metadata expressions and accept data-columns-with-range-bounds; removed table-scope state.
Split Metadata Config (new) `presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java`	New class parsing hierarchical JSON (global/schema/table), exposing metadata column types, required columns, exposed→original mappings, data columns with range bounds, and data→metadata range mappings; DI-enabled.
MySQL Split Provider & Converter `presto-clp/src/main/java/.../split/ClpMySqlSplitProvider.java`, `presto-clp/src/main/java/.../split/ClpMySqlSplitMetadataExpressionConverter.java`	`ClpMySqlSplitProvider` now injected with FunctionMetadataManager, StandardFunctionResolution, and ClpSplitMetadataConfig; when metadataExpression present, builds SQL predicate via new converter which rewrites comparisons to metadata bound columns and formats literals (including decimals).
Removed Filter Abstraction `presto-clp/src/main/java/.../split/filter/ClpSplitFilterProvider.java`, `.../ClpMySqlSplitFilterProvider.java`, `.../ClpSplitFilterConfig.java`, `.../ClpSplitFilterConfigCustomOptionsDeserializer.java`	Entire split-filter provider hierarchy, MySQL-specific remapper, split-filter config classes and custom-options deserializer removed.
Metadata Provider `presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java`	Now accepts `ClpSplitMetadataConfig` and appends configured metadata columns to returned column handles.
Documentation `presto-docs/src/main/sphinx/connector/clp.rst`	Reworked docs from split-filter to split-metadata terminology and examples; updated config key to `clp.split-metadata-config-path` and described namespaces/metaColumns/filterRules model.
Tests — updated / added `presto-clp/src/test/java/.../TestClpFilterToKql.java`, `.../TestClpSplit.java`, `.../TestClpUdfRewriter.java`, `.../TestClpSplitMetadataConfig.java`, `.../TestClpMySqlSplitMetadataExpressionConverter.java`, `.../TestClpQueryBase.java`	Tests updated to use metadata expressions and `ClpSplitMetadataConfig`; added tests for split-metadata config and MySQL expression converter; helper to build RowExpression with TypeProvider added.
Tests — removed `presto-clp/src/test/java/.../split/filter/TestClpMySqlSplitFilterConfig.java`, `.../TestClpSplitFilterConfigCommon.java`, `presto-clp/src/test/java/.../ClpMetadataDbSetUp.java`	Removed legacy split-filter tests and embedded metadata DB setup utility.
Test resources `presto-clp/src/test/resources/test-mysql-split-filter.json` (deleted), `presto-clp/src/test/resources/test-mysql-split-metadata.json` (new)	Replaced split-filter JSON resource with new split-metadata JSON describing metaColumns and filterRules.

Sequence Diagram(s)

sequenceDiagram
  participant Planner as QueryPlanner
  participant Optimizer as ClpComputePushDown
  participant Config as ClpSplitMetadataConfig
  participant Converter as ClpFilterToKqlConverter
  participant SplitProv as ClpMySqlSplitProvider
  participant DB as MySQL

  Note over Planner,Optimizer: Planning phase
  Planner->>Optimizer: plan with filter RowExpression
  Optimizer->>Config: request metadata columns & required columns
  alt metadata pushdown allowed
    Optimizer->>Converter: convert filter -> {KQL?, metadataExpression}
    Converter-->>Optimizer: returns KQL and metadataExpression
    Optimizer->>SplitProv: include metadataExpression in layout
    SplitProv->>Converter: transform metadataExpression -> SQL predicate
    Converter-->>SplitProv: SQL predicate
    SplitProv->>DB: execute archive query + predicate
  else fallback
    Optimizer->>Planner: leave filter as remaining expression
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

ClpComputePushDown.java — control-flow refactor, creation of layout handles with RowExpression and required-column checks.
ClpFilterToKqlConverter.java — complex expression translation, dual KQL vs metadata-expression flows, decimal and literal formatting.
ClpSplitMetadataConfig.java & ClpMySqlSplitMetadataExpressionConverter.java — JSON parsing/merging precedence, mappings, bound rewriting and SQL generation correctness.
Tests & resources — ensure new JSON schema, TypeProvider-RowExpression plumbing and new tests cover previous behaviors removed with split-filter.

Possibly related issues

ClpMetadataSetup can be refactored #60 — refactor CLP metadata test setup; relates to removal of ClpMetadataDbSetUp and introduction of mock metadata DB + ClpSplitMetadataConfig.

Possibly related PRs

refactor: Extract split filter provider interface to improve flexibility for user customization. #46 — touches split-filter/metadata layer and many of the same classes; strongly related to this split-filter → split-metadata transition.

Suggested reviewers

kirkrodrigues

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 22.58% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main changes: metadata expression extraction/validation improvements, split metadata config format updates, and split metadata columns added during resolution.
Description check	✅ Passed	The description is comprehensive and well-structured, covering background, implementation details, validation, and all required template sections with proper documentation and examples.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e58a60 and 8137399.

📒 Files selected for processing (2)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (3 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (5 hunks)

🧰 Additional context used

🧠 Learnings (6)

📓 Common learnings

Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

📚 Learning: 2025-06-13T12:56:06.325Z

Learnt from: wraymo
Repo: y-scope/presto PR: 15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java

📚 Learning: 2025-09-16T17:39:45.102Z

Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java

📚 Learning: 2025-11-08T02:05:14.212Z

Learnt from: wraymo
Repo: y-scope/presto PR: 89
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:835-884
Timestamp: 2025-11-08T02:05:14.212Z
Learning: In ClpFilterToKqlConverter's handleOr method, when an OR expression mixes metadata-only predicates with KQL-only predicates, dropping the metadata predicate is expected behavior. Metadata columns are used for archive selection (split filtering stage) and cannot be evaluated at the row level, so in mixed OR scenarios, only the KQL predicates are pushed down and all archives are scanned. This is a design trade-off, not a correctness issue.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java

📚 Learning: 2025-07-30T15:27:04.862Z

Learnt from: anlowee
Repo: y-scope/presto PR: 46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java

📚 Learning: 2025-09-12T14:47:20.734Z

Learnt from: wraymo
Repo: y-scope/presto PR: 64
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:419-483
Timestamp: 2025-09-12T14:47:20.734Z
Learning: In Presto, SQL validation (including type compatibility checks) happens before reaching the connector optimizer, so invalid SQL with type mismatches will be rejected upstream rather than needing validation in the connector-specific optimization code.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java

🧬 Code graph analysis (2)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (2)

presto-native-execution/presto_cpp/presto_protocol/connector/clp/presto_protocol_clp.h (6)

facebook (25-32)

facebook (36-50)

facebook (51-87)

facebook (56-66)

facebook (67-76)

facebook (77-87)

presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)

PrestoException (18-56)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)

ClpSplitMetadataConfig (58-277)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: test (:presto-docs)
GitHub Check: prestissimo-worker-images-build

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
56-61: Remove duplicate TypeManager binding

We now bind TypeManager twice in the same module (binder.bind(TypeManager.class).toInstance(...) appears on both Line 57 and Line 60). Guice rejects duplicate explicit bindings, so the connector bootstrap will fail with a CreationException as soon as this path is executed. Drop one of the bindings (keep the existing one at Line 60, or the new one, but not both) so the injector can be created successfully.
                 binder.bind(FunctionMetadataManager.class).toInstance(context.getFunctionMetadataManager());
-                binder.bind(TypeManager.class).toInstance(context.getTypeManager());
                 binder.bind(NodeManager.class).toInstance(context.getNodeManager());
                 binder.bind(StandardFunctionResolution.class).toInstance(context.getStandardFunctionResolution());
                 binder.bind(TypeManager.class).toInstance(context.getTypeManager());

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41aa302 and a6b71f6.

📒 Files selected for processing (30)

presto-clp/pom.xml (0 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (2 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (3 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (5 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (4 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java (4 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java (16 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java (3 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (4 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (0 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (0 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigCustomOptionsDeserializer.java (0 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (0 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (0 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (5 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMySqlSplitMetadataExpressionConverter.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (4 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplitMetadataConfig.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (7 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpMySqlSplitFilterConfig.java (0 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java (0 hunks)
presto-clp/src/test/resources/test-mysql-split-filter.json (0 hunks)
presto-clp/src/test/resources/test-mysql-split-metadata.json (1 hunks)
presto-docs/src/main/sphinx/connector/clp.rst (2 hunks)

💤 Files with no reviewable changes (9)

presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpMySqlSplitFilterConfig.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigCustomOptionsDeserializer.java
presto-clp/pom.xml
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java
presto-clp/src/test/resources/test-mysql-split-filter.json
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java

🧰 Additional context used

🧠 Learnings (5)

📚 Learning: 2025-07-30T15:27:04.862Z

Learnt from: anlowee
Repo: y-scope/presto PR: 46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java

📚 Learning: 2025-06-13T12:56:06.325Z

Learnt from: wraymo
Repo: y-scope/presto PR: 15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java

📚 Learning: 2025-09-16T17:39:45.102Z

Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java
presto-docs/src/main/sphinx/connector/clp.rst

📚 Learning: 2025-06-19T08:41:46.288Z

Learnt from: anlowee
Repo: y-scope/presto PR: 0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.

Applied to files:

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java

📚 Learning: 2025-09-12T14:47:20.734Z

Learnt from: wraymo
Repo: y-scope/presto PR: 64
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:419-483
Timestamp: 2025-09-12T14:47:20.734Z
Learning: In Presto, SQL validation (including type compatibility checks) happens before reaching the connector optimizer, so invalid SQL with type mismatches will be rejected upstream rather than needing validation in the connector-specific optimization code.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java

🧬 Code graph analysis (12)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java (1)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)

ClpSplitMetadataConfig (58-277)

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplitMetadataConfig.java (3)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1)

ClpMySqlSplitMetadataExpressionConverter (60-245)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)

ClpSplitMetadataConfig (58-277)

presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)

PrestoException (18-56)

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (4)

presto-clp/src/test/java/com/facebook/presto/plugin/clp/mockdb/ClpMockMetadataDatabase.java (1)

ClpMockMetadataDatabase (49-288)

presto-clp/src/test/java/com/facebook/presto/plugin/clp/mockdb/table/ColumnMetadataTableRows.java (1)

ColumnMetadataTableRows (30-63)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)

ClpComputePushDown (44-157)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)

ClpSplitMetadataConfig (58-277)

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMySqlSplitMetadataExpressionConverter.java (2)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1)

ClpMySqlSplitMetadataExpressionConverter (60-245)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)

ClpSplitMetadataConfig (58-277)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)

ClpSplitMetadataConfig (58-277)

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)

ClpSplitMetadataConfig (58-277)

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (2)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java (1)

ClpExpression (29-102)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java (1)

ClpFilterToKqlConverter (106-1071)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (2)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)

ClpConfig (21-186)

presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)

PrestoException (18-56)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)

ClpSplitMetadataConfig (58-277)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1)

presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)

PrestoException (18-56)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (2)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)

ClpSplitMetadataConfig (58-277)

presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)

PrestoException (18-56)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)

presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.h (1)

RowExpression (394-416)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: conventional-commits
GitHub Check: test (17.0.13, :presto-main-base)
GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
GitHub Check: test (8.0.442, :presto-main-base)
GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
GitHub Check: prestissimo-worker-images-build

🔇 Additional comments (1)

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1)

118-121: LGTM!

The new helper method correctly delegates to the existing toRowExpression implementation and enables tests to provide custom TypeProvider instances. This addition supports the metadata-expression testing infrastructure described in the PR objectives.

coderabbitai · 2025-11-08T01:45:03Z

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java

            TableHandle tableHandle = node.getTable();
            ClpTableHandle clpTableHandle = (ClpTableHandle) tableHandle.getConnectorHandle();
-            tableScopeSet.add(format("%s.%s", CONNECTOR_NAME, clpTableHandle.getSchemaTableName()));
+            if (!metadataConfig.getRequiredColumns(clpTableHandle.getSchemaTableName()).isEmpty()) {
+                throw new PrestoException(CLP_MANDATORY_COLUMN_NOT_IN_FILTER, "required filters must be specified");
+            }


⚠️ Potential issue | 🔴 Critical

Don’t throw before metadata filters are processed.

visitTableScan is invoked while rewriting every table scan, including ones that still have a metadata filter sitting above them. Because the new guard executes before the filter is analysed, any table that lists required columns now fails even when the query provides the correct metadata filter. We need to defer the “required filters must be specified” check until after we’ve inspected the filter’s RowExpression (so genuine missing filters still fail, but valid ones pass).

Apply this diff to remove the premature throw:

- if (!metadataConfig.getRequiredColumns(clpTableHandle.getSchemaTableName()).isEmpty()) { - throw new PrestoException(CLP_MANDATORY_COLUMN_NOT_IN_FILTER, "required filters must be specified"); - }

Then, inside processFilter, enforce the requirement only when no metadata expression was produced:

@@ - SchemaTableName schemaTableName = clpTableHandle.getSchemaTableName(); + SchemaTableName schemaTableName = clpTableHandle.getSchemaTableName(); + Set<String> requiredColumns = metadataConfig.getRequiredColumns(schemaTableName); @@ - Optional<RowExpression> metadataExpression = clpExpression.getMetadataExpression(); + Optional<RowExpression> metadataExpression = clpExpression.getMetadataExpression(); + + if (!requiredColumns.isEmpty() && !metadataExpression.isPresent()) { + throw new PrestoException(CLP_MANDATORY_COLUMN_NOT_IN_FILTER, "required filters must be specified"); + }

Don’t forget to add import java.util.Set; at the top. This keeps the enforcement intact for missing filters while allowing valid filtered queries to proceed.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java around lines 82-86, remove the premature PrestoException throw that checks metadataConfig.getRequiredColumns(...) in visitTableScan so the check does not run before the metadata filter is processed; then update processFilter to enforce the "required filters must be specified" rule only when no metadata expression was produced (i.e. when the metadata expression is absent/empty), and add an import line for java.util.Set at the top of the file.

There could be no filters. No filter should be treated as a failure if required columns are specified

That's why we need to do it in visitTableScan

@wraymo, you're absolutely right! The check in visitTableScan is necessary and correct for the no-filter case.

The visitor pattern here works as follows:

When there's no FilterNode above the TableScanNode: visitTableScan is called, and the required-columns check appropriately throws an exception.

When there IS a FilterNode above the TableScanNode: visitFilter is called, which directly invokes processFilter on the source TableScanNode without delegating to visitTableScan. This means visitTableScan is never invoked in this path.

So the check in visitTableScan only fires when there's no filter, which is exactly when you want to reject queries that don't provide the required metadata filters. It doesn't interfere with queries that have filters because those go through the visitFilter → processFilter path instead.

CodeRabbit's concern about a "premature throw before metadata filters are processed" doesn't apply here—when a metadata filter exists in the plan, visitTableScan isn't called at all.

One minor observation (not blocking): processFilter could also add a check to ensure that when required columns exist and the filter produces no metadataExpression, an exception is thrown. This would catch the edge case where a filter exists but doesn't cover the required metadata columns. However, that might be a separate consideration beyond the scope of this specific discussion.

🧠 Learnings used

Learnt from: anlowee Repo: y-scope/presto PR: 46 File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105 Timestamp: 2025-07-30T15:27:04.862Z Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Learnt from: anlowee Repo: y-scope/presto PR: 64 File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299 Timestamp: 2025-09-16T17:39:45.102Z Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

@wraymo, you're absolutely right! I see the distinction now.

The visitor flow means:

Query with no filter → visitTableScan is called → your check catches it ✓

Query with filter → visitFilter is called → processFilter handles it directly → visitTableScan is NOT called for that TableScanNode

So the check in visitTableScan specifically guards against queries that have zero filters when required columns are configured. That's the correct placement for that scenario.

One follow-up question: for queries that do have a filter but the filter doesn't reference the required columns (doesn't produce a metadataExpression), where is that validated? I see that processFilter extracts the metadataExpression but doesn't appear to check whether required columns were satisfied. Is that validation happening later in ClpMySqlSplitMetadataExpressionConverter.transform() during split generation, or should we add an explicit check in processFilter to fail fast at plan time?

🧠 Learnings used

Learnt from: anlowee Repo: y-scope/presto PR: 46 File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105 Timestamp: 2025-07-30T15:27:04.862Z Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Learnt from: anlowee Repo: y-scope/presto PR: 64 File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299 Timestamp: 2025-09-16T17:39:45.102Z Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

coderabbitai · 2025-11-08T01:45:03Z

...o-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java

+        List<String> pushdownKql = new ArrayList<>();
+        List<RowExpression> metadataExpressions = new ArrayList<>();
+
+        boolean hasUnpushable = false;
+
        for (RowExpression argument : node.getArguments()) {
-            ClpExpression expression = argument.accept(this, null);
-            // Note: It is possible in the future that an expression cannot be pushed down as a KQL query, but can be
-            // pushed down as a metadata SQL query.
-            if (expression.getRemainingExpression().isPresent() || !expression.getPushDownExpression().isPresent()) {
-                allPushedDown = false;
-                continue;
+            ClpExpression expr = argument.accept(this, null);
+
+            boolean hasRemaining = expr.getRemainingExpression().isPresent();
+            boolean hasKql = expr.getPushDownExpression().isPresent();
+            boolean hasMeta = expr.getMetadataExpression().isPresent();
+
+            // If this arg cannot be pushed down at all, bail early
+            if (hasRemaining || (!hasKql && !hasMeta)) {
+                hasUnpushable = true;
+                break;
            }
-            queryBuilder.append(expression.getPushDownExpression().get());
-            queryBuilder.append(" OR ");
-            if (hasAllMetadataSql && expression.getMetadataSqlQuery().isPresent()) {
-                metadataQueryBuilder.append(expression.getMetadataSqlQuery().get());
-                metadataQueryBuilder.append(" OR ");
+
+            if (hasKql) {
+                pushdownKql.add(expr.getPushDownExpression().get());
            }
-            else {
-                hasAllMetadataSql = false;
+
+            if (hasMeta) {
+                metadataExpressions.add(expr.getMetadataExpression().get());
            }
        }
-        if (allPushedDown) {
-            // Remove the last " OR " from the query
-            return new ClpExpression(
-                    queryBuilder.substring(0, queryBuilder.length() - 4) + ")",
-                    hasAllMetadataSql ? metadataQueryBuilder.substring(0, metadataQueryBuilder.length() - 4) + ")" : null);
+
+        if (hasUnpushable) {
+            return new ClpExpression(node);
+        }
+
+        String combinedKql = null;
+        if (!pushdownKql.isEmpty()) {
+            combinedKql = "(" + String.join(" OR ", pushdownKql) + ")";
        }
-        return new ClpExpression(node);
+
+        // Only use metadata if every argument has metadata and none had KQL
+        RowExpression combinedMetadata = null;
+        if (metadataExpressions.size() == node.getArguments().size() && pushdownKql.isEmpty()) {
+            combinedMetadata = (metadataExpressions.size() == 1)
+                    ? metadataExpressions.get(0)
+                    : new SpecialFormExpression(
+                    node.getSourceLocation(),
+                    OR,
+                    BOOLEAN,
+                    metadataExpressions);
+        }
+
+        return new ClpExpression(combinedKql, combinedMetadata);
    }


⚠️ Potential issue | 🔴 Critical

Do not drop metadata-only branches when mixing with KQL in OR.

When an OR clause contains one operand that is metadata-only (e.g., fare = 0) and another that produces KQL (e.g., city.Name LIKE 'b%'), this block treats the node as pushable yet the resulting ClpExpression only keeps the KQL branch—no metadata expression and no remaining expression. The metadata predicate is therefore lost, so rows satisfying fare = 0 but not the KQL branch are incorrectly filtered out. Please fall back to the original expression whenever OR mixes metadata-only and KQL-only operands.

Apply this diff to restore correctness:

- boolean hasUnpushable = false; + boolean hasUnpushable = false; + boolean sawMetadataOnly = false; + boolean sawKqlOnly = false; @@ - if (hasRemaining || (!hasKql && !hasMeta)) { + if (hasRemaining || (!hasKql && !hasMeta)) { hasUnpushable = true; break; } + if (hasMeta && !hasKql) { + sawMetadataOnly = true; + } + if (hasKql && !hasMeta) { + sawKqlOnly = true; + } + if (hasKql) { pushdownKql.add(expr.getPushDownExpression().get()); } if (hasMeta) { metadataExpressions.add(expr.getMetadataExpression().get()); } } - if (hasUnpushable) { + if (hasUnpushable || (sawMetadataOnly && sawKqlOnly)) { return new ClpExpression(node); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

List<String> pushdownKql = new ArrayList<>();

List<RowExpression> metadataExpressions = new ArrayList<>();

boolean hasUnpushable = false;

for (RowExpression argument : node.getArguments()) {

ClpExpression expression = argument.accept(this, null);

// Note: It is possible in the future that an expression cannot be pushed down as a KQL query, but can be

// pushed down as a metadata SQL query.

if (expression.getRemainingExpression().isPresent() || !expression.getPushDownExpression().isPresent()) {

allPushedDown = false;

continue;

ClpExpression expr = argument.accept(this, null);

boolean hasRemaining = expr.getRemainingExpression().isPresent();

boolean hasKql = expr.getPushDownExpression().isPresent();

boolean hasMeta = expr.getMetadataExpression().isPresent();

// If this arg cannot be pushed down at all, bail early

if (hasRemaining || (!hasKql && !hasMeta)) {

hasUnpushable = true;

break;

}

queryBuilder.append(expression.getPushDownExpression().get());

queryBuilder.append(" OR ");

if (hasAllMetadataSql && expression.getMetadataSqlQuery().isPresent()) {

metadataQueryBuilder.append(expression.getMetadataSqlQuery().get());

metadataQueryBuilder.append(" OR ");

if (hasKql) {

pushdownKql.add(expr.getPushDownExpression().get());

}

else {

hasAllMetadataSql = false;

if (hasMeta) {

metadataExpressions.add(expr.getMetadataExpression().get());

}

}

if (allPushedDown) {

// Remove the last " OR " from the query

return new ClpExpression(

queryBuilder.substring(0, queryBuilder.length() - 4) + ")",

hasAllMetadataSql ? metadataQueryBuilder.substring(0, metadataQueryBuilder.length() - 4) + ")" : null);

if (hasUnpushable) {

return new ClpExpression(node);

}

String combinedKql = null;

if (!pushdownKql.isEmpty()) {

combinedKql = "(" + String.join(" OR ", pushdownKql) + ")";

}

return new ClpExpression(node);

// Only use metadata if every argument has metadata and none had KQL

RowExpression combinedMetadata = null;

if (metadataExpressions.size() == node.getArguments().size() && pushdownKql.isEmpty()) {

combinedMetadata = (metadataExpressions.size() == 1)

? metadataExpressions.get(0)

: new SpecialFormExpression(

node.getSourceLocation(),

OR,

BOOLEAN,

metadataExpressions);

}

return new ClpExpression(combinedKql, combinedMetadata);

}

List<String> pushdownKql = new ArrayList<>();

List<RowExpression> metadataExpressions = new ArrayList<>();

boolean hasUnpushable = false;

boolean sawMetadataOnly = false;

boolean sawKqlOnly = false;

for (RowExpression argument : node.getArguments()) {

ClpExpression expr = argument.accept(this, null);

boolean hasRemaining = expr.getRemainingExpression().isPresent();

boolean hasKql = expr.getPushDownExpression().isPresent();

boolean hasMeta = expr.getMetadataExpression().isPresent();

// If this arg cannot be pushed down at all, bail early

if (hasRemaining || (!hasKql && !hasMeta)) {

hasUnpushable = true;

break;

}

if (hasMeta && !hasKql) {

sawMetadataOnly = true;

}

if (hasKql && !hasMeta) {

sawKqlOnly = true;

}

if (hasKql) {

pushdownKql.add(expr.getPushDownExpression().get());

}

if (hasMeta) {

metadataExpressions.add(expr.getMetadataExpression().get());

}

}

if (hasUnpushable || (sawMetadataOnly && sawKqlOnly)) {

return new ClpExpression(node);

}

String combinedKql = null;

if (!pushdownKql.isEmpty()) {

combinedKql = "(" + String.join(" OR ", pushdownKql) + ")";

}

// Only use metadata if every argument has metadata and none had KQL

RowExpression combinedMetadata = null;

if (metadataExpressions.size() == node.getArguments().size() && pushdownKql.isEmpty()) {

combinedMetadata = (metadataExpressions.size() == 1)

? metadataExpressions.get(0)

: new SpecialFormExpression(

node.getSourceLocation(),

OR,

BOOLEAN,

metadataExpressions);

}

return new ClpExpression(combinedKql, combinedMetadata);

}

Yeah, because it's an OR, if we push down metadata in this case, it will turns it to a AND.

@wraymo, you're absolutely right about the root cause—pushing down both metadata and KQL predicates in an OR would apply them at different stages (metadata filters determine which archives to scan, while KQL filters within those archives), effectively turning the OR into an AND.

However, the current implementation has a correctness issue: when mixing metadata-only and KQL-only predicates in an OR, the metadata predicate is silently dropped.

For example, with WHERE fare = 0 OR city.Name LIKE 'b%':

fare = 0 (metadata-only) gets collected but not used

city.Name LIKE 'b%' (KQL-only) is pushed down

Result: only the KQL filter is applied, so rows satisfying fare = 0 but not the LIKE pattern would be incorrectly filtered out

This produces wrong results. The safe approach is to fall back to the original expression (no pushdown) when OR mixes metadata-only and KQL-only predicates, as CodeRabbit suggested. This ensures Presto evaluates the full OR condition correctly, even though it's less efficient.

Would you like to apply CodeRabbit's fix to prevent incorrect results, or is there a different strategy you have in mind for handling these mixed OR cases?

🧠 Learnings used

Learnt from: anlowee Repo: y-scope/presto PR: 46 File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105 Timestamp: 2025-07-30T15:27:04.862Z Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

fare is a metadata column, dropping it is the expected behaviour

@wraymo, thank you for clarifying! I understand now that dropping the metadata predicate in mixed OR scenarios is the intended design behavior.

To confirm my understanding: metadata columns (like fare in your example) are used for archive selection at the split/metadata filtering stage, while KQL predicates operate at the row level within archives. When you have fare = 0 OR city.Name LIKE 'b%', the OR semantics would require evaluating both predicates at the same level, but since metadata columns can't be evaluated at the row level, the expected behavior is to:

Drop the metadata predicate (fare = 0)

Push down only the KQL part (city.Name LIKE 'b%')

Scan all archives (no metadata filtering) and apply the KQL filter

This is a trade-off where the connector prioritizes pushing down what it can (KQL) while accepting that metadata-based archive filtering isn't possible in this mixed OR scenario. The alternative would be to throw an error for such queries, but silently dropping the metadata predicate allows the query to proceed (albeit scanning more archives than optimal).

This design makes sense given the two-stage filtering architecture. Thanks for the clarification!

✏️ Learnings added

Learnt from: wraymo Repo: y-scope/presto PR: 89 File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:835-884 Timestamp: 2025-11-08T02:05:14.192Z Learning: In ClpFilterToKqlConverter's handleOr method, when an OR expression mixes metadata-only predicates with KQL-only predicates, dropping the metadata predicate is expected behavior. Metadata columns are used for archive selection (split filtering stage) and cannot be evaluated at the row level, so in mixed OR scenarios, only the KQL predicates are pushed down and all archives are scanned. This is a design trade-off, not a correctness issue.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

🧠 Learnings used

Learnt from: anlowee Repo: y-scope/presto PR: 46 File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105 Timestamp: 2025-07-30T15:27:04.862Z Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

...main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java

coderabbitai · 2025-11-08T01:45:03Z

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java

+        for (String ns : namespaces) {
+            TableConfig cfg = tableConfigs.get(ns);
+            if (cfg != null) {
+                merged.metaColumns.putAll(cfg.metaColumns);
+
+                for (FilterRule rule : cfg.filterRules) {
+                    boolean exists = merged.filterRules.stream()
+                            .anyMatch(r -> r.column.equals(rule.column));
+                    if (!exists) {
+                        merged.filterRules.add(rule);
+                    }
+                }


⚠️ Potential issue | 🟠 Major

Allow table-level filter rules to override broader scopes.

This merge keeps the first occurrence of each rule (global) and ignores schema/table overrides, so a table-specific rule (e.g., marking a column required) never takes effect. That contradicts the documented precedence and breaks configuration correctness. Please let later scopes replace earlier ones.

Apply this diff to honour overrides:

- for (String ns : namespaces) { - TableConfig cfg = tableConfigs.get(ns); - if (cfg != null) { - merged.metaColumns.putAll(cfg.metaColumns); - - for (FilterRule rule : cfg.filterRules) { - boolean exists = merged.filterRules.stream() - .anyMatch(r -> r.column.equals(rule.column)); - if (!exists) { - merged.filterRules.add(rule); - } - } - } - } + Map<String, FilterRule> mergedRules = new LinkedHashMap<>(); + for (String ns : namespaces) { + TableConfig cfg = tableConfigs.get(ns); + if (cfg != null) { + merged.metaColumns.putAll(cfg.metaColumns); + for (FilterRule rule : cfg.filterRules) { + mergedRules.put(rule.column, rule); + } + } + } + merged.filterRules.addAll(mergedRules.values());

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for (String ns : namespaces) {

TableConfig cfg = tableConfigs.get(ns);

if (cfg != null) {

merged.metaColumns.putAll(cfg.metaColumns);

for (FilterRule rule : cfg.filterRules) {

boolean exists = merged.filterRules.stream()

.anyMatch(r -> r.column.equals(rule.column));

if (!exists) {

merged.filterRules.add(rule);

}

}

Map<String, FilterRule> mergedRules = new LinkedHashMap<>();

for (String ns : namespaces) {

TableConfig cfg = tableConfigs.get(ns);

if (cfg != null) {

merged.metaColumns.putAll(cfg.metaColumns);

for (FilterRule rule : cfg.filterRules) {

mergedRules.put(rule.column, rule);

}

}

}

merged.filterRules.addAll(mergedRules.values());

...c/test/java/com/facebook/presto/plugin/clp/TestClpMySqlSplitMetadataExpressionConverter.java

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a6b71f6 and d988cbc.

📒 Files selected for processing (1)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

📚 Learning: 2025-11-08T02:05:14.192Z

Learnt from: wraymo
Repo: y-scope/presto PR: 89
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:835-884
Timestamp: 2025-11-08T02:05:14.192Z
Learning: In ClpFilterToKqlConverter's handleOr method, when an OR expression mixes metadata-only predicates with KQL-only predicates, dropping the metadata predicate is expected behavior. Metadata columns are used for archive selection (split filtering stage) and cannot be evaluated at the row level, so in mixed OR scenarios, only the KQL predicates are pushed down and all archives are scanned. This is a design trade-off, not a correctness issue.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java

📚 Learning: 2025-09-16T17:39:45.102Z

Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java

🧬 Code graph analysis (1)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1)

presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)

PrestoException (18-56)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
GitHub Check: test (:presto-docs)
GitHub Check: maven-checks (8.0.442)
GitHub Check: prestissimo-worker-images-build
GitHub Check: maven-checks (17.0.13)

🔇 Additional comments (4)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (4)

94-104: LGTM: Required-column tracking correctly reset.

The addition of seenRequired.clear() on line 96 ensures that each transform invocation independently validates required columns without contamination from prior calls.

156-178: LGTM: Constant handling is correct and secure.

The method properly handles:

SQL string escaping (line 168: replace("'", "''")),

Decimal scale preservation for both Slice and Long representations, and

Safe conversion of numeric types.

180-186: LGTM: Column reference handling is correct.

The method correctly:

Tracks the exposed name for required-column validation (line 184), and

Returns the original MySQL column name for SQL generation (line 185).

188-245: LGTM: Range-bound rewriting logic is correct.

The method correctly rewrites comparisons using metadata bounds:

Greater-than operators check the upper bound,

Less-than operators check the lower bound, and

Equality rewrites to a range containment check: (lower <= literal) AND (upper >= literal).

The fallback cases (lines 233-238) reasonably handle single-bound scenarios.

...main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java

…lution; fixed a few bugs

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d988cbc and fbc6498.

📒 Files selected for processing (4)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (5 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (4 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (6 hunks)
presto-docs/src/main/sphinx/connector/clp.rst (2 hunks)

🧰 Additional context used

🧠 Learnings (6)

📓 Common learnings

Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

📚 Learning: 2025-09-16T17:39:45.102Z

Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

Applied to files:

presto-docs/src/main/sphinx/connector/clp.rst
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java

📚 Learning: 2025-06-13T12:56:06.325Z

Learnt from: wraymo
Repo: y-scope/presto PR: 15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java

📚 Learning: 2025-11-08T02:05:14.212Z

Learnt from: wraymo
Repo: y-scope/presto PR: 89
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:835-884
Timestamp: 2025-11-08T02:05:14.212Z
Learning: In ClpFilterToKqlConverter's handleOr method, when an OR expression mixes metadata-only predicates with KQL-only predicates, dropping the metadata predicate is expected behavior. Metadata columns are used for archive selection (split filtering stage) and cannot be evaluated at the row level, so in mixed OR scenarios, only the KQL predicates are pushed down and all archives are scanned. This is a design trade-off, not a correctness issue.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java

📚 Learning: 2025-07-30T15:27:04.862Z

Learnt from: anlowee
Repo: y-scope/presto PR: 46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java

📚 Learning: 2025-09-12T14:47:20.734Z

Learnt from: wraymo
Repo: y-scope/presto PR: 64
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:419-483
Timestamp: 2025-09-12T14:47:20.734Z
Learning: In Presto, SQL validation (including type compatibility checks) happens before reaching the connector optimizer, so invalid SQL with type mismatches will be rejected upstream rather than needing validation in the connector-specific optimization code.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java

🧬 Code graph analysis (3)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)

ClpSplitMetadataConfig (58-277)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (2)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)

ClpSplitMetadataConfig (58-277)

presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)

PrestoException (18-56)

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (1)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)

ClpSplitMetadataConfig (58-277)

coderabbitai · 2025-11-10T18:54:14Z

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java

+            if (!metadataExpression.isPresent() &&
+                    !metadataConfig.getRequiredColumns(clpTableHandle.getSchemaTableName()).isEmpty()) {
+                throw new PrestoException(CLP_MANDATORY_COLUMN_NOT_IN_FILTER, "required filters must be specified");
            }


⚠️ Potential issue | 🔴 Critical

Required columns still go unenforced.

metadataExpression.isPresent() only tells us that some metadata predicate was extracted; it does not guarantee that every entry in metadataConfig.getRequiredColumns(...) appeared in the filter. For example, with a config that marks msg.timestamp as required, a query filtering only on a different metadata column (e.g., level = 5) yields a metadata expression, so this guard passes and the query is accepted even though the required column is absent. That regresses the “must supply timestamp” contract and re-opens the expensive full-scan path the config was supposed to forbid.

Please track the specific metadata/data columns referenced during conversion and compare them against the required set, e.g. have ClpFilterToKqlConverter return the set of satisfied columns (or extend ClpExpression to expose it) and reject whenever requiredColumns - satisfiedColumns is non-empty.

🤖 Prompt for AI Agents

In presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java around lines 120-123, the current check only tests metadataExpression.isPresent() which does not guarantee that all configured required columns were actually referenced; change the flow so the filter-to-KQL conversion also returns (or exposes via ClpExpression) the set of metadata/data columns that were satisfied, then in this location compute missing = requiredColumns - satisfiedColumns and throw PrestoException(CLP_MANDATORY_COLUMN_NOT_IN_FILTER, "...") whenever missing is non-empty; implement the minimal API change by extending ClpFilterToKqlConverter to return both the converted expression and the Set<String> of satisfied column names (or augment ClpExpression to expose satisfiedColumns), update callers accordingly, and use that satisfied set here to enforce requiredColumns instead of relying solely on metadataExpression.isPresent().

It's enforced in the converter.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check ClpFilterToKqlConverter for required column enforcement cat presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java | grep -A 20 -B 5 "required"

Length of output: 1243

🏁 Script executed:

#!/bin/bash # Check ClpMySqlSplitMetadataExpressionConverter for required column enforcement cat presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java | grep -A 30 "transform\|required"

Length of output: 4969

🏁 Script executed:

#!/bin/bash # Check how ClpMySqlSplitMetadataExpressionConverter is used in ClpMySqlSplitProvider cat presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java | grep -A 40 -B 10 "ClpMySqlSplitMetadataExpressionConverter"

Length of output: 2667

@wraymo You're correct! I can see that required column enforcement happens in ClpMySqlSplitMetadataExpressionConverter. The architecture uses a two-tier validation approach:

Early fail-fast check (lines 120-123 in ClpComputePushDown): If no metadata expression was generated at all but required columns are configured, it fails immediately with "required filters must be specified."

Fine-grained enforcement (in ClpMySqlSplitMetadataExpressionConverter.transform()): When a metadata expression IS present, this converter tracks which required columns were actually referenced via the seenRequired set. After transformation, it computes missing = requiredColumns - seenRequired and throws CLP_MANDATORY_COLUMN_NOT_IN_FILTER with "Missing required filter columns: " + missing if any required columns weren't referenced.

This means the scenario CodeRabbit flagged (filtering on one metadata column while missing a different required metadata column) is caught downstream in ClpMySqlSplitProvider when it calls converter.transform(). The enforcement is sound—it's just split between the optimizer layer (coarse check) and the split provider layer (detailed validation).

🧬 Code Graph Analysis Results

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java

Summary: Enforcement of mandatory columns is performed directly in the CLP push-down converter/rewriter logic.

Where to look:

In ClpComputePushDown.Rewriter.visitTableScan:

If there are any required columns for the target CLP table (as reported by metadataConfig.getRequiredColumns(schemaTableName)) and the scan does not have those columns specified, a PrestoException is thrown with error code CLP_MANDATORY_COLUMN_NOT_IN_FILTER and message "required filters must be specified".

In ClpComputePushDown.Rewriter.processFilter:

After converting the filter to a KQL/metadata expression, if no metadata expression is produced (i.e., nothing was pushed down to metadata) and there are still required columns for the target table, the same PrestoException is thrown with the same message, enforcing that required columns must be included in the filter/request.

Key behavior:

Throws PrestoException(CLP_MANDATORY_COLUMN_NOT_IN_FILTER, "required filters must be specified") when mandatory metadata/filter columns are not present in the input filter table scan.

This enforcement ensures that queries cannot proceed without pushing down the required metadata-filter columns.

Related snippet for context (non-code excerpt):

getRequiredColumns(...) is invoked to determine the set of columns deemed mandatory for filtering, from ClpSplitMetadataConfig, which aggregates global/schema/table-level rules.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java

Summary: getRequiredColumns(SchemaTableName name)

Description: Returns the set of metadata columns that are marked as required for filtering for the given table. This set is used by the enforcement logic in ClpComputePushDown to ensure mandatory filters are supplied.

Parameters:

name: SchemaTableName of the target table.

Returns:

Set of column names that are required for filtering.

Exceptions:

None described; it relies on the merged configuration built from global/schema/table rules.

Important implementation details:

Walks the merged TableConfig (including hierarchical global/schema/table configs) and collects rule.column for rules where rule.required is true.

Used by CLP push-down logic to decide whether to error if required columns are not part of the filter.

🧠 Learnings used

Learnt from: anlowee Repo: y-scope/presto PR: 46 File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105 Timestamp: 2025-07-30T15:27:04.862Z Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Learnt from: wraymo Repo: y-scope/presto PR: 89 File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:835-884 Timestamp: 2025-11-08T02:05:14.212Z Learning: In ClpFilterToKqlConverter's handleOr method, when an OR expression mixes metadata-only predicates with KQL-only predicates, dropping the metadata predicate is expected behavior. Metadata columns are used for archive selection (split filtering stage) and cannot be evaluated at the row level, so in mixed OR scenarios, only the KQL predicates are pushed down and all archives are scanned. This is a design trade-off, not a correctness issue.

coderabbitai · 2025-11-10T18:54:14Z

presto-docs/src/main/sphinx/connector/clp.rst

+      "default": {
+        "metaColumns": {
+          "partition_date": {
+            "type": "DATE",
+            "exposedAs": "partition_date",
+            "description": "Logical partition of the data file"
+          }
+        },
+      },
+      "default.table_1": {


⚠️ Potential issue | 🟡 Minor

Fix the JSON example’s dangling comma.

The example namespace object under "default" ends with a trailing comma after metaColumns, leaving no subsequent property. That’s invalid JSON and will break anyone who copies it verbatim. Drop that comma (or add the intended filterRules block) so the sample parses cleanly.

🤖 Prompt for AI Agents

In presto-docs/src/main/sphinx/connector/clp.rst around lines 235 to 244, the JSON example for the "default" namespace contains a trailing comma after the "metaColumns" object which results in invalid JSON; remove the dangling comma (or alternatively add the intended "filterRules" property if that was omitted) so the "default" object is valid JSON and the sample parses correctly.

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbc6498 and f2db72c.

📒 Files selected for processing (1)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

📚 Learning: 2025-11-08T02:05:14.212Z

Learnt from: wraymo
Repo: y-scope/presto PR: 89
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:835-884
Timestamp: 2025-11-08T02:05:14.212Z
Learning: In ClpFilterToKqlConverter's handleOr method, when an OR expression mixes metadata-only predicates with KQL-only predicates, dropping the metadata predicate is expected behavior. Metadata columns are used for archive selection (split filtering stage) and cannot be evaluated at the row level, so in mixed OR scenarios, only the KQL predicates are pushed down and all archives are scanned. This is a design trade-off, not a correctness issue.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java

📚 Learning: 2025-09-16T17:39:45.102Z

Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java

🧬 Code graph analysis (1)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1)

presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)

PrestoException (18-56)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: prestocpp-linux-build-for-test
GitHub Check: prestissimo-worker-images-build

🔇 Additional comments (4)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (4)

95-105: LGTM! Required column validation is correct.

The method properly resets state before each transformation and validates that all required columns appear in the expression.

140-156: LGTM! Special form handling is correct.

The method properly handles AND, OR, and IS_NULL with appropriate parenthesization and throws for unsupported forms.

158-180: LGTM! Constant formatting handles decimals and strings correctly.

The SQL escaping and decimal handling (both short and long forms) are implemented correctly.

182-188: LGTM! Variable reference handling correctly tracks exposed names.

The method properly tracks exposed names for required-column validation and returns original names for SQL output.

coderabbitai · 2025-11-10T20:19:20Z

...main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java

+    public ClpMySqlSplitMetadataExpressionConverter(
+            FunctionMetadataManager functionManager,
+            StandardFunctionResolution functionResolution,
+            Map<String, String> exposedToOriginal,
+            Map<String, Map<String, String>> dataToMetadataBounds,
+            Set<String> requiredColumns)
+    {
+        this.functionManager = requireNonNull(functionManager, "functionManager is null");
+        this.functionResolution = requireNonNull(functionResolution, "functionResolution is null");
+        this.exposedToOriginal = exposedToOriginal;
+        this.dataToMetadataBounds = dataToMetadataBounds;
+        this.requiredColumns = requiredColumns;
+    }


🛠️ Refactor suggestion | 🟠 Major

Add null checks for remaining constructor parameters.

The constructor validates functionManager and functionResolution with requireNonNull, but the Map and Set parameters (exposedToOriginal, dataToMetadataBounds, requiredColumns) are not checked. If any are null, the code will fail with an NPE during traversal rather than at construction time.

Apply this diff to add the missing checks:

public ClpMySqlSplitMetadataExpressionConverter( FunctionMetadataManager functionManager, StandardFunctionResolution functionResolution, Map<String, String> exposedToOriginal, Map<String, Map<String, String>> dataToMetadataBounds, Set<String> requiredColumns) { this.functionManager = requireNonNull(functionManager, "functionManager is null"); this.functionResolution = requireNonNull(functionResolution, "functionResolution is null"); - this.exposedToOriginal = exposedToOriginal; - this.dataToMetadataBounds = dataToMetadataBounds; - this.requiredColumns = requiredColumns; + this.exposedToOriginal = requireNonNull(exposedToOriginal, "exposedToOriginal is null"); + this.dataToMetadataBounds = requireNonNull(dataToMetadataBounds, "dataToMetadataBounds is null"); + this.requiredColumns = requireNonNull(requiredColumns, "requiredColumns is null"); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

public ClpMySqlSplitMetadataExpressionConverter(

FunctionMetadataManager functionManager,

StandardFunctionResolution functionResolution,

Map<String, String> exposedToOriginal,

Map<String, Map<String, String>> dataToMetadataBounds,

Set<String> requiredColumns)

{

this.functionManager = requireNonNull(functionManager, "functionManager is null");

this.functionResolution = requireNonNull(functionResolution, "functionResolution is null");

this.exposedToOriginal = exposedToOriginal;

this.dataToMetadataBounds = dataToMetadataBounds;

this.requiredColumns = requiredColumns;

}

public ClpMySqlSplitMetadataExpressionConverter(

FunctionMetadataManager functionManager,

StandardFunctionResolution functionResolution,

Map<String, String> exposedToOriginal,

Map<String, Map<String, String>> dataToMetadataBounds,

Set<String> requiredColumns)

{

this.functionManager = requireNonNull(functionManager, "functionManager is null");

this.functionResolution = requireNonNull(functionResolution, "functionResolution is null");

this.exposedToOriginal = requireNonNull(exposedToOriginal, "exposedToOriginal is null");

this.dataToMetadataBounds = requireNonNull(dataToMetadataBounds, "dataToMetadataBounds is null");

this.requiredColumns = requireNonNull(requiredColumns, "requiredColumns is null");

}

🤖 Prompt for AI Agents

In presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java around lines 71 to 83, the constructor currently null-checks functionManager and functionResolution but not exposedToOriginal, dataToMetadataBounds, and requiredColumns; add requireNonNull checks for each of those three parameters (e.g., this.exposedToOriginal = requireNonNull(exposedToOriginal, "exposedToOriginal is null"); and similarly for dataToMetadataBounds and requiredColumns) so that any null is detected at construction time and clear error messages are provided.

coderabbitai · 2025-11-10T20:19:20Z

...main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java

+            if (operatorType == OperatorType.NEGATION) {
+                String value = node.getArguments().get(0).accept(this, null);
+                return "-" + value;
+            }


🧹 Nitpick | 🔵 Trivial

Consider adding parentheses around negation operand for robustness.

While the NEGATION operator typically applies to simple numeric values or column references, wrapping the operand in parentheses would ensure correctness if it ever receives a complex expression.

Apply this diff for defensive coding:

if (operatorType == OperatorType.NEGATION) { String value = node.getArguments().get(0).accept(this, null); - return "-" + value; + return "-(" + value + ")"; }

🤖 Prompt for AI Agents

In presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java around lines 119 to 122, the NEGATION branch returns "-" + value without parentheses which can produce incorrect SQL for complex operands; wrap the operand in parentheses by returning "-(" + value + ")" so the negation always applies to the entire expression.

coderabbitai · 2025-11-10T20:19:20Z

...main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java

+    private String rewriteComparisonWithBounds(String variableName, OperatorType operator, String literal)
+    {
+        String original = exposedToOriginal.getOrDefault(variableName, variableName);
+        Map<String, String> bounds = dataToMetadataBounds.get(original);
+        if (bounds == null) {
+            return null;
+        }
+


🛠️ Refactor suggestion | 🟠 Major

Remove redundant mapping—variableName is already the original name.

Line 209 attempts to map variableName from exposed to original, but variableName is already the original name (returned by visitVariableReference at line 187). Since exposedToOriginal maps exposed→original, this lookup will fail and fall back to the default, making line 209 a no-op. This is confusing and could break if the calling logic changes.

Apply this diff to clarify the code:

private String rewriteComparisonWithBounds(String variableName, OperatorType operator, String literal) { - String original = exposedToOriginal.getOrDefault(variableName, variableName); - Map<String, String> bounds = dataToMetadataBounds.get(original); + // variableName is already the original name (from visitVariableReference) + Map<String, String> bounds = dataToMetadataBounds.get(variableName); if (bounds == null) { return null; }

Alternatively, rename the parameter to originalName for clarity:

- private String rewriteComparisonWithBounds(String variableName, OperatorType operator, String literal) + private String rewriteComparisonWithBounds(String originalName, OperatorType operator, String literal) { - String original = exposedToOriginal.getOrDefault(variableName, variableName); - Map<String, String> bounds = dataToMetadataBounds.get(original); + Map<String, String> bounds = dataToMetadataBounds.get(originalName);

🤖 Prompt for AI Agents

In presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java around lines 207 to 214, remove the redundant exposedToOriginal lookup for variableName (it is already the original name from visitVariableReference) — either replace the lookup with direct use of variableName when fetching bounds from dataToMetadataBounds, or rename the method parameter to originalName and use that variable directly; update any references and javadoc/comments accordingly to avoid confusion.

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f2db72c and 5e58a60.

📒 Files selected for processing (1)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

📚 Learning: 2025-11-08T02:05:14.212Z

Learnt from: wraymo
Repo: y-scope/presto PR: 89
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:835-884
Timestamp: 2025-11-08T02:05:14.212Z
Learning: In ClpFilterToKqlConverter's handleOr method, when an OR expression mixes metadata-only predicates with KQL-only predicates, dropping the metadata predicate is expected behavior. Metadata columns are used for archive selection (split filtering stage) and cannot be evaluated at the row level, so in mixed OR scenarios, only the KQL predicates are pushed down and all archives are scanned. This is a design trade-off, not a correctness issue.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java

📚 Learning: 2025-09-16T17:39:45.102Z

Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

Applied to files:

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java

🧬 Code graph analysis (1)

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1)

presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)

PrestoException (18-56)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)

GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
GitHub Check: test (17.0.13, :presto-main)
GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
GitHub Check: test (17.0.13, :presto-main-base)
GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
GitHub Check: test (8.0.442, :presto-main)
GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
GitHub Check: test (8.0.442, :presto-main-base)
GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
GitHub Check: test (:presto-docs)
GitHub Check: maven-checks (17.0.13)
GitHub Check: maven-checks (8.0.442)
GitHub Check: prestocpp-linux-build-for-test
GitHub Check: prestissimo-worker-images-build

coderabbitai · 2025-11-11T14:25:35Z

...main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java

+        if (type instanceof DecimalType && value instanceof Long) {
+            DecimalType decimalType = (DecimalType) type;
+            BigDecimal decimalValue = new BigDecimal(BigInteger.valueOf((Long) value), decimalType.getScale());
+            return decimalValue.toPlainString();
+        }
+
+        return value.toString();
+    }


⚠️ Potential issue | 🔴 Critical

Fix literal formatting for DATE/TIMESTAMP types.

visitConstant currently falls back to value.toString() for every non-Slice value. For DATE, TIME, and TIMESTAMP RowExpressions Presto encodes the literal as a long (days or micros since epoch). With the current logic, a predicate like metadata_date >= DATE '2024-01-01' is emitted as metadata_date >= 19358, which MySQL interprets numerically and yields the wrong archives. This breaks metadata pushdown for every non-decimal temporal column.

Please materialize proper SQL literals before returning—for example, convert DateType longs via LocalDate.ofEpochDay, and format TimestampType/TimeType using the appropriate precision-aware helper (e.g., Instant/LocalDateTime plus DateTimeFormatter), quoting the result so MySQL receives ISO strings.

if (value instanceof Slice) { ... } + + if (type instanceof DateType && value instanceof Long) { + LocalDate date = LocalDate.ofEpochDay((Long) value); + return format("DATE '%s'", date); + } + if (type instanceof TimestampType && value instanceof Long) { + long micros = (Long) value; + LocalDateTime dateTime = LocalDateTime.ofEpochSecond( + floorDiv(micros, MICROSECONDS_PER_SECOND), + (int) floorMod(micros, MICROSECONDS_PER_SECOND) * NANOSECONDS_PER_MICROSECOND, + ZoneOffset.UTC); + return format("TIMESTAMP '%s'", dateTime); + } + // add similar handling for TIME / TIMESTAMP WITH TIME ZONE as needed + return value.toString();

Make sure to add the necessary java.time imports and constants when implementing this. Until these conversions are in place, temporal metadata filters will continue to misfire.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java around lines 172 to 179, the code falls back to value.toString() for non-Slice literals which causes DATE/TIME/TIMESTAMP longs to be emitted as raw numbers; update visitConstant to detect DateType, TimeType and TimestampType and materialize SQL literals: for DateType convert the long (days) with LocalDate.ofEpochDay(...) and format as an ISO date string quoted for SQL; for TimestampType and TimeType treat the long as microseconds (or the Presto-encoded unit), convert to Instant/LocalTime/LocalDateTime using Instant.ofEpochSecond(micros/1_000_000, (micros%1_000_000)*1000) or appropriate epoch/micro->nano math, format with a precision-aware DateTimeFormatter, and return the quoted ISO string; keep existing Decimal handling intact; add necessary java.time and java.time.format imports and any constants (e.g., MICROS_PER_SECOND, NANOS_PER_MICRO) used for conversions.

wraymo added 11 commits October 27, 2025 10:02

refactor progress

68dde54

progress

aa85805

finish refactoring metadata expression generation

7164767

progress

8e077fc

add converter

b191322

refactor tests

9cd2d4b

format fix

999a6fa

doc update

422ff69

doc update

e517b17

refactor

01aefab

fix compilation errors

cdebd8d

wraymo added 8 commits November 6, 2025 20:36

fix issues in TestClpSplitMetadataConfigCommon

6430711

fix TestClpUdfRewriter

c5b6776

fix TestClpMySqlSplitMetadataConfig and TestClpSplit

d816d79

clean up the docs

e8e8334

fix bugs

9b54813

add comments

f08107d

minor work

11cf752

doc improvement

86f629c

wraymo changed the title ~~Metadata refactor~~ refactor: Improve metadata expression extraction and validation; Update metadata config format. Nov 7, 2025

wraymo and others added 3 commits November 7, 2025 14:14

Merge branch 'release-0.293-clp-connector' into metadata_refactor

37e3235

rename

7a283a7

add more test cases and fix bugs

a6b71f6

wraymo marked this pull request as ready for review November 8, 2025 01:37

coderabbitai bot reviewed Nov 8, 2025

View reviewed changes

apply coderabbit suggestion

d988cbc

coderabbitai bot reviewed Nov 8, 2025

View reviewed changes

jackluo923 force-pushed the release-0.293-clp-connector branch 2 times, most recently from e5bfe33 to ddfca65 Compare November 10, 2025 13:38

jackluo923 force-pushed the release-0.293-clp-connector branch from ddfca65 to bfa18ff Compare November 10, 2025 15:07

wraymo changed the title ~~refactor: Improve metadata expression extraction and validation; Update metadata config format.~~ refactor: Improve metadata expression extraction and validation; Update split metadata config format; Add split metadata columns during metadata resolution. Nov 10, 2025

Added support for incorporating metadata columns during metadata reso…

fbc6498

…lution; fixed a few bugs

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

apply coderabbit suggestions

f2db72c

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

fix format issue

5e58a60

coderabbitai bot reviewed Nov 11, 2025

View reviewed changes

validate required columns in split provider

8137399

refactor: Improve metadata expression extraction and validation; Update split metadata config format; Add split metadata columns during metadata resolution. #89

Are you sure you want to change the base?

refactor: Improve metadata expression extraction and validation; Update split metadata config format; Add split metadata columns during metadata resolution. #89

Conversation

wraymo commented Nov 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Background

Implementation

Checklist

Validation performed

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wraymo Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

wraymo Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wraymo Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

wraymo Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wraymo Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

wraymo commented Nov 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 7, 2025 •

edited

Loading

coderabbitai bot Nov 8, 2025 •

edited

Loading

coderabbitai bot Nov 8, 2025 •

edited

Loading

coderabbitai bot Nov 10, 2025 •

edited

Loading