Skip to content

Conversation

@wraymo
Copy link

@wraymo wraymo commented Nov 7, 2025

Description

Background

To support metadata filtering, we maintain a split metadata configuration file that specifies available metadata columns and the possible mappings between data columns and metadata columns. However, the current design has several major issues:

  • The columnName field is inconsistent. It is intended to represent a metadata column name, but when a range mapping is included, it actually refers to a data column name, which is confusing.
  • Metadata columns lack explicit type definitions.
  • The logic for determining whether a column should be included in a filter is tightly coupled with metadata column definitions, and column name remapping (e.g., exposing a different metadata column name to Presto) is not supported.
  • Metadata column matching and data-to-metadata mapping rely on string manipulation and regex replacement, which is error-prone and fragile.

To address these issues and provide more robust support for metadata column projection, we redesigned the configuration format and reimplemented metadata expression extraction and validation using Presto RowExpression instead of plain strings. Validation and SQL translation are now handled in the SplitProvider, since split filtering logic is inherently tied to the split provider rather than the plan optimizer.

Implementation

This PR includes:

  • New metadata config format

    • Defines metadata columns, exposed names, and range mappings.
    • Supports hierarchical namespaces (global → schema → table).
    • Documented in the updated presto-docs/src/main/sphinx/connector/clp.rst.
  • ClpSplitMetadataConfig

    • Loads and interprets the new config.
    • Provides access to metadata columns and range mappings.
  • ClpMySqlSplitMetadataExpressionConverter

    • Converts RowExpression trees to SQL filters for metadata pushdown.
    • Validates required columns and maps data columns to metadata bounds.
  • Refactored ClpFilterToKqlConverter

    • Uses RowExpression for metadata pushdown instead of strings.
    • Separates metadata pushdown and KQL pushdown for different cases and throws exceptions for unsupported metadata pushdown
    • Modifies the logic of handling AND and OR.
    • Fixes two issues:
      • Added support for non-VARCHAR metadata filters.
      • Correctly handles decimal constants (previously lost scale information).
  • Updated ClpTableLayoutHandle and ClpExpression

    • Store RowExpression instead of strings.
  • Removed old metadata-related classes

    • Replaced with the new config-driven logic.
  • Added split metadata columns in ClpMySqlMetadataProvider::listColumnHandles.

  • Removed ClpMetadataDbSetup.

    • Replaced by the mock database. This class was partially refactored. After the full refactoring, the whole class can be removed.
  • Integrated with ClpMySqlSplitProvider

    • Uses the new converter in listSplits.
  • Updated tests and documentation

    • All tests updated to the new format.
    • Added the new metadata config format documentation.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Passed all unit tests
  • Ran an end-to-end test with the following metadata split config file
{
  "": {
    "metaColumns": {
      "size": {
        "type": "BIGINT"
      }
    }
  },
  "default": {
    "metaColumns": {
      "begin_timestamp": {
        "type": "BIGINT"
      },
      "end_timestamp": {
        "type": "BIGINT"
      }
    },
    "filterRules": [
      {
        "column": "begin_timestamp",
        "required": true,
        "reason": "Full scan would be too expensive without timestamp filtering."
      }
    ]
  }
}

And here's the query results.

presto:default> select to_unixtime(from_iso8601_timestamp(timestamp)), clp_get_json_string() from default where service = 'user-service' and begin_timestamp >= 0 order by timestamp;
     _col0     |                                                                                    _col1                                                                                     
---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1.747148401E9 | {"timestamp":"2025-05-13T15:00:01Z","level":"INFO","service":"user-service","event":{"action":"login","status":"success"},"user":{"id":"12345","email":"[email protected]"}} 
 1.747148401E9 | {"timestamp":"2025-05-13T15:00:01Z","level":"INFO","service":"user-service","event":{"action":"login","status":"success"},"user":{"id":"12345","email":"[email protected]"}} 
 1.747148406E9 | {"timestamp":"2025-05-13T15:00:06Z","level":"INFO","service":"user-service","event":{"action":"profile_update"},"user":{"id":"67890","name":"Jane Doe"}}                     
 1.747148406E9 | {"timestamp":"2025-05-13T15:00:06Z","level":"INFO","service":"user-service","event":{"action":"profile_update"},"user":{"id":"67890","name":"Jane Doe"}}                     
 1.747148411E9 | {"timestamp":"2025-05-13T15:00:11Z","level":"WARN","service":"user-service","event":{"action":"multiple_login_attempts"},"user":{"id":"67890"},"meta":{"attempts":5}}        
 1.747148411E9 | {"timestamp":"2025-05-13T15:00:11Z","level":"WARN","service":"user-service","event":{"action":"multiple_login_attempts"},"user":{"id":"67890"},"meta":{"attempts":5}}        
 1.747148414E9 | {"timestamp":"2025-05-13T15:00:14Z","level":"INFO","service":"user-service","event":{"action":"password_change"},"user":{"id":"12345"}}                                      
 1.747148414E9 | {"timestamp":"2025-05-13T15:00:14Z","level":"INFO","service":"user-service","event":{"action":"password_change"},"user":{"id":"12345"}}                                      
  1.74714842E9 | {"timestamp":"2025-05-13T15:00:20Z","level":"INFO","service":"user-service","event":{"action":"logout"},"user":{"id":"12345"}}                                               
  1.74714842E9 | {"timestamp":"2025-05-13T15:00:20Z","level":"INFO","service":"user-service","event":{"action":"logout"},"user":{"id":"12345"}}                                               
(10 rows)

Query 20251110_183238_00009_pfytp, FINISHED, 1 node
Splits: 4 total, 4 done (100.00%)
[Latency: client-side: 143ms, server-side: 124ms] [10 rows, 0B] [80 rows/s, 0B/s]

presto:default> select to_unixtime(from_iso8601_timestamp(timestamp)), clp_get_json_string() from default where service = 'user-service' and begin_timestamp > 0 order by timestamp;
     _col0     |                                                                                    _col1                                                                                     
---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1.747148401E9 | {"timestamp":"2025-05-13T15:00:01Z","level":"INFO","service":"user-service","event":{"action":"login","status":"success"},"user":{"id":"12345","email":"[email protected]"}} 
 1.747148406E9 | {"timestamp":"2025-05-13T15:00:06Z","level":"INFO","service":"user-service","event":{"action":"profile_update"},"user":{"id":"67890","name":"Jane Doe"}}                     
 1.747148411E9 | {"timestamp":"2025-05-13T15:00:11Z","level":"WARN","service":"user-service","event":{"action":"multiple_login_attempts"},"user":{"id":"67890"},"meta":{"attempts":5}}        
 1.747148414E9 | {"timestamp":"2025-05-13T15:00:14Z","level":"INFO","service":"user-service","event":{"action":"password_change"},"user":{"id":"12345"}}                                      
  1.74714842E9 | {"timestamp":"2025-05-13T15:00:20Z","level":"INFO","service":"user-service","event":{"action":"logout"},"user":{"id":"12345"}}                                               
(5 rows)

Query 20251110_183245_00010_pfytp, FINISHED, 1 node
Splits: 3 total, 3 done (100.00%)
[Latency: client-side: 123ms, server-side: 107ms] [5 rows, 0B] [46 rows/s, 0B/s]

presto:default> select to_unixtime(from_iso8601_timestamp(timestamp)), clp_get_json_string() from default where service = 'user-service'  order by timestamp;
Query 20251110_183255_00011_pfytp failed: required filters must be specified

In this setup, we have two archives containing identical data except for the begin_timestamp metadata column. From the results of the first two queries, we can see that metadata filters are correctly extracted and applied during split filtering. In the third query, the system successfully verifies whether a required metadata column is present in the filter.

Summary by CodeRabbit

  • New Features

    • Introduced split‑metadata configuration (namespaced meta‑columns, exposed mappings, range bounds) and metadata‑expression pushdown; metadata columns are now returned with table listings.
    • Added MySQL metadata expression converter to translate metadata expressions into WHERE clauses for pushdown.
  • Bug Fixes / API

    • Replaced legacy split‑filter API with the split‑metadata model; updated error codes and stricter required‑column validation.
  • Tests

    • Added metadata config/converter tests, updated pushdown tests, removed obsolete split‑filter tests.
  • Documentation

    • Updated connector docs and config examples to the split‑metadata format.
  • Chores

    • Removed an unused JSON library dependency.

@coderabbitai
Copy link

coderabbitai bot commented Nov 7, 2025

Walkthrough

Replaces the split-filter abstraction with a structured split-metadata configuration: adds ClpSplitMetadataConfig and expression-based metadata handling, removes ClpSplitFilterProvider and related files, updates connector/module/optimizer/split-provider/metadata wiring, converts metadata SQL strings to RowExpression-based flows, updates tests/resources and docs, and removes jackson-core from the POM.

Changes

Cohort / File(s) Summary
Dependency Management
presto-clp/pom.xml
Removed jackson-core dependency; other Jackson deps remain.
Configuration
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java
Replaced splitFilterConfig with splitMetadataConfigPath config key and removed SplitFilterProviderType.
Connector & DI Wiring
ClpConnector / ClpConnectorFactory / ClpModule / ClpPlanOptimizerProvider
presto-clp/src/main/java/.../ClpConnector.java, .../ClpConnectorFactory.java, .../ClpModule.java, .../ClpPlanOptimizerProvider.java
Replaced ClpSplitFilterProvider usage with ClpSplitMetadataConfig; updated imports, fields, constructors and bindings; duplicate early TypeManager binding added in factory; module binds ClpSplitMetadataConfig.
Error Codes
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java
Removed split-filter-specific codes; added CLP_SPLIT_METADATA_CONFIG_NOT_FOUND and CLP_MANDATORY_COLUMN_NOT_IN_FILTER (reused numeric codes).
Table/Layout & Expression Types
presto-clp/src/main/java/.../ClpTableLayoutHandle.java, presto-clp/src/main/java/.../ClpExpression.java
Replaced Optional metadata SQL fields with Optional metadataExpression; updated constructors, getters, equals/hashCode/toString and package for ClpExpression.
Optimizer / Pushdown
presto-clp/src/main/java/.../optimization/ClpComputePushDown.java, presto-clp/src/main/java/.../optimization/ClpFilterToKqlConverter.java, presto-clp/src/main/java/.../optimization/ClpPlanOptimizerProvider.java
Swapped split-filter provider for ClpSplitMetadataConfig; refactored pushdown to produce/consume metadata RowExpression; extended filter-to-KQL conversion to produce separate KQL and metadata expressions and accept data-columns-with-range-bounds; removed table-scope state.
Split Metadata Config (new)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java
New class parsing hierarchical JSON (global/schema/table), exposing metadata column types, required columns, exposed→original mappings, data columns with range bounds, and data→metadata range mappings; DI-enabled.
MySQL Split Provider & Converter
presto-clp/src/main/java/.../split/ClpMySqlSplitProvider.java, presto-clp/src/main/java/.../split/ClpMySqlSplitMetadataExpressionConverter.java
ClpMySqlSplitProvider now injected with FunctionMetadataManager, StandardFunctionResolution, and ClpSplitMetadataConfig; when metadataExpression present, builds SQL predicate via new converter which rewrites comparisons to metadata bound columns and formats literals (including decimals).
Removed Filter Abstraction
presto-clp/src/main/java/.../split/filter/ClpSplitFilterProvider.java, .../ClpMySqlSplitFilterProvider.java, .../ClpSplitFilterConfig.java, .../ClpSplitFilterConfigCustomOptionsDeserializer.java
Entire split-filter provider hierarchy, MySQL-specific remapper, split-filter config classes and custom-options deserializer removed.
Metadata Provider
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
Now accepts ClpSplitMetadataConfig and appends configured metadata columns to returned column handles.
Documentation
presto-docs/src/main/sphinx/connector/clp.rst
Reworked docs from split-filter to split-metadata terminology and examples; updated config key to clp.split-metadata-config-path and described namespaces/metaColumns/filterRules model.
Tests — updated / added
presto-clp/src/test/java/.../TestClpFilterToKql.java, .../TestClpSplit.java, .../TestClpUdfRewriter.java, .../TestClpSplitMetadataConfig.java, .../TestClpMySqlSplitMetadataExpressionConverter.java, .../TestClpQueryBase.java
Tests updated to use metadata expressions and ClpSplitMetadataConfig; added tests for split-metadata config and MySQL expression converter; helper to build RowExpression with TypeProvider added.
Tests — removed
presto-clp/src/test/java/.../split/filter/TestClpMySqlSplitFilterConfig.java, .../TestClpSplitFilterConfigCommon.java, presto-clp/src/test/java/.../ClpMetadataDbSetUp.java
Removed legacy split-filter tests and embedded metadata DB setup utility.
Test resources
presto-clp/src/test/resources/test-mysql-split-filter.json (deleted), presto-clp/src/test/resources/test-mysql-split-metadata.json (new)
Replaced split-filter JSON resource with new split-metadata JSON describing metaColumns and filterRules.

Sequence Diagram(s)

sequenceDiagram
  participant Planner as QueryPlanner
  participant Optimizer as ClpComputePushDown
  participant Config as ClpSplitMetadataConfig
  participant Converter as ClpFilterToKqlConverter
  participant SplitProv as ClpMySqlSplitProvider
  participant DB as MySQL

  Note over Planner,Optimizer: Planning phase
  Planner->>Optimizer: plan with filter RowExpression
  Optimizer->>Config: request metadata columns & required columns
  alt metadata pushdown allowed
    Optimizer->>Converter: convert filter -> {KQL?, metadataExpression}
    Converter-->>Optimizer: returns KQL and metadataExpression
    Optimizer->>SplitProv: include metadataExpression in layout
    SplitProv->>Converter: transform metadataExpression -> SQL predicate
    Converter-->>SplitProv: SQL predicate
    SplitProv->>DB: execute archive query + predicate
  else fallback
    Optimizer->>Planner: leave filter as remaining expression
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • ClpComputePushDown.java — control-flow refactor, creation of layout handles with RowExpression and required-column checks.
  • ClpFilterToKqlConverter.java — complex expression translation, dual KQL vs metadata-expression flows, decimal and literal formatting.
  • ClpSplitMetadataConfig.java & ClpMySqlSplitMetadataExpressionConverter.java — JSON parsing/merging precedence, mappings, bound rewriting and SQL generation correctness.
  • Tests & resources — ensure new JSON schema, TypeProvider-RowExpression plumbing and new tests cover previous behaviors removed with split-filter.

Possibly related issues

Possibly related PRs

Suggested reviewers

  • kirkrodrigues

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.58% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: metadata expression extraction/validation improvements, split metadata config format updates, and split metadata columns added during resolution.
Description check ✅ Passed The description is comprehensive and well-structured, covering background, implementation details, validation, and all required template sections with proper documentation and examples.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e58a60 and 8137399.

📒 Files selected for processing (2)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (3 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (5 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.
📚 Learning: 2025-06-13T12:56:06.325Z
Learnt from: wraymo
Repo: y-scope/presto PR: 15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
📚 Learning: 2025-09-16T17:39:45.102Z
Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
📚 Learning: 2025-11-08T02:05:14.212Z
Learnt from: wraymo
Repo: y-scope/presto PR: 89
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:835-884
Timestamp: 2025-11-08T02:05:14.212Z
Learning: In ClpFilterToKqlConverter's handleOr method, when an OR expression mixes metadata-only predicates with KQL-only predicates, dropping the metadata predicate is expected behavior. Metadata columns are used for archive selection (split filtering stage) and cannot be evaluated at the row level, so in mixed OR scenarios, only the KQL predicates are pushed down and all archives are scanned. This is a design trade-off, not a correctness issue.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
📚 Learning: 2025-07-30T15:27:04.862Z
Learnt from: anlowee
Repo: y-scope/presto PR: 46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
📚 Learning: 2025-09-12T14:47:20.734Z
Learnt from: wraymo
Repo: y-scope/presto PR: 64
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:419-483
Timestamp: 2025-09-12T14:47:20.734Z
Learning: In Presto, SQL validation (including type compatibility checks) happens before reaching the connector optimizer, so invalid SQL with type mismatches will be rejected upstream rather than needing validation in the connector-specific optimization code.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
🧬 Code graph analysis (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (2)
presto-native-execution/presto_cpp/presto_protocol/connector/clp/presto_protocol_clp.h (6)
  • facebook (25-32)
  • facebook (36-50)
  • facebook (51-87)
  • facebook (56-66)
  • facebook (67-76)
  • facebook (77-87)
presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
  • PrestoException (18-56)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)
  • ClpSplitMetadataConfig (58-277)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: prestissimo-worker-images-build

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@wraymo wraymo changed the title Metadata refactor refactor: Improve metadata expression extraction and validation; Update metadata config format. Nov 7, 2025
@wraymo wraymo marked this pull request as ready for review November 8, 2025 01:37
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)

56-61: Remove duplicate TypeManager binding

We now bind TypeManager twice in the same module (binder.bind(TypeManager.class).toInstance(...) appears on both Line 57 and Line 60). Guice rejects duplicate explicit bindings, so the connector bootstrap will fail with a CreationException as soon as this path is executed. Drop one of the bindings (keep the existing one at Line 60, or the new one, but not both) so the injector can be created successfully.

                 binder.bind(FunctionMetadataManager.class).toInstance(context.getFunctionMetadataManager());
-                binder.bind(TypeManager.class).toInstance(context.getTypeManager());
                 binder.bind(NodeManager.class).toInstance(context.getNodeManager());
                 binder.bind(StandardFunctionResolution.class).toInstance(context.getStandardFunctionResolution());
                 binder.bind(TypeManager.class).toInstance(context.getTypeManager());
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41aa302 and a6b71f6.

📒 Files selected for processing (30)
  • presto-clp/pom.xml (0 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (3 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (5 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (4 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java (4 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java (16 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java (3 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (4 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (0 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java (0 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigCustomOptionsDeserializer.java (0 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (0 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (0 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (5 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMySqlSplitMetadataExpressionConverter.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (4 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplitMetadataConfig.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (7 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpMySqlSplitFilterConfig.java (0 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java (0 hunks)
  • presto-clp/src/test/resources/test-mysql-split-filter.json (0 hunks)
  • presto-clp/src/test/resources/test-mysql-split-metadata.json (1 hunks)
  • presto-docs/src/main/sphinx/connector/clp.rst (2 hunks)
💤 Files with no reviewable changes (9)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpMySqlSplitFilterConfig.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfigCustomOptionsDeserializer.java
  • presto-clp/pom.xml
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterConfig.java
  • presto-clp/src/test/resources/test-mysql-split-filter.json
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-07-30T15:27:04.862Z
Learnt from: anlowee
Repo: y-scope/presto PR: 46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
📚 Learning: 2025-06-13T12:56:06.325Z
Learnt from: wraymo
Repo: y-scope/presto PR: 15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java
📚 Learning: 2025-09-16T17:39:45.102Z
Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java
  • presto-docs/src/main/sphinx/connector/clp.rst
📚 Learning: 2025-06-19T08:41:46.288Z
Learnt from: anlowee
Repo: y-scope/presto PR: 0
File: :0-0
Timestamp: 2025-06-19T08:41:46.288Z
Learning: In the y-scope/presto repository, prefer using Guava's Immutable collections over regular collections when possible.

Applied to files:

  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
📚 Learning: 2025-09-12T14:47:20.734Z
Learnt from: wraymo
Repo: y-scope/presto PR: 64
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:419-483
Timestamp: 2025-09-12T14:47:20.734Z
Learning: In Presto, SQL validation (including type compatibility checks) happens before reaching the connector optimizer, so invalid SQL with type mismatches will be rejected upstream rather than needing validation in the connector-specific optimization code.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java
🧬 Code graph analysis (12)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)
  • ClpSplitMetadataConfig (58-277)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplitMetadataConfig.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1)
  • ClpMySqlSplitMetadataExpressionConverter (60-245)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)
  • ClpSplitMetadataConfig (58-277)
presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
  • PrestoException (18-56)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (4)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/mockdb/ClpMockMetadataDatabase.java (1)
  • ClpMockMetadataDatabase (49-288)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/mockdb/table/ColumnMetadataTableRows.java (1)
  • ColumnMetadataTableRows (30-63)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)
  • ClpComputePushDown (44-157)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)
  • ClpSplitMetadataConfig (58-277)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMySqlSplitMetadataExpressionConverter.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1)
  • ClpMySqlSplitMetadataExpressionConverter (60-245)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)
  • ClpSplitMetadataConfig (58-277)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)
  • ClpSplitMetadataConfig (58-277)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpSplit.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)
  • ClpSplitMetadataConfig (58-277)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java (1)
  • ClpExpression (29-102)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java (1)
  • ClpFilterToKqlConverter (106-1071)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-186)
presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
  • PrestoException (18-56)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnector.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)
  • ClpSplitMetadataConfig (58-277)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1)
presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
  • PrestoException (18-56)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)
  • ClpSplitMetadataConfig (58-277)
presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
  • PrestoException (18-56)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.h (1)
  • RowExpression (394-416)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: conventional-commits
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (1)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1)

118-121: LGTM!

The new helper method correctly delegates to the existing toRowExpression implementation and enables tests to provide custom TypeProvider instances. This addition supports the metadata-expression testing infrastructure described in the PR objectives.

Comment on lines 82 to 86
TableHandle tableHandle = node.getTable();
ClpTableHandle clpTableHandle = (ClpTableHandle) tableHandle.getConnectorHandle();
tableScopeSet.add(format("%s.%s", CONNECTOR_NAME, clpTableHandle.getSchemaTableName()));
if (!metadataConfig.getRequiredColumns(clpTableHandle.getSchemaTableName()).isEmpty()) {
throw new PrestoException(CLP_MANDATORY_COLUMN_NOT_IN_FILTER, "required filters must be specified");
}
Copy link

@coderabbitai coderabbitai bot Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don’t throw before metadata filters are processed.

visitTableScan is invoked while rewriting every table scan, including ones that still have a metadata filter sitting above them. Because the new guard executes before the filter is analysed, any table that lists required columns now fails even when the query provides the correct metadata filter. We need to defer the “required filters must be specified” check until after we’ve inspected the filter’s RowExpression (so genuine missing filters still fail, but valid ones pass).

Apply this diff to remove the premature throw:

-            if (!metadataConfig.getRequiredColumns(clpTableHandle.getSchemaTableName()).isEmpty()) {
-                throw new PrestoException(CLP_MANDATORY_COLUMN_NOT_IN_FILTER, "required filters must be specified");
-            }

Then, inside processFilter, enforce the requirement only when no metadata expression was produced:

@@
-            SchemaTableName schemaTableName = clpTableHandle.getSchemaTableName();
+            SchemaTableName schemaTableName = clpTableHandle.getSchemaTableName();
+            Set<String> requiredColumns = metadataConfig.getRequiredColumns(schemaTableName);
@@
-            Optional<RowExpression> metadataExpression = clpExpression.getMetadataExpression();
+            Optional<RowExpression> metadataExpression = clpExpression.getMetadataExpression();
+
+            if (!requiredColumns.isEmpty() && !metadataExpression.isPresent()) {
+                throw new PrestoException(CLP_MANDATORY_COLUMN_NOT_IN_FILTER, "required filters must be specified");
+            }

Don’t forget to add import java.util.Set; at the top. This keeps the enforcement intact for missing filters while allowing valid filtered queries to proceed.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
around lines 82-86, remove the premature PrestoException throw that checks
metadataConfig.getRequiredColumns(...) in visitTableScan so the check does not
run before the metadata filter is processed; then update processFilter to
enforce the "required filters must be specified" rule only when no metadata
expression was produced (i.e. when the metadata expression is absent/empty), and
add an import line for java.util.Set at the top of the file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be no filters. No filter should be treated as a failure if required columns are specified

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why we need to do it in visitTableScan

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wraymo, you're absolutely right! The check in visitTableScan is necessary and correct for the no-filter case.

The visitor pattern here works as follows:

  • When there's no FilterNode above the TableScanNode: visitTableScan is called, and the required-columns check appropriately throws an exception.
  • When there IS a FilterNode above the TableScanNode: visitFilter is called, which directly invokes processFilter on the source TableScanNode without delegating to visitTableScan. This means visitTableScan is never invoked in this path.

So the check in visitTableScan only fires when there's no filter, which is exactly when you want to reject queries that don't provide the required metadata filters. It doesn't interfere with queries that have filters because those go through the visitFilterprocessFilter path instead.

CodeRabbit's concern about a "premature throw before metadata filters are processed" doesn't apply here—when a metadata filter exists in the plan, visitTableScan isn't called at all.


One minor observation (not blocking): processFilter could also add a check to ensure that when required columns exist and the filter produces no metadataExpression, an exception is thrown. This would catch the edge case where a filter exists but doesn't cover the required metadata columns. However, that might be a separate consideration beyond the scope of this specific discussion.


🧠 Learnings used
Learnt from: anlowee
Repo: y-scope/presto PR: 46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wraymo, you're absolutely right! I see the distinction now.

The visitor flow means:

  • Query with no filtervisitTableScan is called → your check catches it ✓
  • Query with filtervisitFilter is called → processFilter handles it directly → visitTableScan is NOT called for that TableScanNode

So the check in visitTableScan specifically guards against queries that have zero filters when required columns are configured. That's the correct placement for that scenario.

One follow-up question: for queries that do have a filter but the filter doesn't reference the required columns (doesn't produce a metadataExpression), where is that validated? I see that processFilter extracts the metadataExpression but doesn't appear to check whether required columns were satisfied. Is that validation happening later in ClpMySqlSplitMetadataExpressionConverter.transform() during split generation, or should we add an explicit check in processFilter to fail fast at plan time?


🧠 Learnings used
Learnt from: anlowee
Repo: y-scope/presto PR: 46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

Comment on lines +835 to 884
List<String> pushdownKql = new ArrayList<>();
List<RowExpression> metadataExpressions = new ArrayList<>();

boolean hasUnpushable = false;

for (RowExpression argument : node.getArguments()) {
ClpExpression expression = argument.accept(this, null);
// Note: It is possible in the future that an expression cannot be pushed down as a KQL query, but can be
// pushed down as a metadata SQL query.
if (expression.getRemainingExpression().isPresent() || !expression.getPushDownExpression().isPresent()) {
allPushedDown = false;
continue;
ClpExpression expr = argument.accept(this, null);

boolean hasRemaining = expr.getRemainingExpression().isPresent();
boolean hasKql = expr.getPushDownExpression().isPresent();
boolean hasMeta = expr.getMetadataExpression().isPresent();

// If this arg cannot be pushed down at all, bail early
if (hasRemaining || (!hasKql && !hasMeta)) {
hasUnpushable = true;
break;
}
queryBuilder.append(expression.getPushDownExpression().get());
queryBuilder.append(" OR ");
if (hasAllMetadataSql && expression.getMetadataSqlQuery().isPresent()) {
metadataQueryBuilder.append(expression.getMetadataSqlQuery().get());
metadataQueryBuilder.append(" OR ");

if (hasKql) {
pushdownKql.add(expr.getPushDownExpression().get());
}
else {
hasAllMetadataSql = false;

if (hasMeta) {
metadataExpressions.add(expr.getMetadataExpression().get());
}
}
if (allPushedDown) {
// Remove the last " OR " from the query
return new ClpExpression(
queryBuilder.substring(0, queryBuilder.length() - 4) + ")",
hasAllMetadataSql ? metadataQueryBuilder.substring(0, metadataQueryBuilder.length() - 4) + ")" : null);

if (hasUnpushable) {
return new ClpExpression(node);
}

String combinedKql = null;
if (!pushdownKql.isEmpty()) {
combinedKql = "(" + String.join(" OR ", pushdownKql) + ")";
}
return new ClpExpression(node);

// Only use metadata if every argument has metadata and none had KQL
RowExpression combinedMetadata = null;
if (metadataExpressions.size() == node.getArguments().size() && pushdownKql.isEmpty()) {
combinedMetadata = (metadataExpressions.size() == 1)
? metadataExpressions.get(0)
: new SpecialFormExpression(
node.getSourceLocation(),
OR,
BOOLEAN,
metadataExpressions);
}

return new ClpExpression(combinedKql, combinedMetadata);
}
Copy link

@coderabbitai coderabbitai bot Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Do not drop metadata-only branches when mixing with KQL in OR.

When an OR clause contains one operand that is metadata-only (e.g., fare = 0) and another that produces KQL (e.g., city.Name LIKE 'b%'), this block treats the node as pushable yet the resulting ClpExpression only keeps the KQL branch—no metadata expression and no remaining expression. The metadata predicate is therefore lost, so rows satisfying fare = 0 but not the KQL branch are incorrectly filtered out. Please fall back to the original expression whenever OR mixes metadata-only and KQL-only operands.

Apply this diff to restore correctness:

-        boolean hasUnpushable = false;
+        boolean hasUnpushable = false;
+        boolean sawMetadataOnly = false;
+        boolean sawKqlOnly = false;
@@
-            if (hasRemaining || (!hasKql && !hasMeta)) {
+            if (hasRemaining || (!hasKql && !hasMeta)) {
                 hasUnpushable = true;
                 break;
             }
 
+            if (hasMeta && !hasKql) {
+                sawMetadataOnly = true;
+            }
+            if (hasKql && !hasMeta) {
+                sawKqlOnly = true;
+            }
+
             if (hasKql) {
                 pushdownKql.add(expr.getPushDownExpression().get());
             }
 
             if (hasMeta) {
                 metadataExpressions.add(expr.getMetadataExpression().get());
             }
         }
 
-        if (hasUnpushable) {
+        if (hasUnpushable || (sawMetadataOnly && sawKqlOnly)) {
             return new ClpExpression(node);
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
List<String> pushdownKql = new ArrayList<>();
List<RowExpression> metadataExpressions = new ArrayList<>();
boolean hasUnpushable = false;
for (RowExpression argument : node.getArguments()) {
ClpExpression expression = argument.accept(this, null);
// Note: It is possible in the future that an expression cannot be pushed down as a KQL query, but can be
// pushed down as a metadata SQL query.
if (expression.getRemainingExpression().isPresent() || !expression.getPushDownExpression().isPresent()) {
allPushedDown = false;
continue;
ClpExpression expr = argument.accept(this, null);
boolean hasRemaining = expr.getRemainingExpression().isPresent();
boolean hasKql = expr.getPushDownExpression().isPresent();
boolean hasMeta = expr.getMetadataExpression().isPresent();
// If this arg cannot be pushed down at all, bail early
if (hasRemaining || (!hasKql && !hasMeta)) {
hasUnpushable = true;
break;
}
queryBuilder.append(expression.getPushDownExpression().get());
queryBuilder.append(" OR ");
if (hasAllMetadataSql && expression.getMetadataSqlQuery().isPresent()) {
metadataQueryBuilder.append(expression.getMetadataSqlQuery().get());
metadataQueryBuilder.append(" OR ");
if (hasKql) {
pushdownKql.add(expr.getPushDownExpression().get());
}
else {
hasAllMetadataSql = false;
if (hasMeta) {
metadataExpressions.add(expr.getMetadataExpression().get());
}
}
if (allPushedDown) {
// Remove the last " OR " from the query
return new ClpExpression(
queryBuilder.substring(0, queryBuilder.length() - 4) + ")",
hasAllMetadataSql ? metadataQueryBuilder.substring(0, metadataQueryBuilder.length() - 4) + ")" : null);
if (hasUnpushable) {
return new ClpExpression(node);
}
String combinedKql = null;
if (!pushdownKql.isEmpty()) {
combinedKql = "(" + String.join(" OR ", pushdownKql) + ")";
}
return new ClpExpression(node);
// Only use metadata if every argument has metadata and none had KQL
RowExpression combinedMetadata = null;
if (metadataExpressions.size() == node.getArguments().size() && pushdownKql.isEmpty()) {
combinedMetadata = (metadataExpressions.size() == 1)
? metadataExpressions.get(0)
: new SpecialFormExpression(
node.getSourceLocation(),
OR,
BOOLEAN,
metadataExpressions);
}
return new ClpExpression(combinedKql, combinedMetadata);
}
List<String> pushdownKql = new ArrayList<>();
List<RowExpression> metadataExpressions = new ArrayList<>();
boolean hasUnpushable = false;
boolean sawMetadataOnly = false;
boolean sawKqlOnly = false;
for (RowExpression argument : node.getArguments()) {
ClpExpression expr = argument.accept(this, null);
boolean hasRemaining = expr.getRemainingExpression().isPresent();
boolean hasKql = expr.getPushDownExpression().isPresent();
boolean hasMeta = expr.getMetadataExpression().isPresent();
// If this arg cannot be pushed down at all, bail early
if (hasRemaining || (!hasKql && !hasMeta)) {
hasUnpushable = true;
break;
}
if (hasMeta && !hasKql) {
sawMetadataOnly = true;
}
if (hasKql && !hasMeta) {
sawKqlOnly = true;
}
if (hasKql) {
pushdownKql.add(expr.getPushDownExpression().get());
}
if (hasMeta) {
metadataExpressions.add(expr.getMetadataExpression().get());
}
}
if (hasUnpushable || (sawMetadataOnly && sawKqlOnly)) {
return new ClpExpression(node);
}
String combinedKql = null;
if (!pushdownKql.isEmpty()) {
combinedKql = "(" + String.join(" OR ", pushdownKql) + ")";
}
// Only use metadata if every argument has metadata and none had KQL
RowExpression combinedMetadata = null;
if (metadataExpressions.size() == node.getArguments().size() && pushdownKql.isEmpty()) {
combinedMetadata = (metadataExpressions.size() == 1)
? metadataExpressions.get(0)
: new SpecialFormExpression(
node.getSourceLocation(),
OR,
BOOLEAN,
metadataExpressions);
}
return new ClpExpression(combinedKql, combinedMetadata);
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, because it's an OR, if we push down metadata in this case, it will turns it to a AND.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wraymo, you're absolutely right about the root cause—pushing down both metadata and KQL predicates in an OR would apply them at different stages (metadata filters determine which archives to scan, while KQL filters within those archives), effectively turning the OR into an AND.

However, the current implementation has a correctness issue: when mixing metadata-only and KQL-only predicates in an OR, the metadata predicate is silently dropped.

For example, with WHERE fare = 0 OR city.Name LIKE 'b%':

  • fare = 0 (metadata-only) gets collected but not used
  • city.Name LIKE 'b%' (KQL-only) is pushed down
  • Result: only the KQL filter is applied, so rows satisfying fare = 0 but not the LIKE pattern would be incorrectly filtered out

This produces wrong results. The safe approach is to fall back to the original expression (no pushdown) when OR mixes metadata-only and KQL-only predicates, as CodeRabbit suggested. This ensures Presto evaluates the full OR condition correctly, even though it's less efficient.

Would you like to apply CodeRabbit's fix to prevent incorrect results, or is there a different strategy you have in mind for handling these mixed OR cases?


🧠 Learnings used
Learnt from: anlowee
Repo: y-scope/presto PR: 46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fare is a metadata column, dropping it is the expected behaviour

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wraymo, thank you for clarifying! I understand now that dropping the metadata predicate in mixed OR scenarios is the intended design behavior.

To confirm my understanding: metadata columns (like fare in your example) are used for archive selection at the split/metadata filtering stage, while KQL predicates operate at the row level within archives. When you have fare = 0 OR city.Name LIKE 'b%', the OR semantics would require evaluating both predicates at the same level, but since metadata columns can't be evaluated at the row level, the expected behavior is to:

  1. Drop the metadata predicate (fare = 0)
  2. Push down only the KQL part (city.Name LIKE 'b%')
  3. Scan all archives (no metadata filtering) and apply the KQL filter

This is a trade-off where the connector prioritizes pushing down what it can (KQL) while accepting that metadata-based archive filtering isn't possible in this mixed OR scenario. The alternative would be to throw an error for such queries, but silently dropping the metadata predicate allows the query to proceed (albeit scanning more archives than optimal).

This design makes sense given the two-stage filtering architecture. Thanks for the clarification!


✏️ Learnings added
Learnt from: wraymo
Repo: y-scope/presto PR: 89
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:835-884
Timestamp: 2025-11-08T02:05:14.192Z
Learning: In ClpFilterToKqlConverter's handleOr method, when an OR expression mixes metadata-only predicates with KQL-only predicates, dropping the metadata predicate is expected behavior. Metadata columns are used for archive selection (split filtering stage) and cannot be evaluated at the row level, so in mixed OR scenarios, only the KQL predicates are pushed down and all archives are scanned. This is a design trade-off, not a correctness issue.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: anlowee
Repo: y-scope/presto PR: 46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Comment on lines +260 to +271
for (String ns : namespaces) {
TableConfig cfg = tableConfigs.get(ns);
if (cfg != null) {
merged.metaColumns.putAll(cfg.metaColumns);

for (FilterRule rule : cfg.filterRules) {
boolean exists = merged.filterRules.stream()
.anyMatch(r -> r.column.equals(rule.column));
if (!exists) {
merged.filterRules.add(rule);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Allow table-level filter rules to override broader scopes.

This merge keeps the first occurrence of each rule (global) and ignores schema/table overrides, so a table-specific rule (e.g., marking a column required) never takes effect. That contradicts the documented precedence and breaks configuration correctness. Please let later scopes replace earlier ones.

Apply this diff to honour overrides:

-        for (String ns : namespaces) {
-            TableConfig cfg = tableConfigs.get(ns);
-            if (cfg != null) {
-                merged.metaColumns.putAll(cfg.metaColumns);
-
-                for (FilterRule rule : cfg.filterRules) {
-                    boolean exists = merged.filterRules.stream()
-                            .anyMatch(r -> r.column.equals(rule.column));
-                    if (!exists) {
-                        merged.filterRules.add(rule);
-                    }
-                }
-            }
-        }
+        Map<String, FilterRule> mergedRules = new LinkedHashMap<>();
+        for (String ns : namespaces) {
+            TableConfig cfg = tableConfigs.get(ns);
+            if (cfg != null) {
+                merged.metaColumns.putAll(cfg.metaColumns);
+                for (FilterRule rule : cfg.filterRules) {
+                    mergedRules.put(rule.column, rule);
+                }
+            }
+        }
+        merged.filterRules.addAll(mergedRules.values());
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (String ns : namespaces) {
TableConfig cfg = tableConfigs.get(ns);
if (cfg != null) {
merged.metaColumns.putAll(cfg.metaColumns);
for (FilterRule rule : cfg.filterRules) {
boolean exists = merged.filterRules.stream()
.anyMatch(r -> r.column.equals(rule.column));
if (!exists) {
merged.filterRules.add(rule);
}
}
Map<String, FilterRule> mergedRules = new LinkedHashMap<>();
for (String ns : namespaces) {
TableConfig cfg = tableConfigs.get(ns);
if (cfg != null) {
merged.metaColumns.putAll(cfg.metaColumns);
for (FilterRule rule : cfg.filterRules) {
mergedRules.put(rule.column, rule);
}
}
}
merged.filterRules.addAll(mergedRules.values());

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a6b71f6 and d988cbc.

📒 Files selected for processing (1)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.
📚 Learning: 2025-11-08T02:05:14.192Z
Learnt from: wraymo
Repo: y-scope/presto PR: 89
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:835-884
Timestamp: 2025-11-08T02:05:14.192Z
Learning: In ClpFilterToKqlConverter's handleOr method, when an OR expression mixes metadata-only predicates with KQL-only predicates, dropping the metadata predicate is expected behavior. Metadata columns are used for archive selection (split filtering stage) and cannot be evaluated at the row level, so in mixed OR scenarios, only the KQL predicates are pushed down and all archives are scanned. This is a design trade-off, not a correctness issue.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java
📚 Learning: 2025-09-16T17:39:45.102Z
Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java
🧬 Code graph analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1)
presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
  • PrestoException (18-56)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: prestissimo-worker-images-build
  • GitHub Check: maven-checks (17.0.13)
🔇 Additional comments (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (4)

94-104: LGTM: Required-column tracking correctly reset.

The addition of seenRequired.clear() on line 96 ensures that each transform invocation independently validates required columns without contamination from prior calls.


156-178: LGTM: Constant handling is correct and secure.

The method properly handles:

  • SQL string escaping (line 168: replace("'", "''")),
  • Decimal scale preservation for both Slice and Long representations, and
  • Safe conversion of numeric types.

180-186: LGTM: Column reference handling is correct.

The method correctly:

  • Tracks the exposed name for required-column validation (line 184), and
  • Returns the original MySQL column name for SQL generation (line 185).

188-245: LGTM: Range-bound rewriting logic is correct.

The method correctly rewrites comparisons using metadata bounds:

  • Greater-than operators check the upper bound,
  • Less-than operators check the lower bound, and
  • Equality rewrites to a range containment check: (lower <= literal) AND (upper >= literal).

The fallback cases (lines 233-238) reasonably handle single-bound scenarios.

@jackluo923 jackluo923 force-pushed the release-0.293-clp-connector branch 2 times, most recently from e5bfe33 to ddfca65 Compare November 10, 2025 13:38
@jackluo923 jackluo923 force-pushed the release-0.293-clp-connector branch from ddfca65 to bfa18ff Compare November 10, 2025 15:07
@wraymo wraymo changed the title refactor: Improve metadata expression extraction and validation; Update metadata config format. refactor: Improve metadata expression extraction and validation; Update split metadata config format; Add split metadata columns during metadata resolution. Nov 10, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d988cbc and fbc6498.

📒 Files selected for processing (4)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (5 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (4 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (6 hunks)
  • presto-docs/src/main/sphinx/connector/clp.rst (2 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.
📚 Learning: 2025-09-16T17:39:45.102Z
Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

Applied to files:

  • presto-docs/src/main/sphinx/connector/clp.rst
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
📚 Learning: 2025-06-13T12:56:06.325Z
Learnt from: wraymo
Repo: y-scope/presto PR: 15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java
📚 Learning: 2025-11-08T02:05:14.212Z
Learnt from: wraymo
Repo: y-scope/presto PR: 89
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:835-884
Timestamp: 2025-11-08T02:05:14.212Z
Learning: In ClpFilterToKqlConverter's handleOr method, when an OR expression mixes metadata-only predicates with KQL-only predicates, dropping the metadata predicate is expected behavior. Metadata columns are used for archive selection (split filtering stage) and cannot be evaluated at the row level, so in mixed OR scenarios, only the KQL predicates are pushed down and all archives are scanned. This is a design trade-off, not a correctness issue.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
📚 Learning: 2025-07-30T15:27:04.862Z
Learnt from: anlowee
Repo: y-scope/presto PR: 46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
📚 Learning: 2025-09-12T14:47:20.734Z
Learnt from: wraymo
Repo: y-scope/presto PR: 64
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:419-483
Timestamp: 2025-09-12T14:47:20.734Z
Learning: In Presto, SQL validation (including type compatibility checks) happens before reaching the connector optimizer, so invalid SQL with type mismatches will be rejected upstream rather than needing validation in the connector-specific optimization code.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
🧬 Code graph analysis (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMySqlMetadataProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)
  • ClpSplitMetadataConfig (58-277)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)
  • ClpSplitMetadataConfig (58-277)
presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
  • PrestoException (18-56)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMetadata.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java (1)
  • ClpSplitMetadataConfig (58-277)

Comment on lines 120 to 123
if (!metadataExpression.isPresent() &&
!metadataConfig.getRequiredColumns(clpTableHandle.getSchemaTableName()).isEmpty()) {
throw new PrestoException(CLP_MANDATORY_COLUMN_NOT_IN_FILTER, "required filters must be specified");
}
Copy link

@coderabbitai coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Required columns still go unenforced.

metadataExpression.isPresent() only tells us that some metadata predicate was extracted; it does not guarantee that every entry in metadataConfig.getRequiredColumns(...) appeared in the filter. For example, with a config that marks msg.timestamp as required, a query filtering only on a different metadata column (e.g., level = 5) yields a metadata expression, so this guard passes and the query is accepted even though the required column is absent. That regresses the “must supply timestamp” contract and re-opens the expensive full-scan path the config was supposed to forbid.

Please track the specific metadata/data columns referenced during conversion and compare them against the required set, e.g. have ClpFilterToKqlConverter return the set of satisfied columns (or extend ClpExpression to expose it) and reject whenever requiredColumns - satisfiedColumns is non-empty.

🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
around lines 120-123, the current check only tests
metadataExpression.isPresent() which does not guarantee that all configured
required columns were actually referenced; change the flow so the filter-to-KQL
conversion also returns (or exposes via ClpExpression) the set of metadata/data
columns that were satisfied, then in this location compute missing =
requiredColumns - satisfiedColumns and throw
PrestoException(CLP_MANDATORY_COLUMN_NOT_IN_FILTER, "...") whenever missing is
non-empty; implement the minimal API change by extending ClpFilterToKqlConverter
to return both the converted expression and the Set<String> of satisfied column
names (or augment ClpExpression to expose satisfiedColumns), update callers
accordingly, and use that satisfied set here to enforce requiredColumns instead
of relying solely on metadataExpression.isPresent().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's enforced in the converter.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check ClpFilterToKqlConverter for required column enforcement
cat presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java | grep -A 20 -B 5 "required"

Length of output: 1243


🏁 Script executed:

#!/bin/bash
# Check ClpMySqlSplitMetadataExpressionConverter for required column enforcement
cat presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java | grep -A 30 "transform\|required"

Length of output: 4969


🏁 Script executed:

#!/bin/bash
# Check how ClpMySqlSplitMetadataExpressionConverter is used in ClpMySqlSplitProvider
cat presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java | grep -A 40 -B 10 "ClpMySqlSplitMetadataExpressionConverter"

Length of output: 2667


@wraymo You're correct! I can see that required column enforcement happens in ClpMySqlSplitMetadataExpressionConverter. The architecture uses a two-tier validation approach:

  1. Early fail-fast check (lines 120-123 in ClpComputePushDown): If no metadata expression was generated at all but required columns are configured, it fails immediately with "required filters must be specified."

  2. Fine-grained enforcement (in ClpMySqlSplitMetadataExpressionConverter.transform()): When a metadata expression IS present, this converter tracks which required columns were actually referenced via the seenRequired set. After transformation, it computes missing = requiredColumns - seenRequired and throws CLP_MANDATORY_COLUMN_NOT_IN_FILTER with "Missing required filter columns: " + missing if any required columns weren't referenced.

This means the scenario CodeRabbit flagged (filtering on one metadata column while missing a different required metadata column) is caught downstream in ClpMySqlSplitProvider when it calls converter.transform(). The enforcement is sound—it's just split between the optimizer layer (coarse check) and the split provider layer (detailed validation).


🧬 Code Graph Analysis Results

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java

  • Summary: Enforcement of mandatory columns is performed directly in the CLP push-down converter/rewriter logic.

    • Where to look:
      • In ClpComputePushDown.Rewriter.visitTableScan:
        • If there are any required columns for the target CLP table (as reported by metadataConfig.getRequiredColumns(schemaTableName)) and the scan does not have those columns specified, a PrestoException is thrown with error code CLP_MANDATORY_COLUMN_NOT_IN_FILTER and message "required filters must be specified".
      • In ClpComputePushDown.Rewriter.processFilter:
        • After converting the filter to a KQL/metadata expression, if no metadata expression is produced (i.e., nothing was pushed down to metadata) and there are still required columns for the target table, the same PrestoException is thrown with the same message, enforcing that required columns must be included in the filter/request.
    • Key behavior:
      • Throws PrestoException(CLP_MANDATORY_COLUMN_NOT_IN_FILTER, "required filters must be specified") when mandatory metadata/filter columns are not present in the input filter table scan.
      • This enforcement ensures that queries cannot proceed without pushing down the required metadata-filter columns.
  • Related snippet for context (non-code excerpt):

    • getRequiredColumns(...) is invoked to determine the set of columns deemed mandatory for filtering, from ClpSplitMetadataConfig, which aggregates global/schema/table-level rules.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpSplitMetadataConfig.java

  • Summary: getRequiredColumns(SchemaTableName name)
    • Description: Returns the set of metadata columns that are marked as required for filtering for the given table. This set is used by the enforcement logic in ClpComputePushDown to ensure mandatory filters are supplied.
    • Parameters:
      • name: SchemaTableName of the target table.
    • Returns:
      • Set of column names that are required for filtering.
    • Exceptions:
      • None described; it relies on the merged configuration built from global/schema/table rules.
    • Important implementation details:
      • Walks the merged TableConfig (including hierarchical global/schema/table configs) and collects rule.column for rules where rule.required is true.
      • Used by CLP push-down logic to decide whether to error if required columns are not part of the filter.

🧠 Learnings used
Learnt from: anlowee
Repo: y-scope/presto PR: 46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Learnt from: wraymo
Repo: y-scope/presto PR: 89
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:835-884
Timestamp: 2025-11-08T02:05:14.212Z
Learning: In ClpFilterToKqlConverter's handleOr method, when an OR expression mixes metadata-only predicates with KQL-only predicates, dropping the metadata predicate is expected behavior. Metadata columns are used for archive selection (split filtering stage) and cannot be evaluated at the row level, so in mixed OR scenarios, only the KQL predicates are pushed down and all archives are scanned. This is a design trade-off, not a correctness issue.

Comment on lines +235 to +244
"default": {
"metaColumns": {
"partition_date": {
"type": "DATE",
"exposedAs": "partition_date",
"description": "Logical partition of the data file"
}
},
},
"default.table_1": {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the JSON example’s dangling comma.

The example namespace object under "default" ends with a trailing comma after metaColumns, leaving no subsequent property. That’s invalid JSON and will break anyone who copies it verbatim. Drop that comma (or add the intended filterRules block) so the sample parses cleanly.

🤖 Prompt for AI Agents
In presto-docs/src/main/sphinx/connector/clp.rst around lines 235 to 244, the
JSON example for the "default" namespace contains a trailing comma after the
"metaColumns" object which results in invalid JSON; remove the dangling comma
(or alternatively add the intended "filterRules" property if that was omitted)
so the "default" object is valid JSON and the sample parses correctly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbc6498 and f2db72c.

📒 Files selected for processing (1)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.
📚 Learning: 2025-11-08T02:05:14.212Z
Learnt from: wraymo
Repo: y-scope/presto PR: 89
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:835-884
Timestamp: 2025-11-08T02:05:14.212Z
Learning: In ClpFilterToKqlConverter's handleOr method, when an OR expression mixes metadata-only predicates with KQL-only predicates, dropping the metadata predicate is expected behavior. Metadata columns are used for archive selection (split filtering stage) and cannot be evaluated at the row level, so in mixed OR scenarios, only the KQL predicates are pushed down and all archives are scanned. This is a design trade-off, not a correctness issue.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java
📚 Learning: 2025-09-16T17:39:45.102Z
Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java
🧬 Code graph analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1)
presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
  • PrestoException (18-56)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: prestocpp-linux-build-for-test
  • GitHub Check: prestissimo-worker-images-build
🔇 Additional comments (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (4)

95-105: LGTM! Required column validation is correct.

The method properly resets state before each transformation and validates that all required columns appear in the expression.


140-156: LGTM! Special form handling is correct.

The method properly handles AND, OR, and IS_NULL with appropriate parenthesization and throws for unsupported forms.


158-180: LGTM! Constant formatting handles decimals and strings correctly.

The SQL escaping and decimal handling (both short and long forms) are implemented correctly.


182-188: LGTM! Variable reference handling correctly tracks exposed names.

The method properly tracks exposed names for required-column validation and returns original names for SQL output.

Comment on lines +71 to +83
public ClpMySqlSplitMetadataExpressionConverter(
FunctionMetadataManager functionManager,
StandardFunctionResolution functionResolution,
Map<String, String> exposedToOriginal,
Map<String, Map<String, String>> dataToMetadataBounds,
Set<String> requiredColumns)
{
this.functionManager = requireNonNull(functionManager, "functionManager is null");
this.functionResolution = requireNonNull(functionResolution, "functionResolution is null");
this.exposedToOriginal = exposedToOriginal;
this.dataToMetadataBounds = dataToMetadataBounds;
this.requiredColumns = requiredColumns;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add null checks for remaining constructor parameters.

The constructor validates functionManager and functionResolution with requireNonNull, but the Map and Set parameters (exposedToOriginal, dataToMetadataBounds, requiredColumns) are not checked. If any are null, the code will fail with an NPE during traversal rather than at construction time.

Apply this diff to add the missing checks:

     public ClpMySqlSplitMetadataExpressionConverter(
             FunctionMetadataManager functionManager,
             StandardFunctionResolution functionResolution,
             Map<String, String> exposedToOriginal,
             Map<String, Map<String, String>> dataToMetadataBounds,
             Set<String> requiredColumns)
     {
         this.functionManager = requireNonNull(functionManager, "functionManager is null");
         this.functionResolution = requireNonNull(functionResolution, "functionResolution is null");
-        this.exposedToOriginal = exposedToOriginal;
-        this.dataToMetadataBounds = dataToMetadataBounds;
-        this.requiredColumns = requiredColumns;
+        this.exposedToOriginal = requireNonNull(exposedToOriginal, "exposedToOriginal is null");
+        this.dataToMetadataBounds = requireNonNull(dataToMetadataBounds, "dataToMetadataBounds is null");
+        this.requiredColumns = requireNonNull(requiredColumns, "requiredColumns is null");
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public ClpMySqlSplitMetadataExpressionConverter(
FunctionMetadataManager functionManager,
StandardFunctionResolution functionResolution,
Map<String, String> exposedToOriginal,
Map<String, Map<String, String>> dataToMetadataBounds,
Set<String> requiredColumns)
{
this.functionManager = requireNonNull(functionManager, "functionManager is null");
this.functionResolution = requireNonNull(functionResolution, "functionResolution is null");
this.exposedToOriginal = exposedToOriginal;
this.dataToMetadataBounds = dataToMetadataBounds;
this.requiredColumns = requiredColumns;
}
public ClpMySqlSplitMetadataExpressionConverter(
FunctionMetadataManager functionManager,
StandardFunctionResolution functionResolution,
Map<String, String> exposedToOriginal,
Map<String, Map<String, String>> dataToMetadataBounds,
Set<String> requiredColumns)
{
this.functionManager = requireNonNull(functionManager, "functionManager is null");
this.functionResolution = requireNonNull(functionResolution, "functionResolution is null");
this.exposedToOriginal = requireNonNull(exposedToOriginal, "exposedToOriginal is null");
this.dataToMetadataBounds = requireNonNull(dataToMetadataBounds, "dataToMetadataBounds is null");
this.requiredColumns = requireNonNull(requiredColumns, "requiredColumns is null");
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java
around lines 71 to 83, the constructor currently null-checks functionManager and
functionResolution but not exposedToOriginal, dataToMetadataBounds, and
requiredColumns; add requireNonNull checks for each of those three parameters
(e.g., this.exposedToOriginal = requireNonNull(exposedToOriginal,
"exposedToOriginal is null"); and similarly for dataToMetadataBounds and
requiredColumns) so that any null is detected at construction time and clear
error messages are provided.

Comment on lines +119 to +122
if (operatorType == OperatorType.NEGATION) {
String value = node.getArguments().get(0).accept(this, null);
return "-" + value;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider adding parentheses around negation operand for robustness.

While the NEGATION operator typically applies to simple numeric values or column references, wrapping the operand in parentheses would ensure correctness if it ever receives a complex expression.

Apply this diff for defensive coding:

             if (operatorType == OperatorType.NEGATION) {
                 String value = node.getArguments().get(0).accept(this, null);
-                return "-" + value;
+                return "-(" + value + ")";
             }
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java
around lines 119 to 122, the NEGATION branch returns "-" + value without
parentheses which can produce incorrect SQL for complex operands; wrap the
operand in parentheses by returning "-(" + value + ")" so the negation always
applies to the entire expression.

Comment on lines +207 to +214
private String rewriteComparisonWithBounds(String variableName, OperatorType operator, String literal)
{
String original = exposedToOriginal.getOrDefault(variableName, variableName);
Map<String, String> bounds = dataToMetadataBounds.get(original);
if (bounds == null) {
return null;
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove redundant mapping—variableName is already the original name.

Line 209 attempts to map variableName from exposed to original, but variableName is already the original name (returned by visitVariableReference at line 187). Since exposedToOriginal maps exposed→original, this lookup will fail and fall back to the default, making line 209 a no-op. This is confusing and could break if the calling logic changes.

Apply this diff to clarify the code:

     private String rewriteComparisonWithBounds(String variableName, OperatorType operator, String literal)
     {
-        String original = exposedToOriginal.getOrDefault(variableName, variableName);
-        Map<String, String> bounds = dataToMetadataBounds.get(original);
+        // variableName is already the original name (from visitVariableReference)
+        Map<String, String> bounds = dataToMetadataBounds.get(variableName);
         if (bounds == null) {
             return null;
         }

Alternatively, rename the parameter to originalName for clarity:

-    private String rewriteComparisonWithBounds(String variableName, OperatorType operator, String literal)
+    private String rewriteComparisonWithBounds(String originalName, OperatorType operator, String literal)
     {
-        String original = exposedToOriginal.getOrDefault(variableName, variableName);
-        Map<String, String> bounds = dataToMetadataBounds.get(original);
+        Map<String, String> bounds = dataToMetadataBounds.get(originalName);
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java
around lines 207 to 214, remove the redundant exposedToOriginal lookup for
variableName (it is already the original name from visitVariableReference) —
either replace the lookup with direct use of variableName when fetching bounds
from dataToMetadataBounds, or rename the method parameter to originalName and
use that variable directly; update any references and javadoc/comments
accordingly to avoid confusion.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f2db72c and 5e58a60.

📒 Files selected for processing (1)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.
📚 Learning: 2025-11-08T02:05:14.212Z
Learnt from: wraymo
Repo: y-scope/presto PR: 89
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:835-884
Timestamp: 2025-11-08T02:05:14.212Z
Learning: In ClpFilterToKqlConverter's handleOr method, when an OR expression mixes metadata-only predicates with KQL-only predicates, dropping the metadata predicate is expected behavior. Metadata columns are used for archive selection (split filtering stage) and cannot be evaluated at the row level, so in mixed OR scenarios, only the KQL predicates are pushed down and all archives are scanned. This is a design trade-off, not a correctness issue.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java
📚 Learning: 2025-09-16T17:39:45.102Z
Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java
🧬 Code graph analysis (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java (1)
presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
  • PrestoException (18-56)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (20)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (:presto-docs)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: prestocpp-linux-build-for-test
  • GitHub Check: prestissimo-worker-images-build

Comment on lines +172 to +179
if (type instanceof DecimalType && value instanceof Long) {
DecimalType decimalType = (DecimalType) type;
BigDecimal decimalValue = new BigDecimal(BigInteger.valueOf((Long) value), decimalType.getScale());
return decimalValue.toPlainString();
}

return value.toString();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix literal formatting for DATE/TIMESTAMP types.

visitConstant currently falls back to value.toString() for every non-Slice value. For DATE, TIME, and TIMESTAMP RowExpressions Presto encodes the literal as a long (days or micros since epoch). With the current logic, a predicate like metadata_date >= DATE '2024-01-01' is emitted as metadata_date >= 19358, which MySQL interprets numerically and yields the wrong archives. This breaks metadata pushdown for every non-decimal temporal column.

Please materialize proper SQL literals before returning—for example, convert DateType longs via LocalDate.ofEpochDay, and format TimestampType/TimeType using the appropriate precision-aware helper (e.g., Instant/LocalDateTime plus DateTimeFormatter), quoting the result so MySQL receives ISO strings.

         if (value instanceof Slice) {
             ...
         }
+
+        if (type instanceof DateType && value instanceof Long) {
+            LocalDate date = LocalDate.ofEpochDay((Long) value);
+            return format("DATE '%s'", date);
+        }
+        if (type instanceof TimestampType && value instanceof Long) {
+            long micros = (Long) value;
+            LocalDateTime dateTime = LocalDateTime.ofEpochSecond(
+                    floorDiv(micros, MICROSECONDS_PER_SECOND),
+                    (int) floorMod(micros, MICROSECONDS_PER_SECOND) * NANOSECONDS_PER_MICROSECOND,
+                    ZoneOffset.UTC);
+            return format("TIMESTAMP '%s'", dateTime);
+        }
+        // add similar handling for TIME / TIMESTAMP WITH TIME ZONE as needed
+
         return value.toString();

Make sure to add the necessary java.time imports and constants when implementing this. Until these conversions are in place, temporal metadata filters will continue to misfire.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitMetadataExpressionConverter.java
around lines 172 to 179, the code falls back to value.toString() for non-Slice
literals which causes DATE/TIME/TIMESTAMP longs to be emitted as raw numbers;
update visitConstant to detect DateType, TimeType and TimestampType and
materialize SQL literals: for DateType convert the long (days) with
LocalDate.ofEpochDay(...) and format as an ISO date string quoted for SQL; for
TimestampType and TimeType treat the long as microseconds (or the Presto-encoded
unit), convert to Instant/LocalTime/LocalDateTime using
Instant.ofEpochSecond(micros/1_000_000, (micros%1_000_000)*1000) or appropriate
epoch/micro->nano math, format with a precision-aware DateTimeFormatter, and
return the quoted ISO string; keep existing Decimal handling intact; add
necessary java.time and java.time.format imports and any constants (e.g.,
MICROS_PER_SECOND, NANOS_PER_MICRO) used for conversions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant