-
Notifications
You must be signed in to change notification settings - Fork 13.7k
[FLINK-38425][table] Add rule to convert correlate node to vector search physical node #27121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…rch node in physical phase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LG overall. Left a few minor comments.
BTW, why don't we check column existence at https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/functions/sql/ml/SqlVectorSearchTableFunction.java#L169?
<Resource name="optimized rel plan"> | ||
<![CDATA[ | ||
Calc(select=[a, b, c, d, rowtime, PROCTIME_MATERIALIZE(proctime) AS proctime, e, g, PROCTIME_MATERIALIZE(proctime0) AS proctime0, score]) | ||
+- VectorSearchTableFunction(table=[default_catalog.default_database.VectorTableWithProctime], joinType=[InnerJoin], columnToSearch=[g], columnToQuery=[d], topK=[10], select=[a, b, c, d, rowtime, proctime, e, g, PROCTIME() AS proctime, score]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proctime appear from both tables, is it a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. It just influences the plan display. But I try to fix the problem.
.../test/java/org/apache/flink/table/planner/plan/stream/sql/VectorSearchTableFunctionTest.java
Outdated
Show resolved
Hide resolved
if (parentNode != null) { | ||
throw new RelOptPlanner.CannotPlanException( | ||
String.format( | ||
"%s assumes calc to be the first node in parameter search_table, but it has a parent %s.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test to test this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can add a test here, because the relational structure we currently support is too limited — only scan and calc nodes are allowed.
However, it’s not possible to construct a calc node as a child of a scan node, or to build a tree like calc → calc → scan. This is because calc nodes are often merged or eliminated during optimization (see CalcRemoveRule and CalcMergeRule).
Because we check the existence at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
What is the purpose of the change
Add a rule to convert a Correlate node into a VectorSearchPhysicalNode. Currently, this rule requires the right subtree to be simple — that is, it must consist of either a single TableScan, or a Calc operator followed by a TableScan.
Brief change log