[ES|QL] MMR Command - MMROperator and Execution#141324
[ES|QL] MMR Command - MMROperator and Execution#141324markjhoy wants to merge 37 commits intoelastic:mainfrom
Conversation
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/MMROperator.java
Outdated
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
Outdated
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
Outdated
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
Outdated
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
Outdated
Show resolved
Hide resolved
| import static org.elasticsearch.xpack.esql.core.type.DataType.INTEGER; | ||
|
|
||
| public class MMR extends UnaryPlan implements TelemetryAware, ExecutesOn.Coordinator, PostAnalysisVerificationAware { | ||
| public static final NamedWriteableRegistry.Entry ENTRY = new NamedWriteableRegistry.Entry(LogicalPlan.class, "Mmr", MMR::new); |
There was a problem hiding this comment.
Although serialization is not strictly necessary here as this should only run on the coordinator, certain parent plans (e.g. LIMIT) may try and serialize any child plans - and without this (and the writeTo(), etc. implementations) will throw. It theoretically should not hurt anything to allow this to de/serialize.
There was a problem hiding this comment.
I don't think we need serialization, if there's any serialization happening that is a separate bug we need to dig into.
|
Note: seeing a weird issue with the output of the dense vectors where it's rounding the values. For example in one of the CSV tests, I'm getting a mismatch error due to the results: ... the input vectors appear to be rounding on input. Shouldn't affect the functionality of the operator here, but we need to look a bit further into this. |
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
carlosdelest
left a comment
There was a problem hiding this comment.
Looks really good! 💯
Some questions about exception usage and releasing pages.
|
|
||
| @Override | ||
| protected Page createPage(int positionOffset, int length) { | ||
| length = Integer.min(length, remaining()); |
There was a problem hiding this comment.
I think it would be useful to add random nulls (rarely) and we check nothing breaks...
| FROM mmr_text_vector_keyword | ||
| | SORT keyword_field | ||
| | LIMIT 10 | ||
| | MMR [0.1, 0.2, 0.3]::dense_vector ON text_vector LIMIT 3 WITH { "lambda": 0.1 } |
There was a problem hiding this comment.
We could do implicit casting here! Check how it's done now for functions - we may expand that to commands, or just for this specific instance.
I realize that users will probably not use directly a vector as an argument, but the result of some function / expression. Just mentioning it here if you think it would be useful. In that case, we can do that on a separate PR.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/MMR.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/physical/MMRExec.java
Outdated
Show resolved
Hide resolved
...ck/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/planner/LocalExecutionPlanner.java
Show resolved
Hide resolved
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/MMROperator.java
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| @Override | ||
| protected void onClose() { |
There was a problem hiding this comment.
Do we need to override? Is this not handled on the superclass?
There was a problem hiding this comment.
It's an abstract function of the superclass, called after any close() items performed there
There was a problem hiding this comment.
Yep - I mean, I think the exact same thing is being done on the superclass before invoking onClose()?
x-pack/plugin/esql/compute/src/main/java/org/elasticsearch/compute/operator/MMROperator.java
Outdated
Show resolved
Hide resolved
| import static org.elasticsearch.xpack.esql.core.type.DataType.INTEGER; | ||
|
|
||
| public class MMR extends UnaryPlan implements TelemetryAware, ExecutesOn.Coordinator, PostAnalysisVerificationAware { | ||
| public static final NamedWriteableRegistry.Entry ENTRY = new NamedWriteableRegistry.Entry(LogicalPlan.class, "Mmr", MMR::new); |
There was a problem hiding this comment.
I don't think we need serialization, if there's any serialization happening that is a separate bug we need to dig into.
| Integer scoreChannel = null; | ||
| for (Attribute input : mmr.inputSet()) { | ||
| if (input instanceof MetadataAttribute && MetadataAttribute.SCORE.equals(input.name())) { | ||
| scoreChannel = source.layout.get(input.id()).channel(); |
There was a problem hiding this comment.
do we need to pass in the scoreChannel?
when I have seen this initially, I thought that maybe we use these in the MMR formula, but that's not the case 😌
it seems like we only need this for RankDoc ? I wonder what's the side effect of always choosing a constant score of 1.0f when we create the RankDocs objects 🤔 . Can we do that and simplify the code here a bit so we don't need to pass _score here?
| return new MMROperator.Status(emitNanos, pagesReceived, pagesProcessed, rowsReceived, rowsEmitted); | ||
| } | ||
|
|
||
| public record Status(long emitNanos, int pagesReceived, int pagesProcessed, long rowsReceived, long rowsEmitted) |
There was a problem hiding this comment.
I wonder if we can abstract the Status too in CompleteInputCollectorOperator 🤔 .
Other abstract operators seem to do this:
Adds the operator and execution flow for the MMR command as well as pertinent CSV tests
Also performs a light refactoring to add a CompleteInputCollectorOperator base class for operations that require the full set of input pages to be input / processed before any output