Plan nodes

Goal

Document what the plan nodes mean.

This is useful because the plan is the interface between the planner, the optimizations, and the evaluator; thus, components need to be written to a common understanding of the plan's meaning.

The information would be better located in comments in the nodes' source files.

Common fields

PlanNodeId id
- Must be unique within a given plan tree
List outputSymbols

Nodes

In the order they appear in PlanNode.java

A check mark ✓ indicates that the description of the node is complete.

OutputNode ✓

Returns rows back to the user as a side effect. The columns sent to the user are those of its output symbol list.

Fields:

PlanNode source
List columnNames
- Labels for the returned columns, in a parallel list to the output symbols list

ProjectNode ✓

Creates new columns as functions of existing ones.

Fields:

PlanNode source
Assignments assignments
- maps Symbol to defining Expression

TableScanNode

Reads rows in from a table provided by a connector.

Fields:

TableHandle table
- TODO this should be some specification of the table, as opposed to a transaction-scoped execution object, so that unit tests don't have to create transactions just to make plans.
Optional tableLayout
- TODO Why is this optional?
Map<Symbol, ColumnHandle> assignments
- the columns that are being read from the Table
TupleDomain currentConstraint
- temporary field used during predicate pushdown?
Expression originalConstraint
- for human-readable output of pushed-down predicate

ValuesNode

Produces a literal sequence of rows.

Fields:

List<List> rows
- Columns are parallel to the output symbol list

AggregationNode

For each set of grouping keys:

For each bag of rows with common values of the grouping:
- Feed that bag of rows into a set of aggregators, with each producing a single column.

Produces a column for each aggregation, and for each of the keys used in any grouping set.

Implemented by HashAggregationOperator, or for an empty set of grouping keys, AggregationOperator.

TODO fix comment at the top of AggregationOperator.java
TODO why does HashAggregationOperator have getGlobalAggregationOutput, if global aggregation uses AggregationOperator instead?

HashAggregationOperator is implemented by either SpillableHashAggregationBuilder or InMemoryHashAggregationBuilder, depending on the Session's "spill_enabled" property. Presumably the future goal is for that to be determined by the planner and/or executor.

As a special case, for FINAL/SINGLE aggregation with an empty set of grouping keys, it produces a single row.

TODO why not handle that in the translation from SQL? Because it's implied by handling multiple grouping sets within a single plan node?

TODO what about null values of the grouping keys?

Fields:

PlanNode source
Map<Symbol, Aggregation> assignments
- FunctionCall call
- Signature signature
- Optional mask
  - A boolean used for implementing per-aggregate FILTER clauses (coming from a boolean projection), or for implementing DISTINCT aggregators (coming from a MarkDistinctNode)
List<List> groupingSets
Step step { PARTIAL, FINAL, INTERMEDIATE, SINGLE }
- An AggregationNode is either SINGLE, or a part of a sequence: PARTIAL, INTERMEDIATE*, FINAL
- Note that this is global across all aggregations done by this node
Optional hashSymbol
- If specified, the node will expect, and pass through, a column with a hash of the all of the keys used in any grouping set, for each input row. This can save work if the same hash can be used e.g. for aggregation and exchange.
  - TODO what about multiple grouping sets? Which input row's hash gets passed through for a less specific grouping set?
Optional groupIdSymbol
- If there are multiple grouping sets, this output column will produce a unique value for each of the grouping sets
  - TODO add verify check to AggregationNode ctor that this is present iff there are multiple grouping sets.

MarkDistinctNode ✓

Projects on the markerSymbol, which is a boolean which is true for only one row per unique combination of distinctSymbols.

Uses:

to create a masking flag for use with a DISTINCT aggregator.
in scalar correlated subqueries

Fields:

PlanNode source
Symbol markerSymbol
List distinctSymbols
Optional hashSymbol

FilterNode ✓

Only passes rows for which the boolean predicate is non-null and true.

Fields:

PlanNode source
Expression predicate

WindowNode

Fields:

PlanNode source
Set prePartitionedInputs
Specification specification
- List partitionBy
- List orderBy
- Map<Symbol, SortOrder> orderings
  - SortOrder is actually a collation (ascending/descending, and nulls first/last), and doesn't specify what to order on.
    - TODO Presumably this is actually just the collations, parallel to the orderBy list?
int preSortedOrderPrefix
Map<Symbol, Function> windowFunctions
- Function fields:
  - FunctionCall functionCall (type shared with simple row expressions)
    - QualifiedName name
    - Optional window
      - List partitionBy
      - Optional orderBy
        
        List sortItems
        
        Expression sortKey
        
        Ordering ordering {ASCENDING, DESCENDING}
        
        NullOrdering nullOrdering {FIRST, LAST, UNDEFINED}
      - Optional frame
        
        Type type {RANGE, ROWS}
        
        FrameBound start
        
        Type type {UNBOUNDED_PRECEDING, PRECEDING, CURRENT_ROW, FOLLOWING, UNBOUNDED_FOLLOWING}
        
        Optional value
        
        Optional originalValue
        
        Optional end
    - Optional filter
    - boolean distinct
    - List arguments
  - Signature signature
  - Frame frame
    - WindowFrame.Type type {RANGE, ROWS}
    - FrameBound.Type startType {UNBOUNDED_PRECEDING, PRECEDING, CURRENT_ROW, FOLLOWING, UNBOUNDED_FOLLOWING}
    - Optional startValue
    - FrameBound.Type endType
    - Optional endValue
Optional hashSymbol

RowNumberNode

foo

TopNRowNumberNode

foo

LimitNode

foo

DistinctLimitNode

foo

TopNNode ✓

Passes the top count rows according to the given order.

Fields:

PlanNode source
long count
List orderBy
Map<Symbol, SortOrder> orderings
- actually the collations of the columns in the orderBy
Step step { PARTIAL, FINAL, SINGLE }
- A TopNNode is either SINGLE, or a part of a sequence: PARTIAL, FINAL

SampleNode

foo

SortNode

foo

RemoteSourceNode

foo

JoinNode ✓

Implements either cross join or hash join or spatial join.

For hash join, it builds up the hash table from the right child, and then streams through the rows of the left child.

A "match" occurs if all the equijoin criteria match, and the filter evaluates to true. An equijoin criterion does not match if one of the join keys are null.

The outer join types produce an output row even when there is not a match, filling in the other side with nulls.

For spatial join, it builds up the R-Tree from the right child, and then streams through the rows of the left child. The equijoin criterion must be empty. The filter must be made of conjuncts one of which represents a spatial relationship ST_Contains, ST_Intersects or ST_Distance.

The arguments of the spatial function must be non-scalar expressions. One of the arguments must use symbols from left side of the join, the other from right side.

Spatial relationship expressed using ST_Distance function must use less than or less than or equals operator to compare ST_Distance value with a radius. The radius must be either scalar expression or must use symbols only from the right (build) side of the join.

A "match" occurs if spatial relationship holds and the remainder of the filter evaluates to true.

Fields:

Type type {InnerJoin, LeftJoin, RightJoin, FullJoin}
PlanNode left
PlanNode right
List criteria
- Symbol left
- Symbol right
List outputSymbols
- Symbols from left, then symbols from right
Optional filter
Optional leftHashSymbol
Optional rightHashSymbol
Optional distributionType {PARTITIONED, REPLICATED}
- Set by DetermineJoinDistributionType optimization, and then used by the AddExchanges optimization to enforce partitioning of children

SemiJoinNode

Builds a hash table of the keys in filteringSource, and projects onto each row from source a boolean which says whether the key matched in the hash table. That boolean is null if the key is null in the source.

Rationale for producing a boolean: use of CASE with IN: https://blogs.msdn.microsoft.com/craigfr/2006/08/23/subqueries-in-case-expressions/

Fields:

PlanNode source
PlanNode filteringSource
Symbol sourceJoinSymbol
Symbol filteringSourceJoinSymbol
Symbol semiJoinOutput
- TODO assert that this is boolean in ValidateDependenciesChecker or some other place
Optional sourceHashSymbol
Optional filteringSourceHashSymbol
Optional distributionType {PARTITIONED, REPLICATED}
- Set by DetermineJoinDistributionType optimization, and then used by the AddExchanges optimization to enforce partitioning of children

IndexJoinNode

Raptor only, going away in some future Raptor v2

Output includes all symbols from both sources.

Type type {INNER, SOURCE_OUTER}
PlanNode probeSource
PlanNode indexSource
List criteria
Optional probeHashSymbol
Optional indexHashSymbol

IndexSourceNode

Raptor only, going away in some future Raptor v2

Fields:

IndexHandle indexHandle
TableHandle tableHandle
Optional tableLayout
- "Only necessary for event listeners"
Set lookupSymbols
- Must be included in outputSymbols, and in keys of assignments
List outputSymbols
Map<Symbol, ColumnHandle> assignments
TupleDomain effectiveTupleDomain
- "General summary of how the output columns will be constrained"

TableWriterNode

foo

DeleteNode

foo

MetadataDeleteNode

foo

TableFinishNode

foo

UnnestNode

foo

ExchangeNode

Fields:

Type type {GATHER, REPARTITION, REPLICATE}
Scope scope {LOCAL, REMOTE}
List sources
- The exchange node consumes the union of all the rows provided by the sources. The columns are matched up by position in the "inputs" list entries.
- TODO Why not use a child UnionNode instead?
PartitioningScheme partitioningScheme
- Partitioning partitioning
  - PartitioningHandle handle
    - Optional connectorId
    - Optional transactionHandle
    - ConnectorPartitioningHandle connectorHandle
      - boolean isSingleNode()
      - boolean isCoordinatorOnly()
  - List arguments
    - Each ArgumentBinding has a non-null value of exactly one of:
      - Symbol column
      - NullableValue constant
- List outputLayout
  - These are the output columns of the ExchangeNode
- Optional hashColumn
  - If present, it must be contained in the outputLayout
- boolean replicateNulls
- Optional<int[]> bucketToPartition
List<List> inputs
- Parallel to the sources

UnionNode

foo

IntersectNode

foo

EnforceSingleRowNode

foo

GroupIdNode

foo

ExplainAnalyzeNode ✓

Waits for the query to finish, then outputs a single row with a single symbol, containing a string describing the plan.

Fields:

PlanNode source
Symbol outputSymbol

ApplyNode

foo

AssignUniqueId

foo

LateralJoin

foo

Plan nodes

Goal

Common fields

Nodes

OutputNode ✓

ProjectNode ✓

TableScanNode

ValuesNode

AggregationNode

MarkDistinctNode ✓

FilterNode ✓

WindowNode

RowNumberNode

TopNRowNumberNode

LimitNode

DistinctLimitNode

TopNNode ✓

SampleNode

SortNode

RemoteSourceNode

JoinNode ✓

SemiJoinNode

IndexJoinNode

IndexSourceNode

TableWriterNode

DeleteNode

MetadataDeleteNode

TableFinishNode

UnnestNode

ExchangeNode

UnionNode

IntersectNode

EnforceSingleRowNode

GroupIdNode

ExplainAnalyzeNode ✓

ApplyNode

AssignUniqueId

LateralJoin

Clone this wiki locally