Skip to content
Maria Basmanova edited this page Mar 28, 2018 · 4 revisions

Goal

Document what the plan nodes mean.

This is useful because the plan is the interface between the planner, the optimizations, and the evaluator; thus, components need to be written to a common understanding of the plan's meaning.

The information would be better located in comments in the nodes' source files.

Common fields

  • PlanNodeId id
    • Must be unique within a given plan tree
  • List outputSymbols

Nodes

In the order they appear in PlanNode.java

A check mark ✓ indicates that the description of the node is complete.

OutputNode ✓

Returns rows back to the user as a side effect. The columns sent to the user are those of its output symbol list.

Fields:

  • PlanNode source

  • List columnNames

    • Labels for the returned columns, in a parallel list to the output symbols list

ProjectNode ✓

Creates new columns as functions of existing ones.

Fields:

  • PlanNode source
  • Assignments assignments
    • maps Symbol to defining Expression

TableScanNode

Reads rows in from a table provided by a connector.

Fields:

  • TableHandle table
    • TODO this should be some specification of the table, as opposed to a transaction-scoped execution object, so that unit tests don't have to create transactions just to make plans.
  • Optional tableLayout
    • TODO Why is this optional?
  • Map<Symbol, ColumnHandle> assignments
    • the columns that are being read from the Table
  • TupleDomain currentConstraint
    • temporary field used during predicate pushdown?
  • Expression originalConstraint
    • for human-readable output of pushed-down predicate

ValuesNode

Produces a literal sequence of rows.

Fields:

  • List<List> rows
    • Columns are parallel to the output symbol list

AggregationNode

For each set of grouping keys:

  • For each bag of rows with common values of the grouping:
    • Feed that bag of rows into a set of aggregators, with each producing a single column.

Produces a column for each aggregation, and for each of the keys used in any grouping set.

Implemented by HashAggregationOperator, or for an empty set of grouping keys, AggregationOperator.

  • TODO fix comment at the top of AggregationOperator.java
  • TODO why does HashAggregationOperator have getGlobalAggregationOutput, if global aggregation uses AggregationOperator instead?

HashAggregationOperator is implemented by either SpillableHashAggregationBuilder or InMemoryHashAggregationBuilder, depending on the Session's "spill_enabled" property. Presumably the future goal is for that to be determined by the planner and/or executor.

As a special case, for FINAL/SINGLE aggregation with an empty set of grouping keys, it produces a single row.

  • TODO why not handle that in the translation from SQL? Because it's implied by handling multiple grouping sets within a single plan node?

TODO what about null values of the grouping keys?

Fields:

  • PlanNode source
  • Map<Symbol, Aggregation> assignments
    • FunctionCall call
    • Signature signature
    • Optional mask
      • A boolean used for implementing per-aggregate FILTER clauses (coming from a boolean projection), or for implementing DISTINCT aggregators (coming from a MarkDistinctNode)
  • List<List> groupingSets
  • Step step { PARTIAL, FINAL, INTERMEDIATE, SINGLE }
    • An AggregationNode is either SINGLE, or a part of a sequence: PARTIAL, INTERMEDIATE*, FINAL
    • Note that this is global across all aggregations done by this node
  • Optional hashSymbol
    • If specified, the node will expect, and pass through, a column with a hash of the all of the keys used in any grouping set, for each input row. This can save work if the same hash can be used e.g. for aggregation and exchange.
      • TODO what about multiple grouping sets? Which input row's hash gets passed through for a less specific grouping set?
  • Optional groupIdSymbol
    • If there are multiple grouping sets, this output column will produce a unique value for each of the grouping sets
      • TODO add verify check to AggregationNode ctor that this is present iff there are multiple grouping sets.

MarkDistinctNode ✓

Projects on the markerSymbol, which is a boolean which is true for only one row per unique combination of distinctSymbols.

Uses:

  • to create a masking flag for use with a DISTINCT aggregator.
  • in scalar correlated subqueries

Fields:

  • PlanNode source
  • Symbol markerSymbol
  • List distinctSymbols
  • Optional hashSymbol

FilterNode ✓

Only passes rows for which the boolean predicate is non-null and true.

Fields:

  • PlanNode source
  • Expression predicate

WindowNode

Fields:

  • PlanNode source

  • Set prePartitionedInputs

  • Specification specification

    • List partitionBy

    • List orderBy

    • Map<Symbol, SortOrder> orderings

      • SortOrder is actually a collation (ascending/descending, and nulls first/last), and doesn't specify what to order on.

        • TODO Presumably this is actually just the collations, parallel to the orderBy list?
  • int preSortedOrderPrefix

  • Map<Symbol, Function> windowFunctions

    • Function fields:
      • FunctionCall functionCall (type shared with simple row expressions)

        • QualifiedName name

        • Optional window

          • List partitionBy

          • Optional orderBy

            • List sortItems

              • Expression sortKey

              • Ordering ordering {ASCENDING, DESCENDING}

              • NullOrdering nullOrdering {FIRST, LAST, UNDEFINED}

          • Optional frame

            • Type type {RANGE, ROWS}

            • FrameBound start

              • Type type {UNBOUNDED_PRECEDING, PRECEDING, CURRENT_ROW, FOLLOWING, UNBOUNDED_FOLLOWING}

              • Optional value

              • Optional originalValue

            • Optional end

        • Optional filter

        • boolean distinct

        • List arguments

      • Signature signature

      • Frame frame

        • WindowFrame.Type type {RANGE, ROWS}

        • FrameBound.Type startType {UNBOUNDED_PRECEDING, PRECEDING, CURRENT_ROW, FOLLOWING, UNBOUNDED_FOLLOWING}

        • Optional startValue

        • FrameBound.Type endType

        • Optional endValue

  • Optional hashSymbol

RowNumberNode

foo

TopNRowNumberNode

foo

LimitNode

foo

DistinctLimitNode

foo

TopNNode ✓

Passes the top count rows according to the given order.

Fields:

  • PlanNode source

  • long count

  • List orderBy

  • Map<Symbol, SortOrder> orderings

    • actually the collations of the columns in the orderBy
  • Step step { PARTIAL, FINAL, SINGLE }

    • A TopNNode is either SINGLE, or a part of a sequence: PARTIAL, FINAL

SampleNode

foo

SortNode

foo

RemoteSourceNode

foo

JoinNode ✓

Implements either cross join or hash join or spatial join.

For hash join, it builds up the hash table from the right child, and then streams through the rows of the left child.

A "match" occurs if all the equijoin criteria match, and the filter evaluates to true. An equijoin criterion does not match if one of the join keys are null.

The outer join types produce an output row even when there is not a match, filling in the other side with nulls.

For spatial join, it builds up the R-Tree from the right child, and then streams through the rows of the left child. The equijoin criterion must be empty. The filter must be made of conjuncts one of which represents a spatial relationship ST_Contains, ST_Intersects or ST_Distance.

The arguments of the spatial function must be non-scalar expressions. One of the arguments must use symbols from left side of the join, the other from right side.

Spatial relationship expressed using ST_Distance function must use less than or less than or equals operator to compare ST_Distance value with a radius. The radius must be either scalar expression or must use symbols only from the right (build) side of the join.

A "match" occurs if spatial relationship holds and the remainder of the filter evaluates to true.

Fields:

  • Type type {InnerJoin, LeftJoin, RightJoin, FullJoin}
  • PlanNode left
  • PlanNode right
  • List criteria
    • Symbol left
    • Symbol right
  • List outputSymbols
    • Symbols from left, then symbols from right
  • Optional filter
  • Optional leftHashSymbol
  • Optional rightHashSymbol
  • Optional distributionType {PARTITIONED, REPLICATED}
    • Set by DetermineJoinDistributionType optimization, and then used by the AddExchanges optimization to enforce partitioning of children

SemiJoinNode

Builds a hash table of the keys in filteringSource, and projects onto each row from source a boolean which says whether the key matched in the hash table. That boolean is null if the key is null in the source.

Rationale for producing a boolean: use of CASE with IN: https://blogs.msdn.microsoft.com/craigfr/2006/08/23/subqueries-in-case-expressions/

Fields:

  • PlanNode source

  • PlanNode filteringSource

  • Symbol sourceJoinSymbol

  • Symbol filteringSourceJoinSymbol

  • Symbol semiJoinOutput

    • TODO assert that this is boolean in ValidateDependenciesChecker or some other place
  • Optional sourceHashSymbol

  • Optional filteringSourceHashSymbol

  • Optional distributionType {PARTITIONED, REPLICATED}

    • Set by DetermineJoinDistributionType optimization, and then used by the AddExchanges optimization to enforce partitioning of children

IndexJoinNode

Raptor only, going away in some future Raptor v2

Output includes all symbols from both sources.

  • Type type {INNER, SOURCE_OUTER}

  • PlanNode probeSource

  • PlanNode indexSource

  • List criteria

  • Optional probeHashSymbol

  • Optional indexHashSymbol

IndexSourceNode

Raptor only, going away in some future Raptor v2

Fields:

  • IndexHandle indexHandle

  • TableHandle tableHandle

  • Optional tableLayout

    • "Only necessary for event listeners"
  • Set lookupSymbols

    • Must be included in outputSymbols, and in keys of assignments
  • List outputSymbols

  • Map<Symbol, ColumnHandle> assignments

  • TupleDomain effectiveTupleDomain

    • "General summary of how the output columns will be constrained"

TableWriterNode

foo

DeleteNode

foo

MetadataDeleteNode

foo

TableFinishNode

foo

UnnestNode

foo

ExchangeNode

Fields:

  • Type type {GATHER, REPARTITION, REPLICATE}
  • Scope scope {LOCAL, REMOTE}
  • List sources
    • The exchange node consumes the union of all the rows provided by the sources. The columns are matched up by position in the "inputs" list entries.
    • TODO Why not use a child UnionNode instead?
  • PartitioningScheme partitioningScheme
    • Partitioning partitioning
      • PartitioningHandle handle
        • Optional connectorId
        • Optional transactionHandle
        • ConnectorPartitioningHandle connectorHandle
          • boolean isSingleNode()
          • boolean isCoordinatorOnly()
      • List arguments
        • Each ArgumentBinding has a non-null value of exactly one of:
          • Symbol column
          • NullableValue constant
    • List outputLayout
      • These are the output columns of the ExchangeNode
    • Optional hashColumn
      • If present, it must be contained in the outputLayout
    • boolean replicateNulls
    • Optional<int[]> bucketToPartition
  • List<List> inputs
    • Parallel to the sources

UnionNode

foo

IntersectNode

foo

EnforceSingleRowNode

foo

GroupIdNode

foo

ExplainAnalyzeNode ✓

Waits for the query to finish, then outputs a single row with a single symbol, containing a string describing the plan.

Fields:

  • PlanNode source
  • Symbol outputSymbol

ApplyNode

foo

AssignUniqueId

foo

LateralJoin

foo