#Generator functions
##What are generator functions? Generator functions are functions that can return a sequence of values rather than a single value. In many programming languages, they look like functions but behave like iterators.
In Quickstep, a generator function takes a list of constant arguments at query compile time, and populates a relation at query run time. One example is the generate_series
function, which in the simplest case takes two integer arguments (start, end)
and generates a single-column table with values start, start+1, ..., end
.
##Use generator functions
Usage of a generator function is similar to that of a normal function except that it is placed inside a SQL query's FROM
clause.
For example:
SELECT *
FROM generate_series(1, 5);
And the output will be:
+---------------+
|generate_series|
+---------------+
| 1|
| 2|
| 3|
| 4|
| 5|
+---------------+
Here generate_series(1, 5)
stands for a relation with 5 rows.
A more complex example is:
SELECT i, j, i*j
FROM generate_series(1, 3) AS gs1(i),
generate_series(1, 3) AS gs2(j)
WHERE i <= j;
It is a theta-join between two generated relations. The output will be:
+-----------+-----------+-----------+
|i |j |(i*j) |
+-----------+-----------+-----------+
| 1| 1| 1|
| 1| 2| 2|
| 1| 3| 3|
| 2| 2| 4|
| 2| 3| 6|
| 3| 3| 9|
+-----------+-----------+-----------+
##Overview of the design that supports the feature
- Given a SQL query, at first each generator function call is parsed as an abstract syntax tree node carrying the function's name and arguments information.
- See
ParseGeneratorTableReference
as the abstract syntax tree node for generator functions.
- Then, in
Resolver
, the generator function is resolved and its arguments are validated. If this process is successful, we obtain a function handle that contains the actual implementation of this function. A logic plan node is then created to hold the function handle and to represent the relation that this function will generate.
- See
GeneratorFunctionFactory
as the factory class that resolves a function from its name. - See
GeneratorFunction
as the abstract function class that checks the validity of arguments and creates the function handle. - See
GeneratorFunctionHandle
as the function handle class that provides the actual implementation of a particular generator function. - See
logical::TableGenerator
as the logical plan node class that represents the relation that a generator function will create.
- The logical plan node
logical::TableGenerator
is then transformed into a physical plan nodephysical::TableGenerator
.
- See
physical::TableGenerator
as the physical plan node class that represents the relation that a generator function will create.
- At the final stage of query planning, the physical plan node is transformed into a relational operator.
- See
TableGeneratorOperator
as the relational operator class that represents a operator that will produce run-time work orders.
- At the time of query execution, the function handle's
populateColumns()
method is invoked to actually generate a relation.
- See
TableGeneratorWorkOrder
as the work order class that actually invokes the generator function to populate a relation at run time.
##Implement a new generator function There are just three steps to implement a new generator function.
- Subclass
GeneratorFunctionHandle
and implement all virtual methods.
- See
GenerateSeriesHandle
as an example.
- Subclass
GeneratorFunction
and implement all virtual methods. Also define a staticInstance()
method in the subclass to return a singleton instance reference.
- See
GenerateSeries
as an example.
- Register the
GeneratorFunction
subclass intoGeneratorFunctionFactory
by adding intoGeneratorFunctionFactory
's constructor a new line:
GeneratorFunctionFactory::GeneratorFunctionFactory() {
...
// Register all generator functions here.
REGISTER_GENERATOR_FUNCTION_(YourGeneratorFunctionSubclass);
...
}