{% hint style="danger" %} Multi-stage engine warning
This document describes null handling for the single-stage query engine. At this time, the multi-stage query engine (v2) does not support null handling. Queries involving null values in a multi-stage environment may return unexpected results. {% endhint %}
By default, null handling is disabled (nullHandlingEnabled=false
) in the Table index configuration (tableIndexConfig). When null support is disabled, IS NOT NULL
evaluates to true,
and IS NULL
evaluates to false
. For example, the predicate in the query below matches all records.
select count(*) from my_table where column IS NOT NULL
To enable basic null support (IS NULL
and IS NOT NULL
) and generate the null index, in the Table index configuration (tableIndexConfig), set nullHandlingEnabled=true
.
When null support is enabled, IS NOT NULL
and IS NULL
evaluate to true
or false
according to whether a null is detected.
{% hint style="info" %} Important
You MUST SET enableNullHandling=true;
before you query. Just having "nullHandlingEnabled: true,"
set in your table config does not automatically provide enableNullHandling=true
when you execute a query. Basic null handling supports IS NOT NULL
and IS NULL
predicates. Advanced null handling adds SQL compatibility.
{% endhint %}
If you're not able to generate the null index for your use case, you may filter for null values using a default value specified in your schema or a specific value included in your query.
{% hint style="info" %} The following example queries work when the null value is not used in a dataset. Errors may occur if the specified null value is a valid value in the dataset. {% endhint %}
- Specify a default null value (
defaultNullValue
) in your schema for dimension fields, (dimensionFieldSpecs
), metric fields (metricFieldSpecs)
, and date time fields (dateTimeFieldSpecs
). - To filter out the specified default null value, for example, you could write a query like the following:
select count(*) from my_table where column <> 'default_null_value'
Filter for a specific value in your query that will not be included in the dataset. For example, to calculate the average age, use -1
to indicate the value of Age
is null
.
- Rewrite the following query:
select avg(Age) from my_table
- To cover null values as follows:
select avg(Age) from my_table WHERE Age <> -1
Under development to improve performance for advanced null handling.
Pinot provides advanced null handling support similar to standard SQL null handling. Because this feature carries a notable performance impact (even queries without null values), this feature is not enabled by default. For optimal query latency, we recommend enabling basic null support.
To enable NULL
handling, do the following:
To enable
null handling during ingestion, in tableIndexConfig, set**nullHandlingEnabled=true
**.- To enable null handling for queries, set the**
enableNullHandling
** query option.
{% hint style="info" %} Important
You MUST SET enableNullHandling=true;
before you query. Just having "nullHandlingEnabled: true,"
set in your table config does not automatically provide enableNullHandling=true
when you execute a query. Basic null handling supports IS NOT NULL
and IS NULL
predicates. Advanced null handling adds SQL compatibility.
{% endhint %}
To store the null values in a segment, you must enable the nullHandlingEnabled
in tableIndexConfig section before ingesting the data.
During real-time or offline ingestion, Pinot checks to see if null handling is enabled, and stores null values in the segment itself. Data ingested when null handling is disabled does not store null values, and should be ingested again.
The nullHandlingEnabled
configuration affects all columns in a Pinot table.
{% hint style="info" %} Column-level null support is under development. {% endhint %}
By default, null usage in the predicate is disabled.
For handling nulls in aggregation functions, explicitly enable the null support by setting the query option enableNullHandling
to true
. Configure this option in one of the following ways:
Set enableNullHandling=true
at the beginning of the query.- If using JDBC, set the connection option
enableNullHandling=true
(either in the URL or as a property).
When this option is enabled, the Pinot query engine uses a different execution path that checks null predicates. Therefore, some indexes may not be usable, and the query is significantly more expensive. This is the main reason why null handling is not enabled by default.
If the query includes a IS NULL
or IS NOT NULL
predicate, Pinot fetches the NULL
value vector for the corresponding column within FilterPlanNode
and retrieves the corresponding bitmap that represents all document IDs containing NULL
values for that column. This bitmap is then used to create a BitmapBasedFilterOperator
to do the filtering operation.