Skip to content

Latest commit

 

History

History
248 lines (202 loc) · 9.94 KB

timeseries_pipeline_elements.md

File metadata and controls

248 lines (202 loc) · 9.94 KB

Timevector Pipelines experimental

Description
Example
Pipeline Elements

Description

Timescale timevector objects are just a convenient and efficient way of tracking a single value over time and are detailed a bit more here. One of our primary goals with timevector is that they should be easy and efficient to perform basic operations on, and that is where pipelines enter the picture. At its simplest, a pipeline is just a timevector connected to a pipeline element via the pipeline operator ->. However, most pipeline operations output new timevector, so it's possible to chain many pipeline elements together such that the output from one element become the input to the next.

A note on operator associativity and grouping

Due to limitations in the PostgresQL parser, custom operators are required to be left associative. The following pipeline will always result in elementA being applied to timevector and then elementB being applied to the result.

SELECT timevector -> elementA -> elementB;

However, it is possible to explicitly group elements using parentheses:

SELECT timevector -> (elementA -> elementB);

This will result in a pipeline object being created from elements A and B, which will then be applied to the timevector. While we don't presently take maximum advantage of this internally, these multiple element pipelines should enable optimizations moving forward. Therefore, this second form should be preferred where possible.

Usage Example

For this example let start with a table of temperatures collected from different devices at different times.

SET TIME ZONE 'UTC';
CREATE TABLE test_data(time TIMESTAMPTZ, device INTEGER, temperature DOUBLE PRECISION);

In order to have some nominally interesting data to look at, let's populate this table with random data covering 30 days of readings over 10 devices.

INSERT INTO test_data
    SELECT
        '2020-01-01 00:00:00+00'::timestamptz + ((test_random() * 2592000)::int * '1 second'::interval),
        floor(test_random() * 10 + 1),
        50 + test_random() * 20
    FROM generate_series(1,10000);

Now suppose we want to know how much the temperature fluctuates on a daily basis for each device. Using timevector and pipelines can simplify the process of finding the answer:

CREATE VIEW daily_delta AS
    SELECT device,
        timevector(time, temperature)
            -> (toolkit_experimental.sort()
            ->  delta()) AS deltas
    FROM test_data
    GROUP BY device;

This command creates a timevector from the time and temperature columns (grouped by device), sorts them in increasing time, and computes the deltas between values. Now we can look at the deltas for a specific device. Note that the output for this test is inaccurate as we've removed some of the pipeline elements for the moment.

SELECT time, value::numeric(4,2) AS delta FROM unnest((SELECT deltas FROM daily_delta WHERE device = 3));
          time          | delta
------------------------+-------
 2020-01-02 00:00:00+00 | -0.54
 2020-01-03 00:00:00+00 |  0.29
 2020-01-04 00:00:00+00 | -0.25
 2020-01-05 00:00:00+00 |  0.07
 2020-01-06 00:00:00+00 |  0.80
 2020-01-07 00:00:00+00 | -0.27
 2020-01-08 00:00:00+00 | -2.55
 2020-01-09 00:00:00+00 |  3.51
 2020-01-10 00:00:00+00 | -0.78
 2020-01-11 00:00:00+00 | -0.39
 2020-01-12 00:00:00+00 |  0.55
 2020-01-13 00:00:00+00 | -0.87
 2020-01-14 00:00:00+00 |  1.17
 2020-01-15 00:00:00+00 | -2.49
 2020-01-16 00:00:00+00 |  0.10
 2020-01-17 00:00:00+00 |  1.09
 2020-01-18 00:00:00+00 | -0.09
 2020-01-19 00:00:00+00 |  1.14
 2020-01-20 00:00:00+00 | -1.23
 2020-01-21 00:00:00+00 | -0.29
 2020-01-22 00:00:00+00 | -0.37
 2020-01-23 00:00:00+00 |  1.48
 2020-01-24 00:00:00+00 | -0.52
 2020-01-25 00:00:00+00 |  1.34
 2020-01-26 00:00:00+00 | -0.95
 2020-01-27 00:00:00+00 | -0.65
 2020-01-28 00:00:00+00 | -0.42
 2020-01-29 00:00:00+00 |  1.42
 2020-01-30 00:00:00+00 | -0.66

Or even run one of our device's deltas through lttb to get a nice graphable set of points:

SELECT (deltas -> toolkit_experimental.lttb(10))::TEXT FROM daily_delta where device = 7;
                                                                            text
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
(version:1,num_points:10,flags:1,internal_padding:(0,0,0),points:[(ts:"2020-01-01 23:45:36+00",val:0),(ts:"2020-01-02 00:28:48+00",val:0.01999999999999602),(ts:"2020-01-02 17:45:36+00",val:0.020000000000003126),(ts:"2020-01-02 17:45:36+00",val:0),(ts:"2020-01-03 03:07:12+00",val:0.020000000000003126),(ts:"2020-01-03 20:24:00+00",val:0.01999999999999602),(ts:"2020-01-03 20:24:00+00",val:0),(ts:"2020-01-04 05:45:36+00",val:0.020000000000003126),(ts:"2020-01-04 23:02:24+00",val:0.020000000000003126),(ts:"2020-01-04 23:02:24+00",val:0)],null_val:[0,0])

Current Pipeline Elements(A-Z)

As of the current timescale release, these elements are all experimental.


delta

delta(
) RETURNS TimevectorPipelineElement

This element will return a new timevector where each point is the difference between the current and preceding value in the input timevector. The new series will be one point shorter as it will not have a preceding value to return a delta for the first point.

Required Arguments

Name Type Description

Pipeline Execution Returns

Column Type Description
timevector Timevector The result of applying this pipeline element will be a new time series where each point contains the difference in values from the prior point in the input timevector.

Sample Usage

SELECT time, value
FROM unnest(
    (SELECT timevector('2020-01-01'::timestamptz + step * '1 day'::interval, step * step)
        -> delta()
    FROM generate_series(1, 5) step)
);
          time          | value
------------------------+-------
 2020-01-03 00:00:00+00 |     3
 2020-01-04 00:00:00+00 |     5
 2020-01-05 00:00:00+00 |     7
 2020-01-06 00:00:00+00 |     9

lttb

lttb(
    resolution int,
) RETURNS TimevectorPipelineElement

This element will return a largest triangle three buckets approximation of a given timevector. Its behavior is the same as the lttb function documented here, save that it expects the series to be sorted.

SELECT lttb(time, value, 40) FROM data;

is equivalent to

SELECT timevector(time, value) -> sort() -> lttb() FROM data;

Required Arguments

Name Type Description
resolution INTEGER Number of points the output should have.

Pipeline Execution Returns

Column Type Description
timevector Timevector The result of applying this pipeline element will be a new timevector with resolution point that is visually similar to the input series.

Sample Usage

SELECT time, value
FROM unnest(
    (SELECT timevector('2020-01-01 UTC'::TIMESTAMPTZ + make_interval(days=>(foo*10)::int), 10 + 5 * cos(foo))
        -> toolkit_experimental.lttb(4)
    FROM generate_series(1,11,0.1) foo)
);
          time          |       value
------------------------+--------------------
 2020-01-11 00:00:00+00 |   12.7015115293407
 2020-02-01 00:00:00+00 |  5.004324248633603
 2020-03-03 00:00:00+00 | 14.982710485116087
 2020-04-20 00:00:00+00 | 10.022128489940254

sort

sort(
) RETURNS TimevectorPipelineElement

This element takes in a timevector and returns a timevector consisting of the same points, but in order of increasing time values.

Required Arguments

Name Type Description

Pipeline Execution Returns

Column Type Description
timevector Timevector The result of applying this pipeline element will be a time sorted version of the incoming timevector.

Sample Usage

SELECT time, value
FROM unnest(
    (SELECT timevector('2020-01-06'::timestamptz - step * '1 day'::interval, step * step)
        -> toolkit_experimental.sort()
    FROM generate_series(1, 5) step)
);
          time          | value
------------------------+-------
 2020-01-01 00:00:00+00 |    25
 2020-01-02 00:00:00+00 |    16
 2020-01-03 00:00:00+00 |     9
 2020-01-04 00:00:00+00 |     4
 2020-01-05 00:00:00+00 |     1