Record SMA-X history in PostgreSQL / TimescaleDB.
- API documentation
- Project pages on github.io
Author: Attila Kovacs
Last Updated: 18 September 2024
- Introduction
- Prerequisites
- Building
smax-postgres
- Installation
- Database organization (for clients)
- Configuration reference
smax-postgres
is a daemon application, which can collect data from an
SMA information eXchange (SMA-X)
realtime database and insert these into a PostgreSQL database to create a time-series historical record for all or
selected SMA-X variables. The program is highly customizable and supports both regular updates for changing variables
as well as regular snapshots of all selected SMA-X variables.
The smax-postgres
application has build and runtime dependencies on the following software:
- PostgreSQL installation and development files (
libpq.so
andlipq.fe.h
). - Smithsonian/smax-clib
- Smithsonian/redisx
- Smithsonian/xchange
- Popt development libraries (
libpopt-dev
in Debian, orpopt-devel
in RPM distros) - (optional) TimescaleDB extensions.
- (optional) systemd development files (
libsystemd.so
andsd-daemon.h
).
Additionally, to configure your SMA-X server, you will need the Smithsonian/smax-server repo also.
You can configure the build, either by editing config.mk
or else by defining the relevant environment variables
prior to invoking make
. The following build variables can be configured:
-
PGDIR
: Root directory of a specific PostgreSQL installation to build against (not set by default). -
XCHANGE
: the root of the location where the Smithsonian/xchange library is installed. It expects to findxchange.h
under$(XCHANGE)/include
andlibxchange.so
under$(XCHANGE)/lib
or else in the defaultLD_LIBRARY_PATH
. -
REDISX
: the root of the location where the Smithsonian/redisx library is installed. It expects to findredisx.h
under$(REDISX)/include
andlibredisx.so
under$(REDISX)/lib
or else in the defaultLD_LIBRARY_PATH
. -
SMAXLIB
: the root of the location where the Smithsonian/smax-clib library is installed. It expects to findsmax.h
under$(SMAXLIB)/include
andlibsmax.so
under$(SMAXLIB)/lib
or else in the defaultLD_LIBRARY_PATH
. -
SYSTEMD
: Sets whether to compile with SystemD integration (needslibsystemd.so
andsd-daemon.h
). Default is 1 (enabled). -
CC
: The C compiler to use (default:gcc
). -
CPPFLAGS
: C preprocessor flags, such as externally defined compiler constants. -
CFLAGS
: Flags to pass onto the C compiler (default:-Os -Wall -std=c99
). Note,-Iinclude
will be added automatically. -
LDFLAGS
: Extra linker flags (default: not set). Note,-lm -pthread -lsmax -lredisx -lxchange -lpq -lpopt
will be added automatically. -
BUILD_MODE
: You can set it todebug
to enable debugging features: it will initialize the globalxDebug
variable toTRUE
and add-g
toCFLAGS
. -
CHECKEXTRA
: Extra options to pass tocppcheck
for themake check
target
After configuring, you can simply run make
, which will build bin/smax-postgres
, and user documentation. You may
also build other make
target(s). (You can use make help
to get a summary of the available make
targets).
Now you may compile smax-postgres
:
$ make
Prior to installation, you should check that the PostgreSQL service name is correct in smax-postgres.service
, and
edit it as necessary for your system configuration. You may also edit the cfg/smax-postgres.cfg
now, or after it
is installed.
Provided the build was successful, you can install the executables, configuration files, and optionally the SystemD unit files via:
$ sudo make install
(When installing at the SMA, you may want make install-sma
instead, to install with SMA-specific configuration).
In case of SystemD integration you should also reload the SystemD daemon so smax-postgres.service
can be enabled
and managed as desired:
$ sudo systemd daemon-reload
After that you can start the service as:
$ sudo systemd start smax-postgres
By default, make install
will install the smax-postgress
executable to /usr/bin
, configuration under /etc/
,
SystemD service unit under /etc/systemd/system
, and documentation under /usr/share/doc/smax-postgres/
. Instead of
/usr
, you may want to install into another destination, such as /opt/
or /usr/local
. You can do that by setting
the DESTDIR
environment prior to make install
, e.g.:
$ export DESTDIR="/opt"
Additionally, you can also stage the installation under a different root, by setting the PREFIX
environment variable,
e.g.:
$ export PREFIX="~/rmpbuild/BUILD/smax-postgres"
In case of SystemD integration, errors will get logged to the journal, and can be investigated by journalctl
. E.g. to
see the errors in the last 3 hours, you may:
$ journalctl -u smax-postgres --since "3 hours ago"
Standard output is logged to the file /var/log/smax-postgres.out
for the current session. Restarting smax-postgres
will
start a new file. (Because of buffering, there may be a long lag before you'll see stuff appear in this log file.)
Prior to using smax-postgres
, you will need to configure users and access privileges (roles) for the database, and may
want to create the database instance manually. You will need to create a user designated for the smax-postgres
program,
and specify its credentials in the smax-postgres
configuration file. This user will not require CREATEDB
permission,
but it will need permission to create tables in the existing database and to insert data or to search the tables
(i.e. read/write privileges). Additionally, you may also create the designated database instance assigned to whatever
user to own. If you create the database manually, do not forget to set the name of the designated database in the
smax-postgres
configuration file.
Normally smax-postgres
will assume that the database to use has been fully set up, including a 'titles' table
(containing 2 columns: text variable IDs, and auto-incremented integer serial numbers). However, smax-postgres
can create the database and set up the required 'titles' table as needed (including indexing, and TimescaleDB
extension as appropriate), when launched with the -b
(or --bootstrap
) option.
$ smax-postgres -c /usr/local/etc/smax-postgres/myconfig.cfg -b
Will log into the existing (new) database using the credentials specified in the configuration file
/usr/local/etc/smax-postgres/myconfig.cfg
, then configures that database (e.g. set up the TimescaleSB extension as
appropriate), and creates the 'titles' table and its index.
You may also let the bootstrapping process create the database itself, in which case you may have to provide the
password for the 'postgres' admin account, or the credentials for another account with CREATEDB
privileges with
the necessary privileges to create databases. E.g.:
$ smax-postgres -c /usr/local/etc/myconfig.cfg -b -p "S3cur1ty!"
will attempt create the database as the 'postgres' admin with password 'S3curity!', before proceeding to configure the
newly created database as the user designated for the smax-postgres
program. (Alternatively, you may use the -a
and
-p
options together to create the database with another privileged user). The newly created database will be
automatically assigned to the designated smax-postgres
user as its owner).
Once the database is configured, you will not need the -b
option again (but it also will not wreck the previous
initialization if accidentally used again after the initial setup).
Each SMA-X variable has its own time-series data table in the SQL DB. These tables are named var_<tid>
, where
<tid>
is a 6-digit serial number, e.g. var_000001
for the first variable added to the SQL database. The variable
name to <tid>
pairings are listed in the titles
table, which contains just two columns: (1) the full SMA-X
variable id (text), and (2) the corresponding <tid>
(integer serial) used in the SQL database to store the time
series of the variable. (Thus, to figure out what table stored data for a given SMA-X variable you will need to search
the 'titles' table for the <tid>
first.)
The time series data in the var_<tid>
tables contains (at least) 2 + n
columns for an SMA-X variable, which has
n
array elements. The first column is the UTC timestamp at which the data was pulled from the SMA-X database. The
second is an integer 'age' (in seconds) that informs how much before the pull was the last time that variable was
updated in the SMA-X database. After that, the remaining columns list the array elements stored in the SMA-X variable.
(Thus scalar entries will have just one additional column, labeled as 'c0').
Because the SMA-X variables may have dynamic types and array dimensions, the SQL tables may automatically expand
to an enclosing type (for example if a variable changes from int16
to int32
or to a float32
), and columns will
be added as necessary to store an expanded set of array elements. When SMA-X data 'shrinks', containing fewer
elements than the existing SQL record, the SQL entry will be padded with NULL
values as necessary.
In addition to the time-series data stored in the SQL database, it also stores versioned metadata in tables with
matching <tid>
values. For example, the metadata for the time-series var_000001
is stored in the table
var_000001_meta
. Metadata tables contained serial-numbered versions of infrequently changing metadata, such
as array dimensions and shapes (scalar values are stored with ndim = 0
and shape = NULL
), associated physical
units, and downsampling factors. Each metadata entry is timestamped also to indicate when a change (if any)
occurred in these characteristics.
Thus, to query an SMA-X variable system:subsystem:property
, you first want to find the 'tid' of the variable in
'titles':
SELECT tid FROM titles WHERE name = 'system.subsystem.property';
Say the query returns the tid 192
, then the time-series for that variable will be stored in the table named
var_000192
, while metadata versions are stored in the table named var_000192_meta
.
See cfg/example.cfg
as an example configuration file. Based on it may create your own configuration file, which
you can then load via the -c
option to smax-postgres
at startup. If using SystemD integration, you may want to update
/etc/systemd/system/smax-postgres.service
to load the configuration file from the location of your choice when the
service is started via systemd
.
Host name or IP address of the SMA-X server (default 'smax').
Password for authenticating user on the SQL server (no default).
SQL database name to use (default is 'smax_db').
Host name or IP address of the SMA-X server (default 'localhost').
SQL user name to use (default is 'smax_db').
Determines whether to create hypertables via the TimescaleDB extension. The value 1 enables, 0 disables the used of hypertables (the default is to not use hypertables). TimescaleDB hypertables allow for faster access of time series data by organizing large datasets into smaller blocks of data, which can be handled more efficiently.
Timescales may be specified by a numeric value (integer or decimal) followed immediately by a unit designator, such as '1d' for one day. Alternatively, the value 'none' can be used to disable a timescale-specific option. The following timescale units are understood:
unit | description |
---|---|
s |
second(s) |
m |
minute(s) |
h |
hour(s) |
d |
day(s) |
w |
week(s) |
y |
year(s) |
Specifies the interval at which all designated variables are pushed into the SQL database, regardless whether they have changed or not since the last time they were pushed (default '1h'). See the section further above on interval specifications.
Specifies the regular interval at which to push changing variables into the database (default: '1m'). See the section furtherabove on interval specifications. Variables that have no changed since the last update will be excluded from the regular updates until it is time for the next full snapshot. The snapshot interval is controlled separately (see above).
SMA-X variables can be specified individually or via glob patterns, similarly to how these are used in UNIX shells.
Specifies a variable or a glob pattern of variables, which are to be logged into the SQL database regardless of all other directives, which may otherwise limit if and when they are to be pushed. Reserve using this option only for the most critical cases, when the other configuration options do not provide the desired level of assurances for some absolutely critical data points.
Specifies a variable or a glob pattern of variables that are to be excluded from logging to the SQL database.
exclude
and include
directives take effect in the order they were specified, so for a given variable only the last
include
or exclude
statement, which pertains to it, will decide whether or not to log that given variable.
Variables that are configured with an always
directive will be logged to the SQL database regardless of any
exclusions that may have been specified, either before or after. By default all metadata variables (ones whose name
begin with <
) and all temporary variable (whose names begin with an underscore _
) are excluded from logging,
unless they are explicitly re-included.
Specifies a variable or a glob pattern of variables that are to be excluded from logging to the SQL database.
exclude
and include
directives take effect in the order they were specified, so for a given variable only the last
include
or exclude
statement, which pertains to it, will decide whether or not to log that given variable.
Variables that are configured with an always
directive will be logged to the SQL database regardless of any
exclusions that may have been specified, either before or after. By default, all SMA-X variables are included in the
logging, except for the metadata and temporary data that are excluded by default (see exclude
).
Sets a maximum age for variables to push to the SQL database (default: '90d'). Variables that have not been updated in
SMA-X for longer than the specified interval will not be pushed to the SQL database. Variables that are configured
with an always
directive will be logged to the SQL database regardless of their age.
Sets a maximum byte size for variables to push to the SQL database (default: '1024'). Variables that have larger
binary representations (after downsampling via the sampling
directive, if any) will not be logged to the database to
avoid bloating. However, variables explicitly configured via an always
directive will be logged to the SQL database
regardless of their storage requirements.
Log sparse samples of data for a variable or a glob pattern of variables. In some cases you may store large arrays in
the SMA-X database, logging of which may bloat the time series history stored in the SQL database. However, you may
want to still get a preview of what that data was, by storing every n'th sample of the original only. For example, for
an array with 1000 entries, you may want to store say 20 samples. Setting <n>
to 50 will achieve that, by storing
every 50th element in the SQL database only. (Still, the SQL database will store the original dimensionality of the
downsampled variables, and note the downsampling factor used also as metadata).
Copyright (C) 2024 Attila Kovács