You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now migrate first month from production db instance to timescaledb instance
-- In production db instance
\copy (SELECT*FROM histories WHERE NOT datetime <'2024-01-01'AND datetime <'2024-02-01') TO 'histories-2024-01.csv' WITH CSV HEADER
-- In timescaledb instance
\copy histories FROM'histories-2024-01.csv' WITH CSV HEADER
-- COPY 90892167 (Rows)
And manually trigger a compress
CALL run_job(1000);
So far, it works amazing perfect! Only took around 30~60 minutes to compress around from 50GB to 2GB. And I can almost immediately see disk usage decreasing from df -h
Repeat for second month didn't make me notice much difference.
\copy (SELECT*FROM histories WHERE NOT datetime <'2024-02-01'AND datetime <'2024-03-01') TO 'histories-2024-02.csv' WITH CSV HEADER
\copy histories FROM'histories-2024-02.csv' WITH CSV HEADER
CALL run_job(1000);
From 3rd month, here is where performance issue begins.
For the observed 4th month. It takes over 3 hours to notice any difference on df -h , while with a high disk IO read. And another hour to actually execute the compress. (I think)
(Compress procedure from the tooltip to the end where Disk BPS reaches zero)
Reading at 130MB/s * 3 hours is around 1371GB data while my whole summed uncompressed csv dataset files only takes 200GB
I want to know blocking at where needing this high amount of IO read.
\dx
List of installed extensions
Name | Version | Schema | Description
-------------+---------+------------+---------------------------------------------------------------------------------------
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
timescaledb | 2.13.0 | public | Enables scalable inserts and complex queries for time-series data (Community Edition)
(2 rows)
SELECT version();
version
------------------------------------------------------------------------------
PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 12.3.0, 64-bit
(1 row)
Implementation challenges
No response
The text was updated successfully, but these errors were encountered:
What type of enhancement is this?
Performance
What subsystems and features will be improved?
Compression
What does the enhancement do?
I'm migrating a
histories
table to a hyper table to make use of timescaledb's compression ability.The data starts from
2024-01-01
and would be kinda large to finish it in one shot. So I migrate them month by month.Here is what I did to create an empty hyper table:
Now migrate first month from production db instance to timescaledb instance
And manually trigger a compress
CALL run_job(1000);
So far, it works amazing perfect! Only took around 30~60 minutes to compress around from 50GB to 2GB. And I can almost immediately see disk usage decreasing from
df -h
Repeat for second month didn't make me notice much difference.
From 3rd month, here is where performance issue begins.
For the observed 4th month. It takes over 3 hours to notice any difference on
df -h
, while with a high disk IO read. And another hour to actually execute the compress. (I think)(Compress procedure from the tooltip to the end where Disk BPS reaches zero)
Reading at 130MB/s * 3 hours is around 1371GB data while my whole summed uncompressed csv dataset files only takes 200GB
I want to know blocking at where needing this high amount of IO read.
Implementation challenges
No response
The text was updated successfully, but these errors were encountered: