[Bug]: missing values on large inserts #59

hjfeldy · 2022-05-03T19:06:20Z

What type of bug is this?

Data corruption

What subsystems and features are affected?

Data ingestion

What happened?

Upon inserting a large CSV of cryptocurrency data, certain rows are missing.
timescaleIssue.zip

TimescaleDB version affected

2.6.1

PostgreSQL version used

12

What operating system did you use?

Ubuntu 20.04 LTS x86_64

What installation method did you use?

Deb/Apt

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

psql:testDB.sql:1: NOTICE:  extension "timescaledb" already exists, skipping
     create_hypertable
 -------------------------
  (8,public,test_table,t)
 (1 row)

                add_dimension
 --------------------------------------------
  (30,public,test_table,collection_window,t)
 (1 row)

             add_dimension
 --------------------------------------
  (31,public,test_table,roll_window,t)
 (1 row)

          add_dimension
 -------------------------------
  (32,public,test_table,pair,t)
 (1 row)

 Performing vanilla postgres COPY command...
 COPY 3748508
 Results:
  count
 -------
      1
 (1 row)

 Resetting DB
 psql:testDB.sql:1: NOTICE:  extension "timescaledb" already exists, skipping
     create_hypertable
 -------------------------
  (9,public,test_table,t)
 (1 row)

                add_dimension
 --------------------------------------------
  (34,public,test_table,collection_window,t)
 (1 row)

             add_dimension
 --------------------------------------
  (35,public,test_table,roll_window,t)
 (1 row)

          add_dimension
 -------------------------------
  (36,public,test_table,pair,t)
 (1 row)

 Performing timescaledb-parallel-copy command...
  ttaCOPY 3748509
 Results:
  count
 -------
      0
 (1 row)

How can we reproduce the bug?

Verify by unzipping the attached archive and running the "run" shell script. Out of caution, I included an obfuscated version of the crypto data in question. It has the same datatypes and the same dimensions as the original, unobfuscated csv. The script performs a vanilla postgres \COPY command on the csv and selects the row that I found to be missing when I discovered the bug (proving that COPY does, in fact, include the row). Then, the script clears all data out of the DB and repeats the copy, this time using timescaledb-parallel-copy. It then performs a selection to show the row in question is no longer there, despite supposedly being copied from an identical csv

mkindahl · 2022-05-04T08:00:29Z

@hjfeldy Thanks for the bug report. Since this seems to be an issue with timescaledb-parallel-copy I move it over there.

jchampio · 2022-06-09T18:22:07Z

@hjfeldy Thanks for the report. It looks like the utility doesn't play well with the HEADER option you've specified -- Postgres will ignore the first line from every chunk, which is where your missing rows are going.

I suggest using -skip-header instead. If I switch your script from using

timescaledb-parallel-copy -file obf.csv -workers 1 -db-name $DB_NAME -connection "${CON_STRING}" -table test_table -copy-options "CSV HEADER"

to

timescaledb-parallel-copy -file obf.csv -workers 1 -db-name $DB_NAME -connection "${CON_STRING}" -table test_table -skip-header

then your test passes, and I can see the expected number of rows in the table. Does that work for you?

hjfeldy added the bug label May 3, 2022

mkindahl transferred this issue from timescale/timescaledb May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: missing values on large inserts #59

[Bug]: missing values on large inserts #59

hjfeldy commented May 3, 2022 •

edited by mkindahl

Loading

mkindahl commented May 4, 2022

jchampio commented Jun 9, 2022

[Bug]: missing values on large inserts #59

[Bug]: missing values on large inserts #59

Comments

hjfeldy commented May 3, 2022 • edited by mkindahl Loading

What type of bug is this?

What subsystems and features are affected?

What happened?

TimescaleDB version affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

mkindahl commented May 4, 2022

jchampio commented Jun 9, 2022

hjfeldy commented May 3, 2022 •

edited by mkindahl

Loading