Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: missing values on large inserts #59

Open
hjfeldy opened this issue May 3, 2022 · 2 comments
Open

[Bug]: missing values on large inserts #59

hjfeldy opened this issue May 3, 2022 · 2 comments
Labels

Comments

@hjfeldy
Copy link

hjfeldy commented May 3, 2022

What type of bug is this?

Data corruption

What subsystems and features are affected?

Data ingestion

What happened?

Upon inserting a large CSV of cryptocurrency data, certain rows are missing.
timescaleIssue.zip

TimescaleDB version affected

2.6.1

PostgreSQL version used

12

What operating system did you use?

Ubuntu 20.04 LTS x86_64

What installation method did you use?

Deb/Apt

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

psql:testDB.sql:1: NOTICE:  extension "timescaledb" already exists, skipping
     create_hypertable
 -------------------------
  (8,public,test_table,t)
 (1 row)

                add_dimension
 --------------------------------------------
  (30,public,test_table,collection_window,t)
 (1 row)

             add_dimension
 --------------------------------------
  (31,public,test_table,roll_window,t)
 (1 row)

          add_dimension
 -------------------------------
  (32,public,test_table,pair,t)
 (1 row)

 Performing vanilla postgres COPY command...
 COPY 3748508
 Results:
  count
 -------
      1
 (1 row)

 Resetting DB
 psql:testDB.sql:1: NOTICE:  extension "timescaledb" already exists, skipping
     create_hypertable
 -------------------------
  (9,public,test_table,t)
 (1 row)

                add_dimension
 --------------------------------------------
  (34,public,test_table,collection_window,t)
 (1 row)

             add_dimension
 --------------------------------------
  (35,public,test_table,roll_window,t)
 (1 row)

          add_dimension
 -------------------------------
  (36,public,test_table,pair,t)
 (1 row)

 Performing timescaledb-parallel-copy command...
  ttaCOPY 3748509
 Results:
  count
 -------
      0
 (1 row)

How can we reproduce the bug?

Verify by unzipping the attached archive and running the "run" shell script. Out of caution, I included an obfuscated version of the crypto data in question. It has the same datatypes and the same dimensions as the original, unobfuscated csv. The script performs a vanilla postgres \COPY command on the csv and selects the row that I found to be missing when I discovered the bug (proving that COPY does, in fact, include the row). Then, the script clears all data out of the DB and repeats the copy, this time using timescaledb-parallel-copy. It then performs a selection to show the row in question is no longer there, despite supposedly being copied from an identical csv

@hjfeldy hjfeldy added the bug label May 3, 2022
@mkindahl mkindahl transferred this issue from timescale/timescaledb May 4, 2022
@mkindahl
Copy link

mkindahl commented May 4, 2022

@hjfeldy Thanks for the bug report. Since this seems to be an issue with timescaledb-parallel-copy I move it over there.

@jchampio
Copy link
Contributor

jchampio commented Jun 9, 2022

@hjfeldy Thanks for the report. It looks like the utility doesn't play well with the HEADER option you've specified -- Postgres will ignore the first line from every chunk, which is where your missing rows are going.

I suggest using -skip-header instead. If I switch your script from using

timescaledb-parallel-copy -file obf.csv -workers 1 -db-name $DB_NAME -connection "${CON_STRING}" -table test_table -copy-options "CSV HEADER"

to

timescaledb-parallel-copy -file obf.csv -workers 1 -db-name $DB_NAME -connection "${CON_STRING}" -table test_table -skip-header

then your test passes, and I can see the expected number of rows in the table. Does that work for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants