Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any numbers on insertion times? #2

Open
RajKarri opened this issue Jan 31, 2025 · 3 comments
Open

Any numbers on insertion times? #2

RajKarri opened this issue Jan 31, 2025 · 3 comments

Comments

@RajKarri
Copy link

I get this study was done around capturing "Data Analytics" related benchmarks. But, it would be nice if you can show numbers on how much time it took to insert all those rows.

@alexey-milovidov
Copy link
Member

Let me copy the answer from the README:

The benchmark does not record data loading times. While it was one of the initial goals, many systems require a finicky multi-step data preparation process, which makes them difficult to compare.

And the comments from the author, @tom-clickhouse:

We were fighting a lot getting each system to ingest the JSON data properly, each data store, including ClickHouse had issues with different subsets of the data (parsing issues, special characters issues etc), therefore for each system individually, we take some extra steps in the data loading scripts, that makes it tricky to fairly compare ingest time.
We are still planning to add ingest time metric though, we need to split the data preparations steps for this etc.

here is the data massaging extra step for PostgreSQL for example. For DuckDB we split the 1million doc files into smaller chunks as if even just one doc has an issue, the whole file is discarded

@RajKarri
Copy link
Author

RajKarri commented Feb 1, 2025

Yeah. Makes sense. It's hard to set these things up before running queries. Please share if any other studies on clickhouse insert speeds (may not be related to this one). Thanks.

@rschu1ze
Copy link
Member

Please share if any other studies on clickhouse insert speeds (may not be related to this one). Thanks.

https://clickhouse.com/blog/clickhouse-input-format-matchup-which-is-fastest-most-efficient

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants