Add new JSON type. by gsergey418 · Pull Request #478 · mymarilyn/clickhouse-driver

gsergey418 · 2025-02-17T14:00:08Z

Add support for the new JSON type. My company uses this driver lib to interact with CH and we rely heavily on the JSON type. Any help would be greatly appreciated since I don't know much about writing database drivers.

fixes JSON Column Type support #473
fixes Support the new JSON type #460

Checklist:

Add tests that demonstrate the correct behavior of the change. Tests should fail without the change.
Add or update relevant docs, in the docs folder and in code.
Ensure PR doesn't contain untouched code reformatting: spaces, etc.
Run flake8 and fix issues.
Run pytest no tests failed. See https://clickhouse-driver.readthedocs.io/en/latest/development.html.

gsergey418 · 2025-02-27T14:38:30Z

@xzkostyan I've marked the PR as ready, please review.

gsergey418 · 2025-02-28T09:11:15Z

Not ready yet, after some testing with our software we found out that this column has trouble parsing Array(JSON) and Array(Array) values. Will fix this in the next few days, the majority of code should still be good for review.

gsergey418 · 2025-02-28T14:10:52Z

Seems to be ready now, added additional tests for checking compatibility with the CLI.

gsergey418 · 2025-02-28T14:12:53Z

There's a problem that it's not compatible with the CLI when using array-nested dictionaries at more than five layers of nesting. I probably won't add this functionality since it's too niche of a use case.

gsergey418 · 2025-03-04T14:09:31Z

The problem was the inability to deserialize shared JSON when max_dynamic_paths and max_dynamic_types are exceeded. It uses a separate undocumented format I sadly don't have time to work on right now. Dynamic JSON serialization I implemented seems to be fully compatible with itself and works well for objects inserted with other libraries/clients that don't exceed the dynamic paths limit (1024 dynamic paths by default and /4 on every array-nested dictionary). Requesting review @xzkostyan

jhKessler · 2025-03-12T19:41:26Z

Any indication on when this will be merged?

KingMohan45 · 2025-03-26T10:01:43Z

+1

chrisgoddard · 2025-04-03T20:42:40Z

Any update on this? This is a big blocker for me currently.

gsergey418 · 2025-04-06T07:51:34Z

Any update on this? This is a big blocker for me currently.

Not yet, the maintainer has to review the PR. You can try this out for now and let me know if it works.

$ pip uninstall clickhouse-driver
$ pip install git+https://github.com/gsergey418/clickhouse-driver@newjson#egg=clickhouse-driver

etos · 2025-05-05T17:20:24Z

+1

yuangezhizao · 2025-05-19T10:46:33Z

Any update on this? This is a big blocker for me currently.

Not yet, the maintainer has to review the PR. You can try this out for now and let me know if it works.
$ pip uninstall clickhouse-driver
$ pip install git+https://github.com/gsergey418/clickhouse-driver@newjson#egg=clickhouse-driver

In my test case, it works. Thanks for the PR. 👍

clickhouse-driver execute result

/Users/yuangezhizao/Documents/GitLab/<rm>-capacity-poc/.venv/bin/python /Users/yuangezhizao/Documents/GitLab/<rm>-capacity-poc/<rm>_capacity_poc/clickhouse/test_new_json_type.py 
DEBUG:clickhouse_driver.connection:Connecting. Database: default. User: default
DEBUG:clickhouse_driver.connection:Connecting to <rm>:9001
DEBUG:clickhouse_driver.connection:Connected to ClickHouse server version 25.4.2, revision: 54477
DEBUG:clickhouse_driver.connection:Query: DROP TABLE IF EXISTS `default`.test_new_json_type
DEBUG:clickhouse_driver.connection:Block "" send time: 0.000023
INFO:__main__:Table deleted
DEBUG:clickhouse_driver.connection:Query: 
                CREATE TABLE `default`.test_new_json_type
                (
                    `id` UInt64,
                    `json_data` JSON()
                )
                ENGINE = MergeTree()
                ORDER BY id;
                
DEBUG:clickhouse_driver.connection:Block "" send time: 0.000024
INFO:__main__:Table created
DEBUG:clickhouse_driver.connection:Query: INSERT INTO `default`.test_new_json_type (id, json_data) VALUES
DEBUG:clickhouse_driver.connection:Block "" send time: 0.000028
DEBUG:clickhouse_driver.streams.native:Writing column id
DEBUG:clickhouse_driver.streams.native:Writing column json_data
DEBUG:clickhouse_driver.connection:Block "" send time: 0.000856
DEBUG:clickhouse_driver.streams.native:Reading column host_name
DEBUG:clickhouse_driver.streams.native:Reading column current_time
DEBUG:tzlocal:/etc/localtime found
DEBUG:tzlocal:1 found:
 {'/etc/localtime is a symlink to': 'Asia/Shanghai'}
DEBUG:clickhouse_driver.streams.native:Reading column thread_id
DEBUG:clickhouse_driver.streams.native:Reading column type
DEBUG:clickhouse_driver.streams.native:Reading column name
DEBUG:clickhouse_driver.streams.native:Reading column value
DEBUG:clickhouse_driver.connection:Block "" send time: 0.000056
DEBUG:clickhouse_driver.streams.native:Reading column host_name
DEBUG:clickhouse_driver.streams.native:Reading column current_time
DEBUG:clickhouse_driver.streams.native:Reading column thread_id
DEBUG:clickhouse_driver.streams.native:Reading column type
DEBUG:clickhouse_driver.streams.native:Reading column name
DEBUG:clickhouse_driver.streams.native:Reading column value
INFO:__main__:Data created
DEBUG:clickhouse_driver.connection:Query: SELECT id, JSONExtractKeysAndValuesRaw(toJSONString(json_data)) FROM `default`.test_new_json_type
DEBUG:clickhouse_driver.connection:Block "" send time: 0.000015
DEBUG:clickhouse_driver.streams.native:Reading column id
DEBUG:clickhouse_driver.streams.native:Reading column JSONExtractKeysAndValuesRaw(toJSONString(json_data))
DEBUG:clickhouse_driver.streams.native:Reading column host_name
DEBUG:clickhouse_driver.streams.native:Reading column current_time
DEBUG:clickhouse_driver.streams.native:Reading column thread_id
DEBUG:clickhouse_driver.streams.native:Reading column type
DEBUG:clickhouse_driver.streams.native:Reading column name
DEBUG:clickhouse_driver.streams.native:Reading column value
INFO:__main__:ID: 1, Extracted JSON: [('attr', '{"attr16":"100"}')]
INFO:__main__:ID: 2, Extracted JSON: [('attr', '{"attr16":"string"}')]
INFO:__main__:ID: 3, Extracted JSON: [('attr', '{"attr16":[["aaaaa","bbbbb"]]}')]
INFO:__main__:ID: 4, Extracted JSON: [('attr', '{"attr16":[["ccccc"],"aaaaa","bbbbb"]}')]
INFO:__main__:ID: 5, Extracted JSON: [('attr', '{"attr_test":[["ccccc"],"aaaaa","bbbbb"]}')]
INFO:__main__:Data selected

Before

After

gsergey418 · 2025-05-19T11:05:48Z

@yuangezhizao Thanks for the test! Glad to hear it works

maciej-or · 2025-05-22T09:21:25Z

@gsergey418 thanks for the PR. Works smoothly with JSON types but with Array(JSON) I get an error by insert

DB::Exception: Array sizes are too large: 7164775599414010632

even if I try to insert empty arrays,
sample table:

CREATE TABLE gcp_billing_test (
  user_id Int64,
  cost Float64,
  resource JSON DEFAULT '{}',
  service JSON DEFAULT '{}',
  labels Array(JSON) DEFAULT [],
  credits Array(JSON) DEFAULT []
)
ENGINE = MergeTree
PARTITION BY (user_id)
ORDER BY (user_id)

sample insert:

INSERT INTO gcp_billing_test (
  user_id,
  cost,
  resource,
  service,
  labels,
  credits
)
VALUES (
  123456789,
  42.75,
  '{"type": "compute", "location": "us-central1"}',
  '{"name": "Compute Engine", "category": "IaaS"}',
  [
    '{"env": "prod"}',
    '{"team": "infra"}'
  ],
  [
    '{"name": "Sustained Use Discount", "amount": 5.25}',
    '{"name": "Committed Use Discount", "amount": 3.10}'
  ]
);

stack trace:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/clickhouse_driver/dbapi/cursor.py", line 111, in execute
    response = execute(
  File "/usr/local/lib/python3.10/site-packages/clickhouse_backend/driver/client.py", line 44, in execute
    rv = self.process_insert_query(
  File "/usr/local/lib/python3.10/site-packages/clickhouse_driver/client.py", line 607, in process_insert_query
    rv = self.send_data(sample_block, data,
  File "/usr/local/lib/python3.10/site-packages/clickhouse_driver/client.py", line 665, in send_data
    self.receive_profile_events()
  File "/usr/local/lib/python3.10/site-packages/clickhouse_driver/client.py", line 747, in receive_profile_events
    raise packet.exception
clickhouse_driver.errors.ServerException: Code: 128.
DB::Exception: Array sizes are too large: 7164775599414010632

ronaldorcampos · 2025-08-04T08:54:44Z

Thanks for the PR. We really need this supported officially, as we rely heavily on json type.

jgb · 2025-08-05T08:11:16Z

I would also like to voice my support for this PR (or any other PR that can implement JSON support for this library).
clickhouse-connect offers JSON support, but it isn't able to automatically transform null values into default values for non-nullable columns. clickhouse-driver does offer that feature, but doesn't support JSON.

PierreF · 2025-09-03T10:00:00Z

When I tested this PR, I hit one issue: when the JSON contains dynamic keys (like user-defined keys) the SELECT could fail if the data part contains more than 128 distincts keys.

My usage is JSON fields that store attribute of a log line. Each log line's attribute could be user-defined (not a fixed set of keys, e.g. one line could have http.response.status = 200 the other postgresql.version = 17). It result that all log lines could collectively have more than 128 keys.

The following test exhibit the bug:

    def test_json_13(self):
        """ Reproduce an issue when we read a JSON column that had more than 127 distinct keys across all rows
        """
        # The issue don't occur with Memory engine, so use MergeTree
        create_table_template = 'CREATE TABLE test ({}) ENGINE = MergeTree ORDER BY tuple()'

        with self.create_table("a JSON", template=create_table_template):
            unique_keys = [
                f"unique_key_{idx}"
                for idx in range(128)
            ]

            # There is also an issue when we try to insert more than 127 distinct keys in a single
            # query, but this isn't what this test show, so send the unique keys in multiple inserts.
            max_batch_size = 64
            batches = [unique_keys[i:i + max_batch_size] for i in range(0, len(unique_keys), max_batch_size)]
            for batch in batches:
                data = [
                    ({key: 1},)
                    for key in batch
                ]
                self.client.execute("INSERT INTO test (a) VALUES", data)
            
            # We submitted multiple inserts (which create multiple data parts). Each of these data parts contains
            # only max_batch_size distinct keys.
            # Once merged into a single data part, the resulting data part will
            # contains all unique_keys which exceed the threshold of ~120 unique keys.
            # Force merge of the data parts into a single one
            self.client.execute("OPTIMIZE TABLE test FINAL")

            query = "SELECT * FROM test"
            result = self.client.execute(query)
            result_keys = {key for row in result for key in row[0].keys()}
            self.assertEqual(result_keys, set(unique_keys))

shachibista · 2025-10-10T08:30:40Z

Any updates on this?

KabirSingh-Marrow · 2025-10-27T12:28:14Z

When will this get merged?

anand-bakerev · 2026-01-23T11:50:47Z

Hi Team, any update in this?

gsergey418 changed the title ~~Add a test and column for the new JSON type.~~ Add new JSON type. Feb 17, 2025

gsergey418 marked this pull request as ready for review February 27, 2025 14:37

gsergey418 added 3 commits March 12, 2025 11:08

Add support for the new JSON type.

48bde0d

Add tests for the new JSON type.

7f4a90d

Update documentation for JSON.

f94560b

gsergey418 force-pushed the newjson branch from fe64779 to f94560b Compare March 12, 2025 08:09

Merge branch 'master' into newjson

fb71227

phacops mentioned this pull request Nov 24, 2025

feat(eap): Allow arrays to be queried from EAP getsentry/snuba#7551

Merged

Uh oh!

Conversation

gsergey418 commented Feb 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gsergey418 commented Feb 27, 2025

Uh oh!

gsergey418 commented Feb 28, 2025

Uh oh!

gsergey418 commented Feb 28, 2025

Uh oh!

gsergey418 commented Feb 28, 2025

Uh oh!

gsergey418 commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhKessler commented Mar 12, 2025

Uh oh!

KingMohan45 commented Mar 26, 2025

Uh oh!

chrisgoddard commented Apr 3, 2025

Uh oh!

gsergey418 commented Apr 6, 2025

Uh oh!

etos commented May 5, 2025

Uh oh!

yuangezhizao commented May 19, 2025

clickhouse-driver execute result

Before

After

Uh oh!

gsergey418 commented May 19, 2025

Uh oh!

maciej-or commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ronaldorcampos commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgb commented Aug 5, 2025

Uh oh!

PierreF commented Sep 3, 2025

Uh oh!

shachibista commented Oct 10, 2025

Uh oh!

KabirSingh-Marrow commented Oct 27, 2025

Uh oh!

anand-bakerev commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

14 participants

gsergey418 commented Feb 17, 2025 •

edited

Loading

gsergey418 commented Mar 4, 2025 •

edited

Loading

maciej-or commented May 22, 2025 •

edited

Loading

ronaldorcampos commented Aug 4, 2025 •

edited

Loading