Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snowflake: adbc_ingest will fail with "double free" segmentation fault if record batch schema is incorrect #2108

Open
pkit opened this issue Aug 29, 2024 · 3 comments
Labels
Type: bug Something isn't working

Comments

@pkit
Copy link

pkit commented Aug 29, 2024

What happened?

If schema of RecordBatchReader doesn't match the actual batch columns - adbc driver crashes.
It's also pretty hard to debug why, because the only lead is "double free or corruption (out)".
Needed to run under valgrind to understand what's going on.
For some reason it fails in go with a proper exception "index out of bounds" but then it's not propagated to the python code.

Stack Trace

No response

How can we reproduce the bug?

    schema = pa.schema(fields=[
        pa.field("name1", pa.string()),
        pa.field("name2", pa.string()),
    ])
    data = [
        {"name1": "aaa"},
        {"name1": "bbb"},
    ]
    reader = pa.RecordBatchReader.from_batches(schema, [pa.RecordBatch.from_pylist(data)])
    with c.cursor() as cur:
        cur.adbc_ingest("test2", reader, mode="create_append")

Environment/Setup

Latest

@pkit pkit added the Type: bug Something isn't working label Aug 29, 2024
@lidavidm lidavidm added this to the ADBC Libraries 15 milestone Aug 29, 2024
@joellubi
Copy link
Member

@pkit Can you please share the package version(s) for which this issue occurred, and any other configuration you may have passed to the driver/connection?

I failed to reproduce this using adbc-driver-snowflake = 1.1.0. For me it failed with the stack trace I would have expected it to:

panic: arrow/array: number of columns/fields mismatch

goroutine 38 [running]:
github.com/apache/arrow/go/v17/arrow/array.NewRecord(0x1400018e480, {0x14000e82010, 0x1, 0x160009760?}, 0x2)
        /Users/runner/go/pkg/mod/github.com/apache/arrow/go/[email protected]/arrow/array/record.go:151 +0x198
github.com/apache/arrow/go/v17/arrow/cdata.ImportCRecordBatchWithSchema(0x14000581f80?, 0x1400018e480)
        /Users/runner/go/pkg/mod/github.com/apache/arrow/go/[email protected]/arrow/cdata/interface.go:131 +0x248
github.com/apache/arrow/go/v17/arrow/cdata.(*nativeCRecordBatchReader).next(0x14000a9a340)
        /Users/runner/go/pkg/mod/github.com/apache/arrow/go/[email protected]/arrow/cdata/cdata.go:997 +0x1bc
github.com/apache/arrow/go/v17/arrow/cdata.(*nativeCRecordBatchReader).Next(0x14000a9a340)
        /Users/runner/go/pkg/mod/github.com/apache/arrow/go/[email protected]/arrow/cdata/cdata.go:956 +0x20
github.com/apache/arrow-adbc/go/adbc/driver/snowflake.readRecords({0x161bd3558, 0x140005d0aa0}, {0x108fb3898, 0x14000a9a340}, 0x14000118840)
        /Users/runner/work/arrow-adbc/arrow-adbc/adbc/go/adbc/driver/snowflake/bulk_ingestion.go:315 +0x78
github.com/apache/arrow-adbc/go/adbc/driver/snowflake.(*statement).ingestStream.func3()
        /Users/runner/work/arrow-adbc/arrow-adbc/adbc/go/adbc/driver/snowflake/bulk_ingestion.go:249 +0x34
golang.org/x/sync/errgroup.(*Group).Go.func1()
        /Users/runner/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78 +0x58
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 17
        /Users/runner/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75 +0x98
Abort trap: 6

@pkit
Copy link
Author

pkit commented Aug 30, 2024

$ pip freeze | grep adbc
adbc-driver-manager==1.1.0
adbc-driver-snowflake==1.1.0

I will add a full repro soon. Yes, it involves custom configuration for adbc.snowflake.statement.ingest_* stuff

@pkit
Copy link
Author

pkit commented Aug 30, 2024

I lied, it fails even with no custom config.
snowflake_connector_profile.url is just a snowflake URL as a string
pytest:

def test_adbc_bug(snowflake_connector_profile):
    c = connect(snowflake_connector_profile.url, db_kwargs={
        "adbc.snowflake.sql.schema": "PUBLIC",
        "adbc.snowflake.sql.db": "TEST1",
    })
    schema = pa.schema(
        fields=[
            pa.field("name1", pa.string()),
            pa.field("name2", pa.string()),
        ]
    )
    data = [
        {"name1": "aaa"},
        {"name1": "bbb"},
    ]
    reader = pa.RecordBatchReader.from_batches(schema, [pa.RecordBatch.from_pylist(data)])
    with c.cursor() as cur:
        cur.adbc_ingest("test2", reader, mode="create_append")

Exception:

=================================================================================================== test session starts ===================================================================================================
platform linux -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0 -- /home/user/adbc_bug/.venv/bin/python
cachedir: .pytest_cache
rootdir: /home/user/adbc_bug
configfile: pyproject.toml
plugins: asyncio-0.24.0, anyio-3.7.1, Faker-28.1.0
asyncio: mode=Mode.STRICT, default_loop_scope=None
collected 1 item                                                                                                                                                                                                          

tests/functional/python/test_sf_transform.py::test_adbc_bug Fatal Python error: Aborted

Thread 0x00007f3febe29740 (most recent call first):
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/adbc_driver_manager/dbapi.py", line 937 in adbc_ingest
  File "/home/user/adbc_bug/tests/functional/python/test_sf_transform.py", line 148 in test_adbc_bug
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/python.py", line 159 in pytest_pyfunc_call
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/python.py", line 1627 in runtest
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 174 in pytest_runtest_call
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 242 in <lambda>
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 341 in from_call
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 241 in call_and_report
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 132 in runtestprotocol
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/main.py", line 362 in pytest_runtestloop
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/main.py", line 337 in _main
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/main.py", line 283 in wrap_session
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/main.py", line 330 in pytest_cmdline_main
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/config/__init__.py", line 175 in main
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/_pytest/config/__init__.py", line 201 in console_main
  File "/home/user/adbc_bug/.venv/lib/python3.11/site-packages/pytest/__main__.py", line 9 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, pyarrow.lib, adbc_driver_manager._lib, pyarrow._compute, pyarrow._acero, pyarrow._fs, pyarrow._csv, pyarrow._json, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet, pyarrow._parquet_encryption, pyarrow._dataset_parquet_encryption, pyarrow._dataset_parquet, adbc_driver_manager._reader, pydantic.typing, pydantic.errors, pydantic.version, pydantic.utils, pydantic.class_validators, pydantic.config, pydantic.color, pydantic.datetime_parse, pydantic.validators, pydantic.networks, pydantic.types, pydantic.json, pydantic.error_wrappers, pydantic.fields, pydantic.parse, pydantic.schema, pydantic.main, pydantic.dataclasses, pydantic.annotated_types, pydantic.decorator, pydantic.env_settings, pydantic.tools, pydantic, clickhouse_connect.driverc.buffer, clickhouse_connect.driverc.dataconv, clickhouse_connect.driverc.npconv, zstandard.backend_c, lz4._version, lz4.frame._frame, pyarrow._azurefs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, psycopg2._psycopg, regex._regex, _cffi_backend, charset_normalizer.md, snowflake.connector.nanoarrow_arrow_iterator (total: 54)
Aborted (core dumped)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants