-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python/postgresql: NUMERIC values corrupted in fetched results #1501
Comments
Hmm. @WillAyd or @paleolimbot happen to have any ideas? |
also forgot to mention in the ticket... re-running the reproducer very often leads to different corruption - e.g. the trailing garbage changes over test runs. |
Thank you for this and the reproducer! The code to translate Postgres COPY for NUMERIC into a string is very complicated (I translated the Postgres implementation as literally as I could): arrow-adbc/c/driver/postgresql/copy/reader.h Lines 275 to 393 in d6694a2
I am guessing there is an error there. I can can take a look in the next few days! |
My guess is that the value of |
@paleolimbot i have some free time so poking around the code. started comparing the funky code with the original from PostgreSQL (it's a bit late here so i'm on sort of autopilot :)) I think the missing leading zero is due to this: maint-0.9.0...lupko:arrow-adbc:maint-0.9.0#diff-c5dc283c26154e7a7e1291b73f5131cf4a180240729fea7f459980f60b08486cR438 i seems a clear diff from the original code. trying locally with those changes and I see the leading zero is restored |
the trailing corruption is likely due to this: lupko@f171a17 |
here is PR that makes the conversion faithful to the PostgreSQL counterpart: https://github.com/apache/arrow-adbc/pull/1502/files local testing shows the singular zero before decimal is present now + the corruption is gone. |
- conversion from numeric to string had two bugs due to deviations from the original PostgreSQL code - leading single zero before decimal would always be dropped - in some cases, the numbers after decimal would not be incomplete and instead replaced with 'garbage' here is the PostgreSQL code: https://github.com/postgres/postgres/blob/9589b038d3203cd5ba708fb4f5c23182c88ad0b3/src/backend/utils/adt/numeric.c#L7443 (the DEC_DIGITS=4 variant) Fixes #1501.
Hello,
I have just recently started prototyping with ADBC with goal to eventually integrate it into our system. Running queries and fetching results it the main use case for us.
I have started with PostgreSQL driver and have run into issues with NUMERIC columns in results. I understand the choice to do loss-less transfer as strings, however I find that these strings are in the end corrupt.
In my tests, I consume results as Arrow Tables.
There are two main things:
value of 0 never contains the decimal part. so for example
.0000
,When using particular numeric scale, all values get corrupt - there is trailing garbage that cannot be converted to UTF-8.
Here is an all-in-one reproducer: https://gist.github.com/lupko/336da65b37aade5dc2433004e2720d8e
A sample output (NUMERIC(16, 5):
Thanks for this great project. Please let me know if I can be of any further help.
The text was updated successfully, but these errors were encountered: