Fix: Deserialization failure for u64 values larger than i64::MAX#67
Conversation
There was a problem hiding this comment.
Review
The core approach is sound — splitting Token::Integer(i64) into SignedInteger(i64) / UnsignedInteger(u64) and trying i64 parse first then u64 fallback is the right fix. A few issues to address:
Bug: Missing UnsignedInteger arm in parse_object_value
At parser.rs around line 1278 in the diff, only SignedInteger is handled:
Token::SignedInteger(i) => {
let val = *i;
self.advance()?;
Ok(Number::from(val).into())
}There's no corresponding Token::UnsignedInteger arm. Large u64 values in this object-value parsing path would fail to match.
Code duplication in parse_value
The Token::UnsignedInteger block is a near-complete copy-paste of the Token::SignedInteger block (~40 lines). The only difference is the type. These should be combined — either with a helper function, or by extracting the value as serde_json::Number early and sharing the rest of the logic.
Unrelated changes
- Import reformatting (multi-line → single-line) throughout both files adds noise to the diff and should be a separate commit or omitted.
#[derive(Debug)]added toScanneris unrelated to the u64 fix.
Missing integration test
The test updates only rename Integer → SignedInteger and add one scanner-level assertion. There should be a test that roundtrips a large u64 value through encode_default/decode_default — that's exactly what issue #64 describes and it would catch regressions like the missing arm above.
missing UnsignedInteger arm in parse_primitive()
Reproduction — this TOON input with a list array of large u64 values:
values[3]:
- 12591154125385152738
- 18446744073709551615
- 9223372036854775808
Fails with:
Expected primitive value, found UnsignedInteger(12591154125385152738)
The issue is in parse_primitive() (parser.rs line ~1281) — it handles Token::SignedInteger but has no arm for Token::UnsignedInteger, so it falls through to the catch-all error.
|
@shanithkk please fix lint too |
There was a problem hiding this comment.
Thanks for the updates @shanithkk — the parse_primitive fix and the round-trip tests are good additions.
I went through the latest diff carefully. Here's where things stand:
Correctness
All 15 match sites in the parser now handle both SignedInteger and UnsignedInteger — verified each one. The core fix in the scanner (try i64 first, fall back to u64) is correct. I also tested a bunch of edge cases locally and everything passes:
- list arrays with large u64 (the parse_primitive path that was broken before) — works now
- large u64 as object key, in tabular arrays, inline arrays
- i64::MAX boundary, i64::MAX+1, u64::MAX
- values > u64::MAX correctly fall back to string
- large u64 followed by trailing tokens becomes string as expected
- negative numbers still parse as SignedInteger
So functionally this is solid.
Code duplication is still the main concern
The Token::UnsignedInteger arm in parse_value (lines 184-222) is a character-for-character copy of the SignedInteger arm (lines 145-183) — 39 lines of identical logic. The only actual difference is Number::from(i64) vs Number::from(u64) on the fallback line.
This is exactly the kind of thing that caused the parse_primitive bug in the first place — when you duplicate N places, it's easy to miss one. A helper function that takes the key string, first text, and a serde_json::Number could collapse both arms to ~3 lines each.
Same pattern repeats in parse_primitive (two identical 4-line arms) and across ~10 other match sites, though those are smaller.
I'd really prefer to see this cleaned up before merging. Not asking for a huge refactor — just extract the shared logic so we don't have to update two identical blocks every time something changes in that path.
Minor stuff
#[derive(Debug)]on Scanner (line 28) — unrelated to the u64 fix, would be cleaner as a separate commit or dropped from this PR- The round-trip tests are good but don't cover list arrays (the path that was actually broken), tabular arrays, or object keys with large u64. Would be nice to have at least one test per parser path
Test I used locally
For reference, here's the TOON input that was failing before your latest push and now works:
values[3]:
- 12591154125385152738
- 18446744073709551615
- 9223372036854775808
This hits parse_primitive() and was erroring with Expected primitive value, found UnsignedInteger(...).
Also please fix broken clippy lint
|
@shanithkk fix the nightly fmt check , so we can merge this PR |
Linked Issue
Fix: #64
Closes #
Description
Issue was during deserialization, u64 values larger than i64::MAX (9223372036854775807) fail because they are incorrectly handled as i64. This causes errors when values exceed the i64 range.
Updated the decoding logic by introducing SignedInteger(i64) and UnsignedInteger(u64) variants, allowing values to be deserialized using the appropriate integer type and correctly handling u64 values larger than i64::MAX.
Type of Change
Changes Made
SPEC Compliance
Testing
Pre-submission Checklist
Breaking Changes
Additional Context