Skip to content

Fix: Deserialization failure for u64 values larger than i64::MAX#67

Merged
shreyasbhat0 merged 3 commits into
toon-format:mainfrom
shanithkk:main
May 22, 2026
Merged

Fix: Deserialization failure for u64 values larger than i64::MAX#67
shreyasbhat0 merged 3 commits into
toon-format:mainfrom
shanithkk:main

Conversation

@shanithkk

@shanithkk shanithkk commented May 16, 2026

Copy link
Copy Markdown
Contributor

Linked Issue

Fix: #64

Closes #

Description

Issue was during deserialization, u64 values larger than i64::MAX (9223372036854775807) fail because they are incorrectly handled as i64. This causes errors when values exceed the i64 range.

Updated the decoding logic by introducing SignedInteger(i64) and UnsignedInteger(u64) variants, allowing values to be deserialized using the appropriate integer type and correctly handling u64 values larger than i64::MAX.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test coverage improvement

Changes Made

  • Introduced SignedInteger(i64) and UnsignedInteger(u64) variants.
  • Updated the decoding logic to use the appropriate integer type during deserialization.

SPEC Compliance

  • This PR implements/fixes spec compliance
  • Spec section(s) affected:
  • Spec version:

Testing

  • All existing tests pass
  • Added new tests for changes
  • Tests cover edge cases and spec compliance

Pre-submission Checklist

  • My code follows the project's coding standards
  • I have run code formatting/linting tools
  • I have added tests that prove my fix/feature works
  • New and existing tests pass locally
  • I have updated documentation if needed
  • I have reviewed the TOON specification for relevant sections

Breaking Changes

  • No breaking changes
  • Breaking changes (describe migration path below)

Additional Context

@shanithkk shanithkk requested a review from a team as a code owner May 16, 2026 16:38

@shreyasbhat0 shreyasbhat0 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

The core approach is sound — splitting Token::Integer(i64) into SignedInteger(i64) / UnsignedInteger(u64) and trying i64 parse first then u64 fallback is the right fix. A few issues to address:

Bug: Missing UnsignedInteger arm in parse_object_value

At parser.rs around line 1278 in the diff, only SignedInteger is handled:

Token::SignedInteger(i) => {
    let val = *i;
    self.advance()?;
    Ok(Number::from(val).into())
}

There's no corresponding Token::UnsignedInteger arm. Large u64 values in this object-value parsing path would fail to match.

Code duplication in parse_value

The Token::UnsignedInteger block is a near-complete copy-paste of the Token::SignedInteger block (~40 lines). The only difference is the type. These should be combined — either with a helper function, or by extracting the value as serde_json::Number early and sharing the rest of the logic.

Unrelated changes

  • Import reformatting (multi-line → single-line) throughout both files adds noise to the diff and should be a separate commit or omitted.
  • #[derive(Debug)] added to Scanner is unrelated to the u64 fix.

Missing integration test

The test updates only rename IntegerSignedInteger and add one scanner-level assertion. There should be a test that roundtrips a large u64 value through encode_default/decode_default — that's exactly what issue #64 describes and it would catch regressions like the missing arm above.

missing UnsignedInteger arm in parse_primitive()

Reproduction — this TOON input with a list array of large u64 values:

values[3]:
  - 12591154125385152738
  - 18446744073709551615
  - 9223372036854775808

Fails with:

Expected primitive value, found UnsignedInteger(12591154125385152738)

The issue is in parse_primitive() (parser.rs line ~1281) — it handles Token::SignedInteger but has no arm for Token::UnsignedInteger, so it falls through to the catch-all error.

@shreyasbhat0

Copy link
Copy Markdown
Member

@shanithkk please fix lint too

@shanithkk shanithkk requested a review from shreyasbhat0 May 20, 2026 05:10

@shreyasbhat0 shreyasbhat0 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @shanithkk — the parse_primitive fix and the round-trip tests are good additions.

I went through the latest diff carefully. Here's where things stand:

Correctness

All 15 match sites in the parser now handle both SignedInteger and UnsignedInteger — verified each one. The core fix in the scanner (try i64 first, fall back to u64) is correct. I also tested a bunch of edge cases locally and everything passes:

  • list arrays with large u64 (the parse_primitive path that was broken before) — works now
  • large u64 as object key, in tabular arrays, inline arrays
  • i64::MAX boundary, i64::MAX+1, u64::MAX
  • values > u64::MAX correctly fall back to string
  • large u64 followed by trailing tokens becomes string as expected
  • negative numbers still parse as SignedInteger

So functionally this is solid.

Code duplication is still the main concern

The Token::UnsignedInteger arm in parse_value (lines 184-222) is a character-for-character copy of the SignedInteger arm (lines 145-183) — 39 lines of identical logic. The only actual difference is Number::from(i64) vs Number::from(u64) on the fallback line.

This is exactly the kind of thing that caused the parse_primitive bug in the first place — when you duplicate N places, it's easy to miss one. A helper function that takes the key string, first text, and a serde_json::Number could collapse both arms to ~3 lines each.

Same pattern repeats in parse_primitive (two identical 4-line arms) and across ~10 other match sites, though those are smaller.

I'd really prefer to see this cleaned up before merging. Not asking for a huge refactor — just extract the shared logic so we don't have to update two identical blocks every time something changes in that path.

Minor stuff

  • #[derive(Debug)] on Scanner (line 28) — unrelated to the u64 fix, would be cleaner as a separate commit or dropped from this PR
  • The round-trip tests are good but don't cover list arrays (the path that was actually broken), tabular arrays, or object keys with large u64. Would be nice to have at least one test per parser path

Test I used locally

For reference, here's the TOON input that was failing before your latest push and now works:

values[3]:
  - 12591154125385152738
  - 18446744073709551615
  - 9223372036854775808

This hits parse_primitive() and was erroring with Expected primitive value, found UnsignedInteger(...).

Also please fix broken clippy lint

@shreyasbhat0

shreyasbhat0 commented May 21, 2026

Copy link
Copy Markdown
Member

@shanithkk fix the nightly fmt check , so we can merge this PR

@shreyasbhat0 shreyasbhat0 merged commit b72975c into toon-format:main May 22, 2026
3 checks passed
@github-actions github-actions Bot mentioned this pull request May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

large u64 values cannot roundtrip (serialized as string, deserialization fails)

2 participants