🔄 toonpy

A production-grade Python library and CLI that converts data between JSON, YAML, and TOON (Token-Oriented Object Notation) while fully conforming to TOON SPEC v2.0. Perfect for developers and data engineers who need efficient, token-optimized data serialization.

📦 Current Version: 0.4.0 - YAML support added with optional dependency model! See What's New in v0.4.0 and Performance sections for details.

✅ Full TOON SPEC v2.0 Compliance - This library implements all examples from the official TOON specification repository, ensuring complete compatibility with the standard.

🚀 What's New in v0.4.0

YAML Support Release (November 2025) - This version adds comprehensive YAML support with a smart optional dependency model:

🔄 YAML ↔ TOON conversion - Bidirectional conversion with streaming support
📦 Optional dependency model - Zero-dependency core, install YAML support only if needed: pip install toontools[yaml]
🎯 CLI commands - New yaml-to-toon and toon-to-yaml commands
⚡ High performance - YAML conversion with minimal overhead (2-9%)
📚 Design philosophy docs - New DESIGN_PHILOSOPHY.md explaining architectural decisions
✅ 22 new tests - Comprehensive YAML test coverage

Why Optional Dependencies?

Lightweight core: Keep toontools dependency-free for JSON ↔ TOON workflows
Install what you need: Only add PyYAML if you need YAML support
Best of both worlds: Zero-dependency simplicity + extended format support

Previous Release - v0.3.0 (November 2025):

⚡ Parser: 20-50% faster - Optimized literal parsing, comment removal, and table processing
🚀 Serializer: Up to 70% faster - Streamlined type checking and container handling
🔢 Utils: 10-15% faster - Improved number parsing and string operations

Backward Compatibility: ✅ 100% compatible with all previous versions - drop-in replacement, no code changes required!

See RELEASE_NOTES.md for complete details and CHANGELOG.md for the full changelog.

✨ Features

The toonpy library provides comprehensive JSON ↔ TOON conversion capabilities:

🔧 1. Lossless Conversion

Bidirectional conversion between JSON-compatible Python objects and TOON text
Round-trip preservation - data integrity guaranteed
Supports all JSON data types (objects, arrays, scalars)
Handles nested structures of any depth

📊 2. Advanced Parser & Lexer

LL(1) parser with indentation tracking
Comment support - inline (#, //) and block (/* */) comments
ABNF-backed grammar - fully compliant with TOON SPEC v2.0
Error reporting with line and column numbers

🚀 3. Automatic Tabular Detection

Smart detection of uniform-object arrays
Automatic emission of efficient tabular mode (key[N]{fields}:)
Token savings estimation using tiktoken (optional)
Configurable modes: auto, compact, readable

🛠️ 4. CLI & Utilities

Command-line interface (toonpy) for file conversion
Validation API for syntax checking
Streaming helpers for large files
Formatting tools for code style consistency

🔄 5. YAML Support (Optional)

YAML ↔ TOON conversion with optimized performance
Streaming YAML to TOON for large files
CLI commands for YAML file conversion
Full Unicode support and proper type handling

📦 Installation

Install from PyPI (Recommended)

pip install toontools

Or install a specific version:

pip install toontools==0.4.0

📦 PyPI Package: toontools on PyPI | Latest: v0.3.0

Install from Source

# Clone the repository
git clone https://github.com/shinjidev/toonpy.git
cd toonpy

# Install the package
pip install .

# Or install with optional extras
pip install .[tests]      # Include testing dependencies
pip install .[examples]   # Include tiktoken for token counting
pip install .[yaml]       # Include PyYAML for YAML support

Requirements: Python 3.9+

Core Philosophy: toontools follows a "zero-dependency core" design. The base installation requires no external packages, ensuring fast installs and minimal footprint. Additional format support (YAML, etc.) is available as optional dependencies.

Optional: YAML Support

To enable YAML ↔ TOON conversion:

pip install toontools[yaml]
# or
pip install PyYAML>=6.0

Why optional? YAML support is opt-in to keep the core library lightweight (~60KB, 0 dependencies). Most users only need JSON ↔ TOON conversion. If you need YAML support, simply install the extra and all YAML functions become available automatically.

🚀 Quick Start

from toontools import to_toon, from_toon

# Convert Python object to TOON
data = {
    "crew": [
        {"id": 1, "name": "Luz", "role": "Light glyph"},
        {"id": 2, "name": "Amity", "role": "Abomination strategist"}
    ],
    "active": true,
    "ship": {
        "name": "Owl House",
        "location": "Bonesborough"
    }
}

toon_text = to_toon(data, mode="auto")
print(toon_text)
# Output:
# crew[2]{id,name,role}:
#   1,Luz,"Light glyph"
#   2,Amity,"Abomination strategist"
# active: true
# ship:
#   name: "Owl House"
#   location: Bonesborough

# Convert TOON back to Python object
round_trip = from_toon(toon_text)
assert round_trip == data  # ✅ Perfect round-trip!

📖 Detailed Usage

Python API

Basic Conversion

from toontools import to_toon, from_toon

# JSON → TOON
data = {"name": "Luz", "age": 16, "active": True}
toon = to_toon(data, indent=2, mode="auto")

# TOON → JSON
parsed = from_toon(toon)
assert parsed == data

Validation

from toontools import validate_toon

toon_text = """
crew[2]{id,name}:
  1,Luz
  2,Amity
"""

is_valid, errors = validate_toon(toon_text, strict=True)
if not is_valid:
    for error in errors:
        print(f"Error: {error}")

Tabular Suggestions

from toontools import suggest_tabular

crew = [
    {"id": 1, "name": "Luz"},
    {"id": 2, "name": "Amity"}
]

suggestion = suggest_tabular(crew)
if suggestion.use_tabular:
    print(f"Use tabular format! Estimated savings: {suggestion.estimated_savings} tokens")
    print(f"Fields: {suggestion.keys}")

Streaming Large Files

from toontools import stream_to_toon

with open("large_data.json", "r") as fin, open("output.toon", "w") as fout:
    bytes_written = stream_to_toon(fin, fout, mode="compact")
    print(f"Converted {bytes_written} bytes")

YAML Support

Convert YAML to TOON:

from toontools import to_toon_from_yaml

yaml_str = """
crew:
  - id: 1
    name: Luz
    role: Magic user
  - id: 2
    name: Amity
    role: Strategist
"""

toon_str = to_toon_from_yaml(yaml_str, mode="auto")
print(toon_str)
# Output:
# crew[2]{id,name,role}:
#   1,Luz,"Magic user"
#   2,Amity,Strategist

Convert TOON to YAML:

from toontools import to_yaml_from_toon

toon_str = """
crew[2]{id,name}:
  1,Luz
  2,Amity
active: true
"""

yaml_str = to_yaml_from_toon(toon_str)
print(yaml_str)
# Output:
# crew:
# - id: 1
#   name: Luz
# - id: 2
#   name: Amity
# active: true

Stream YAML to TOON:

from toontools import stream_yaml_to_toon

with open("data.yaml", "r") as fin, open("output.toon", "w") as fout:
    bytes_written = stream_yaml_to_toon(fin, fout, mode="auto")
    print(f"Converted {bytes_written} bytes")

Note: Requires pip install toontools[yaml] or pip install PyYAML>=6.0

Command-Line Interface

Convert JSON to TOON

toonpy to --in data.json --out data.toon --mode readable --indent 2

Convert TOON to JSON

toonpy from --in data.toon --out data.json --permissive

Format a TOON File

toonpy fmt --in data.toon --out data.formatted.toon --mode readable

Convert YAML to TOON

toonpy yaml-to-toon --in data.yaml --out data.toon --mode auto

Convert TOON to YAML

toonpy toon-to-yaml --in data.toon --out data.yaml

Note: YAML commands require pip install toontools[yaml]

Exit Codes:

0 - Success
2 - TOON syntax error
3 - General error
4 - I/O error

🧪 Testing

The library includes comprehensive unit tests, property-based tests, and performance benchmarks:

# Run all tests
pytest

# Run with coverage
pytest --cov=toonpy --cov-report=html

# Run performance benchmarks
pytest tests/test_benchmark.py -v -s

# Run specific test file
pytest tests/test_parser.py -v

Test Coverage:

✅ Unit tests for parser, serializer, API, and CLI
✅ Property-based tests with Hypothesis for round-trip verification
✅ Performance benchmarks for speed validation
✅ Edge cases: multiline strings, comments, empty containers
✅ Error handling and validation

Example Test Output:

============================= test session starts =============================
tests/test_parser.py::test_parse_object_and_array PASSED
tests/test_parser.py::test_parse_table_block PASSED
tests/test_serializer.py::test_round_trip_simple PASSED
tests/test_benchmark.py::test_serialize_small_data PASSED
...
============================== 20+ passed in 3.45s ==============================

⚡ Performance

toonpy v0.3.0 delivers exceptional performance with major speed improvements across all components. This release represents a comprehensive optimization effort with measurable gains of 20-70% in key operations.

🚀 Performance Highlights (v0.3.0)

Component	Key Operation	Improvement	Impact
Parser	Comment-free files	+70%	Dramatically faster parsing when no comments present
Parser	Literal parsing	+30-40%	Common values (`true`, `false`, `null`) cached
Parser	Overall parsing	+20-50%	Comprehensive optimizations across all operations
Serializer	Key serialization paths	+70%	Type checking streamlined
Serializer	Container handling	+35-40%	Reduced redundant `isinstance()` checks
Utils	Number parsing	+10-15%	Try/except approach with regex fallback
Utils	Row splitting	Significant	String slicing instead of char-by-char building
Parallel	Memory usage	Improved	`executor.map()` for better efficiency

Benchmark Results

Run the benchmarks to see real-time performance metrics:

# Run comprehensive benchmark suite
pytest tests/test_benchmark.py -v -s

# Run module-specific benchmarks
python benchmark_optimizations.py    # Parser benchmarks
python benchmark_serializer.py       # Serializer benchmarks
python benchmark_parallel.py         # Parallel module benchmarks

Typical Performance (v0.3.0 on modern hardware):

Operation	Dataset Size	Time	Throughput	vs v0.2.0
Serialize small data	3 fields	~0.010 ms	~100K ops/s	+30% faster
Parse small data	3 fields	~0.012 ms	~83K ops/s	+40% faster
Serialize tabular	100 rows	~0.30 ms	~3,300 ops/s	~70% faster
Parse tabular	100 rows	~1.20 ms	~830 ops/s	~40% faster
Round-trip	500 rows	~8.5 ms	~118 ops/s	~40% faster
Large file (1000 rows)	1K records	~3-4 ms	~250-330 ops/s	~50% faster
Nested structures	Depth 10	~0.25 ms	~4,000 ops/s	~170% faster
Comment removal	Comment-free	~0.05 ms	20K ops/s	~70% faster

Performance Characteristics:

⚡ Blazing fast serialization - Optimized with literal caching and streamlined logic
🚀 Efficient tabular format - Automatic detection reduces token count by 30-50%
📊 Competitive with JSON - Now only 3-5x slower than JSON (vs 7-12x in v0.2.0)
🔄 Fast round-trips - Complete JSON → TOON → JSON conversion in single-digit milliseconds
💾 Token savings - Tabular format ideal for LLM applications
🎯 Production-ready - Optimized for real-world workloads

Example Benchmark Output (v0.3.0):

[Benchmark] Small data serialization: 0.010 ms/op (30% faster)
[Benchmark] Small data parsing: 0.012 ms/op (40% faster)
[Benchmark] Tabular data serialization (100 rows): 0.300 ms (70% faster)
[Benchmark] Tabular data parsing (100 rows): 1.200 ms (40% faster)
[Benchmark] Round-trip (500 rows): 8.500 ms (40% faster)
[Benchmark] Performance comparison (100 rows):
  JSON:  0.080 ms
  TOON:  0.350 ms (v0.3.0)
  Ratio: 4.37x (vs 7.41x in v0.2.0)

🚀 Performance Optimizations

The v0.3.0 release includes comprehensive optimizations across all modules. Below are the key improvements:

New in v0.3.0: Core Parser & Serializer Optimizations

1. Literal Caching (~30-40% improvement for common values)

What was done:

Implemented _LITERAL_CACHE dictionary for frequently used tokens
Pre-stores parsed values for "true", "false", "null", "[]", "{}"
Early return pattern in _parse_token() to check cache first

Why it's faster:

Before: Every literal required string processing, type detection, and conversion
After: Common literals return cached value instantly, skipping all parsing logic
Impact: Massive speedup for files with many boolean/null values

Code example:

# Before (slow):
if token.lower() == "true":
    return True
elif token.lower() == "false":
    return False
# ... more checks

# After (fast):
cached = _LITERAL_CACHE.get(token.lower())
if token.lower() in _LITERAL_CACHE:
    return cached  # Instant return

2. StringIO-based Comment Removal (~70% improvement for comment-free files)

What was done:

Refactored _remove_block_comments() to use io.StringIO
Added early return if no block comments detected
Eliminated character-by-character string building

Why it's faster:

Before: Always processed entire file character-by-character, building result with string concatenation
After: Early exit if no /* found, uses efficient StringIO when needed
Impact: Most TOON files have no block comments, so they skip processing entirely

3. Try/Except Number Parsing (~10-15% improvement in utils)

What was done:

Changed guess_number() to use try/except for int() and float()
Regex used only for strict validation, not primary parsing
Early rejection based on first character

Why it's faster:

Before: Regex pattern matching for every number, which is relatively slow
After: Native Python int/float conversion (fast path), regex only for edge cases
Impact: Number-heavy files parse significantly faster

4. Streamlined Type Checking (~35-40% improvement in serializer)

What was done:

Optimized _inline_container_repr() to minimize isinstance() calls
Removed redundant type checks in _write_value()
Better code flow to avoid repeated checks

Why it's faster:

Before: Multiple isinstance() checks for same object
After: Check once, remember result, use efficient logic flow
Impact: Especially noticeable when serializing many objects

5. String Slicing for Row Parsing (Significant improvement in utils)

What was done:

Replaced character-by-character list building in split_escaped_row()
Used efficient string slicing to extract segments
Eliminated intermediate list and join() overhead

Why it's faster:

Before: Loop through each char, append to list, join at end
After: Slice string directly at split points
Impact: Much faster for tabular data with many rows

From v0.2.0: Base Optimizations

6. Indentation Caching (~15-20% improvement in nested structures)

What was done:

Implemented a cache for indentation strings (0-20 levels)
Pre-computes common indentation strings instead of creating them repeatedly
Uses _get_indent() method with _indent_cache dictionary

Why it's faster:

Before: Each line required creating a new string with " " * (level * indent), which allocates memory and performs string multiplication repeatedly
After: Common indentation levels are computed once and reused, eliminating redundant string creation
Impact: Most noticeable in deeply nested structures where the same indentation levels are used many times

Code example:

# Before (slow):
lines.append(" " * level + content)  # Creates new string every time

# After (fast):
indent_str = self._get_indent(level)  # Uses cache
lines.append(indent_str + content)

7. String Concatenation Optimization (~5-10% general, ~60% in tabular)

What was done:

Eliminated string concatenation with + operator in loops
Pre-compute common prefixes (like "-" for arrays)
Use join() once at the end instead of multiple concatenations
Build rows as lists and join once per row

Why it's faster:

Before: Python's + operator for strings creates new string objects each time, which is O(n) for each concatenation
After: Building a list and using join() is O(n) total, much more efficient
Impact: Especially noticeable in tabular format where many rows are processed

Code example:

# Before (slow):
row = ""
for cell in cells:
    row += cell + ","  # Creates new string each iteration

# After (fast):
row_str = ",".join(cells)  # Single join operation

8. Compiled Regular Expressions (~3-5% improvement in parsing)

What was done:

Compiled regex patterns as class attributes instead of compiling them on each call
Patterns are compiled once when the class is defined, not per instance

Why it's faster:

Before: re.match(pattern, text) compiles the pattern every time it's called
After: Pre-compiled patterns stored as _QUOTED_TABLE_PATTERN and _UNQUOTED_TABLE_PATTERN are reused
Impact: Most noticeable when parsing many table headers

Code example:

# Before (slow):
match = re.match(r'^"([^"]+)"\[(\d+)\]\{([^}]+)\}:$', content)

# After (fast):
match = self._QUOTED_TABLE_PATTERN.match(content)  # Pre-compiled

9. Line Ending Normalization Optimization (~1-2% improvement)

What was done:

Only normalize line endings if \r is present in the source
Avoids unnecessary string operations on Unix-style text

Why it's faster:

Before: Always performed replace("\r\n", "\n").replace("\r", "\n") even when not needed
After: Checks for \r first, only normalizes if necessary
Impact: Small but consistent improvement, especially for large files

10. Optional Parallelism Module (2-4x for large arrays >10K elements)

What was done:

Created toonpy.parallel module with parallel_serialize_chunks()
Uses concurrent.futures (ThreadPoolExecutor or ProcessPoolExecutor)
Allows processing large arrays in parallel chunks

Why it's faster:

Before: Large arrays processed sequentially on a single core
After: Arrays divided into chunks, each processed in parallel
Impact: Significant speedup for very large datasets (>10K elements) on multi-core systems

Usage:

from toonpy.parallel import parallel_serialize_chunks, chunk_sequence
from toonpy import ToonSerializer

large_array = [{"id": i} for i in range(50000)]
chunks = chunk_sequence(large_array, chunk_size=5000)
serializer = ToonSerializer()

results = parallel_serialize_chunks(
    chunks,
    serializer.dumps,
    use_threads=False,  # Use processes for CPU-bound work
    max_workers=4
)

Performance Comparison Summary

Optimization	Improvement	Best For	Version
Literal Caching	30-40%	Files with many booleans/nulls	v0.3.0
StringIO Comment Removal	70%	Comment-free files (most common)	v0.3.0
Try/Except Number Parsing	10-15%	Number-heavy data	v0.3.0
Streamlined Type Checking	35-40%	Object serialization	v0.3.0
String Slicing Row Parsing	Significant	Tabular data with many rows	v0.3.0
Indentation Caching	15-20%	Nested structures, deep hierarchies	v0.2.0
String Concatenation	5-10% general, 60% tabular	Tabular arrays, large datasets	v0.2.0
Compiled Regex	3-5%	Table parsing, repeated patterns	v0.2.0
Line Ending Optimization	1-2%	Large files, Unix-style text	v0.2.0
Parallelism	2-4x	Arrays >10K elements	v0.2.0

Overall Impact (v0.3.0 vs v0.2.0):

Parser: 20-50% faster overall, 70% faster for comment-free files
Serializer: Up to 70% faster in key paths, 35-40% faster container handling
Utils: 10-15% faster number parsing, significant row parsing improvement
Tabular serialization: ~70% faster (0.30 ms vs 0.55 ms)
Tabular parsing: ~40% faster (1.20 ms vs 1.70 ms)
Round-trip: ~40% faster (8.5 ms vs 11.9 ms)
Nested structures: ~170% faster throughput (4,000 ops/s vs 2,300 ops/s)

v0.3.0 vs v0.1.0 (Initial Release):

Parser: ~100-150% faster (2-2.5x speedup)
Serializer: ~200% faster (3x speedup)
Overall throughput: ~140% improvement

These optimizations maintain full TOON SPEC v2.0 compliance while dramatically improving performance. All improvements are production-tested with 24/24 tests passing.

📚 Detailed Documentation:

RELEASE_NOTES.md - Complete v0.3.0 release notes
OPTIMIZATIONS_DOCUMENTED.md - 23-page technical analysis
ALL_OPTIMIZATIONS_SUMMARY.md - Comprehensive overview
Run benchmark_optimizations.py, benchmark_serializer.py, benchmark_parallel.py for detailed metrics

📊 Example Output

Input JSON:

{
  "crew": [
    {"id": 1, "name": "Luz", "role": "Light glyph"},
    {"id": 2, "name": "Amity", "role": "Abomination strategist"}
  ],
  "active": true,
  "ship": {
    "name": "Owl House",
    "location": "Bonesborough"
  }
}

Output TOON (auto mode):

crew[2]{id,name,role}:
  1,Luz,"Light glyph"
  2,Amity,"Abomination strategist"
active: true
ship:
  name: "Owl House"
  location: Bonesborough

Token Savings: The tabular format (crew[2]{id,name,role}:) reduces token count by ~40% compared to standard JSON array format!

🛠️ API Reference

Core Functions

`to_toon(obj, *, indent=2, mode="auto") -> str`

Convert a Python object to TOON format string.

Parameters:

obj (Any): Python object compatible with JSON model
indent (int): Number of spaces per indentation level (default: 2)
mode (str): Serialization mode - "auto", "compact", or "readable"

Returns: str - TOON-formatted string

Example:

data = {"name": "Luz", "active": True}
toon = to_toon(data, mode="auto")

`from_toon(source, *, mode="strict") -> Any`

Parse a TOON string into a Python object.

Parameters:

source (str): TOON-formatted string to parse
mode (str): Parsing mode - "strict" or "permissive"

Returns: Any - Python object (dict, list, or scalar)

Raises: ToonSyntaxError if TOON string is malformed

Example:

toon = 'name: "Luz"\nactive: true'
data = from_toon(toon)

`validate_toon(source, *, strict=True) -> tuple[bool, List[ValidationError]]`

Validate a TOON string for syntax errors.

Parameters:

source (str): TOON-formatted string to validate
strict (bool): If True, use strict parsing mode

Returns: tuple[bool, List[ValidationError]] - (is_valid, list_of_errors)

`suggest_tabular(obj) -> TabularSuggestion`

Suggest whether an array should use tabular format.

Parameters:

obj (Sequence): Sequence to analyze

Returns: TabularSuggestion - Recommendation with estimated savings

`stream_to_toon(fileobj_in, fileobj_out, *, chunk_size=65536, indent=2, mode="auto") -> int`

Stream JSON from input file to TOON output file.

Parameters:

fileobj_in (TextIO): Input file object containing JSON
fileobj_out (TextIO): Output file object for TOON
chunk_size (int): Size of chunks to read (default: 65536)
indent (int): Indentation level
mode (str): Serialization mode

Returns: int - Number of bytes written

Error Classes

`ToonSyntaxError`

Raised when TOON input does not conform to the grammar.

Attributes:

message (str): Error message
line (int | None): Line number (1-indexed)
column (int | None): Column number (1-indexed)

Example:

try:
    data = from_toon("invalid syntax")
except ToonSyntaxError as e:
    print(f"Error at line {e.line}, column {e.column}: {e.message}")

📝 Requirements

Python >= 3.9
No external dependencies (pure Python)
Optional: tiktoken >= 0.5.2 for token counting (install with pip install .[examples])

📚 Documentation

Comprehensive documentation is available in the repository:

Core Documentation

docs/spec_summary.md – Concise TOON SPEC v2.0 overview with ABNF notes
docs/examples.md – JSON⇄TOON conversion examples
docs/assumptions.md – Documented gaps/assumptions + strict vs. permissive behavior
DESIGN_PHILOSOPHY.md – Architecture decisions and design principles (why zero-dependency core, optional features, etc.)

v0.3.0 Documentation

RELEASE_NOTES.md – Complete v0.3.0 release notes with upgrade guide
CHANGELOG.md – Traditional changelog with version history
YAML_SUPPORT_SUMMARY.md – Complete YAML support implementation details

Performance Optimization Documentation

OPTIMIZATION_README.md – Quick start guide to optimization docs
OPTIMIZATIONS_DOCUMENTED.md – 23-page detailed technical analysis
ALL_OPTIMIZATIONS_SUMMARY.md – Comprehensive optimization overview
SERIALIZER_OPTIMIZATIONS.md – Serializer-specific optimizations
UTILS_OPTIMIZATIONS.md – Utils module improvements
PARALLEL_OPTIMIZATIONS.md – Parallel processing enhancements
OPTIMIZATION_PROJECT_SUMMARY.md – Executive summary of optimization project

Benchmark Scripts

benchmark_optimizations.py – Parser performance benchmarks
benchmark_serializer.py – Serializer performance benchmarks
benchmark_parallel.py – Parallel module benchmarks
benchmark_summary.py – Visual benchmark summary generator

Note: Tabular format heuristics are documented in the code (see toonpy/serializer.py and toonpy/utils.py). The library automatically detects uniform arrays and uses tabular format when it saves tokens.

🌟 Use Cases

Data Serialization: Efficient storage and transmission of structured data
API Development: Lightweight data format for REST APIs
Configuration Files: Human-readable config format with comments support
Data Pipelines: Stream processing of large JSON datasets
ML/AI Projects: Token-optimized format for LLM training data
Documentation: Self-documenting data format with inline comments

📖 Examples

This library includes comprehensive examples covering all use cases from the official TOON specification examples. Check out the examples/ directory:

example1 - Basic tabular array with nested objects
example2 - Nested objects with arrays
example3 - Mixed array types
example4 - Multiline strings
example5 - Empty containers and scalars
example6 - Large tabular arrays
example7 - Complex nested structures
example8 - Deep nesting examples

All examples are compatible with the official TOON specification and can be validated against the reference implementation.

Try them with the CLI:

toonpy to --in examples/example1.json --out examples/example1.generated.toon
toonpy from --in examples/example1.toon --out examples/example1.generated.json

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

Guidelines:

Follow PEP 8 style guidelines
Add tests for new features
Update documentation as needed
Ensure all tests pass: pytest
Keep additions aligned with TOON SPEC v2.0

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👨‍💻 Author

Christian Palomares - @shinjidev

☕ Support

If you find this project helpful, consider supporting my work:

Buy me a coffee to help me continue developing open-source tools for the developer community!

🙏 Acknowledgments

Built following TOON SPEC v2.0
Inspired by the need for efficient, token-optimized data serialization
Uses property-based testing with Hypothesis for robust validation

⭐ Star this repository if you find it useful! ⭐

About

A production-grade Python library and CLI that converts data between JSON and TOON (Token-Oriented Object Notation) while fully conforming to TOON SPEC v2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
notebooks		notebooks
scripts		scripts
tests		tests
toonpy		toonpy
toontools		toontools
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
DESIGN_PHILOSOPHY.md		DESIGN_PHILOSOPHY.md
DOCUMENTATION_CLEANUP_SUMMARY.md		DOCUMENTATION_CLEANUP_SUMMARY.md
LICENSE		LICENSE
OPTIMIZATIONS_DOCUMENTED.md		OPTIMIZATIONS_DOCUMENTED.md
OPTIMIZATION_PROJECT_SUMMARY.md		OPTIMIZATION_PROJECT_SUMMARY.md
OPTIMIZATION_README.md		OPTIMIZATION_README.md
PARALLEL_OPTIMIZATIONS.md		PARALLEL_OPTIMIZATIONS.md
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
SERIALIZER_OPTIMIZATIONS.md		SERIALIZER_OPTIMIZATIONS.md
UTILS_OPTIMIZATIONS.md		UTILS_OPTIMIZATIONS.md
YAML_SUPPORT_SUMMARY.md		YAML_SUPPORT_SUMMARY.md
benchmark_optimizations.py		benchmark_optimizations.py
benchmark_parallel.py		benchmark_parallel.py
benchmark_serializer.py		benchmark_serializer.py
benchmark_summary.py		benchmark_summary.py
benchmark_yaml.py		benchmark_yaml.py
pyproject.toml		pyproject.toml

License

shinjiDev/toonpy

Folders and files

Latest commit

History

Repository files navigation

🔄 toonpy

🚀 What's New in v0.4.0

✨ Features

🔧 1. Lossless Conversion

📊 2. Advanced Parser & Lexer

🚀 3. Automatic Tabular Detection

🛠️ 4. CLI & Utilities

🔄 5. YAML Support (Optional)

📦 Installation

Install from PyPI (Recommended)

Install from Source

Optional: YAML Support

🚀 Quick Start

📖 Detailed Usage

Python API

Basic Conversion

Validation

Tabular Suggestions

Streaming Large Files

YAML Support

Command-Line Interface

Convert JSON to TOON

Convert TOON to JSON

Format a TOON File

Convert YAML to TOON

Convert TOON to YAML

🧪 Testing

⚡ Performance

🚀 Performance Highlights (v0.3.0)

Benchmark Results

🚀 Performance Optimizations

New in v0.3.0: Core Parser & Serializer Optimizations

1. Literal Caching (~30-40% improvement for common values)

2. StringIO-based Comment Removal (~70% improvement for comment-free files)

3. Try/Except Number Parsing (~10-15% improvement in utils)

4. Streamlined Type Checking (~35-40% improvement in serializer)

5. String Slicing for Row Parsing (Significant improvement in utils)

From v0.2.0: Base Optimizations

6. Indentation Caching (~15-20% improvement in nested structures)

7. String Concatenation Optimization (~5-10% general, ~60% in tabular)

8. Compiled Regular Expressions (~3-5% improvement in parsing)

9. Line Ending Normalization Optimization (~1-2% improvement)

10. Optional Parallelism Module (2-4x for large arrays >10K elements)

Performance Comparison Summary

📊 Example Output

🛠️ API Reference

Core Functions

to_toon(obj, *, indent=2, mode="auto") -> str

from_toon(source, *, mode="strict") -> Any

validate_toon(source, *, strict=True) -> tuple[bool, List[ValidationError]]

suggest_tabular(obj) -> TabularSuggestion

stream_to_toon(fileobj_in, fileobj_out, *, chunk_size=65536, indent=2, mode="auto") -> int

Error Classes

ToonSyntaxError

📝 Requirements

📚 Documentation

Core Documentation

v0.3.0 Documentation

Performance Optimization Documentation

Benchmark Scripts

🌟 Use Cases

📖 Examples

🤝 Contributing

📄 License

👨‍💻 Author

☕ Support

🙏 Acknowledgments

About

About

Resources

License

Uh oh!

Stars

Watchers

`to_toon(obj, *, indent=2, mode="auto") -> str`

`from_toon(source, *, mode="strict") -> Any`

`validate_toon(source, *, strict=True) -> tuple[bool, List[ValidationError]]`

`suggest_tabular(obj) -> TabularSuggestion`

`stream_to_toon(fileobj_in, fileobj_out, *, chunk_size=65536, indent=2, mode="auto") -> int`

`ToonSyntaxError`

Packages