[RFC]: [Proposal] Token-Efficient Value Aliasing using $ID References

### Type of Change

- [ ] Breaking change (incompatible with current spec)
- [ ] Backward-compatible addition
- [ ] Clarification or editorial improvement
- [x] New optional feature
- [ ] Changes to conformance requirements

### Summary

**Description:**
While TOON is already excellent at reducing token overhead by eliminating key repetition, there is still significant waste when large string values or long numbers (like IDs, hashes, or company names) repeat across multiple rows in a dataset.

I propose a native way to define **Value Aliases** at the start of the payload to achieve maximum compression.

**The Concept (Value Aliasing):**

• Define a reference at the top using the `$` prefix (e.g., `$1: Very Long Repetitive String`)
• Use the alias (`$1`) throughout the document instead of the full value.

**Comparative Example**

**Current TOON (Standard):**

provider: Global Logistics and International Foods Services S.A.

```
items:
  [product, carrier]

  - Rice, Global Logistics and International Foods Services S.A.
  - Beans, Global Logistics and International Foods Services S.A.
  - Corn, Global Logistics and International Foods Services S.A.
```

**Proposed TOON (With Aliasing):**

provider: $1

```
$1: Global Logistics and International Foods Services S.A.

items:
  [product, carrier]

  - Rice, $1
  - Beans, $1
  - Corn, $1
```

**Why this is a game-changer for LLMs:**

**1. Significant Cost Reduction**
In a list of 50+ items, repeating a 50-character string consumes thousands of unnecessary tokens.
This method collapses that cost to nearly zero.

**2. KV Cache Efficiency**
Modern LLMs (GPT-4, Claude) are highly efficient at handling symbolic pointers.
Defining a variable once and referencing it later improves model focus and output speed.

**3. Large Number Compression**
Perfect for blockchain hashes, UUIDs, or long transaction IDs that appear multiple times in a single context.

**4. Implementation Simplicity**
Requires only a very simple regex-based parser on the client side to “hydrate” the data back to its original form.

**Suggested Syntax**

```
Use $n: for definitions (where n is an ID or a short mnemonic)
Use $n as a value placeholder
```

### Motivation

```markdown
Current LLM (Large Language Model) token costs are directly tied to the number of characters and repeating patterns in the input/output. While TOON already optimizes data structures by removing redundant JSON keys, it does not yet address the redundancy of repetitive long string values (such as company names, addresses, or UUIDs) within the data itself. The main problems this proposal solves are: Token Bloat in Large Datasets: When a specific value (e.g., a "Service Provider" name) repeats across 50+ rows, we are paying for those tokens 50 times. By using a single reference like $1, we collapse that cost to nearly zero. Context Window Efficiency: Long repetitive strings take up valuable space in the model's context window. Aliasing allows us to pack much more actual information into a single request. LLM Pattern Recognition: Models like GPT-4 and Claude are excellent at maintaining symbolic associations. Defining a variable once at the top ($1: Value) and referencing it later is highly reliable and reduces the risk of the model truncating long strings in the middle of a list. This "Token-Efficient Aliasing" turns TOON into a truly compressed transport format for high-volume AI agents and enterprise-level automation.
```

### Detailed Design

```markdown
The proposal introduces a Reference Header section at the very top of the TOON payload, followed by the data body.
1. Syntax for Definitions:
Variable definitions must start with a $ followed by an identifier (numeric or mnemonic) and a colon separator.
Format: $ID: Long String Value
Example: $1: Global Logistics and International Foods Services S.A.
2. Syntax for Referencing:
Throughout the TOON structure (both in simple fields and inside table rows), the $ID acts as a pointer to the defined value.
Example in fields: provider: $1
Example in rows: - ItemName, $1, $2
3. Parsing Logic (The "De-aliasing" Process):
The parser should follow a two-step approach:
Pre-processing: Identify and store all lines starting with $ in a temporary dictionary/map.
Hydration: Before converting the TOON structure to a final JSON/Object, globally replace all occurrences of $ID with their corresponding mapped values.
4. Scope and Rules:
Global Scope: Variables defined at the top apply to the entire payload.
Type Neutrality: While primarily intended for long strings, this can also be used for large numbers or repetitive complex IDs (like Blockchain hashes) to ensure consistency and save tokens.
```

### Examples

```markdown
Below is a comparison between the current TOON specification and the proposed Value Aliasing model.
1. Current TOON Specification (Redundant Values):
In this example, the long company name and the status are repeated multiple times, wasting tokens.


order_id: 5520
default_warehouse: Logística Global de Alimentos do Brasil S.A.
status_message: Product successfully dispatched to destination

items:
  [prod, warehouse, status]
  - Monitor Gamer, Logística Global de Alimentos do Brasil S.A., Product successfully dispatched to destination
  - Mechanical Keyboard, Logística Global de Alimentos do Brasil S.A., Product successfully dispatched to destination
  - Gaming Mouse, Logística Global de Alimentos do Brasil S.A., Product successfully dispatched to destination


2. Proposed TOON with Value Aliasing (Optimized):
The same data, but significantly more compact and token-efficient.


$1: Logística Global de Alimentos do Brasil S.A.
$2: Product successfully dispatched to destination

order_id: 5520
default_warehouse: $1
status_message: $2

items:
  [prod, warehouse, status]
  - Monitor Gamer, $1, $2
  - Mechanical Keyboard, $1, $2
  - Gaming Mouse, $1, $2
```

### Drawbacks

While the benefits in token savings are significant, there are a few trade-offs to consider:

**Client-Side Processing:** This introduces a small overhead for the application layer. The client (or server) must implement a "hydration" step to replace aliases with their actual values before saving data to a database.

**Readability for Humans:** While LLMs handle symbolic references perfectly, a raw TOON file with many $1, $2, $3 variables becomes slightly harder for a human to read at a glance compared to the full-text version.

**Parsing Complexity:** The parser needs to be slightly more robust to handle cases where a user might accidentally define a variable but not use it, or vice versa. However, this can be easily mitigated with simple Regex or a basic dictionary map.

### Alternatives Considered

**Standard JSON/YAML:** These formats are natively supported but are extremely token-heavy due to repeated keys and structural syntax (brackets, quotes, indentation). They do not offer a built-in way to alias values without increasing schema complexity.

**Schema-only TOON (Current):** The current TOON spec handles key repetition by defining a header ` [key1, key2]`. However, it lacks a mechanism for **Value Aliasing**. Without this proposal, long strings must be repeated in every row, leading to high costs in large datasets.

**Compression Algorithms (Gzip/Zlib):** While highly effective for storage, LLMs cannot "read" binary compressed data. We need a "semantic compression" that is human-readable and LLM-understandable, which is exactly what the $ID aliasing provides.

**Positional References:** Using only numbers (e.g., - Item, 1, 2) without the $ prefix. We considered this, but the $ prefix is safer as it clearly distinguishes a reference from a literal number, avoiding parsing errors.

### Impact on Implementations

The introduction of **Value Aliasing** has a low-to-moderate impact on existing TOON parsers and generators, as it follows a non-breaking incremental approach.

**Parser Updates:** Current parsers will need a pre-processing step. This involves a single regex pass or a line-by-line scan to identify $ID: definitions at the beginning of the payload.

**State Management:** The parser must maintain a simple key-value map (dictionary) during the lifecycle of the "hydration" process.

**Forward Compatibility:** Existing TOON files that do not use the $ symbol will remain 100% compatible. The aliasing logic only triggers when a $ prefix is detected.

**Generator Logic:** Libraries that generate TOON from JSON/Objects can be optimized to automatically detect repetitive strings and convert them into aliases to save user tokens.

### Migration Strategy

_No response_

### Test Cases

```json

```

### Affected Specification Sections

**Grammar and Syntax:** A new rule must be added to support the definition of aliases using the $ prefix followed by a colon (e.g., $ID: value).

**Data Types and References:** Introduction of a "Reference Type" or "Pointer" to differentiate between literal strings and aliased values during the parsing process.

**Header Structure:** Modification to the top-level structure to allow a "Metadata/Reference Header" section before the main object or table body.

**Parsing Algorithm:** Addition of a mandatory "Hydration" step in the reference implementation to ensure aliases are resolved before the data is consumed by applications.


### Unresolved Questions

_No response_

### Additional Context

_No response_

### Checklist

- [x] I have read the RFC process in CONTRIBUTING.md
- [x] I have searched for similar proposals
- [x] I have considered backward compatibility
- [x] I understand this may require community discussion before acceptance

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: [Proposal] Token-Efficient Value Aliasing using $ID References #36

Type of Change

Summary

Motivation

Detailed Design

Examples

Drawbacks

Alternatives Considered

Impact on Implementations

Migration Strategy

Test Cases

Affected Specification Sections

Unresolved Questions

Additional Context

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC]: [Proposal] Token-Efficient Value Aliasing using $ID References #36

Description

Type of Change

Summary

Motivation

Detailed Design

Examples

Drawbacks

Alternatives Considered

Impact on Implementations

Migration Strategy

Test Cases

Affected Specification Sections

Unresolved Questions

Additional Context

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions