Skip to content

perf: improve WABinary encoder and decoder (+3.7x faster)#2513

Open
Santosl2 wants to merge 20 commits intoWhiskeySockets:masterfrom
Santosl2:feat/new-binary-encoder-and-decoder
Open

perf: improve WABinary encoder and decoder (+3.7x faster)#2513
Santosl2 wants to merge 20 commits intoWhiskeySockets:masterfrom
Santosl2:feat/new-binary-encoder-and-decoder

Conversation

@Santosl2
Copy link
Copy Markdown
Contributor

@Santosl2 Santosl2 commented Apr 29, 2026

Improved Issue

Refactors the binary encoder/decoder to a class-based, allocation-efficient architecture with strict bounds checking, stable object shapes, and optimized buffer usage. Eliminates per-call closure allocation overhead, improves JIT optimization (inline caching), and reduces GC pressure under deep recursion.

Also introduces explicit JID decoding paths, correct empty string handling, and preallocated structures for predictable performance.


Core Changes

Architecture

  • Replaced closure-heavy encoder/decoder with ByteEncoder / ByteDecoder classes
  • All hot-path methods live on prototype → zero function allocations per encode/decode
  • Stable object shape → better V8 inline caching + inlining

Memory & Performance

  • Dynamic buffer growth (doubling strategy) in encoder
    → minimizes reallocations and copies
  • Buffer.subarray used instead of copies
    → zero-copy slicing
  • Preallocated arrays in decoder (readList)
    → avoids push/resize overhead
  • Removed closure allocations (~10 per node previously)
    → eliminates GC churn in recursive trees

Token System

  • Replaced object lookup with Map (TOKEN_MAP_REFACTOR)
    → predictable O(1), no prototype pollution risk
  • Tokens initialized once via Map#set
  • Supports both single and double byte dictionaries

Encoding Improvements

  • Bounds-checked writes (ensure) for all operations

  • Unified integer writing (writeUInt*)

  • Correct empty string handling ('' !== LIST_EMPTY)

  • Packed encoding (nibble/hex):

    • Shared helpers
    • Length validation
    • Strict charset validation

Decoding Improvements

  • Bounds-safe reads (checkEOS) on every operation
  • Improved packed decoding logic symmetry with encoder

Performance Rationale

Before (closure-based)

  • ~10 function allocations per node
  • Deep recursion → O(n * children) allocations
  • Frequent GC + inline cache pollution

After (class-based)

  • 0 allocations per call (methods on prototype)
  • Stable hidden classes
  • JIT-friendly → aggressive inlining

Observed effect

  • Small payloads: minor improvement (I/O bound)
  • Deep trees: >100% faster (allocation bound)

Bench BEFORE

 BENCHMARK: WABinary encode.ts + decode.ts
============================================================

── ENCODE ──────────────────────────────────────────────────

  ┌─ Nó simples (attrs string, sem conteúdo)
  │  Iterações  : 100,000
  │  Tempo total: 409.69 ms
  │  CPU ops/s  : 244,084
  │  CPU ms/op  : 0.0041 ms
  │  Heap antes : 128.95 MB
  │  Heap depois: 128.89 MB
  │  ΔHeap      : -0.06 MB
  └─ Alocação/s : -0.14 MB/s

  ┌─ Nó com conteúdo binário (512 bytes)
  │  Iterações  : 100,000
  │  Tempo total: 871.39 ms
  │  CPU ops/s  : 114,759
  │  CPU ms/op  : 0.0087 ms
  │  Heap antes : 128.91 MB
  │  Heap depois: 128.97 MB
  │  ΔHeap      : +0.06 MB
  └─ Alocação/s : 0.07 MB/s

  ┌─ Nó com JIDs (@s.whatsapp.net)
  │  Iterações  : 100,000
  │  Tempo total: 836.77 ms
  │  CPU ops/s  : 119,508
  │  CPU ms/op  : 0.0084 ms
  │  Heap antes : 128.96 MB
  │  Heap depois: 128.98 MB
  │  ΔHeap      : +0.02 MB
  └─ Alocação/s : 0.02 MB/s

  ┌─ Nó com nibble-packed strings
  │  Iterações  : 100,000
  │  Tempo total: 225.79 ms
  │  CPU ops/s  : 442,898
  │  CPU ms/op  : 0.0023 ms
  │  Heap antes : 128.95 MB
  │  Heap depois: 128.96 MB
  │  ΔHeap      : +0.01 MB
  └─ Alocação/s : 0.07 MB/s

  ┌─ Nó aninhado (3 filhos, mixed content)
  │  Iterações  : 100,000
  │  Tempo total: 568.43 ms
  │  CPU ops/s  : 175,923
  │  CPU ms/op  : 0.0057 ms
  │  Heap antes : 128.96 MB
  │  Heap depois: 128.96 MB
  │  ΔHeap      : -0.00 MB
  └─ Alocação/s : -0.00 MB/s

  ┌─ Array grande (50 filhos)
  │  Iterações  : 10,000
  │  Tempo total: 845.85 ms
  │  CPU ops/s  : 11,822
  │  CPU ms/op  : 0.0846 ms
  │  Heap antes : 128.96 MB
  │  Heap depois: 128.96 MB
  │  ΔHeap      : +0.01 MB
  └─ Alocação/s : 0.01 MB/s

── DECODE ──────────────────────────────────────────────────

  ┌─ Nó simples (attrs string, sem conteúdo)
  │  Iterações  : 100,000
  │  Tempo total: 188.61 ms
  │  CPU ops/s  : 530,208
  │  CPU ms/op  : 0.0019 ms
  │  Heap antes : 128.99 MB
  │  Heap depois: 129.03 MB
  │  ΔHeap      : +0.04 MB
  └─ Alocação/s : 0.19 MB/s

  ┌─ Nó com conteúdo binário (512 bytes)
  │  Iterações  : 100,000
  │  Tempo total: 102.25 ms
  │  CPU ops/s  : 977,984
  │  CPU ms/op  : 0.0010 ms
  │  Heap antes : 129.01 MB
  │  Heap depois: 129.07 MB
  │  ΔHeap      : +0.06 MB
  └─ Alocação/s : 0.59 MB/s

  ┌─ Nó com JIDs (@s.whatsapp.net)
  │  Iterações  : 100,000
  │  Tempo total: 307.08 ms
  │  CPU ops/s  : 325,646
  │  CPU ms/op  : 0.0031 ms
  │  Heap antes : 129.06 MB
  │  Heap depois: 129.06 MB
  │  ΔHeap      : -0.01 MB
  └─ Alocação/s : -0.03 MB/s

  ┌─ Nó com nibble-packed strings
  │  Iterações  : 100,000
  │  Tempo total: 125.13 ms
  │  CPU ops/s  : 799,145
  │  CPU ms/op  : 0.0013 ms
  │  Heap antes : 129.06 MB
  │  Heap depois: 129.06 MB
  │  ΔHeap      : -0.00 MB
  └─ Alocação/s : -0.03 MB/s

  ┌─ Nó aninhado (3 filhos, mixed content)
  │  Iterações  : 100,000
  │  Tempo total: 299.19 ms
  │  CPU ops/s  : 334,232
  │  CPU ms/op  : 0.0030 ms
  │  Heap antes : 129.04 MB
  │  Heap depois: 129.06 MB
  │  ΔHeap      : +0.02 MB
  └─ Alocação/s : 0.08 MB/s

  ┌─ Array grande (50 filhos)
  │  Iterações  : 10,000
  │  Tempo total: 585.79 ms
  │  CPU ops/s  : 17,071
  │  CPU ms/op  : 0.0586 ms
  │  Heap antes : 129.06 MB
  │  Heap depois: 129.07 MB
  │  ΔHeap      : +0.00 MB
  └─ Alocação/s : 0.00 MB/s

── ROUNDTRIP (encode → decode) ─────────────────────────────

  ┌─ Nó simples roundtrip
  │  Iterações  : 100,000
  │  Tempo total: 644.18 ms
  │  CPU ops/s  : 155,237
  │  CPU ms/op  : 0.0064 ms
  │  Heap antes : 129.08 MB
  │  Heap depois: 129.07 MB
  │  ΔHeap      : -0.01 MB
  └─ Alocação/s : -0.01 MB/s

  ┌─ Nó binário roundtrip
  │  Iterações  : 100,000
  │  Tempo total: 998.44 ms
  │  CPU ops/s  : 100,156
  │  CPU ms/op  : 0.0100 ms
  │  Heap antes : 129.07 MB
  │  Heap depois: 129.07 MB
  │  ΔHeap      : -0.00 MB
  └─ Alocação/s : -0.00 MB/s

  ┌─ Array grande roundtrip
  │  Iterações  : 10,000
  │  Tempo total: 1415.24 ms
  │  CPU ops/s  : 7,066
  │  CPU ms/op  : 0.1415 ms
  │  Heap antes : 129.07 MB
  │  Heap depois: 129.07 MB
  │  ΔHeap      : +0.01 MB
  └─ Alocação/s : 0.00 MB/s

── MEMORY STRESS (encode + decode, 1M operações simples) ──

  ┌─ Stress: 1,000,000 roundtrips
  │  Tempo      : 6.35 s
  │  ops/s      : 157,404
  │  Heap antes : 129.07 MB
  │  Heap depois: 129.08 MB
  └─ ΔHeap      : +0.01 MB

============================================================
 Benchmark concluído.
============================================================

Bench After

 BENCHMARK: WABinary encode.ts + decode.ts
============================================================

── ENCODE ──────────────────────────────────────────────────

  ┌─ Nó simples (attrs string, sem conteúdo)
  │  Iterações  : 100,000
  │  Tempo total: 110.76 ms
  │  CPU ops/s  : 902,887
  │  CPU ms/op  : 0.0011 ms
  │  Heap antes : 130.59 MB
  │  Heap depois: 130.60 MB
  │  ΔHeap      : +0.02 MB
  └─ Alocação/s : 0.14 MB/s

  ┌─ Nó com conteúdo binário (512 bytes)
  │  Iterações  : 100,000
  │  Tempo total: 45.29 ms
  │  CPU ops/s  : 2,207,980
  │  CPU ms/op  : 0.0005 ms
  │  Heap antes : 130.60 MB
  │  Heap depois: 130.67 MB
  │  ΔHeap      : +0.07 MB
  └─ Alocação/s : 1.49 MB/s

  ┌─ Nó com JIDs (@s.whatsapp.net)
  │  Iterações  : 100,000
  │  Tempo total: 182.51 ms
  │  CPU ops/s  : 547,916
  │  CPU ms/op  : 0.0018 ms
  │  Heap antes : 130.60 MB
  │  Heap depois: 130.69 MB
  │  ΔHeap      : +0.09 MB
  └─ Alocação/s : 0.51 MB/s

  ┌─ Nó com nibble-packed strings
  │  Iterações  : 100,000
  │  Tempo total: 71.30 ms
  │  CPU ops/s  : 1,402,565
  │  CPU ms/op  : 0.0007 ms
  │  Heap antes : 130.60 MB
  │  Heap depois: 130.68 MB
  │  ΔHeap      : +0.08 MB
  └─ Alocação/s : 1.14 MB/s

  ┌─ Nó aninhado (3 filhos, mixed content)
  │  Iterações  : 100,000
  │  Tempo total: 138.14 ms
  │  CPU ops/s  : 723,916
  │  CPU ms/op  : 0.0014 ms
  │  Heap antes : 130.60 MB
  │  Heap depois: 129.97 MB
  │  ΔHeap      : -0.63 MB
  └─ Alocação/s : -4.55 MB/s

  ┌─ Array grande (50 filhos)
  │  Iterações  : 10,000
  │  Tempo total: 118.34 ms
  │  CPU ops/s  : 84,503
  │  CPU ms/op  : 0.0118 ms
  │  Heap antes : 129.92 MB
  │  Heap depois: 129.97 MB
  │  ΔHeap      : +0.04 MB
  └─ Alocação/s : 0.38 MB/s

── DECODE ──────────────────────────────────────────────────

  ┌─ Nó simples (attrs string, sem conteúdo)
  │  Iterações  : 100,000
  │  Tempo total: 55.27 ms
  │  CPU ops/s  : 1,809,457
  │  CPU ms/op  : 0.0006 ms
  │  Heap antes : 129.99 MB
  │  Heap depois: 130.05 MB
  │  ΔHeap      : +0.06 MB
  └─ Alocação/s : 1.10 MB/s

  ┌─ Nó com conteúdo binário (512 bytes)
  │  Iterações  : 100,000
  │  Tempo total: 19.00 ms
  │  CPU ops/s  : 5,264,236
  │  CPU ms/op  : 0.0002 ms
  │  Heap antes : 130.01 MB
  │  Heap depois: 130.09 MB
  │  ΔHeap      : +0.08 MB
  └─ Alocação/s : 4.41 MB/s

  ┌─ Nó com JIDs (@s.whatsapp.net)
  │  Iterações  : 100,000
  │  Tempo total: 91.24 ms
  │  CPU ops/s  : 1,096,049
  │  CPU ms/op  : 0.0009 ms
  │  Heap antes : 130.03 MB
  │  Heap depois: 130.08 MB
  │  ΔHeap      : +0.05 MB
  └─ Alocação/s : 0.56 MB/s

  ┌─ Nó com nibble-packed strings
  │  Iterações  : 100,000
  │  Tempo total: 43.89 ms
  │  CPU ops/s  : 2,278,191
  │  CPU ms/op  : 0.0004 ms
  │  Heap antes : 130.04 MB
  │  Heap depois: 130.09 MB
  │  ΔHeap      : +0.05 MB
  └─ Alocação/s : 1.14 MB/s

  ┌─ Nó aninhado (3 filhos, mixed content)
  │  Iterações  : 100,000
  │  Tempo total: 85.04 ms
  │  CPU ops/s  : 1,175,929
  │  CPU ms/op  : 0.0009 ms
  │  Heap antes : 130.04 MB
  │  Heap depois: 130.11 MB
  │  ΔHeap      : +0.08 MB
  └─ Alocação/s : 0.90 MB/s

  ┌─ Array grande (50 filhos)
  │  Iterações  : 10,000
  │  Tempo total: 65.47 ms
  │  CPU ops/s  : 152,744
  │  CPU ms/op  : 0.0065 ms
  │  Heap antes : 130.07 MB
  │  Heap depois: 130.10 MB
  │  ΔHeap      : +0.03 MB
  └─ Alocação/s : 0.48 MB/s

── ROUNDTRIP (encode → decode) ─────────────────────────────

  ┌─ Nó simples roundtrip
  │  Iterações  : 100,000
  │  Tempo total: 161.97 ms
  │  CPU ops/s  : 617,403
  │  CPU ms/op  : 0.0016 ms
  │  Heap antes : 130.00 MB
  │  Heap depois: 130.11 MB
  │  ΔHeap      : +0.11 MB
  └─ Alocação/s : 0.70 MB/s

  ┌─ Nó binário roundtrip
  │  Iterações  : 100,000
  │  Tempo total: 67.61 ms
  │  CPU ops/s  : 1,479,068
  │  CPU ms/op  : 0.0007 ms
  │  Heap antes : 130.03 MB
  │  Heap depois: 130.12 MB
  │  ΔHeap      : +0.09 MB
  └─ Alocação/s : 1.39 MB/s

  ┌─ Array grande roundtrip
  │  Iterações  : 10,000
  │  Tempo total: 191.88 ms
  │  CPU ops/s  : 52,116
  │  CPU ms/op  : 0.0192 ms
  │  Heap antes : 130.04 MB
  │  Heap depois: 130.11 MB
  │  ΔHeap      : +0.06 MB
  └─ Alocação/s : 0.33 MB/s

── MEMORY STRESS (encode + decode, 1M operações simples) ──

  ┌─ Stress: 1,000,000 roundtrips
  │  Tempo      : 1.54 s
  │  ops/s      : 649,972
  │  Heap antes : 130.11 MB
  │  Heap depois: 130.12 MB
  └─ ΔHeap      : +0.02 MB

============================================================
 Benchmark concluído.

Summary by CodeRabbit

  • Refactor
    • Improved binary encoding and decoding infrastructure for enhanced performance and maintainability.
    • Optimized buffer handling with read-only views to reduce memory overhead.
    • Refactored token lookup structure for more efficient retrieval operations.
    • Streamlined internal encoding pipeline with optimized data structure handling.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 29, 2026

📝 Walkthrough

Walkthrough

Token storage refactors from object to Map structure across the constants file. Decode and encode modules are restructured into dedicated ByteDecoder and ByteEncoder classes respectively, replacing helper closures and number array buffering while preserving public API behavior.

Changes

Cohort / File(s) Summary
Token Map Refactoring
src/WABinary/constants.ts
Replaces TOKEN_MAP object with TOKEN_MAP_REFACTOR Map structure. Population logic updated to use .set() method with identical { dict?, index } payload, maintaining computed indices.
Binary Node Decoding
src/WABinary/decode.ts
Extracts decompression logic into dedicated ByteDecoder class with internal index tracking and instance methods (readByte, readBytes, checkEOS). Switches from slice() to subarray() for buffer operations. Wrapper function syncs external indexRef.index with decoder cursor.
Binary Node Encoding
src/WABinary/encode.ts
Refactors from number[] buffer building to ByteEncoder class with direct resizable Buffer writes. Tokenization updated to use TOKEN_MAP_REFACTOR.get() instead of object property access. Element counting now uses explicit iteration rather than array filtering.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • purpshell

Poem

🐰 The tokens hop into a Map so fine,
Decoders class-ified, encoders align,
Subarrays dance where slices once played,
A refactored warren where bytes are arrayed! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately summarizes the main change: a performance optimization refactoring of the WABinary encoder and decoder with a claimed 3.7x speedup.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@whiskeysockets-bot
Copy link
Copy Markdown
Contributor

whiskeysockets-bot commented Apr 29, 2026

Thanks for opening this pull request and contributing to the project!

The next step is for the maintainers to review your changes. If everything looks good, it will be approved and merged into the main branch.

In the meantime, anyone in the community is encouraged to test this pull request and provide feedback.

✅ How to confirm it works

If you’ve tested this PR, please comment below with:

Tested and working ✅

This helps us speed up the review and merge process.

📦 To test this PR locally:

# NPM
npm install @whiskeysockets/baileys@Santosl2/Baileys-1#feat/new-binary-encoder-and-decoder

# Yarn (v2+)
yarn add @whiskeysockets/baileys@Santosl2/Baileys-1#feat/new-binary-encoder-and-decoder

# PNPM
pnpm add @whiskeysockets/baileys@Santosl2/Baileys-1#feat/new-binary-encoder-and-decoder

If you encounter any issues or have feedback, feel free to comment as well.

@Santosl2
Copy link
Copy Markdown
Contributor Author

@jlucaso1 or @purpshell can you run real stress test?

@Santosl2
Copy link
Copy Markdown
Contributor Author

Bechmark comparison

image

@Santosl2 Santosl2 marked this pull request as ready for review April 30, 2026 14:10
@Santosl2 Santosl2 changed the title Feat/new binary encoder and decoder perf: new binary encoder and decoder Apr 30, 2026
@Santosl2 Santosl2 changed the title perf: new binary encoder and decoder perf: improve WABinary encoder and decoder Apr 30, 2026
@Santosl2 Santosl2 changed the title perf: improve WABinary encoder and decoder perf: improve WABinary encoder and decoder (+3.7x faster) Apr 30, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/WABinary/constants.ts`:
- Around line 1295-1305: Restore the old exported symbol TOKEN_MAP as a
compatibility alias to the new TOKEN_MAP_REFACTOR so external callers and
BinaryNodeCodingOptions keep the same shape; add a re-export or alias that makes
TOKEN_MAP refer to the same Map instance as TOKEN_MAP_REFACTOR (e.g., export
TOKEN_MAP aliasing TOKEN_MAP_REFACTOR) near where TOKEN_MAP_REFACTOR is defined
so both names point to the identical Map.

In `@src/WABinary/decode.ts`:
- Around line 91-99: The readInt implementation in readInt(n, littleEndian) uses
bitwise << and | which coerce to signed 32-bit and can produce negative values
for 4-byte lengths (BINARY_32), allowing checkEOS to be bypassed; update readInt
to compute unsigned values (either by delegating to Buffer.readUIntBE/readUIntLE
when n <= 6 or by using arithmetic accumulation: val = val * 256 + byte) so
lengths remain unsigned, and keep references to checkEOS, readInt, next,
readBytes and readStringFromChars to verify callers still receive a non-negative
length. Ensure littleEndian handling is preserved and tests for 4-byte high-bit
cases are added.

In `@src/WABinary/encode.ts`:
- Around line 247-263: The attrCount computation currently increments for any
non-null/undefined attribute but encoder only serializes string values in the
later loop (see attrs, attrCount, encoder.writeListStart, and the string-write
loop), causing an incorrect list size when non-strings (e.g., numbers) are
present; fix by either (A) changing the counting loop to only increment when
typeof v === 'string', or (B) validate attrs before encoding and throw an error
if any attr value is non-string so encoder.writeListStart is sized correctly;
apply the same string-only rule consistently before writing tag/content.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: de6deb48-39ce-4157-948f-e2acd765e608

📥 Commits

Reviewing files that changed from the base of the PR and between ca61ac1 and 93046e3.

📒 Files selected for processing (3)
  • src/WABinary/constants.ts
  • src/WABinary/decode.ts
  • src/WABinary/encode.ts

Comment thread src/WABinary/constants.ts
Comment on lines +1295 to 1305
export const TOKEN_MAP_REFACTOR: Map<string, { dict?: number; index: number }> = new Map()

for (const [i, SINGLE_BYTE_TOKEN] of SINGLE_BYTE_TOKENS.entries()) {
TOKEN_MAP[SINGLE_BYTE_TOKEN] = { index: i }
TOKEN_MAP_REFACTOR.set(SINGLE_BYTE_TOKEN, { index: i })
}

for (const [i, DOUBLE_BYTE_TOKEN] of DOUBLE_BYTE_TOKENS.entries()) {
for (const [j, element] of DOUBLE_BYTE_TOKEN.entries()) {
TOKEN_MAP[element] = { dict: i, index: j }
TOKEN_MAP_REFACTOR.set(element, { dict: i, index: j })
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the old TOKEN_MAP export for compatibility.

Removing TOKEN_MAP changes the public BinaryNodeCodingOptions shape and breaks existing imports/custom opts immediately. Unless this PR is intentionally shipping a semver break, please re-export the same Map under the old name.

Compatibility alias
 export const TOKEN_MAP_REFACTOR: Map<string, { dict?: number; index: number }> = new Map()
+export const TOKEN_MAP = TOKEN_MAP_REFACTOR
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export const TOKEN_MAP_REFACTOR: Map<string, { dict?: number; index: number }> = new Map()
for (const [i, SINGLE_BYTE_TOKEN] of SINGLE_BYTE_TOKENS.entries()) {
TOKEN_MAP[SINGLE_BYTE_TOKEN] = { index: i }
TOKEN_MAP_REFACTOR.set(SINGLE_BYTE_TOKEN, { index: i })
}
for (const [i, DOUBLE_BYTE_TOKEN] of DOUBLE_BYTE_TOKENS.entries()) {
for (const [j, element] of DOUBLE_BYTE_TOKEN.entries()) {
TOKEN_MAP[element] = { dict: i, index: j }
TOKEN_MAP_REFACTOR.set(element, { dict: i, index: j })
}
}
export const TOKEN_MAP_REFACTOR: Map<string, { dict?: number; index: number }> = new Map()
export const TOKEN_MAP = TOKEN_MAP_REFACTOR
for (const [i, SINGLE_BYTE_TOKEN] of SINGLE_BYTE_TOKENS.entries()) {
TOKEN_MAP_REFACTOR.set(SINGLE_BYTE_TOKEN, { index: i })
}
for (const [i, DOUBLE_BYTE_TOKEN] of DOUBLE_BYTE_TOKENS.entries()) {
for (const [j, element] of DOUBLE_BYTE_TOKEN.entries()) {
TOKEN_MAP_REFACTOR.set(element, { dict: i, index: j })
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/WABinary/constants.ts` around lines 1295 - 1305, Restore the old exported
symbol TOKEN_MAP as a compatibility alias to the new TOKEN_MAP_REFACTOR so
external callers and BinaryNodeCodingOptions keep the same shape; add a
re-export or alias that makes TOKEN_MAP refer to the same Map instance as
TOKEN_MAP_REFACTOR (e.g., export TOKEN_MAP aliasing TOKEN_MAP_REFACTOR) near
where TOKEN_MAP_REFACTOR is defined so both names point to the identical Map.

Comment thread src/WABinary/decode.ts
Comment on lines +91 to 99
private readInt(n: number, littleEndian = false): number {
this.checkEOS(n)
let val = 0
for (let i = 0; i < n; i++) {
const shift = littleEndian ? i : n - 1 - i
val |= next()! << (shift * 8)
val |= this.next() << (shift * 8)
}

return val
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

In JavaScript, do bitwise shift/or operators coerce numbers to signed 32-bit integers, and does Node.js Buffer.readUIntBE(offset, 4)return an unsigned value up to0xffffffff?

💡 Result:

Yes, in JavaScript, bitwise shift and logical operators (such as <<, >>, >>>, &, |, ^, ~) coerce their operands to signed 32-bit integers using the ToInt32 abstract operation, which converts numbers to the range [-2^31, 2^31 - 1] in two's complement representation. This applies to all bitwise operators except the unsigned right shift (>>>) right operand, which uses ToUint32. Yes, Node.js Buffer.readUIntBE(offset, 4) reads 4 bytes from the buffer at the given offset and interprets them as an unsigned big-endian 32-bit integer, returning a Number value from 0 to 0xffffffff (4294967295). This is distinct from signed methods like readInt32BE and matches the behavior of the specialized readUInt32BE method. The returned value is a safe JavaScript Number since it fits within the safe integer range up to 2^53 - 1.

Citations:


🏁 Script executed:

# Check if the file exists and read the exact code
cat -n src/WABinary/decode.ts | sed -n '85,105p'

Repository: WhiskeySockets/Baileys

Length of output: 680


🏁 Script executed:

# Find all usages of readInt(4) and BINARY_32
rg "readInt\(4\)|BINARY_32" -B 2 -A 2

Repository: WhiskeySockets/Baileys

Length of output: 1171


🏁 Script executed:

# Inspect the checkEOS method and readBytes signature
ast-grep --pattern 'checkEOS($_, $_) { $$$ }' || rg "checkEOS|readBytes" -B 1 -A 3 | head -50

Repository: WhiskeySockets/Baileys

Length of output: 2085


🏁 Script executed:

# Search for other bitwise operations on multiple bytes
rg "<<\s*\(" -B 3 -A 1 src/WABinary/decode.ts | head -40

Repository: WhiskeySockets/Baileys

Length of output: 199


🏁 Script executed:

# Check for readInt usage with other byte counts
rg "readInt\(" -B 1 -A 1

Repository: WhiskeySockets/Baileys

Length of output: 1108


Stop building 4-byte lengths with bitwise operators.

<< and | coerce to signed 32-bit ints. For any BINARY_32 length with the high bit set, readInt(4) becomes negative, so checkEOS() can be bypassed and the subsequent readBytes() / readStringFromChars() calls run with an invalid length. Switch to Buffer.readUIntBE/LE() or arithmetic accumulation.

Use Buffer's unsigned readers
 private readInt(n: number, littleEndian = false): number {
 	this.checkEOS(n)
-	let val = 0
-	for (let i = 0; i < n; i++) {
-		const shift = littleEndian ? i : n - 1 - i
-		val |= this.next() << (shift * 8)
-	}
-
-	return val
+	const val = littleEndian
+		? this.buffer.readUIntLE(this.index, n)
+		: this.buffer.readUIntBE(this.index, n)
+	this.index += n
+	return val
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private readInt(n: number, littleEndian = false): number {
this.checkEOS(n)
let val = 0
for (let i = 0; i < n; i++) {
const shift = littleEndian ? i : n - 1 - i
val |= next()! << (shift * 8)
val |= this.next() << (shift * 8)
}
return val
private readInt(n: number, littleEndian = false): number {
this.checkEOS(n)
const val = littleEndian
? this.buffer.readUIntLE(this.index, n)
: this.buffer.readUIntBE(this.index, n)
this.index += n
return val
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/WABinary/decode.ts` around lines 91 - 99, The readInt implementation in
readInt(n, littleEndian) uses bitwise << and | which coerce to signed 32-bit and
can produce negative values for 4-byte lengths (BINARY_32), allowing checkEOS to
be bypassed; update readInt to compute unsigned values (either by delegating to
Buffer.readUIntBE/readUIntLE when n <= 6 or by using arithmetic accumulation:
val = val * 256 + byte) so lengths remain unsigned, and keep references to
checkEOS, readInt, next, readBytes and readStringFromChars to verify callers
still receive a non-negative length. Ensure littleEndian handling is preserved
and tests for 4-byte high-bit cases are added.

Comment thread src/WABinary/encode.ts
Comment on lines +247 to +263
let attrCount = 0
if (attrs) {
for (const k in attrs) {
const v = attrs[k]
if (v !== undefined && v !== null) attrCount++
}
}

writeListStart(2 * validAttributes.length + 1 + (typeof content !== 'undefined' ? 1 : 0))
writeString(tag)
encoder.writeListStart(2 * attrCount + 1 + (content !== undefined ? 1 : 0))
encoder.writeString(tag)

for (const key of validAttributes) {
if (typeof attrs[key] === 'string') {
writeString(key)
writeString(attrs[key])
if (attrs) {
for (const key in attrs) {
const val = attrs[key]
if (typeof val === 'string') {
encoder.writeString(key)
encoder.writeString(val)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make attribute counting match attribute serialization.

Lines 247-252 count any non-null attribute, but Lines 259-264 only write string values. If a JS caller passes something like { foo: 1 }, the list header is too large and the rest of the node is encoded at the wrong offsets. Count only strings here, or throw before writing.

Guard non-string attrs before sizing the list
 let attrCount = 0
 if (attrs) {
-	for (const k in attrs) {
-		const v = attrs[k]
-		if (v !== undefined && v !== null) attrCount++
+	for (const k in attrs) {
+		const v = attrs[k]
+		if (v === undefined || v === null) {
+			continue
+		}
+		if (typeof v !== 'string') {
+			throw new Error(`invalid attribute value for "${k}"`)
+		}
+		attrCount++
 	}
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let attrCount = 0
if (attrs) {
for (const k in attrs) {
const v = attrs[k]
if (v !== undefined && v !== null) attrCount++
}
}
writeListStart(2 * validAttributes.length + 1 + (typeof content !== 'undefined' ? 1 : 0))
writeString(tag)
encoder.writeListStart(2 * attrCount + 1 + (content !== undefined ? 1 : 0))
encoder.writeString(tag)
for (const key of validAttributes) {
if (typeof attrs[key] === 'string') {
writeString(key)
writeString(attrs[key])
if (attrs) {
for (const key in attrs) {
const val = attrs[key]
if (typeof val === 'string') {
encoder.writeString(key)
encoder.writeString(val)
let attrCount = 0
if (attrs) {
for (const k in attrs) {
const v = attrs[k]
if (v === undefined || v === null) {
continue
}
if (typeof v !== 'string') {
throw new Error(`invalid attribute value for "${k}"`)
}
attrCount++
}
}
encoder.writeListStart(2 * attrCount + 1 + (content !== undefined ? 1 : 0))
encoder.writeString(tag)
if (attrs) {
for (const key in attrs) {
const val = attrs[key]
if (typeof val === 'string') {
encoder.writeString(key)
encoder.writeString(val)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/WABinary/encode.ts` around lines 247 - 263, The attrCount computation
currently increments for any non-null/undefined attribute but encoder only
serializes string values in the later loop (see attrs, attrCount,
encoder.writeListStart, and the string-write loop), causing an incorrect list
size when non-strings (e.g., numbers) are present; fix by either (A) changing
the counting loop to only increment when typeof v === 'string', or (B) validate
attrs before encoding and throw an error if any attr value is non-string so
encoder.writeListStart is sized correctly; apply the same string-only rule
consistently before writing tag/content.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/WABinary/decode.ts">

<violation number="1" location="src/WABinary/decode.ts:96">
P1: `readInt(4)` uses `<<` and `|` which coerce operands to **signed** 32-bit integers. When a BINARY_32 length has its high bit set (byte ≥ 0x80 shifted left by 24), the result becomes negative. A negative length passes `checkEOS` (since `index + negative < buffer.length`) and then feeds an invalid value into `readBytes`/`readStringFromChars`. Use `Buffer.readUIntBE`/`readUIntLE` or arithmetic multiplication instead of bitwise shifts.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread src/WABinary/decode.ts
for (let i = 0; i < n; i++) {
const shift = littleEndian ? i : n - 1 - i
val |= next()! << (shift * 8)
val |= this.next() << (shift * 8)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: readInt(4) uses << and | which coerce operands to signed 32-bit integers. When a BINARY_32 length has its high bit set (byte ≥ 0x80 shifted left by 24), the result becomes negative. A negative length passes checkEOS (since index + negative < buffer.length) and then feeds an invalid value into readBytes/readStringFromChars. Use Buffer.readUIntBE/readUIntLE or arithmetic multiplication instead of bitwise shifts.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/WABinary/decode.ts, line 96:

<comment>`readInt(4)` uses `<<` and `|` which coerce operands to **signed** 32-bit integers. When a BINARY_32 length has its high bit set (byte ≥ 0x80 shifted left by 24), the result becomes negative. A negative length passes `checkEOS` (since `index + negative < buffer.length`) and then feeds an invalid value into `readBytes`/`readStringFromChars`. Use `Buffer.readUIntBE`/`readUIntLE` or arithmetic multiplication instead of bitwise shifts.</comment>

<file context>
@@ -8,110 +8,120 @@ const inflatePromise = promisify(inflate)
 		for (let i = 0; i < n; i++) {
 			const shift = littleEndian ? i : n - 1 - i
-			val |= next()! << (shift * 8)
+			val |= this.next() << (shift * 8)
 		}
 
</file context>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

2 participants