perf: improve WABinary encoder and decoder (+3.7x faster) by Santosl2 · Pull Request #2513 · WhiskeySockets/Baileys

Santosl2 · 2026-04-29T23:05:16Z

Improved Issue

Refactors the binary encoder/decoder to a class-based, allocation-efficient architecture with strict bounds checking, stable object shapes, and optimized buffer usage. Eliminates per-call closure allocation overhead, improves JIT optimization (inline caching), and reduces GC pressure under deep recursion.

Also introduces explicit JID decoding paths, correct empty string handling, and preallocated structures for predictable performance.

Core Changes

Architecture

Replaced closure-heavy encoder/decoder with ByteEncoder / ByteDecoder classes
All hot-path methods live on prototype → zero function allocations per encode/decode
Stable object shape → better V8 inline caching + inlining

Memory & Performance

Dynamic buffer growth (doubling strategy) in encoder
→ minimizes reallocations and copies
Buffer.subarray used instead of copies
→ zero-copy slicing
Preallocated arrays in decoder (readList)
→ avoids push/resize overhead
Removed closure allocations (~10 per node previously)
→ eliminates GC churn in recursive trees

Token System

Replaced object lookup with Map (TOKEN_MAP_REFACTOR)
→ predictable O(1), no prototype pollution risk
Tokens initialized once via Map#set
Supports both single and double byte dictionaries

Encoding Improvements

Bounds-checked writes (ensure) for all operations
Unified integer writing (writeUInt*)
Correct empty string handling ('' !== LIST_EMPTY)
Packed encoding (nibble/hex):
- Shared helpers
- Length validation
- Strict charset validation

Decoding Improvements

Bounds-safe reads (checkEOS) on every operation
Improved packed decoding logic symmetry with encoder

Performance Rationale

Before (closure-based)

~10 function allocations per node
Deep recursion → O(n * children) allocations
Frequent GC + inline cache pollution

After (class-based)

0 allocations per call (methods on prototype)
Stable hidden classes
JIT-friendly → aggressive inlining

Observed effect

Small payloads: minor improvement (I/O bound)
Deep trees: >100% faster (allocation bound)

Bench BEFORE

 BENCHMARK: WABinary encode.ts + decode.ts
============================================================

── ENCODE ──────────────────────────────────────────────────

  ┌─ Nó simples (attrs string, sem conteúdo)
  │  Iterações  : 100,000
  │  Tempo total: 409.69 ms
  │  CPU ops/s  : 244,084
  │  CPU ms/op  : 0.0041 ms
  │  Heap antes : 128.95 MB
  │  Heap depois: 128.89 MB
  │  ΔHeap      : -0.06 MB
  └─ Alocação/s : -0.14 MB/s

  ┌─ Nó com conteúdo binário (512 bytes)
  │  Iterações  : 100,000
  │  Tempo total: 871.39 ms
  │  CPU ops/s  : 114,759
  │  CPU ms/op  : 0.0087 ms
  │  Heap antes : 128.91 MB
  │  Heap depois: 128.97 MB
  │  ΔHeap      : +0.06 MB
  └─ Alocação/s : 0.07 MB/s

  ┌─ Nó com JIDs (@s.whatsapp.net)
  │  Iterações  : 100,000
  │  Tempo total: 836.77 ms
  │  CPU ops/s  : 119,508
  │  CPU ms/op  : 0.0084 ms
  │  Heap antes : 128.96 MB
  │  Heap depois: 128.98 MB
  │  ΔHeap      : +0.02 MB
  └─ Alocação/s : 0.02 MB/s

  ┌─ Nó com nibble-packed strings
  │  Iterações  : 100,000
  │  Tempo total: 225.79 ms
  │  CPU ops/s  : 442,898
  │  CPU ms/op  : 0.0023 ms
  │  Heap antes : 128.95 MB
  │  Heap depois: 128.96 MB
  │  ΔHeap      : +0.01 MB
  └─ Alocação/s : 0.07 MB/s

  ┌─ Nó aninhado (3 filhos, mixed content)
  │  Iterações  : 100,000
  │  Tempo total: 568.43 ms
  │  CPU ops/s  : 175,923
  │  CPU ms/op  : 0.0057 ms
  │  Heap antes : 128.96 MB
  │  Heap depois: 128.96 MB
  │  ΔHeap      : -0.00 MB
  └─ Alocação/s : -0.00 MB/s

  ┌─ Array grande (50 filhos)
  │  Iterações  : 10,000
  │  Tempo total: 845.85 ms
  │  CPU ops/s  : 11,822
  │  CPU ms/op  : 0.0846 ms
  │  Heap antes : 128.96 MB
  │  Heap depois: 128.96 MB
  │  ΔHeap      : +0.01 MB
  └─ Alocação/s : 0.01 MB/s

── DECODE ──────────────────────────────────────────────────

  ┌─ Nó simples (attrs string, sem conteúdo)
  │  Iterações  : 100,000
  │  Tempo total: 188.61 ms
  │  CPU ops/s  : 530,208
  │  CPU ms/op  : 0.0019 ms
  │  Heap antes : 128.99 MB
  │  Heap depois: 129.03 MB
  │  ΔHeap      : +0.04 MB
  └─ Alocação/s : 0.19 MB/s

  ┌─ Nó com conteúdo binário (512 bytes)
  │  Iterações  : 100,000
  │  Tempo total: 102.25 ms
  │  CPU ops/s  : 977,984
  │  CPU ms/op  : 0.0010 ms
  │  Heap antes : 129.01 MB
  │  Heap depois: 129.07 MB
  │  ΔHeap      : +0.06 MB
  └─ Alocação/s : 0.59 MB/s

  ┌─ Nó com JIDs (@s.whatsapp.net)
  │  Iterações  : 100,000
  │  Tempo total: 307.08 ms
  │  CPU ops/s  : 325,646
  │  CPU ms/op  : 0.0031 ms
  │  Heap antes : 129.06 MB
  │  Heap depois: 129.06 MB
  │  ΔHeap      : -0.01 MB
  └─ Alocação/s : -0.03 MB/s

  ┌─ Nó com nibble-packed strings
  │  Iterações  : 100,000
  │  Tempo total: 125.13 ms
  │  CPU ops/s  : 799,145
  │  CPU ms/op  : 0.0013 ms
  │  Heap antes : 129.06 MB
  │  Heap depois: 129.06 MB
  │  ΔHeap      : -0.00 MB
  └─ Alocação/s : -0.03 MB/s

  ┌─ Nó aninhado (3 filhos, mixed content)
  │  Iterações  : 100,000
  │  Tempo total: 299.19 ms
  │  CPU ops/s  : 334,232
  │  CPU ms/op  : 0.0030 ms
  │  Heap antes : 129.04 MB
  │  Heap depois: 129.06 MB
  │  ΔHeap      : +0.02 MB
  └─ Alocação/s : 0.08 MB/s

  ┌─ Array grande (50 filhos)
  │  Iterações  : 10,000
  │  Tempo total: 585.79 ms
  │  CPU ops/s  : 17,071
  │  CPU ms/op  : 0.0586 ms
  │  Heap antes : 129.06 MB
  │  Heap depois: 129.07 MB
  │  ΔHeap      : +0.00 MB
  └─ Alocação/s : 0.00 MB/s

── ROUNDTRIP (encode → decode) ─────────────────────────────

  ┌─ Nó simples roundtrip
  │  Iterações  : 100,000
  │  Tempo total: 644.18 ms
  │  CPU ops/s  : 155,237
  │  CPU ms/op  : 0.0064 ms
  │  Heap antes : 129.08 MB
  │  Heap depois: 129.07 MB
  │  ΔHeap      : -0.01 MB
  └─ Alocação/s : -0.01 MB/s

  ┌─ Nó binário roundtrip
  │  Iterações  : 100,000
  │  Tempo total: 998.44 ms
  │  CPU ops/s  : 100,156
  │  CPU ms/op  : 0.0100 ms
  │  Heap antes : 129.07 MB
  │  Heap depois: 129.07 MB
  │  ΔHeap      : -0.00 MB
  └─ Alocação/s : -0.00 MB/s

  ┌─ Array grande roundtrip
  │  Iterações  : 10,000
  │  Tempo total: 1415.24 ms
  │  CPU ops/s  : 7,066
  │  CPU ms/op  : 0.1415 ms
  │  Heap antes : 129.07 MB
  │  Heap depois: 129.07 MB
  │  ΔHeap      : +0.01 MB
  └─ Alocação/s : 0.00 MB/s

── MEMORY STRESS (encode + decode, 1M operações simples) ──

  ┌─ Stress: 1,000,000 roundtrips
  │  Tempo      : 6.35 s
  │  ops/s      : 157,404
  │  Heap antes : 129.07 MB
  │  Heap depois: 129.08 MB
  └─ ΔHeap      : +0.01 MB

============================================================
 Benchmark concluído.
============================================================

Bench After

 BENCHMARK: WABinary encode.ts + decode.ts
============================================================

── ENCODE ──────────────────────────────────────────────────

  ┌─ Nó simples (attrs string, sem conteúdo)
  │  Iterações  : 100,000
  │  Tempo total: 110.76 ms
  │  CPU ops/s  : 902,887
  │  CPU ms/op  : 0.0011 ms
  │  Heap antes : 130.59 MB
  │  Heap depois: 130.60 MB
  │  ΔHeap      : +0.02 MB
  └─ Alocação/s : 0.14 MB/s

  ┌─ Nó com conteúdo binário (512 bytes)
  │  Iterações  : 100,000
  │  Tempo total: 45.29 ms
  │  CPU ops/s  : 2,207,980
  │  CPU ms/op  : 0.0005 ms
  │  Heap antes : 130.60 MB
  │  Heap depois: 130.67 MB
  │  ΔHeap      : +0.07 MB
  └─ Alocação/s : 1.49 MB/s

  ┌─ Nó com JIDs (@s.whatsapp.net)
  │  Iterações  : 100,000
  │  Tempo total: 182.51 ms
  │  CPU ops/s  : 547,916
  │  CPU ms/op  : 0.0018 ms
  │  Heap antes : 130.60 MB
  │  Heap depois: 130.69 MB
  │  ΔHeap      : +0.09 MB
  └─ Alocação/s : 0.51 MB/s

  ┌─ Nó com nibble-packed strings
  │  Iterações  : 100,000
  │  Tempo total: 71.30 ms
  │  CPU ops/s  : 1,402,565
  │  CPU ms/op  : 0.0007 ms
  │  Heap antes : 130.60 MB
  │  Heap depois: 130.68 MB
  │  ΔHeap      : +0.08 MB
  └─ Alocação/s : 1.14 MB/s

  ┌─ Nó aninhado (3 filhos, mixed content)
  │  Iterações  : 100,000
  │  Tempo total: 138.14 ms
  │  CPU ops/s  : 723,916
  │  CPU ms/op  : 0.0014 ms
  │  Heap antes : 130.60 MB
  │  Heap depois: 129.97 MB
  │  ΔHeap      : -0.63 MB
  └─ Alocação/s : -4.55 MB/s

  ┌─ Array grande (50 filhos)
  │  Iterações  : 10,000
  │  Tempo total: 118.34 ms
  │  CPU ops/s  : 84,503
  │  CPU ms/op  : 0.0118 ms
  │  Heap antes : 129.92 MB
  │  Heap depois: 129.97 MB
  │  ΔHeap      : +0.04 MB
  └─ Alocação/s : 0.38 MB/s

── DECODE ──────────────────────────────────────────────────

  ┌─ Nó simples (attrs string, sem conteúdo)
  │  Iterações  : 100,000
  │  Tempo total: 55.27 ms
  │  CPU ops/s  : 1,809,457
  │  CPU ms/op  : 0.0006 ms
  │  Heap antes : 129.99 MB
  │  Heap depois: 130.05 MB
  │  ΔHeap      : +0.06 MB
  └─ Alocação/s : 1.10 MB/s

  ┌─ Nó com conteúdo binário (512 bytes)
  │  Iterações  : 100,000
  │  Tempo total: 19.00 ms
  │  CPU ops/s  : 5,264,236
  │  CPU ms/op  : 0.0002 ms
  │  Heap antes : 130.01 MB
  │  Heap depois: 130.09 MB
  │  ΔHeap      : +0.08 MB
  └─ Alocação/s : 4.41 MB/s

  ┌─ Nó com JIDs (@s.whatsapp.net)
  │  Iterações  : 100,000
  │  Tempo total: 91.24 ms
  │  CPU ops/s  : 1,096,049
  │  CPU ms/op  : 0.0009 ms
  │  Heap antes : 130.03 MB
  │  Heap depois: 130.08 MB
  │  ΔHeap      : +0.05 MB
  └─ Alocação/s : 0.56 MB/s

  ┌─ Nó com nibble-packed strings
  │  Iterações  : 100,000
  │  Tempo total: 43.89 ms
  │  CPU ops/s  : 2,278,191
  │  CPU ms/op  : 0.0004 ms
  │  Heap antes : 130.04 MB
  │  Heap depois: 130.09 MB
  │  ΔHeap      : +0.05 MB
  └─ Alocação/s : 1.14 MB/s

  ┌─ Nó aninhado (3 filhos, mixed content)
  │  Iterações  : 100,000
  │  Tempo total: 85.04 ms
  │  CPU ops/s  : 1,175,929
  │  CPU ms/op  : 0.0009 ms
  │  Heap antes : 130.04 MB
  │  Heap depois: 130.11 MB
  │  ΔHeap      : +0.08 MB
  └─ Alocação/s : 0.90 MB/s

  ┌─ Array grande (50 filhos)
  │  Iterações  : 10,000
  │  Tempo total: 65.47 ms
  │  CPU ops/s  : 152,744
  │  CPU ms/op  : 0.0065 ms
  │  Heap antes : 130.07 MB
  │  Heap depois: 130.10 MB
  │  ΔHeap      : +0.03 MB
  └─ Alocação/s : 0.48 MB/s

── ROUNDTRIP (encode → decode) ─────────────────────────────

  ┌─ Nó simples roundtrip
  │  Iterações  : 100,000
  │  Tempo total: 161.97 ms
  │  CPU ops/s  : 617,403
  │  CPU ms/op  : 0.0016 ms
  │  Heap antes : 130.00 MB
  │  Heap depois: 130.11 MB
  │  ΔHeap      : +0.11 MB
  └─ Alocação/s : 0.70 MB/s

  ┌─ Nó binário roundtrip
  │  Iterações  : 100,000
  │  Tempo total: 67.61 ms
  │  CPU ops/s  : 1,479,068
  │  CPU ms/op  : 0.0007 ms
  │  Heap antes : 130.03 MB
  │  Heap depois: 130.12 MB
  │  ΔHeap      : +0.09 MB
  └─ Alocação/s : 1.39 MB/s

  ┌─ Array grande roundtrip
  │  Iterações  : 10,000
  │  Tempo total: 191.88 ms
  │  CPU ops/s  : 52,116
  │  CPU ms/op  : 0.0192 ms
  │  Heap antes : 130.04 MB
  │  Heap depois: 130.11 MB
  │  ΔHeap      : +0.06 MB
  └─ Alocação/s : 0.33 MB/s

── MEMORY STRESS (encode + decode, 1M operações simples) ──

  ┌─ Stress: 1,000,000 roundtrips
  │  Tempo      : 1.54 s
  │  ops/s      : 649,972
  │  Heap antes : 130.11 MB
  │  Heap depois: 130.12 MB
  └─ ΔHeap      : +0.02 MB

============================================================
 Benchmark concluído.

Summary by CodeRabbit

Refactor
- Improved binary encoding and decoding infrastructure for enhanced performance and maintainability.
- Optimized buffer handling with read-only views to reduce memory overhead.
- Refactored token lookup structure for more efficient retrieval operations.
- Streamlined internal encoding pipeline with optimized data structure handling.

…e IDs" This reverts commit c03f9d8.

…and improve class structure

coderabbitai · 2026-04-29T23:05:24Z

📝 Walkthrough

Walkthrough

Token storage refactors from object to Map structure across the constants file. Decode and encode modules are restructured into dedicated ByteDecoder and ByteEncoder classes respectively, replacing helper closures and number array buffering while preserving public API behavior.

Changes

Cohort / File(s)	Summary
Token Map Refactoring `src/WABinary/constants.ts`	Replaces `TOKEN_MAP` object with `TOKEN_MAP_REFACTOR` Map structure. Population logic updated to use `.set()` method with identical `{ dict?, index }` payload, maintaining computed indices.
Binary Node Decoding `src/WABinary/decode.ts`	Extracts decompression logic into dedicated `ByteDecoder` class with internal index tracking and instance methods (`readByte`, `readBytes`, `checkEOS`). Switches from `slice()` to `subarray()` for buffer operations. Wrapper function syncs external `indexRef.index` with decoder cursor.
Binary Node Encoding `src/WABinary/encode.ts`	Refactors from `number[]` buffer building to `ByteEncoder` class with direct resizable Buffer writes. Tokenization updated to use `TOKEN_MAP_REFACTOR.get()` instead of object property access. Element counting now uses explicit iteration rather than array filtering.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

purpshell

Poem

🐰 The tokens hop into a Map so fine,
Decoders class-ified, encoders align,
Subarrays dance where slices once played,
A refactored warren where bytes are arrayed! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title accurately summarizes the main change: a performance optimization refactoring of the WABinary encoder and decoder with a claimed 3.7x speedup.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

whiskeysockets-bot · 2026-04-29T23:05:27Z

Thanks for opening this pull request and contributing to the project!

The next step is for the maintainers to review your changes. If everything looks good, it will be approved and merged into the main branch.

In the meantime, anyone in the community is encouraged to test this pull request and provide feedback.

✅ How to confirm it works

If you’ve tested this PR, please comment below with:

Tested and working ✅

This helps us speed up the review and merge process.

📦 To test this PR locally:

# NPM
npm install @whiskeysockets/baileys@Santosl2/Baileys-1#feat/new-binary-encoder-and-decoder

# Yarn (v2+)
yarn add @whiskeysockets/baileys@Santosl2/Baileys-1#feat/new-binary-encoder-and-decoder

# PNPM
pnpm add @whiskeysockets/baileys@Santosl2/Baileys-1#feat/new-binary-encoder-and-decoder

If you encounter any issues or have feedback, feel free to comment as well.

…anced JID handling Co-authored-by: Copilot <copilot@github.com>

… return types

…ocation

…ance Co-authored-by: Copilot <copilot@github.com>

Santosl2 · 2026-04-30T01:39:32Z

@jlucaso1 or @purpshell can you run real stress test?

Santosl2 · 2026-04-30T13:35:04Z

Bechmark comparison

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/WABinary/constants.ts`:
- Around line 1295-1305: Restore the old exported symbol TOKEN_MAP as a
compatibility alias to the new TOKEN_MAP_REFACTOR so external callers and
BinaryNodeCodingOptions keep the same shape; add a re-export or alias that makes
TOKEN_MAP refer to the same Map instance as TOKEN_MAP_REFACTOR (e.g., export
TOKEN_MAP aliasing TOKEN_MAP_REFACTOR) near where TOKEN_MAP_REFACTOR is defined
so both names point to the identical Map.

In `@src/WABinary/decode.ts`:
- Around line 91-99: The readInt implementation in readInt(n, littleEndian) uses
bitwise << and | which coerce to signed 32-bit and can produce negative values
for 4-byte lengths (BINARY_32), allowing checkEOS to be bypassed; update readInt
to compute unsigned values (either by delegating to Buffer.readUIntBE/readUIntLE
when n <= 6 or by using arithmetic accumulation: val = val * 256 + byte) so
lengths remain unsigned, and keep references to checkEOS, readInt, next,
readBytes and readStringFromChars to verify callers still receive a non-negative
length. Ensure littleEndian handling is preserved and tests for 4-byte high-bit
cases are added.

In `@src/WABinary/encode.ts`:
- Around line 247-263: The attrCount computation currently increments for any
non-null/undefined attribute but encoder only serializes string values in the
later loop (see attrs, attrCount, encoder.writeListStart, and the string-write
loop), causing an incorrect list size when non-strings (e.g., numbers) are
present; fix by either (A) changing the counting loop to only increment when
typeof v === 'string', or (B) validate attrs before encoding and throw an error
if any attr value is non-string so encoder.writeListStart is sized correctly;
apply the same string-only rule consistently before writing tag/content.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: de6deb48-39ce-4157-948f-e2acd765e608

📥 Commits

Reviewing files that changed from the base of the PR and between ca61ac1 and 93046e3.

📒 Files selected for processing (3)

src/WABinary/constants.ts
src/WABinary/decode.ts
src/WABinary/encode.ts

coderabbitai · 2026-04-30T14:18:48Z

+export const TOKEN_MAP_REFACTOR: Map<string, { dict?: number; index: number }> = new Map()

 for (const [i, SINGLE_BYTE_TOKEN] of SINGLE_BYTE_TOKENS.entries()) {
-	TOKEN_MAP[SINGLE_BYTE_TOKEN] = { index: i }
+	TOKEN_MAP_REFACTOR.set(SINGLE_BYTE_TOKEN, { index: i })
 }

 for (const [i, DOUBLE_BYTE_TOKEN] of DOUBLE_BYTE_TOKENS.entries()) {
 	for (const [j, element] of DOUBLE_BYTE_TOKEN.entries()) {
-		TOKEN_MAP[element] = { dict: i, index: j }
+		TOKEN_MAP_REFACTOR.set(element, { dict: i, index: j })
 	}
 }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the old TOKEN_MAP export for compatibility.

Removing TOKEN_MAP changes the public BinaryNodeCodingOptions shape and breaks existing imports/custom opts immediately. Unless this PR is intentionally shipping a semver break, please re-export the same Map under the old name.

Compatibility alias

export const TOKEN_MAP_REFACTOR: Map<string, { dict?: number; index: number }> = new Map() +export const TOKEN_MAP = TOKEN_MAP_REFACTOR

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export const TOKEN_MAP_REFACTOR: Map<string, { dict?: number; index: number }> = new Map()

for (const [i, SINGLE_BYTE_TOKEN] of SINGLE_BYTE_TOKENS.entries()) {

TOKEN_MAP[SINGLE_BYTE_TOKEN] = { index: i }

TOKEN_MAP_REFACTOR.set(SINGLE_BYTE_TOKEN, { index: i })

}

for (const [i, DOUBLE_BYTE_TOKEN] of DOUBLE_BYTE_TOKENS.entries()) {

for (const [j, element] of DOUBLE_BYTE_TOKEN.entries()) {

TOKEN_MAP[element] = { dict: i, index: j }

TOKEN_MAP_REFACTOR.set(element, { dict: i, index: j })

}

}

export const TOKEN_MAP_REFACTOR: Map<string, { dict?: number; index: number }> = new Map()

export const TOKEN_MAP = TOKEN_MAP_REFACTOR

for (const [i, SINGLE_BYTE_TOKEN] of SINGLE_BYTE_TOKENS.entries()) {

TOKEN_MAP_REFACTOR.set(SINGLE_BYTE_TOKEN, { index: i })

}

for (const [i, DOUBLE_BYTE_TOKEN] of DOUBLE_BYTE_TOKENS.entries()) {

for (const [j, element] of DOUBLE_BYTE_TOKEN.entries()) {

TOKEN_MAP_REFACTOR.set(element, { dict: i, index: j })

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/WABinary/constants.ts` around lines 1295 - 1305, Restore the old exported symbol TOKEN_MAP as a compatibility alias to the new TOKEN_MAP_REFACTOR so external callers and BinaryNodeCodingOptions keep the same shape; add a re-export or alias that makes TOKEN_MAP refer to the same Map instance as TOKEN_MAP_REFACTOR (e.g., export TOKEN_MAP aliasing TOKEN_MAP_REFACTOR) near where TOKEN_MAP_REFACTOR is defined so both names point to the identical Map.

coderabbitai · 2026-04-30T14:18:48Z

+	private readInt(n: number, littleEndian = false): number {
+		this.checkEOS(n)
 		let val = 0
 		for (let i = 0; i < n; i++) {
 			const shift = littleEndian ? i : n - 1 - i
-			val |= next()! << (shift * 8)
+			val |= this.next() << (shift * 8)
 		}

 		return val


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

In JavaScript, do bitwise shift/or operators coerce numbers to signed 32-bit integers, and does Node.js Buffer.readUIntBE(offset, 4)return an unsigned value up to0xffffffff?

💡 Result:

Yes, in JavaScript, bitwise shift and logical operators (such as <<, >>, >>>, &, |, ^, ~) coerce their operands to signed 32-bit integers using the ToInt32 abstract operation, which converts numbers to the range [-2^31, 2^31 - 1] in two's complement representation. This applies to all bitwise operators except the unsigned right shift (>>>) right operand, which uses ToUint32. Yes, Node.js Buffer.readUIntBE(offset, 4) reads 4 bytes from the buffer at the given offset and interprets them as an unsigned big-endian 32-bit integer, returning a Number value from 0 to 0xffffffff (4294967295). This is distinct from signed methods like readInt32BE and matches the behavior of the specialized readUInt32BE method. The returned value is a safe JavaScript Number since it fits within the safe integer range up to 2^53 - 1.

Citations:

1: https://devdoc.net/web/developer.mozilla.org/en-US/docs/JavaScript/Reference/Operators/Bitwise_Operators.html

2: https://thelinuxcode.com/javascript-bitwise-operators-a-practical-modern-guide/

3: https://www.w3schools.com/js/js_bitwise.asp

4: https://tc39.es/ecma262/pr/3770/multipage/abstract-operations.html

5: https://bun.com/reference/node/buffer/Buffer/readUIntBE

6: https://docs.deno.com/api/node/buffer/~/Buffer.readUInt32BE

🏁 Script executed:

# Check if the file exists and read the exact code cat -n src/WABinary/decode.ts | sed -n '85,105p'

Repository: WhiskeySockets/Baileys

Length of output: 680

🏁 Script executed:

# Find all usages of readInt(4) and BINARY_32 rg "readInt$4$|BINARY_32" -B 2 -A 2

Repository: WhiskeySockets/Baileys

Length of output: 1171

🏁 Script executed:

# Inspect the checkEOS method and readBytes signature ast-grep --pattern 'checkEOS($_, $_) { $$$ }' || rg "checkEOS|readBytes" -B 1 -A 3 | head -50

Repository: WhiskeySockets/Baileys

Length of output: 2085

🏁 Script executed:

# Search for other bitwise operations on multiple bytes rg "<<\s*\(" -B 3 -A 1 src/WABinary/decode.ts | head -40

Repository: WhiskeySockets/Baileys

Length of output: 199

🏁 Script executed:

# Check for readInt usage with other byte counts rg "readInt\(" -B 1 -A 1

Repository: WhiskeySockets/Baileys

Length of output: 1108

Stop building 4-byte lengths with bitwise operators.

<< and | coerce to signed 32-bit ints. For any BINARY_32 length with the high bit set, readInt(4) becomes negative, so checkEOS() can be bypassed and the subsequent readBytes() / readStringFromChars() calls run with an invalid length. Switch to Buffer.readUIntBE/LE() or arithmetic accumulation.

Use Buffer's unsigned readers

private readInt(n: number, littleEndian = false): number { this.checkEOS(n) - let val = 0 - for (let i = 0; i < n; i++) { - const shift = littleEndian ? i : n - 1 - i - val |= this.next() << (shift * 8) - } - - return val + const val = littleEndian + ? this.buffer.readUIntLE(this.index, n) + : this.buffer.readUIntBE(this.index, n) + this.index += n + return val }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

private readInt(n: number, littleEndian = false): number {

this.checkEOS(n)

let val = 0

for (let i = 0; i < n; i++) {

const shift = littleEndian ? i : n - 1 - i

val |= next()! << (shift * 8)

val |= this.next() << (shift * 8)

}

return val

private readInt(n: number, littleEndian = false): number {

this.checkEOS(n)

const val = littleEndian

? this.buffer.readUIntLE(this.index, n)

: this.buffer.readUIntBE(this.index, n)

this.index += n

return val

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/WABinary/decode.ts` around lines 91 - 99, The readInt implementation in readInt(n, littleEndian) uses bitwise << and | which coerce to signed 32-bit and can produce negative values for 4-byte lengths (BINARY_32), allowing checkEOS to be bypassed; update readInt to compute unsigned values (either by delegating to Buffer.readUIntBE/readUIntLE when n <= 6 or by using arithmetic accumulation: val = val * 256 + byte) so lengths remain unsigned, and keep references to checkEOS, readInt, next, readBytes and readStringFromChars to verify callers still receive a non-negative length. Ensure littleEndian handling is preserved and tests for 4-byte high-bit cases are added.

coderabbitai · 2026-04-30T14:18:49Z

+	let attrCount = 0
+	if (attrs) {
+		for (const k in attrs) {
+			const v = attrs[k]
+			if (v !== undefined && v !== null) attrCount++
+		}
+	}

-	writeListStart(2 * validAttributes.length + 1 + (typeof content !== 'undefined' ? 1 : 0))
-	writeString(tag)
+	encoder.writeListStart(2 * attrCount + 1 + (content !== undefined ? 1 : 0))
+	encoder.writeString(tag)

-	for (const key of validAttributes) {
-		if (typeof attrs[key] === 'string') {
-			writeString(key)
-			writeString(attrs[key])
+	if (attrs) {
+		for (const key in attrs) {
+			const val = attrs[key]
+			if (typeof val === 'string') {
+				encoder.writeString(key)
+				encoder.writeString(val)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make attribute counting match attribute serialization.

Lines 247-252 count any non-null attribute, but Lines 259-264 only write string values. If a JS caller passes something like { foo: 1 }, the list header is too large and the rest of the node is encoded at the wrong offsets. Count only strings here, or throw before writing.

Guard non-string attrs before sizing the list

let attrCount = 0 if (attrs) { - for (const k in attrs) { - const v = attrs[k] - if (v !== undefined && v !== null) attrCount++ + for (const k in attrs) { + const v = attrs[k] + if (v === undefined || v === null) { + continue + } + if (typeof v !== 'string') { + throw new Error(`invalid attribute value for "${k}"`) + } + attrCount++ } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let attrCount = 0

if (attrs) {

for (const k in attrs) {

const v = attrs[k]

if (v !== undefined && v !== null) attrCount++

}

}

writeListStart(2 * validAttributes.length + 1 + (typeof content !== 'undefined' ? 1 : 0))

writeString(tag)

encoder.writeListStart(2 * attrCount + 1 + (content !== undefined ? 1 : 0))

encoder.writeString(tag)

for (const key of validAttributes) {

if (typeof attrs[key] === 'string') {

writeString(key)

writeString(attrs[key])

if (attrs) {

for (const key in attrs) {

const val = attrs[key]

if (typeof val === 'string') {

encoder.writeString(key)

encoder.writeString(val)

let attrCount = 0

if (attrs) {

for (const k in attrs) {

const v = attrs[k]

if (v === undefined || v === null) {

continue

}

if (typeof v !== 'string') {

throw new Error(`invalid attribute value for "${k}"`)

}

attrCount++

}

}

encoder.writeListStart(2 * attrCount + 1 + (content !== undefined ? 1 : 0))

encoder.writeString(tag)

if (attrs) {

for (const key in attrs) {

const val = attrs[key]

if (typeof val === 'string') {

encoder.writeString(key)

encoder.writeString(val)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/WABinary/encode.ts` around lines 247 - 263, The attrCount computation currently increments for any non-null/undefined attribute but encoder only serializes string values in the later loop (see attrs, attrCount, encoder.writeListStart, and the string-write loop), causing an incorrect list size when non-strings (e.g., numbers) are present; fix by either (A) changing the counting loop to only increment when typeof v === 'string', or (B) validate attrs before encoding and throw an error if any attr value is non-string so encoder.writeListStart is sized correctly; apply the same string-only rule consistently before writing tag/content.

cubic-dev-ai

1 issue found across 3 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/WABinary/decode.ts">

<violation number="1" location="src/WABinary/decode.ts:96">
P1: `readInt(4)` uses `<<` and `|` which coerce operands to **signed** 32-bit integers. When a BINARY_32 length has its high bit set (byte ≥ 0x80 shifted left by 24), the result becomes negative. A negative length passes `checkEOS` (since `index + negative < buffer.length`) and then feeds an invalid value into `readBytes`/`readStringFromChars`. Use `Buffer.readUIntBE`/`readUIntLE` or arithmetic multiplication instead of bitwise shifts.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-04-30T14:22:14Z

 		for (let i = 0; i < n; i++) {
 			const shift = littleEndian ? i : n - 1 - i
-			val |= next()! << (shift * 8)
+			val |= this.next() << (shift * 8)


P1: readInt(4) uses << and | which coerce operands to signed 32-bit integers. When a BINARY_32 length has its high bit set (byte ≥ 0x80 shifted left by 24), the result becomes negative. A negative length passes checkEOS (since index + negative < buffer.length) and then feeds an invalid value into readBytes/readStringFromChars. Use Buffer.readUIntBE/readUIntLE or arithmetic multiplication instead of bitwise shifts.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/WABinary/decode.ts, line 96: <comment>`readInt(4)` uses `<<` and `|` which coerce operands to **signed** 32-bit integers. When a BINARY_32 length has its high bit set (byte ≥ 0x80 shifted left by 24), the result becomes negative. A negative length passes `checkEOS` (since `index + negative < buffer.length`) and then feeds an invalid value into `readBytes`/`readStringFromChars`. Use `Buffer.readUIntBE`/`readUIntLE` or arithmetic multiplication instead of bitwise shifts.</comment> <file context> @@ -8,110 +8,120 @@ const inflatePromise = promisify(inflate) for (let i = 0; i < n; i++) { const shift = littleEndian ? i : n - 1 - i - val |= next()! << (shift * 8) + val |= this.next() << (shift * 8) } </file context>

Santosl2 and others added 15 commits July 27, 2025 15:45

fix: improve message resend logic by adding checks for message IDs

c03f9d8

Revert "fix: improve message resend logic by adding checks for messag…

f8d190f

…e IDs" This reverts commit c03f9d8.

Merge branch 'WhiskeySockets:master' into master

7db9244

Merge branch 'WhiskeySockets:master' into master

32d611a

Merge branch 'WhiskeySockets:master' into master

7618975

Merge branch 'WhiskeySockets:master' into master

8d04f2c

Merge branch 'WhiskeySockets:master' into master

e533629

Merge branch 'WhiskeySockets:master' into master

63b6f28

Merge branch 'WhiskeySockets:master' into master

3cc24a6

Merge branch 'WhiskeySockets:master' into master

c88c10b

Merge branch 'WhiskeySockets:master' into master

f771ff0

Merge remote-tracking branch 'upstream/master'

92e39ce

Merge remote-tracking branch 'upstream/master'

e00d475

Merge remote-tracking branch 'upstream/master'

01b5086

refactor: update binary encoder and decoder to use Map for TOKEN_MAP …

987032d

…and improve class structure

github-project-automation Bot added this to Baileys Review Cycle Apr 29, 2026

github-project-automation Bot moved this to Backlog in Baileys Review Cycle Apr 29, 2026

Santosl2 and others added 5 commits April 29, 2026 20:08

feat: add readFbJid and readInteropJid methods to ByteDecoder for enh…

adc4e76

…anced JID handling Co-authored-by: Copilot <copilot@github.com>

feat: handle empty string case in writeString method of ByteEncoder

55ecbf3

refactor: update readFbJid and readInteropJid methods to use explicit…

682284b

… return types

refactor: optimize readList method in ByteDecoder to use array preall…

70aef62

…ocation

feat: implement buffer management in ByteEncoder for improved perform…

93046e3

…ance Co-authored-by: Copilot <copilot@github.com>

Santosl2 marked this pull request as ready for review April 30, 2026 14:10

Santosl2 changed the title ~~Feat/new binary encoder and decoder~~ perf: new binary encoder and decoder Apr 30, 2026

Santosl2 changed the title ~~perf: new binary encoder and decoder~~ perf: improve WABinary encoder and decoder Apr 30, 2026

Santosl2 changed the title ~~perf: improve WABinary encoder and decoder~~ perf: improve WABinary encoder and decoder (+3.7x faster) Apr 30, 2026

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Apr 30, 2026

View reviewed changes

Uh oh!

Conversation

Santosl2 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Improved Issue

Core Changes

Architecture

Memory & Performance

Token System

Encoding Improvements

Decoding Improvements

Performance Rationale

Before (closure-based)

After (class-based)

Bench BEFORE

Bench After

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

Uh oh!

whiskeysockets-bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ How to confirm it works

📦 To test this PR locally:

Uh oh!

Santosl2 commented Apr 30, 2026

Uh oh!

Santosl2 commented Apr 30, 2026

Bechmark comparison

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Santosl2 commented Apr 29, 2026 •

edited

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

whiskeysockets-bot commented Apr 29, 2026 •

edited

Loading