Skip to content

fix(embed): truncate oversized chunks to prevent context window crash#316

Open
debugerman wants to merge 1 commit intotobi:mainfrom
debugerman:fix/embed-oversized-chunk-crash
Open

fix(embed): truncate oversized chunks to prevent context window crash#316
debugerman wants to merge 1 commit intotobi:mainfrom
debugerman:fix/embed-oversized-chunk-crash

Conversation

@debugerman
Copy link
Copy Markdown
Contributor

The embedding model (embeddinggemma-300M) has a 2048-token context window. Chunks exceeding this limit cause node-llama-cpp to crash with SIGABRT on Apple Silicon, or silently return null embeddings.

While the chunker targets 900 tokens, edge cases (dense code, base64, format prefixes) can produce chunks that exceed the context window. The reranker already had truncation logic; the embedding path did not.

Changes:

  • Add truncateForEmbedding() in LlamaCpp that tokenizes and truncates text exceeding the 2048-token context window (minus 100 overhead)
  • Apply truncation in both embed() and embedBatch() before calling into node-llama-cpp, preventing SIGABRT and null results
  • Replace first-chunk dimension probing with a virtual probe text, decoupling dimension detection from user data
  • Add test/oversized-chunk.test.ts covering oversized single embed, mixed batch, and formatted chunks with titles

Normal chunks (<=900 tokens) are unaffected -- truncation only activates on abnormally large inputs that would otherwise crash or be silently dropped.

Fixes #303

The embedding model (embeddinggemma-300M) has a 2048-token context
window. Chunks exceeding this limit cause node-llama-cpp to crash
with SIGABRT on Apple Silicon, or silently return null embeddings.

While the chunker targets 900 tokens, edge cases (dense code, base64,
format prefixes) can produce chunks that exceed the context window.
The reranker already had truncation logic; the embedding path did not.

Changes:
- Add truncateForEmbedding() in LlamaCpp that tokenizes and truncates
  text exceeding the 2048-token context window (minus 100 overhead)
- Apply truncation in both embed() and embedBatch() before calling
  into node-llama-cpp, preventing SIGABRT and null results
- Replace first-chunk dimension probing with a virtual probe text,
  decoupling dimension detection from user data
- Add test/oversized-chunk.test.ts covering oversized single embed,
  mixed batch, and formatted chunks with titles

Normal chunks (<=900 tokens) are unaffected -- truncation only activates
on abnormally large inputs that would otherwise crash or be silently
dropped.

Fixes tobi#303
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

qmd embed crashes when first chunk exceeds EmbeddingGemma context size (should skip gracefully)

1 participant