Skip to content

feat: Add EmbeddingCache component to avoid re-embedding identical text #11476

@Aarkin7

Description

@Aarkin7

Is your feature request related to a problem? Please describe.
When the same string is embedded more than once, Haystack has no built-in way to skip the second call. This shows up in a few common cases:

  • Re-indexing after a cleaner/splitter change, where most resulting chunks are identical to the previous run.
  • Repeated queries in any user-facing app (FAQs, password resets, etc.), every repeat re-hits the embedding model.
  • CI and test suites that embed the same fixtures on every run.

On hosted embedders this is a direct cost (e.g. text-embedding-3-large at $0.13 / 1M tokens). On local Sentence-Transformers it's CPU time. Either way, it's avoidable work.

I checked for prior art:

  • haystack/components/caching/cache_checker.py is keyed on a Document metadata field (typically URL for crawled content), it doesn't handle text → vector lookups.
  • The cache_params in the Sentence-Transformers backends caches the model load, not embedding results.

Describe the solution you'd like
A new EmbeddingCache component that wraps any existing TextEmbedder or DocumentEmbedder. Additive only, no existing API changes.

from haystack.components.embedders import OpenAITextEmbedder
from haystack.components.caching import EmbeddingCache, InMemoryEmbeddingStore

embedder = EmbeddingCache(
  embedder=OpenAITextEmbedder(model="text-embedding-3-large"),
  store=InMemoryEmbeddingStore(),
)

# First call → underlying embedder runs, vector stored.

# Repeat call → store hit, same output shape as the underlying embedder.

Describe alternatives you've considered

  1. Extend CacheChecker. Different semantics, it operates on Documents and metadata, not string → vector. Bolting embedding-cache logic onto it would muddy both.
  2. Rely on provider-side caching. OpenAI/Anthropic offer prompt caching for chat completions, not embeddings. No fallback here.
  3. LangChain's CacheBackedEmbeddings. Prior art that confirms the pattern is expected by users comparing frameworks. Not a reason to import LangChain.

Additional context

  • Storage backends in core vs. integrations. Only in-memory in core to keep the dependency surface tiny. Disk/Redis backends belong in haystack-core-integrations.
  • Async. Mirrors the wrapped embedders; store interface has async variants.
  • Invalidation. Keying off embedder.to_dict() covers model/dim/normalization changes for free; an explicit version param can be added later if needed.
  • Pipeline-level result caching is intentionally out of scope, that's a bigger design conversation about idempotency. This issue is just the embedder step.

👋 Hello there! This issue will be handled internally and isn’t open for external contributions. If you’d like to contribute, please take a look at issues labeled contributions welcome or good first issue. We’d really appreciate it!

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low priority, leave it in the backlog
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions