feat: Add EmbeddingCache component to avoid re-embedding identical text

**Is your feature request related to a problem? Please describe.**
When the same string is embedded more than once, Haystack has no built-in way to skip the second call. This shows up in a few common cases:

- Re-indexing after a cleaner/splitter change, where most resulting chunks are identical to the previous run.
- Repeated queries in any user-facing app (FAQs, password resets, etc.), every repeat re-hits the embedding model.
- CI and test suites that embed the same fixtures on every run.

On hosted embedders this is a direct cost (e.g. text-embedding-3-large at $0.13 / 1M tokens). On local Sentence-Transformers it's CPU time. Either way, it's avoidable work.

I checked for prior art:

- haystack/components/caching/cache_checker.py is keyed on a Document metadata field (typically URL for crawled content), it doesn't handle text → vector lookups.
- The cache_params in the Sentence-Transformers backends caches the model load, not embedding results.

**Describe the solution you'd like**
A new EmbeddingCache component that wraps any existing TextEmbedder or DocumentEmbedder. Additive only, no existing API changes.

```python
from haystack.components.embedders import OpenAITextEmbedder
from haystack.components.caching import EmbeddingCache, InMemoryEmbeddingStore

embedder = EmbeddingCache(
  embedder=OpenAITextEmbedder(model="text-embedding-3-large"),
  store=InMemoryEmbeddingStore(),
)

# First call → underlying embedder runs, vector stored.

# Repeat call → store hit, same output shape as the underlying embedder.
```

**Describe alternatives you've considered**
1. Extend CacheChecker. Different semantics, it operates on Documents and metadata, not string → vector. Bolting embedding-cache logic onto it would muddy both.
2. Rely on provider-side caching. OpenAI/Anthropic offer prompt caching for chat completions, not embeddings. No fallback here.
3. LangChain's CacheBackedEmbeddings. Prior art that confirms the pattern is expected by users comparing frameworks. Not a reason to import LangChain.

**Additional context**
- Storage backends in core vs. integrations. Only in-memory in core to keep the dependency surface tiny. Disk/Redis backends belong in haystack-core-integrations.
- Async. Mirrors the wrapped embedders; store interface has async variants.
- Invalidation. Keying off embedder.to_dict() covers model/dim/normalization changes for free; an explicit version param can be added later if needed.
- Pipeline-level result caching is intentionally out of scope, that's a bigger design conversation about idempotency. This issue is just the embedder step.


---
👋 Hello there! This issue will be handled internally and isn’t open for external contributions. If you’d like to contribute, please take a look at issues labeled contributions welcome or good first issue. We’d really appreciate it!



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add EmbeddingCache component to avoid re-embedding identical text #11476

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat: Add EmbeddingCache component to avoid re-embedding identical text #11476

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions