Is your feature request related to a problem? Please describe.
When the same string is embedded more than once, Haystack has no built-in way to skip the second call. This shows up in a few common cases:
- Re-indexing after a cleaner/splitter change, where most resulting chunks are identical to the previous run.
- Repeated queries in any user-facing app (FAQs, password resets, etc.), every repeat re-hits the embedding model.
- CI and test suites that embed the same fixtures on every run.
On hosted embedders this is a direct cost (e.g. text-embedding-3-large at $0.13 / 1M tokens). On local Sentence-Transformers it's CPU time. Either way, it's avoidable work.
I checked for prior art:
- haystack/components/caching/cache_checker.py is keyed on a Document metadata field (typically URL for crawled content), it doesn't handle text → vector lookups.
- The cache_params in the Sentence-Transformers backends caches the model load, not embedding results.
Describe the solution you'd like
A new EmbeddingCache component that wraps any existing TextEmbedder or DocumentEmbedder. Additive only, no existing API changes.
from haystack.components.embedders import OpenAITextEmbedder
from haystack.components.caching import EmbeddingCache, InMemoryEmbeddingStore
embedder = EmbeddingCache(
embedder=OpenAITextEmbedder(model="text-embedding-3-large"),
store=InMemoryEmbeddingStore(),
)
# First call → underlying embedder runs, vector stored.
# Repeat call → store hit, same output shape as the underlying embedder.
Describe alternatives you've considered
- Extend CacheChecker. Different semantics, it operates on Documents and metadata, not string → vector. Bolting embedding-cache logic onto it would muddy both.
- Rely on provider-side caching. OpenAI/Anthropic offer prompt caching for chat completions, not embeddings. No fallback here.
- LangChain's CacheBackedEmbeddings. Prior art that confirms the pattern is expected by users comparing frameworks. Not a reason to import LangChain.
Additional context
- Storage backends in core vs. integrations. Only in-memory in core to keep the dependency surface tiny. Disk/Redis backends belong in haystack-core-integrations.
- Async. Mirrors the wrapped embedders; store interface has async variants.
- Invalidation. Keying off embedder.to_dict() covers model/dim/normalization changes for free; an explicit version param can be added later if needed.
- Pipeline-level result caching is intentionally out of scope, that's a bigger design conversation about idempotency. This issue is just the embedder step.
👋 Hello there! This issue will be handled internally and isn’t open for external contributions. If you’d like to contribute, please take a look at issues labeled contributions welcome or good first issue. We’d really appreciate it!
Is your feature request related to a problem? Please describe.
When the same string is embedded more than once, Haystack has no built-in way to skip the second call. This shows up in a few common cases:
On hosted embedders this is a direct cost (e.g. text-embedding-3-large at $0.13 / 1M tokens). On local Sentence-Transformers it's CPU time. Either way, it's avoidable work.
I checked for prior art:
Describe the solution you'd like
A new EmbeddingCache component that wraps any existing TextEmbedder or DocumentEmbedder. Additive only, no existing API changes.
Describe alternatives you've considered
Additional context
👋 Hello there! This issue will be handled internally and isn’t open for external contributions. If you’d like to contribute, please take a look at issues labeled contributions welcome or good first issue. We’d really appreciate it!