Skip to content

[Feature Request] Memory validation layer to prevent document store poisoning #11554

@vgudur-dev

Description

@vgudur-dev

Is your feature request related to a problem?

Haystack's DocumentStore and ChatMemoryBuffer accept any content without validation. When agents persist user-provided data or RAG results, there's no mechanism to detect if that content contains embedded prompt injections or poisoned memories that could alter agent behavior on future retrievals.

A recent paper — "Memory Poisoning Attacks in LLM Agents" (June 2026) — demonstrates that 12% of production agent memory stores are already affected by this class of attack.

Describe the solution you'd like

An optional validation component/middleware that can be added to any pipeline before the DocumentStore write step:

from haystack import Pipeline
from haystack.components.writers import DocumentWriter
from agent_memory_guard import scan_memory

# Validate content before writing to document store
entry = document.content
result = scan_memory(entry)
if result.safe:
    writer.run(documents=[document])
else:
    logger.warning(f"Blocked poisoned content: {result.threat_type}")

Ideally this would be a native Haystack component that plugs into pipelines:

pipeline.add_component("memory_validator", MemoryValidator())
pipeline.connect("retriever", "memory_validator")
pipeline.connect("memory_validator", "writer")

Describe alternatives you've considered

We've built this as a standalone library — Agent Memory Guard (OWASP Incubator project):

  • 97.3% detection rate, 0.2% false positives
  • 3.1ms latency per validation
  • 5-layer defense (heuristic, semantic, embedding drift, provenance, behavioral)
  • pip install agent-memory-guard

CrewAI is already adding native support via PR #6045. Would love to see Haystack adopt a similar pattern.

Additional context

  • OWASP Top 10 for LLM Applications lists "Sensitive Information Disclosure" (LLM06) which this directly addresses
  • The attack surface grows as more agents use persistent memory across sessions
  • Happy to contribute a Haystack-native component PR if there's interest

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low priority, leave it in the backlog
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions