Skip to content

fix: resolve 25 test regressions from streaming retain pipeline#836

Merged
nicoloboschi merged 1 commit intomainfrom
fix/streaming-retain-regressions
Apr 1, 2026
Merged

fix: resolve 25 test regressions from streaming retain pipeline#836
nicoloboschi merged 1 commit intomainfrom
fix/streaming-retain-regressions

Conversation

@nicoloboschi
Copy link
Copy Markdown
Collaborator

Summary

Fixes 25 test regressions introduced by the 3-phase streaming retain pipeline (#722):

  • Per-content tags lost (18 tests) — the streaming pipeline flattened all chunks and assigned contents[0].tags to every chunk, breaking tag-based memory visibility/isolation. Fixed by adding chunk_to_content mapping so each chunk preserves its source content's tags, context, event_date, etc.
  • Multi-document batch tracking (1 test) — batches with per-content document_id values were merged into a single document with a random UUID. Fixed by grouping contents by document_id and processing each group independently.
  • Migration ID collision (1 test) — two migration files shared revision ID d6e7f8a9b0c1. Renamed trgm index migration to e8f9a0b1c2d3, fixed the dependency chain, and added missing schema prefix on DROP INDEX for multi-tenant correctness.
  • Graph entity inheritance (1 test)get_graph_data queried unit_entities for observation IDs only, but observations inherit entities from source memories via source_memory_ids. Fixed by querying all_relevant_ids.
  • Docstring false positives (1 test)link_utils.py docstrings contained SQL-like patterns that triggered the unqualified table reference safety check.
  • Config test count (1 test)retain_chunk_batch_size was added to _CONFIGURABLE_FIELDS without updating the hierarchical config test.

Test plan

  • All 25 previously failing tests now pass (83 total in the affected test files)
  • Lint passes
  • Full test suite CI

The 3-phase retain pipeline (914ba79) introduced several regressions:

1. **Per-content tags lost** — streaming pipeline used `contents[0].tags`
   for ALL chunks, breaking tag-based visibility. Fixed by tracking
   chunk-to-content mapping so each chunk uses its source content's tags.

2. **Multi-document batches broken** — batches with per-content
   `document_id` values were merged into a single document. Fixed by
   grouping by document_id and processing each group independently.

3. **Migration ID collision** — `d6e7f8a9b0c1` was used by both
   `drop_documents_metadata` and `case_insensitive_entities_trgm_index`.
   Renamed trgm migration to `e8f9a0b1c2d3`, fixed chain, added missing
   schema prefix on DROP INDEX.

4. **Graph entity inheritance** — `get_graph_data` queried entities for
   observation IDs only, but observations inherit entities from source
   memories. Fixed by querying `all_relevant_ids`.

5. **Docstring false positives** — link_utils.py docstrings triggered
   the SQL schema safety test's unqualified table reference check.

6. **Config test count** — `retain_chunk_batch_size` added to
   `_CONFIGURABLE_FIELDS` without updating the test assertion.
@nicoloboschi nicoloboschi force-pushed the fix/streaming-retain-regressions branch from 46a5d9e to 470ed96 Compare April 1, 2026 15:31
@nicoloboschi nicoloboschi merged commit 7415ebf into main Apr 1, 2026
46 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant