fix: resolve 25 test regressions from streaming retain pipeline by nicoloboschi · Pull Request #836 · vectorize-io/hindsight

nicoloboschi · 2026-04-01T15:23:19Z

Summary

Fixes 25 test regressions introduced by the 3-phase streaming retain pipeline (#722):

Per-content tags lost (18 tests) — the streaming pipeline flattened all chunks and assigned contents[0].tags to every chunk, breaking tag-based memory visibility/isolation. Fixed by adding chunk_to_content mapping so each chunk preserves its source content's tags, context, event_date, etc.
Multi-document batch tracking (1 test) — batches with per-content document_id values were merged into a single document with a random UUID. Fixed by grouping contents by document_id and processing each group independently.
Migration ID collision (1 test) — two migration files shared revision ID d6e7f8a9b0c1. Renamed trgm index migration to e8f9a0b1c2d3, fixed the dependency chain, and added missing schema prefix on DROP INDEX for multi-tenant correctness.
Graph entity inheritance (1 test) — get_graph_data queried unit_entities for observation IDs only, but observations inherit entities from source memories via source_memory_ids. Fixed by querying all_relevant_ids.
Docstring false positives (1 test) — link_utils.py docstrings contained SQL-like patterns that triggered the unqualified table reference safety check.
Config test count (1 test) — retain_chunk_batch_size was added to _CONFIGURABLE_FIELDS without updating the hierarchical config test.

Test plan

All 25 previously failing tests now pass (83 total in the affected test files)
Lint passes
Full test suite CI

The 3-phase retain pipeline (914ba79) introduced several regressions: 1. **Per-content tags lost** — streaming pipeline used `contents[0].tags` for ALL chunks, breaking tag-based visibility. Fixed by tracking chunk-to-content mapping so each chunk uses its source content's tags. 2. **Multi-document batches broken** — batches with per-content `document_id` values were merged into a single document. Fixed by grouping by document_id and processing each group independently. 3. **Migration ID collision** — `d6e7f8a9b0c1` was used by both `drop_documents_metadata` and `case_insensitive_entities_trgm_index`. Renamed trgm migration to `e8f9a0b1c2d3`, fixed chain, added missing schema prefix on DROP INDEX. 4. **Graph entity inheritance** — `get_graph_data` queried entities for observation IDs only, but observations inherit entities from source memories. Fixed by querying `all_relevant_ids`. 5. **Docstring false positives** — link_utils.py docstrings triggered the SQL schema safety test's unqualified table reference check. 6. **Config test count** — `retain_chunk_batch_size` added to `_CONFIGURABLE_FIELDS` without updating the test assertion.

nicoloboschi force-pushed the fix/streaming-retain-regressions branch from 46a5d9e to 470ed96 Compare April 1, 2026 15:31

nicoloboschi merged commit 7415ebf into main Apr 1, 2026
46 of 47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve 25 test regressions from streaming retain pipeline#836

fix: resolve 25 test regressions from streaming retain pipeline#836
nicoloboschi merged 1 commit intomainfrom
fix/streaming-retain-regressions

nicoloboschi commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicoloboschi commented Apr 1, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant