-
Notifications
You must be signed in to change notification settings - Fork 311
Description
Summary
The test "handles relation tracker restart" in publication_manager_test.exs:503 has a race condition that causes intermittent CI failures.
Observed in run 22354924262 on main (2026-02-24).
Error
** (exit) exited in: GenServer.call({:via, Registry, {:"Electric.ProcessRegistry:...", {Electric.Replication.PublicationManager.RelationTracker, nil}}}, {:remove_shape, "36215155-..."}, 5000)
** (EXIT) no process: the process is not alive or there's no process currently associated with the given name
Root Cause
The test at test/electric/replication/publication_manager_test.exs:503:
- Line 515:
GenServer.stop(relation_tracker_name)kills the RelationTracker - Line 519:
assert_pub_tables(ctx, [ctx.relation], 2_000)polls Postgres publication tables until they match - Line 522:
PublicationManager.remove_shape(ctx.stack_id, shape_handle)does aGenServer.callto the RelationTracker
The problem is that assert_pub_tables checks Postgres state, not whether the RelationTracker GenServer has been re-registered by the supervisor. There's a window where publication tables are correct (from the previous state) but the new RelationTracker process isn't yet alive or hasn't finished handle_continue(:restore_relations, ...).
Suggested Fix
Call RelationTracker.wait_for_restore(ctx.stack_id) before remove_shape on line 522. This function already exists (line 79-82 of relation_tracker.ex) and blocks until handle_continue(:restore_relations) completes, which guarantees the process is registered and ready.
Context: Broader CI Flakiness
While investigating, I looked at all sync-service workflow failures from the last 2 days: 12 failures vs 14 successes (~46% failure rate). The failures are spread across many test files — only 1 of the 12 was this publication_manager_test:
| Test file | Failures |
|---|---|
shape_cache_test.exs:501 |
4 |
request_batcher_test.exs:100 |
2 |
publication_manager_test.exs:503 |
1 |
api_test.exs:925 |
1 |
delete_shape_plug_test.exs:100 |
1 |
shape_db_test.exs:553 |
1 |
shape_cache_test.exs:877 |
1 |