Skip to content

Make MSC4102 "prefer unthreaded receipt" durable at insert time#19838

Open
erikjohnston wants to merge 3 commits into
developfrom
erikj/fix_receipts_fed
Open

Make MSC4102 "prefer unthreaded receipt" durable at insert time#19838
erikjohnston wants to merge 3 commits into
developfrom
erikj/fix_receipts_fed

Conversation

@erikjohnston

@erikjohnston erikjohnston commented Jun 9, 2026

Copy link
Copy Markdown
Member

Fix #19171

Part of #18537

Problem

TestThreadReceiptsInSyncMSC4102 is a persistent Complement flake (workers + federation). Example failing run.

MSC4102 requires that an unthreaded read receipt always wins over a clashing threaded one (same user, same event). Today that's enforced only at read time, in ReceiptInRoom.merge_to_content, which dedupes a clashing pair within
a single /sync response
.

That breaks down when the two receipts are persisted at different stream positions and get served in separate /sync responses. Tracing the failing run:

  1. Alice sends an unthreaded receipt for event B, then a threaded receipt for the same event B. Both are federated to hs2 as two separate m.receipt EDUs and persisted there at receipts stream positions 2 (unthreaded) and 3
    (threaded).
  2. Bob's initial sync advanced its receipt token to 2 but emitted no receipt; his next (incremental) sync window was (2,3], containing only the threaded receipt.
  3. The unthreaded receipt was never surfaced, so the threaded one "won" → MSC4102 violation → timeout.

Fix

Make "unthreaded wins" durable at insert time. In _insert_linearized_receipt_txn, drop a threaded receipt if an unthreaded receipt for the same (room, type, user) already exists for the same event. It is then never
persisted, streamed, or federated, so no sync-window split can resurface it — and on hs1 the clashing threaded receipt is never sent over federation in the first place.

Key design points:

  • The check keys off event id, mirroring the read-time dedup in ReceiptInRoom.merge_to_content (which clashes purely on (user_id, event_id), never on ordering). This keeps the two layers consistent and order-independent.
  • Because we match on event id rather than event_stream_ordering, it also covers a remote unthreaded receipt whose event we hadn't seen when it arrived (NULL event_stream_ordering, which is never backfilled outside the one-time
    background update). (Thanks to the Copilot review for catching this case.)
  • The same-thread "don't clobber a more recent event" check is unchanged; both checks now share a single query.

This is safe for notification counts: EventPushSummary.is_unread already treats the unthreaded receipt as a floor across all threads, so a threaded receipt at the same event is redundant for counts.

@erikjohnston erikjohnston force-pushed the erikj/fix_receipts_fed branch 2 times, most recently from 73621b4 to a7434ba Compare June 9, 2026 13:01
@erikjohnston erikjohnston requested a review from Copilot June 9, 2026 13:02

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes MSC4102’s “prefer unthreaded read receipt” behavior durable by enforcing the preference at receipt insert time, preventing conflicting threaded receipts from being persisted (and later resurfacing across split /sync windows).

Changes:

  • Update receipt insertion logic to drop threaded receipts when an unthreaded receipt for the same user/room/type already supersedes it.
  • Add a regression test covering the threaded-vs-unthreaded insert-time behavior.
  • Add a Towncrier bugfix newsfragment describing the user-visible fix.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
synapse/storage/databases/main/receipts.py Enforce MSC4102 preference at insert time by dropping semantically-redundant threaded receipts.
tests/storage/test_receipts.py Add regression coverage to ensure conflicting threaded receipts are dropped when an unthreaded receipt exists.
changelog.d/19838.bugfix Add release note for the MSC4102 /sync receipt ordering bugfix.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread synapse/storage/databases/main/receipts.py
Previously the "unthreaded receipt always wins over a clashing threaded
one" behaviour was only applied at read time, in
`ReceiptInRoom.merge_to_content`, which dedupes the pair within a single
/sync response. When the two receipts end up at different stream
positions and are served in separate /sync responses (e.g. when they
arrive over federation as separate EDUs), the threaded receipt could be
served on its own and incorrectly win, which is what caused the
`TestThreadReceiptsInSyncMSC4102` Complement flake.

Drop a threaded receipt at insert time if an unthreaded receipt for the
same user already exists at the same or a later event, so it never gets
persisted, streamed or federated. This is safe for notification counts,
since the unthreaded receipt already acts as a floor across all threads.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@erikjohnston erikjohnston force-pushed the erikj/fix_receipts_fed branch from a7434ba to 8750a01 Compare June 9, 2026 15:59
@erikjohnston erikjohnston marked this pull request as ready for review June 9, 2026 17:34
@erikjohnston erikjohnston requested a review from a team as a code owner June 9, 2026 17:34
)
self.assertEqual(res, {self.room_id1: event1_2_id, self.room_id2: event2_1_id})

def test_threaded_receipt_dropped_when_unthreaded_exists(self) -> None:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the TestThreadReceiptsInSyncMSC4102 Complement test should also be updated to better separate read receipts across EDU's so it can consistently stress the failure case. Or could be an additional new test that does that (engineered homeserver)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have updated it in matrix-org/complement#881

Comment on lines +943 to +949
# Doing this at insert time, as well as when serving receipts,
# makes the "prefer unthreaded" behaviour durable: otherwise the
# threaded receipt could be served on its own in a later /sync
# response (e.g. when the unthreaded and threaded receipts
# arrive in separate federation EDUs and so end up at different
# stream positions), causing the client to incorrectly see it
# win.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we drop the /sync behavior in favor the insert solution we have here?

Perhaps instead an assert at the /sync level.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, possibly. We wouldn't want to do it immediately to handle existing receipts in the DB.

Comment thread synapse/storage/databases/main/receipts.py Outdated
Comment thread synapse/storage/databases/main/receipts.py Outdated
Comment thread synapse/storage/databases/main/receipts.py Outdated
Comment thread changelog.d/19838.bugfix
@@ -0,0 +1 @@
Fix a bug where a threaded read receipt could incorrectly win over a clashing unthreaded one (MSC4102) when the two were served in separate `/sync` responses, e.g. when received over federation as separate EDUs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you reproduce the TestThreadReceiptsInSyncMSC4102 failure locally? Or did the LLM just spot what was wrong based on the test flake output from CI?

I struggled: #19171

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a case of throwing LLM at the logs of a flakey run. What it spit out made sense and seemed to match what was happening, and the suggested fix (this) made sense in the context of what we were doing when reading as well.

Comment thread changelog.d/19838.bugfix
@@ -0,0 +1 @@
Fix a bug where a threaded read receipt could incorrectly win over a clashing unthreaded one (MSC4102) when the two were served in separate `/sync` responses, e.g. when received over federation as separate EDUs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per my points in matrix-org/complement#881, I think this may be missing the mark on the behavior of MSC4102

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? We want to make sure that when combining receipts for the same event ID we want the unthreaded to win? MSC4102 does only talk about EDUs, but I think the idea is still that you want unthreaded to win everywhere?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I think the idea is still that you want unthreaded to win everywhere?

I don't think there is any spec for this.

We want to record all of the read receipts and only deduplicate if there are conflicting read receipts when creating m.receipt EDU's. We seem to already be doing this.

And to better clarify and exaggerate why this is fine, it's okay to send and receive a threaded read receipt for the same event two hours after the unthreaded read receipt. See MSC3771 which explains,

This MSC proposes allowing the same receipt type to exist multiple times in a room per user:

  • Once for the unthreaded timeline.
  • Once for the main timeline in the room.
  • Once per threaded timeline.

And the only restriction is that "this still does not allow a caller to move their receipts backwards in a room"


If this breakdown is to be believed, the bug appears to be in the following behavior especially "The unthreaded receipt was never surfaced" part. Why is /sync advancing past its persisted position?

  1. Alice sends an unthreaded receipt for event B, then a threaded receipt for the same event B. Both are federated to hs2 as two separate m.receipt EDUs and persisted there at receipts stream positions 2 (unthreaded) and 3
    (threaded).
  2. Bob's initial sync advanced its receipt token to 2 but emitted no receipt; his next (incremental) sync window was (2,3], containing only the threaded receipt.
  3. The unthreaded receipt was never surfaced, so the threaded one "won" → MSC4102 violation → timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Complement TestThreadReceiptsInSyncMSC4102 is flaky

3 participants