Fix two bugs in the `flag_existing_quarantined_media` background update by erikjohnston · Pull Request #19901 · element-hq/synapse

erikjohnston · 2026-07-01T10:46:10Z

The flag_existing_quarantined_media background update (added in #19558, shipped in v1.152.0) back-populates the quarantined_media_changes table with media that was already quarantined. It has two bugs, fixed here in one commit each.

1. Some quarantined remote media is silently skipped

The remote-media query paged through remote_media_cache with:

WHERE quarantined_by IS NOT NULL
    AND media_origin >= ? AND media_id > ?

This ANDs the two key columns independently rather than comparing them as a tuple. Once an origin has been fully processed (e.g. media_id reaches zzz for origin a.example), rows in a later origin whose media_id is <= the last processed media_id (e.g. b.example / aaa) fail the media_id > ? test and are never flagged.

Fixed by using a proper row-value tuple comparison:

WHERE quarantined_by IS NOT NULL
    AND (media_origin, media_id) > (?, ?)

Both the minimum supported SQLite (3.37.2) and PostgreSQL support row-value comparisons.

2. Exhausted queries keep re-running every iteration

flag_quarantined ran both the local and remote queries on every iteration. When one table was exhausted but the other still had rows, the update kept returning a positive count, so the finished table's (now empty) query needlessly re-ran on every subsequent iteration until the whole update completed.

This could add significant time to the transaction, as since the rows were deleted it could scan a significant portion of the table each time.

Fixed by tracking per-table completion (local_done / remote_done) in the background-update progress and skipping a table's query once it has returned an empty batch. The progress dict was also restructured into an incremental build for readability.

A thrid commit will cause this background job to run again. This will re-insert media again into the quarantined_media_changes, but that should be fine.

cc @turt2live @MadLittleMods

…media The `flag_existing_quarantined_media` background update paged through `remote_media_cache` with `media_origin >= ? AND media_id > ?`, which ANDs the two columns independently. Once an origin was fully processed, rows in a later origin whose `media_id` was <= the last processed `media_id` would be silently skipped and never flagged. Use a proper row-value tuple comparison `(media_origin, media_id) > (?, ?)` (matching the table's unique constraint/index), as done elsewhere in the codebase. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`flag_quarantined` ran both the local and remote queries on every iteration. Once one table was exhausted but the other still had rows to process, the background update kept returning a positive count, so the finished table's (now empty) query needlessly re-ran on every subsequent iteration until the whole update completed. Track per-table completion (`local_done`/`remote_done`) in the background update progress and skip a table's query once it has returned an empty batch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The `flag_existing_quarantined_media` background update shipped (in 94/03) with a broken remote media query that silently skipped some already-quarantined remote media, and servers may have already run it to completion. Now that the query is fixed, add a schema delta that deletes any existing background update row and re-inserts it so the update runs again from scratch and flags the media missed on the first run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add the changelog entry for PR #19901, noting the bug was introduced in v1.152.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

MadLittleMods · 2026-07-01T17:05:13Z

-            # We use a >= ? on the media origin to avoid missing records when media IDs
-            # collide between origins (the table's unique constraint is on `(media_origin, media_id)`).
-            # Filtering by `(media_origin, media_id)` also makes sure we're using an index.


We're losing this context

I'm not quite sure what context, but 99b235f makes it clearer that we're using an index and that we'll paginate over every row. The >= comment is no longer relevant.

MadLittleMods · 2026-07-01T17:05:55Z

+            # We page through `remote_media_cache` with a tuple comparison on
+            # `(media_origin, media_id)` (matching the table's unique constraint and
+            # index). Comparing the columns independently (e.g.
+            # `media_origin >= ? AND media_id > ?`) would incorrectly skip rows in a
+            # newly-reached origin whose media_id is <= the last processed media_id.


This comment is a bit clunky for what it's trying to convey.

Have tried to reword a little 99b235f

MadLittleMods · 2026-07-01T17:10:22Z

@@ -0,0 +1 @@
+Fix the `flag_existing_quarantined_media` background update skipping some quarantined remote media and re-running exhausted queries unnecessarily. Introduced in v1.152.0.


How did you notice these flaws?

I saw that it was taking ~2s to do 1 item, and while staring at the query realised that it was wrong. That didn't explain the slowness, but then I realised we were repeatedly calling the query with the same args.

turt2live · 2026-07-01T20:07:46Z

fwiw, we spotted the remote media bug after the merge but didn't consider it important enough to fix. Remote media is purged quickly enough where it doesn't provide much value into the hashing system, and if it was to be re-downloaded or accessed then the built-in Synapse auto-quarantine function would have the same effect as our tooling. Further, as we roll out hash matching at more and more levels, we're likely to catch future media before it reaches Synapse's stores anyway.

The initial import is fully intended to be best effort, not complete.

turt2live · 2026-07-01T20:11:52Z

+-- We delete any existing row first: on servers where the update already completed the row
+-- was removed, and on servers where it's still pending/mid-run this clears the stale
+-- progress so the re-insert below starts cleanly (and avoids a primary key collision).
+DELETE FROM background_updates WHERE update_name = 'flag_existing_quarantined_media';


it's a huge shame to lose so many months of progress, especially when we're still running the background update on matrix.org. Can we update the job to run after the existing one or otherwise de-duplicate data going into the table?

This is at odds with what you wrote above. Why do you care?

fwiw, we spotted the remote media bug after the merge but didn't consider it important enough to fix.

The initial import is fully intended to be best effort, not complete.

On matrix.org we can manually set to "last_local_media_id":"xJFXwTD" and then it will start from where it left of for local media.

This is at odds with what you wrote above. Why do you care?

I think we care about local media, but not about remote media, maybe?

Co-authored-by: Eric Eastwood <erice@element.io>

erikjohnston · 2026-07-02T09:00:22Z

fwiw, we spotted the remote media bug after the merge but didn't consider it important enough to fix.

I mean, fair, but I'm not a fan of leaving around broken code. I reckon it missed the majority of remote media.

(It is also causing the local media to be handled at ~0.5Hz)

erikjohnston and others added 4 commits July 1, 2026 10:15

Add changelog

5ed45d6

Add the changelog entry for PR #19901, noting the bug was introduced in v1.152.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

erikjohnston force-pushed the erikj/fix/flag-existing-quarantined-media branch from 88661e0 to 5ed45d6 Compare July 1, 2026 10:51

erikjohnston marked this pull request as ready for review July 1, 2026 13:52

erikjohnston requested a review from a team as a code owner July 1, 2026 13:52

MadLittleMods added A-Media-Repository A-Abuse Reports, media quarantine, policy servers, etc labels Jul 1, 2026

MadLittleMods approved these changes Jul 1, 2026

View reviewed changes

turt2live reviewed Jul 1, 2026

View reviewed changes

erikjohnston and others added 2 commits July 2, 2026 09:54

Update changelog.d/19901.bugfix

1c9879b

Co-authored-by: Eric Eastwood <erice@element.io>

Fixup comment

99b235f

erikjohnston merged commit cb60c15 into develop Jul 2, 2026
45 of 47 checks passed

erikjohnston deleted the erikj/fix/flag-existing-quarantined-media branch July 2, 2026 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix two bugs in the `flag_existing_quarantined_media` background update#19901

Fix two bugs in the `flag_existing_quarantined_media` background update#19901
erikjohnston merged 6 commits into
developfrom
erikj/fix/flag-existing-quarantined-media

erikjohnston commented Jul 1, 2026

Uh oh!

MadLittleMods Jul 1, 2026

Uh oh!

erikjohnston Jul 2, 2026

Uh oh!

MadLittleMods Jul 1, 2026

Uh oh!

erikjohnston Jul 2, 2026

Uh oh!

Uh oh!

MadLittleMods Jul 1, 2026

Uh oh!

erikjohnston Jul 2, 2026

Uh oh!

turt2live commented Jul 1, 2026

Uh oh!

turt2live Jul 1, 2026

Uh oh!

MadLittleMods Jul 1, 2026

Uh oh!

erikjohnston Jul 2, 2026 •

edited

Loading

Uh oh!

erikjohnston commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1 @@
		Fix the `flag_existing_quarantined_media` background update skipping some quarantined remote media and re-running exhausted queries unnecessarily. Introduced in v1.152.0.

Uh oh!

Conversation

erikjohnston commented Jul 1, 2026

1. Some quarantined remote media is silently skipped

2. Exhausted queries keep re-running every iteration

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

turt2live commented Jul 1, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erikjohnston Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erikjohnston commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

erikjohnston Jul 2, 2026 •

edited

Loading