MSC4140: rate limit management against delay_id by AndrewFerr · Pull Request #19830 · element-hq/synapse

AndrewFerr · 2026-06-05T16:07:50Z

Do this instead of rate limiting based on IP address, to avoid the possibility of multiple unauthenticated clients on a shared IP address getting put into the same rate-limit quota and getting unfairly blocked.

This follows the suggestion made by the MSC: https://github.com/matrix-org/matrix-spec-proposals/blob/toger5/expiring-events-keep-alive/proposals/4140-delayed-events-futures.md#rate-limiting-for-heartbeats

Depends on #19794. Either that should be merged first, or this PR should absorb it. (Perhaps this would be a good use-case for stacked PRs.)

Pull Request Checklist

Pull request is based on the develop branch
Pull request includes a changelog file. The entry should:
- Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
- Use markdown where necessary, mostly for code blocks.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
Code style is correct (run the linters)

This is to allow authed requests to have their own ratelimit quotas.

Plus, since element-hq#19152, the delayed event management ratelimit hasn't considered the requesting device ID, so don't mention that anymore.

Do this instead of rate limiting based on IP address, to avoid the possibility of multiple unauthenticated clients on a shared IP address getting put into the same rate-limit quota and getting unfairly blocked. This follows the suggestion made by the MSC: https://github.com/matrix-org/matrix-spec-proposals/blob/toger5/expiring-events-keep-alive/proposals/4140-delayed-events-futures.md#rate-limiting-for-heartbeats

For unauthenticated requests to delayed event management endpoints, use different rate limits for different source IP addresses, instead of having all such requests share the same limit (per delay_id).

Currently, it should be awaited only by the /cancel and /send delayed event management endpoints, but not /restart, which can be run by workers that do not set _initialized_from_db and thus can't await it.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>

Also update docstring on _mgmt_ratelimit to mention how it now keys ratelimits on delay_id + requester

AndrewFerr · 2026-06-18T14:25:35Z

If it would help with the review, I could rebase this PR's branch to exclude commits from #19794, as that is now merged.

anoadragon453 · 2026-06-25T17:04:41Z

        """
        assert self._is_master
-        await self._mgmt_ratelimit(request)
+        await self._mgmt_ratelimit(request, delay_id)


Does this weaken the rate-limiting too much? Now that a user can spam requests, as long as they cycle through (effectively unlimited) delay IDs?

I get the reasoning that an external service would end up getting rate-limited if it's managing lots of delay IDs for a bunch of users... but would it not make sense to just exclude such a service from rate-limiting?

And finally, I get that lots of logged-out users on the same corporate VPN would end up getting rate-limited due to sharing one IP... but at this point, what is the rate-limit protecting against? Seems it's only stopping a buggy client from spamming the same endpoint accidentally?

Does this weaken the rate-limiting too much? Now that a user can spam requests, as long as they cycle through (effectively unlimited) delay IDs?

The worst that such spammy requests would do is make the server look up from a database index (to check for the existence of the target delayed event). IMO this is an acceptable cost to allow for the kind of rate-limiting this PR is after, but if that's still too costly, we can consider another approach.

would it not make sense to just exclude such a service from rate-limiting?

That's not a bad idea, but then the question is how Synapse would identify such an external service, given the "service" is currently just any arbitrary HTTP client that's been given some delay IDs.

what is the rate-limit protecting against? Seems it's only stopping a buggy client from spamming the same endpoint accidentally?

Yes, be it a buggy or malicious client. It takes non-trivial work to reset a delayed event (past the initial quick check for the existence of the delayed event), so it's worth something to protect it with a rate limit.

It bears mentioning that the long-term solution to all of this is to have the external services be OAuth 2.0 clients, with a token that has scoped access to these management endpoints (detailed in MSC4140). Rate limits could then be applied per client, as the clients would be identifiable by their token. The idea of using delay_id grouping is just a stopgap measure to apply some kind of rate limit in the absence of a way to identify an unauthed external service.

AndrewFerr added 6 commits May 21, 2026 11:28

MSC4140: allow auth on management endpoints

a016de0

This is to allow authed requests to have their own ratelimit quotas.

Add changelog

6526be9

Lint

33eca5e

Update ratelimit documentation

f2ba5b3

Plus, since element-hq#19152, the delayed event management ratelimit hasn't considered the requesting device ID, so don't mention that anymore.

Merge with 'develop'

9d9f542

Merge with 'develop' again

97a65a8

AndrewFerr requested a review from a team as a code owner June 5, 2026 16:07

AndrewFerr marked this pull request as draft June 5, 2026 16:15

Johennes mentioned this pull request Jun 8, 2026

MSC4140: Cancellable delayed events matrix-org/matrix-spec-proposals#4140

Open

7 tasks

AndrewFerr marked this pull request as ready for review June 8, 2026 15:46

AndrewFerr added 8 commits June 10, 2026 10:43

Merge with 'develop' again

dc3772e

Add changelog

410f4fa

Fixup

0199f65

Still use source IP address as ratelimit fallback

5e0f44a

For unauthenticated requests to delayed event management endpoints, use different rate limits for different source IP addresses, instead of having all such requests share the same limit (per delay_id).

Update tests

f43bc62

Satisfy mypy

efe144c

Revert always awaiting on _initialized_from_db

3745bd7

Currently, it should be awaited only by the /cancel and /send delayed event management endpoints, but not /restart, which can be run by workers that do not set _initialized_from_db and thus can't await it.

AndrewFerr force-pushed the msc4140-ratelimit-delay_id branch from 8a56dfe to 3745bd7 Compare June 10, 2026 14:44

AndrewFerr commented Jun 10, 2026

View reviewed changes

Comment thread synapse/handlers/delayed_events.py

AndrewFerr mentioned this pull request Jun 10, 2026

MSC4140: allow auth on management endpoints for delayed events #19794

Merged

3 tasks

anoadragon453 and others added 5 commits June 18, 2026 10:58

Fix plural in comment

0ad97ef

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Minor wording changes

863e586

Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>

Update generated documentation

c17cdf1

Merge with updates to authed management endpoints

c50e040

Also update docstring on _mgmt_ratelimit to mention how it now keys ratelimits on delay_id + requester

Merge with 'develop' after element-hq#19794

881e33e

MadLittleMods added the A-Rate-Limit label Jun 18, 2026

anoadragon453 requested a review from Copilot June 24, 2026 20:42

Copilot started reviewing on behalf of anoadragon453 June 24, 2026 20:42 View session

This comment was marked as resolved.

Sign in to view

anoadragon453 reviewed Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MSC4140: rate limit management against delay_id#19830

MSC4140: rate limit management against delay_id#19830
AndrewFerr wants to merge 19 commits into
element-hq:developfrom
AndrewFerr:msc4140-ratelimit-delay_id

AndrewFerr commented Jun 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

AndrewFerr commented Jun 18, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

anoadragon453 Jun 25, 2026

Uh oh!

AndrewFerr Jun 26, 2026

Uh oh!

AndrewFerr Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

AndrewFerr commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Checklist

Uh oh!

Uh oh!

AndrewFerr commented Jun 18, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

anoadragon453 Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

AndrewFerr Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

AndrewFerr Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AndrewFerr commented Jun 5, 2026 •

edited

Loading