|
| 1 | +# MSC4280: Hint that a /rooms/{room_id}/messages request is interactive |
| 2 | + |
| 3 | +The endpoint [/rooms/{room_id}/messages](https://spec.matrix.org/latest/client-server-api/#get_matrixclientv3roomsroomidmessages) |
| 4 | +is used by clients to retrieve older events from a homeserver, when the direction is set to |
| 5 | +backwards (a phenomenon also called "back-pagination" throughout this MSC). This can be useful in a |
| 6 | +few contexts: |
| 7 | + |
| 8 | +- after a gappy sync (i.e. that set the `limited` flag), so as to retrieve events included in the |
| 9 | + gap, that is, all the events not included in the last sync response, and that have been sent to the |
| 10 | + homeserver after the last time we've sync'd. This applies both to sync v2 and simplified sliding |
| 11 | + sync. |
| 12 | +- as an out-of-sync mechanism to go through all the events in a room from the end to the start, so |
| 13 | + as to apply some mass operation on them, like indexing them for a search engine. |
| 14 | + |
| 15 | +In fact, this mechanism is crucial in the context of [simplified sliding sync](https://github.com/matrix-org/matrix-spec-proposals/pull/4186). |
| 16 | +This sync mechanism indeed generates thin server responses including a minimal set of events |
| 17 | +(controlled by the `timeline_limit` request parameter), so as to provide better initial sync times |
| 18 | +and ultimately more responsive clients. The client is then expected to use the |
| 19 | +`/rooms/{room_id}/messages` endpoint to retrieve the previous events of a room. |
| 20 | + |
| 21 | +As a result, clients should be able to expect this endpoint to be *fast*, when the user session is |
| 22 | +interactive (i.e. a user is waiting for these events to be retrieved). While it's hard to define |
| 23 | +*how* fast, it's expected that this endpoint would return in a matter of seconds, in the worst |
| 24 | +cases. Otherwise, the user experience on the clients may be severely degraded. |
| 25 | + |
| 26 | +However, some server implementations, including |
| 27 | +[Synapse](https://github.com/element-hq/synapse/blob/5c84f258095535aaa2a4a04c850f439fd00735cc/synapse/handlers/pagination.py#L575-L584), |
| 28 | +[Conduit](https://gitlab.com/famedly/conduit/-/blob/a7e6f60b41122761422df2b7bcc0c192416f9a28/src/api/client_server/message.rs#L201) |
| 29 | +and |
| 30 | +[Conduwuit](https://github.com/girlbossceo/conduwuit/blob/0f81c1e1ccdcb0c5c6d5a27e82f16eb37b1e61c8/src/api/client/message.rs#L94-L101), |
| 31 | +may generate, under some implementation-specific conditions, federation requests to |
| 32 | +[backfill](https://spec.matrix.org/v1.14/server-server-api/#backfilling-and-retrieving-missing-events) |
| 33 | +the room timeline, and fetch more events from other servers. This slows down reception of the |
| 34 | +response in the client, since it now be blocking on the server waiting for the federation responses |
| 35 | +to come. Moreover, the time spent retrieving those responses is theoretically unbounded, so the |
| 36 | +homeserver and the clients may have to wait forever for such requests to complete. |
| 37 | + |
| 38 | +We need a more responsive way to fetch older events from the server, without having to wait for |
| 39 | +federation responses to come back. This is the *raison d'être* of this MSC. |
| 40 | + |
| 41 | +## Proposal |
| 42 | + |
| 43 | +It is proposed that the `/rooms/{room_id}/messages` endpoint be modified to allow clients to |
| 44 | +specify a new boolean query parameter `interactive`, which indicates that the client is interested |
| 45 | +in getting the response *quickly*. |
| 46 | + |
| 47 | +If the parameter is missing, then it's considered to be `false` by default. Thus, this is not a |
| 48 | +semantics breaking change, in that the server behavior will remain the same if the query parameter |
| 49 | +hasn't been set. |
| 50 | + |
| 51 | +When the query parameter is set to `true`, then the server is expected to do a best-effort attempt |
| 52 | +at providing a response *in a reasonably short time*. Implementations may use one of the following |
| 53 | +strategies to achieve this: |
| 54 | + |
| 55 | +- avoid blocking on a backfill request to other homeservers, by not starting such requests at all, |
| 56 | + or by starting them in the background in a non-blocking way. |
| 57 | +- start the backfill request, and race between waiting for its completion and timing out after a |
| 58 | + short amount of time. This can be a nice tradeoff in case backfill requests resolve quickly. |
| 59 | +- not do anything differently. This doesn't solve the problem, but the query parameter really is a |
| 60 | + hint that the response is expected to come in quickly, not a strong requirement. |
| 61 | +- do something completely different, not mentioned in this MSC, that achieves the same goal. |
| 62 | + |
| 63 | +## Potential issues |
| 64 | + |
| 65 | +Before, it was possible that clients would miss events in a room, because they back-paginated |
| 66 | +through it using `/messages`, and the server received new events after a netsplit, at a position that |
| 67 | +the client had already paginated through. This would result in the client not receiving those |
| 68 | +events, or receiving them through sync but in a non-topological ordering (i.e. an ordering that |
| 69 | +would be different that the one they would've observed by paginating with `/messages`). |
| 70 | + |
| 71 | +This MSC doesn't resolve this problem, and it may make it more apparent on the contrary, if *all* |
| 72 | +`/messages` requests end up *not* causing any federation backfill. The most likely consequence of |
| 73 | +this is that events might be more frequently misordered across clients. |
| 74 | + |
| 75 | +## Alternatives |
| 76 | + |
| 77 | +Instead of an additional query parameter, this MSC could mandate that this becomes the expected |
| 78 | +behavior of all the implementations. This would be an implicit breaking change, and it may inhibit |
| 79 | +use cases where clients might prefer a perfectly backfilled room over a quick response time. |
| 80 | + |
| 81 | +Since this problem is more frequent with simplified sliding sync, one could imagine that a client |
| 82 | +would find a simplified-sliding-sync specific solution. For instance, it could increase the |
| 83 | +`timeline_limit` window to get more and more events from the end of the room, up to the previous |
| 84 | +latest event they knew about, and thus *not* cause backfill requests. This is a workaround that |
| 85 | +would work, but not be optimal in terms of bandwidth and server CPU activity, as it would mean |
| 86 | +including lots of events the client has already seen before (viz., the increasing tail of the |
| 87 | +room's timeline). |
| 88 | + |
| 89 | +We could also have a new separate paginated endpoint to retrieve the previous events in the *sync* |
| 90 | +ordering, thus not causing any backfill requests. It would be strictly more work to implement, and |
| 91 | +it is unclear that it would achieve more than the current proposal. |
0 commit comments