Rust database access via Python database connection pool v2 by MadLittleMods · Pull Request #19878 · element-hq/synapse

MadLittleMods · 2026-06-24T23:54:55Z

Based on the explorations done in #19824 and #19846,

Rust database access via Python database connection pool

This is a stepping stone before we can go full Rust everywhere. We're providing a generic interface as we want database access to work in Synapse and synapse-rust-apps. In synapse-rust-apps, we will use a tokio-postgres based database connection pool so it's full Rust.

We want to avoid the situation where we have two database connection pools (one for Python, one for Rust) as we've run into connection exhaustion problems on Matrix.org before.

Why `runInteraction(...)`?

Using the same runInteraction pattern that we already have in Synapse means we can port over existing Synapse code/endpoints without much thought. But this pattern also makes sense because we want¹ transactions to have repeatable-read isolation (easy to think about, less foot-guns). Having everything thappen in a function callback means we can do retries for serialization/deadlock errors.

When an application receives this error message, it should abort the current transaction and retry the whole transaction from the beginning. The second time through, the transaction will see the previously-committed change as part of its initial view of the database, so there is no logical conflict in using the new version of the row as the starting point for the new transaction's update.

Note that only updating transactions might need to be retried; read-only transactions will never have serialization conflicts.

-- https://www.postgresql.org/docs/current/transaction-iso.html#XACT-REPEATABLE-READ

As a note, this strategy is less of an impedance mismatch (aligns more closely) with Synapse so the glue code for the python_db_pool should also be simpler.

How does this interact with logcontext (`LoggingContext`)?

See docs on log contexts for more background.

We already support normal logging from Rust -> Python with pyo3-log and log but as soon as we pass a thread boundary, everything is logged against the sentinel log context. Normally, we want logs and CPU/DB usage correlated with the request that spawned the work.

You can see how I took a stab at fixing this in #19846 by capturing the logcontext in a Tokio task local and re-activating as necessary. For example, in that PR, I reactivated the logcontext in run_python_awaitable(...) which we use to call runInteraction(...) from the Rust side which means all of the database usage is correlated with the request as expected. It also means any log:info!(...) done in run_interaction(...) is correlated correctly. But there needs to be a better story for when you want to log everywhere else.

I haven't explored tracking CPU usage on the Rust side.

I've left all of this out of this PR as I think it will be better to tackle this as a dedicated follow-up. For example, I'm thinking about instead creating a new LoggingContext with the parent_context set to the calling context and try to avoid needing to call set_current_context(...) on the Python side where possible (like tracking CPU).

Testing strategy

Added some tests that exercise some async Rust handlers for the /versions endpoint:

SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.rest.client.test_versions.VersionsTestCase

Real-world:

poetry run synapse_homeserver --config-path homeserver.yaml
GET http://localhost:8008/_matrix/client/versions

Pull Request Checklist

Pull request is based on the develop branch
Pull request includes a changelog file. The entry should:
- Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
- Use markdown where necessary, mostly for code blocks.
- End with either a period (.) or an exclamation mark (!).
- Start with a capital letter.
- Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
Code style is correct (run the linters)

To note: Ideally, we'd want the least isolation possible but the problem is that there is no tooling to yell at you when your queries/logic is wrong so repeatable-read isolation is a great balance. ↩

`/versions` is a good candidate as its is the simplest endpoint that uses the database. We use `tests/rest/client/test_versions.py` as a sanity check that all of this new Rust database access works.

MadLittleMods · 2026-06-25T00:09:19Z

@@ -0,0 +1 @@
+Allow Rust code to have database access via Python database connection pool.


I suggest reviewing this commit by commit.

It first introduces the database interface structure, then Store usage, and then finally refactoring the /versions endpoint to be handled in Rust.

The /versions endpoint is the simplest endpoint I could find that still had some database access. Hopefully the refactor on /versions isn't that controversial as it's not really the point of this PR. We can always remove it from this PR but it's just here as a sanity check that all of this works.

I've looked over and refined all of this but feel free to call out whatever as I might be cargo-culting too much from the LLM splat that this is based on, #19846 (that PR is a combination of LLM splats and manual refinement)

MadLittleMods · 2026-06-25T00:12:00Z

+        # XXX: We must create the Rust HTTP client before we call `reactor.run()` below.
+        # Twisted's `MemoryReactor` doesn't invoke `callWhenRunning` callbacks if it's
+        # already running and we rely on that to start the Tokio thread pool in Rust. In
+        # the future, this may not matter, see https://github.com/twisted/twisted/pull/12514


As an update, twisted/twisted#12514 is finally part of a Twisted release 26.4.0 (2026-05-11). If we updated our Twisted version, we could probably get rid of all of this ugliness.

But that may be a few years away given our deprecation policy considers a "no-brainer" upgrade once it's available in both the latest Debian Stable (currently Twisted 24.11.0) and Ubuntu LTS repositories (currently Twisted 25.5.0)

MadLittleMods · 2026-06-25T00:12:33Z

+    // We can't check this here because of circular import issues
+    // logging_context_module(py)?;


Perhaps this will get fixed by #19876 (comment)

MadLittleMods · 2026-06-25T00:13:29Z

                RoomCreationPreset.TRUSTED_PRIVATE_CHAT,
                RoomCreationPreset.PUBLIC_CHAT,
-            ]
+            }


Made these a set to better represent what it is and more efficient lookups. Touched this because I had to model it in SynapseHomeServerConfig on the Rust side

MadLittleMods · 2026-06-25T00:15:22Z

+
+/// Experimental features the server supports
+#[derive(Serialize, Debug, Clone)]
+pub struct UnstableFeatureMap {


For reference, this is a manual translation from synapse/rest/client/versions.py#L105-L213 (not a prompt and pray it's right). I also already asked an LLM to review and spot any mistakes.

MadLittleMods · 2026-06-25T00:17:39Z

@@ -0,0 +1,343 @@
+/*


For reference, you can also see what a Rust tokio-postgres based DatabasePool looks like in #19846

It's completely unscrutinized but the point is just that it's possible and we can care about it once we start using it in synapse-rust-apps.

MadLittleMods · 2026-06-25T00:19:51Z

+                // We fire-and-forget using `run_in_background`. Re-using
+                // `run_in_background` also makes sure the awaitable gets run with the
+                // current logcontext while following the logcontext rules.
+                //
+                // FIXME: Currently runs in the sentinel logcontext because we don't manage it here


You can see how handling logcontext might look like in #19846 but per my explanation in the PR description here, it's being left out in favor of a follow-up PR.

MadLittleMods · 2026-06-25T00:23:32Z

+logger = logging.getLogger(__name__)
+
+
+class VersionsTestCase(unittest.HomeserverTestCase):


I'm relying on these tests to sanity check that the whole Rust database access thing works.

Is this good enough? Should I be writing tests on the Rust side? Seems hard/weird as our only DatabasePool implementation requires Synapse to be running. How do we make that happen?

I wish I could write some specific Rust db_pool.run_interaction(...)/txn.query(...) scenarios and test results better but generally I guess this is good enough. We could expose that function and run it from here 🤷

MadLittleMods · 2026-06-25T00:24:39Z

@@ -301,15 +302,59 @@ def transport(self) -> "FakeChannel":
    def await_result(self, timeout_ms: int = 1000) -> None:


These changes are based on the same decisions being made in #19871

This seems to cause our trial tests not to finish in CI 🤔

Even tried these changes by themselves in #19879 which also reproduces.

I'll have to figure it out tomorrow but I think this PR is still good to review otherwise.

Split off to #19879

MadLittleMods · 2026-06-25T00:38:07Z

+struct RustHandlers {
+    versions: Py<versions::VersionsHandler>,
+}


Opinions on self.rust_handlers.versions vs exposing handlers individually?

If this is the pattern we like, we should also move rust/src/rendezvous / rust/src/msc4388_rendezvous here as well (follow-up PR).

MadLittleMods · 2026-06-25T00:39:39Z

+
+#[pyclass]
+pub struct VersionsHandler {
+    pub global_unstable_feature_map: Arc<UnstableFeatureMap>,


We have the UnstableFeatureMap abstraction separate from SynapseHomeServerConfig as we potentially want to re-use these handlers as part of synapse-rust-apps, Rusty homeserver, etc. We don't want to bog it down with Synapse specific things.

MadLittleMods · 2026-06-25T00:42:06Z

+            "v1.10".to_string(),
+            "v1.11".to_string(),
+            "v1.12".to_string(),
+        ]),


To make this actually generic (usable outside of Synapse), we probably want to pass this in as well but I've left it for now ⏩

MadLittleMods · 2026-06-25T16:21:57Z

+    // ```
+    // db_pool.run_interaction("description", async move |txn| {
+    //     /* do stuff with txn */
+    // })
+    // ```


This kind of thing may be possible. It seems like sea-orm accomplished this somehow:

Feature Request: Support for async closures SeaQL/sea-orm#2749 (comment)

https://blog.weiznich.de/blog/diesel-async-0-9/

In any case, we can address this in a follow-up PR.

See #19879 See #19878 (comment)

…-db-pool2

erikjohnston

Architecturally this looks sensible. Other than deciding when it's appropriate to panic, the rest are either some simplifications, stylistic nits or comments.

erikjohnston · 2026-06-29T10:46:28Z

+            "v1.9".to_string(),
+            "v1.10".to_string(),
+            "v1.11".to_string(),
+            "v1.12".to_string(),


Would be easier if these were &'static str instead

erikjohnston · 2026-06-29T11:05:48Z

+        // The clone here isn't the best but better than manually composing things
+        ..global_unstable_feature_map.clone()


Fun little thing, the compiler can detect that all of the used values are Copy and so global_unstable_feature_map doesn't have anything moved out of it and thus doesn't get moved.

Suggested change

// The clone here isn't the best but better than manually composing things

..global_unstable_feature_map.clone()

..*global_unstable_feature_map

AIUI this is because .. is basically syntactic sugar for:

let unstable_feature_map = UnstableFeatureMap { msc3575: msc3575_enabled, msc3881: msc3881_enabled, msc2326: global_unstable_feature_map.msc2326, // etc. };

and so at no point is global_unstable_feature_map consumed.

c.f. https://doc.rust-lang.org/book/ch05-01-defining-structs.html#creating-instances-with-struct-update-syntax

erikjohnston · 2026-06-29T12:44:40Z

+}
+
+impl DatabaseEngine {
+    //[inline]


Suggested change

//[inline]

erikjohnston · 2026-06-29T12:54:09Z

+                        // Alternatively, we could just use `futures::executor::block_on`
+                        // which is probably cleaner but a single-shot poll is more
+                        // enforcing of the concept we want to represent.
+                        match poll_once(func(&mut txn)) {


I think we can use FutureExt::now_or_never for this? Rather than writing our own?

erikjohnston · 2026-06-29T12:55:03Z

+                        // enforcing of the concept we want to represent.
+                        match poll_once(func(&mut txn)) {
+                            Poll::Ready(Ok(value)) => {
+                                *callback_slot.lock().unwrap() = Some(Ok(value));


We should replace .unwrap() with something.

erikjohnston · 2026-06-29T13:24:37Z

+        .expect("`database_engine` type must have a name")
+        .to_str()
+        .expect("`database_engine` type name must be valid UTF-8")
+        .to_owned();


I'm not convinced we should be using panics outside of very specific cases, and can end up with much worse UX depending on the consumer if they're hit. E.g. in synapse-rust-apps this will cause us to abort the HTTP connection rather than return a 500-error I believe?

Plus in this instance we're making things much more verbose, compared with: let name = txn_py.getattr("database_engine")?.get_type().name()?;

I don't really see any upside in using relying on panic, given we're returning a PyErr throughout the call chain anyway.

If we wanted to better catch e.g. people renaming database_engine, then we should do this lookup during startup so that we catch it early (and can abort startup).

erikjohnston · 2026-06-29T13:27:32Z

+                        // Otherwise, we should only find a single row for this (user, feature)
+                        [row] => Some(row.try_get(0)?),
+                        _ => {
+                            panic!("Programming error (SQL query probably doesn't match our expectations)")


Again, not sure panic is the right thing here, especially as this is reliant on external components to be correct, and thus this isn't some local invariant being broken.

erikjohnston · 2026-06-29T13:28:58Z

+            None,
+            move |args, _kwargs| -> PyResult<Py<PyAny>> {
+                let value = args.get_item(0)?.unbind();
+                if let Some(tx) = success_sender.lock().unwrap().take() {


We should do something with this unwrap, either error or make an .except if we know it can't panic.

erikjohnston · 2026-06-29T13:37:21Z

+        channel = self.make_request(
+            "GET",
+            "/_matrix/client/versions",
+            content={},


Suggested change

content={},

GET requests don't have request bodies.

erikjohnston · 2026-06-29T13:40:39Z

+}
+
+#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord)]
+pub enum RoomCreationPreset {


The rest of the file could all use some docstrings

erikjohnston · 2026-06-29T14:55:20Z

+struct VersionsResponse {
+    versions: Vec<String>,
+    /// as per MSC1497
+    unstable_features: std::collections::BTreeMap<String, bool>,


Suggested change

unstable_features: std::collections::BTreeMap<String, bool>,

unstable_features: UnstableFeatureMap,

should work fine, no?

erikjohnston · 2026-06-29T14:56:38Z

+    MSC3575,
+    #[serde(rename = "msc4222")]
+    MSC4222,
+}


Rather than using serde for this, feels cleaner to do:

impl PerUserExperimentalFeature { fn as_str(&self) -> &'static str { match self { Self::MSC3881 => "msc3881", Self::MSC3575 => "msc3575", Self::MSC4222 => "msc4222", } } }

especially since we don't Deserialize

MadLittleMods added 5 commits June 24, 2026 18:29

Add Rust database interface

9714a94

Add run_python_awaitable(...)

6da263b

Add Python implementation of the DatabasePool

cee06f9

Add Store with example usage of run_interaction(...)

9225c8e

Update /versions endpoint to use Rust

d978a66

`/versions` is a good candidate as its is the simplest endpoint that uses the database. We use `tests/rest/client/test_versions.py` as a sanity check that all of this new Rust database access works.

MadLittleMods added rust Z-Rust labels Jun 24, 2026

Add changelog

35163d8