Skip to content

Rust database access via Python database connection pool v2#19878

Open
MadLittleMods wants to merge 10 commits into
developfrom
madlittlemods/rust-db-access-using-python-db-pool2
Open

Rust database access via Python database connection pool v2#19878
MadLittleMods wants to merge 10 commits into
developfrom
madlittlemods/rust-db-access-using-python-db-pool2

Conversation

@MadLittleMods

@MadLittleMods MadLittleMods commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Based on the explorations done in #19824 and #19846,


Rust database access via Python database connection pool

This is a stepping stone before we can go full Rust everywhere. We're providing a generic interface as we want database access to work in Synapse and synapse-rust-apps. In synapse-rust-apps, we will use a tokio-postgres based database connection pool so it's full Rust.

We want to avoid the situation where we have two database connection pools (one for Python, one for Rust) as we've run into connection exhaustion problems on Matrix.org before.

Why runInteraction(...)?

Using the same runInteraction pattern that we already have in Synapse means we can port over existing Synapse code/endpoints without much thought. But this pattern also makes sense because we want1 transactions to have repeatable-read isolation (easy to think about, less foot-guns). Having everything thappen in a function callback means we can do retries for serialization/deadlock errors.

When an application receives this error message, it should abort the current transaction and retry the whole transaction from the beginning. The second time through, the transaction will see the previously-committed change as part of its initial view of the database, so there is no logical conflict in using the new version of the row as the starting point for the new transaction's update.

Note that only updating transactions might need to be retried; read-only transactions will never have serialization conflicts.

-- https://www.postgresql.org/docs/current/transaction-iso.html#XACT-REPEATABLE-READ

As a note, this strategy is less of an impedance mismatch (aligns more closely) with Synapse so the glue code for the python_db_pool should also be simpler.

How does this interact with logcontext (LoggingContext)?

See docs on log contexts for more background.

We already support normal logging from Rust -> Python with pyo3-log and log but as soon as we pass a thread boundary, everything is logged against the sentinel log context. Normally, we want logs and CPU/DB usage correlated with the request that spawned the work.

You can see how I took a stab at fixing this in #19846 by capturing the logcontext in a Tokio task local and re-activating as necessary. For example, in that PR, I reactivated the logcontext in run_python_awaitable(...) which we use to call runInteraction(...) from the Rust side which means all of the database usage is correlated with the request as expected. It also means any log:info!(...) done in run_interaction(...) is correlated correctly. But there needs to be a better story for when you want to log everywhere else.

I haven't explored tracking CPU usage on the Rust side.

I've left all of this out of this PR as I think it will be better to tackle this as a dedicated follow-up. For example, I'm thinking about instead creating a new LoggingContext with the parent_context set to the calling context and try to avoid needing to call set_current_context(...) on the Python side where possible (like tracking CPU).

Testing strategy

Added some tests that exercise some async Rust handlers for the /versions endpoint:

SYNAPSE_TEST_LOG_LEVEL=INFO poetry run trial tests.rest.client.test_versions.VersionsTestCase

Real-world:

  1. poetry run synapse_homeserver --config-path homeserver.yaml
  2. GET http://localhost:8008/_matrix/client/versions

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

Footnotes

  1. To note: Ideally, we'd want the least isolation possible but the problem is that there is no tooling to yell at you when your queries/logic is wrong so repeatable-read isolation is a great balance.

`/versions` is a good candidate as its is the simplest endpoint that uses the database.

We use `tests/rest/client/test_versions.py` as a sanity check that all of this
new Rust database access works.
Comment thread changelog.d/19878.misc
@@ -0,0 +1 @@
Allow Rust code to have database access via Python database connection pool.

@MadLittleMods MadLittleMods Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest reviewing this commit by commit.

It first introduces the database interface structure, then Store usage, and then finally refactoring the /versions endpoint to be handled in Rust.

The /versions endpoint is the simplest endpoint I could find that still had some database access. Hopefully the refactor on /versions isn't that controversial as it's not really the point of this PR. We can always remove it from this PR but it's just here as a sanity check that all of this works.

I've looked over and refined all of this but feel free to call out whatever as I might be cargo-culting too much from the LLM splat that this is based on, #19846 (that PR is a combination of LLM splats and manual refinement)

Comment on lines +43 to +46
# XXX: We must create the Rust HTTP client before we call `reactor.run()` below.
# Twisted's `MemoryReactor` doesn't invoke `callWhenRunning` callbacks if it's
# already running and we rely on that to start the Tokio thread pool in Rust. In
# the future, this may not matter, see https://github.com/twisted/twisted/pull/12514

@MadLittleMods MadLittleMods Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an update, twisted/twisted#12514 is finally part of a Twisted release 26.4.0 (2026-05-11). If we updated our Twisted version, we could probably get rid of all of this ugliness.

But that may be a few years away given our deprecation policy considers a "no-brainer" upgrade once it's available in both the latest Debian Stable (currently Twisted 24.11.0) and Ubuntu LTS repositories (currently Twisted 25.5.0)

Comment thread rust/src/deferred.rs
Comment on lines +282 to +283
// We can't check this here because of circular import issues
// logging_context_module(py)?;

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this will get fixed by #19876 (comment)

Comment thread synapse/config/room.py
RoomCreationPreset.TRUSTED_PRIVATE_CHAT,
RoomCreationPreset.PUBLIC_CHAT,
]
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made these a set to better represent what it is and more efficient lookups. Touched this because I had to model it in SynapseHomeServerConfig on the Rust side


/// Experimental features the server supports
#[derive(Serialize, Debug, Clone)]
pub struct UnstableFeatureMap {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, this is a manual translation from synapse/rest/client/versions.py#L105-L213 (not a prompt and pray it's right). I also already asked an LLM to review and spot any mistakes.

@@ -0,0 +1,343 @@
/*

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, you can also see what a Rust tokio-postgres based DatabasePool looks like in #19846

It's completely unscrutinized but the point is just that it's possible and we can care about it once we start using it in synapse-rust-apps.

Comment thread rust/src/deferred.rs
Comment on lines +205 to +209
// We fire-and-forget using `run_in_background`. Re-using
// `run_in_background` also makes sure the awaitable gets run with the
// current logcontext while following the logcontext rules.
//
// FIXME: Currently runs in the sentinel logcontext because we don't manage it here

@MadLittleMods MadLittleMods Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see how handling logcontext might look like in #19846 but per my explanation in the PR description here, it's being left out in favor of a follow-up PR.

logger = logging.getLogger(__name__)


class VersionsTestCase(unittest.HomeserverTestCase):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm relying on these tests to sanity check that the whole Rust database access thing works.

Is this good enough? Should I be writing tests on the Rust side? Seems hard/weird as our only DatabasePool implementation requires Synapse to be running. How do we make that happen?

I wish I could write some specific Rust db_pool.run_interaction(...)/txn.query(...) scenarios and test results better but generally I guess this is good enough. We could expose that function and run it from here 🤷

Comment thread tests/server.py
Comment thread tests/server.py
@@ -301,15 +302,59 @@ def transport(self) -> "FakeChannel":
def await_result(self, timeout_ms: int = 1000) -> None:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are based on the same decisions being made in #19871

@MadLittleMods MadLittleMods Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to cause our trial tests not to finish in CI 🤔

Even tried these changes by themselves in #19879 which also reproduces.

I'll have to figure it out tomorrow but I think this PR is still good to review otherwise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split off to #19879

Comment thread rust/src/handlers/mod.rs
Comment on lines +31 to +33
struct RustHandlers {
versions: Py<versions::VersionsHandler>,
}

@MadLittleMods MadLittleMods Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opinions on self.rust_handlers.versions vs exposing handlers individually?

If this is the pattern we like, we should also move rust/src/rendezvous / rust/src/msc4388_rendezvous here as well (follow-up PR).


#[pyclass]
pub struct VersionsHandler {
pub global_unstable_feature_map: Arc<UnstableFeatureMap>,

@MadLittleMods MadLittleMods Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the UnstableFeatureMap abstraction separate from SynapseHomeServerConfig as we potentially want to re-use these handlers as part of synapse-rust-apps, Rusty homeserver, etc. We don't want to bog it down with Synapse specific things.

"v1.10".to_string(),
"v1.11".to_string(),
"v1.12".to_string(),
]),

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this actually generic (usable outside of Synapse), we probably want to pass this in as well but I've left it for now ⏩

@MadLittleMods MadLittleMods marked this pull request as ready for review June 25, 2026 01:59
@MadLittleMods MadLittleMods requested a review from a team as a code owner June 25, 2026 01:59
@erikjohnston erikjohnston self-assigned this Jun 25, 2026
Comment on lines +98 to +102
// ```
// db_pool.run_interaction("description", async move |txn| {
// /* do stuff with txn */
// })
// ```

@MadLittleMods MadLittleMods Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of thing may be possible. It seems like sea-orm accomplished this somehow:

In any case, we can address this in a follow-up PR.

@erikjohnston erikjohnston left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Architecturally this looks sensible. Other than deciding when it's appropriate to panic, the rest are either some simplifications, stylistic nits or comments.

"v1.9".to_string(),
"v1.10".to_string(),
"v1.11".to_string(),
"v1.12".to_string(),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be easier if these were &'static str instead

Comment on lines +126 to +127
// The clone here isn't the best but better than manually composing things
..global_unstable_feature_map.clone()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fun little thing, the compiler can detect that all of the used values are Copy and so global_unstable_feature_map doesn't have anything moved out of it and thus doesn't get moved.

Suggested change
// The clone here isn't the best but better than manually composing things
..global_unstable_feature_map.clone()
..*global_unstable_feature_map

AIUI this is because .. is basically syntactic sugar for:

let unstable_feature_map = UnstableFeatureMap {
      msc3575: msc3575_enabled,
      msc3881: msc3881_enabled,
     
      msc2326: global_unstable_feature_map.msc2326,
      // etc.
};

and so at no point is global_unstable_feature_map consumed.

c.f. https://doc.rust-lang.org/book/ch05-01-defining-structs.html#creating-instances-with-struct-update-syntax

}

impl DatabaseEngine {
//[inline]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//[inline]

// Alternatively, we could just use `futures::executor::block_on`
// which is probably cleaner but a single-shot poll is more
// enforcing of the concept we want to represent.
match poll_once(func(&mut txn)) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use FutureExt::now_or_never for this? Rather than writing our own?

// enforcing of the concept we want to represent.
match poll_once(func(&mut txn)) {
Poll::Ready(Ok(value)) => {
*callback_slot.lock().unwrap() = Some(Ok(value));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should replace .unwrap() with something.

.expect("`database_engine` type must have a name")
.to_str()
.expect("`database_engine` type name must be valid UTF-8")
.to_owned();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we should be using panics outside of very specific cases, and can end up with much worse UX depending on the consumer if they're hit. E.g. in synapse-rust-apps this will cause us to abort the HTTP connection rather than return a 500-error I believe?

Plus in this instance we're making things much more verbose, compared with: let name = txn_py.getattr("database_engine")?.get_type().name()?;

I don't really see any upside in using relying on panic, given we're returning a PyErr throughout the call chain anyway.

If we wanted to better catch e.g. people renaming database_engine, then we should do this lookup during startup so that we catch it early (and can abort startup).

Comment thread rust/src/storage/store.rs
// Otherwise, we should only find a single row for this (user, feature)
[row] => Some(row.try_get(0)?),
_ => {
panic!("Programming error (SQL query probably doesn't match our expectations)")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not sure panic is the right thing here, especially as this is reliant on external components to be correct, and thus this isn't some local invariant being broken.

Comment thread rust/src/deferred.rs
None,
move |args, _kwargs| -> PyResult<Py<PyAny>> {
let value = args.get_item(0)?.unbind();
if let Some(tx) = success_sender.lock().unwrap().take() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do something with this unwrap, either error or make an .except if we know it can't panic.

channel = self.make_request(
"GET",
"/_matrix/client/versions",
content={},

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
content={},

GET requests don't have request bodies.

Comment thread rust/src/config/mod.rs
}

#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord)]
pub enum RoomCreationPreset {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of the file could all use some docstrings

struct VersionsResponse {
versions: Vec<String>,
/// as per MSC1497
unstable_features: std::collections::BTreeMap<String, bool>,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
unstable_features: std::collections::BTreeMap<String, bool>,
unstable_features: UnstableFeatureMap,

should work fine, no?

Comment thread rust/src/storage/store.rs
MSC3575,
#[serde(rename = "msc4222")]
MSC4222,
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using serde for this, feels cleaner to do:

impl PerUserExperimentalFeature {
    fn as_str(&self) -> &'static str {
        match self {
            Self::MSC3881 => "msc3881",
            Self::MSC3575 => "msc3575",
            Self::MSC4222 => "msc4222",
        }
    }
}

especially since we don't Deserialize

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants