Rust database access via Python database connection pool v2#19878
Rust database access via Python database connection pool v2#19878MadLittleMods wants to merge 10 commits into
Conversation
`/versions` is a good candidate as its is the simplest endpoint that uses the database. We use `tests/rest/client/test_versions.py` as a sanity check that all of this new Rust database access works.
| @@ -0,0 +1 @@ | |||
| Allow Rust code to have database access via Python database connection pool. | |||
There was a problem hiding this comment.
I suggest reviewing this commit by commit.
It first introduces the database interface structure, then Store usage, and then finally refactoring the /versions endpoint to be handled in Rust.
The /versions endpoint is the simplest endpoint I could find that still had some database access. Hopefully the refactor on /versions isn't that controversial as it's not really the point of this PR. We can always remove it from this PR but it's just here as a sanity check that all of this works.
I've looked over and refined all of this but feel free to call out whatever as I might be cargo-culting too much from the LLM splat that this is based on, #19846 (that PR is a combination of LLM splats and manual refinement)
| # XXX: We must create the Rust HTTP client before we call `reactor.run()` below. | ||
| # Twisted's `MemoryReactor` doesn't invoke `callWhenRunning` callbacks if it's | ||
| # already running and we rely on that to start the Tokio thread pool in Rust. In | ||
| # the future, this may not matter, see https://github.com/twisted/twisted/pull/12514 |
There was a problem hiding this comment.
As an update, twisted/twisted#12514 is finally part of a Twisted release 26.4.0 (2026-05-11). If we updated our Twisted version, we could probably get rid of all of this ugliness.
But that may be a few years away given our deprecation policy considers a "no-brainer" upgrade once it's available in both the latest Debian Stable (currently Twisted 24.11.0) and Ubuntu LTS repositories (currently Twisted 25.5.0)
| // We can't check this here because of circular import issues | ||
| // logging_context_module(py)?; |
There was a problem hiding this comment.
Perhaps this will get fixed by #19876 (comment)
| RoomCreationPreset.TRUSTED_PRIVATE_CHAT, | ||
| RoomCreationPreset.PUBLIC_CHAT, | ||
| ] | ||
| } |
There was a problem hiding this comment.
Made these a set to better represent what it is and more efficient lookups. Touched this because I had to model it in SynapseHomeServerConfig on the Rust side
|
|
||
| /// Experimental features the server supports | ||
| #[derive(Serialize, Debug, Clone)] | ||
| pub struct UnstableFeatureMap { |
There was a problem hiding this comment.
For reference, this is a manual translation from synapse/rest/client/versions.py#L105-L213 (not a prompt and pray it's right). I also already asked an LLM to review and spot any mistakes.
| @@ -0,0 +1,343 @@ | |||
| /* | |||
There was a problem hiding this comment.
For reference, you can also see what a Rust tokio-postgres based DatabasePool looks like in #19846
It's completely unscrutinized but the point is just that it's possible and we can care about it once we start using it in synapse-rust-apps.
| // We fire-and-forget using `run_in_background`. Re-using | ||
| // `run_in_background` also makes sure the awaitable gets run with the | ||
| // current logcontext while following the logcontext rules. | ||
| // | ||
| // FIXME: Currently runs in the sentinel logcontext because we don't manage it here |
There was a problem hiding this comment.
You can see how handling logcontext might look like in #19846 but per my explanation in the PR description here, it's being left out in favor of a follow-up PR.
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class VersionsTestCase(unittest.HomeserverTestCase): |
There was a problem hiding this comment.
I'm relying on these tests to sanity check that the whole Rust database access thing works.
Is this good enough? Should I be writing tests on the Rust side? Seems hard/weird as our only DatabasePool implementation requires Synapse to be running. How do we make that happen?
I wish I could write some specific Rust db_pool.run_interaction(...)/txn.query(...) scenarios and test results better but generally I guess this is good enough. We could expose that function and run it from here 🤷
| @@ -301,15 +302,59 @@ def transport(self) -> "FakeChannel": | |||
| def await_result(self, timeout_ms: int = 1000) -> None: | |||
There was a problem hiding this comment.
These changes are based on the same decisions being made in #19871
There was a problem hiding this comment.
This seems to cause our trial tests not to finish in CI 🤔
Even tried these changes by themselves in #19879 which also reproduces.
I'll have to figure it out tomorrow but I think this PR is still good to review otherwise.
| struct RustHandlers { | ||
| versions: Py<versions::VersionsHandler>, | ||
| } |
There was a problem hiding this comment.
Opinions on self.rust_handlers.versions vs exposing handlers individually?
If this is the pattern we like, we should also move rust/src/rendezvous / rust/src/msc4388_rendezvous here as well (follow-up PR).
|
|
||
| #[pyclass] | ||
| pub struct VersionsHandler { | ||
| pub global_unstable_feature_map: Arc<UnstableFeatureMap>, |
There was a problem hiding this comment.
We have the UnstableFeatureMap abstraction separate from SynapseHomeServerConfig as we potentially want to re-use these handlers as part of synapse-rust-apps, Rusty homeserver, etc. We don't want to bog it down with Synapse specific things.
| "v1.10".to_string(), | ||
| "v1.11".to_string(), | ||
| "v1.12".to_string(), | ||
| ]), |
There was a problem hiding this comment.
To make this actually generic (usable outside of Synapse), we probably want to pass this in as well but I've left it for now ⏩
| // ``` | ||
| // db_pool.run_interaction("description", async move |txn| { | ||
| // /* do stuff with txn */ | ||
| // }) | ||
| // ``` |
There was a problem hiding this comment.
This kind of thing may be possible. It seems like sea-orm accomplished this somehow:
- Feature Request: Support for async closures SeaQL/sea-orm#2749 (comment)
- https://blog.weiznich.de/blog/diesel-async-0-9/
In any case, we can address this in a follow-up PR.
| "v1.9".to_string(), | ||
| "v1.10".to_string(), | ||
| "v1.11".to_string(), | ||
| "v1.12".to_string(), |
There was a problem hiding this comment.
Would be easier if these were &'static str instead
| // The clone here isn't the best but better than manually composing things | ||
| ..global_unstable_feature_map.clone() |
There was a problem hiding this comment.
Fun little thing, the compiler can detect that all of the used values are Copy and so global_unstable_feature_map doesn't have anything moved out of it and thus doesn't get moved.
| // The clone here isn't the best but better than manually composing things | |
| ..global_unstable_feature_map.clone() | |
| ..*global_unstable_feature_map |
AIUI this is because .. is basically syntactic sugar for:
let unstable_feature_map = UnstableFeatureMap {
msc3575: msc3575_enabled,
msc3881: msc3881_enabled,
msc2326: global_unstable_feature_map.msc2326,
// etc.
};and so at no point is global_unstable_feature_map consumed.
| } | ||
|
|
||
| impl DatabaseEngine { | ||
| //[inline] |
There was a problem hiding this comment.
| //[inline] |
| // Alternatively, we could just use `futures::executor::block_on` | ||
| // which is probably cleaner but a single-shot poll is more | ||
| // enforcing of the concept we want to represent. | ||
| match poll_once(func(&mut txn)) { |
There was a problem hiding this comment.
I think we can use FutureExt::now_or_never for this? Rather than writing our own?
| // enforcing of the concept we want to represent. | ||
| match poll_once(func(&mut txn)) { | ||
| Poll::Ready(Ok(value)) => { | ||
| *callback_slot.lock().unwrap() = Some(Ok(value)); |
There was a problem hiding this comment.
We should replace .unwrap() with something.
| .expect("`database_engine` type must have a name") | ||
| .to_str() | ||
| .expect("`database_engine` type name must be valid UTF-8") | ||
| .to_owned(); |
There was a problem hiding this comment.
I'm not convinced we should be using panics outside of very specific cases, and can end up with much worse UX depending on the consumer if they're hit. E.g. in synapse-rust-apps this will cause us to abort the HTTP connection rather than return a 500-error I believe?
Plus in this instance we're making things much more verbose, compared with: let name = txn_py.getattr("database_engine")?.get_type().name()?;
I don't really see any upside in using relying on panic, given we're returning a PyErr throughout the call chain anyway.
If we wanted to better catch e.g. people renaming database_engine, then we should do this lookup during startup so that we catch it early (and can abort startup).
| // Otherwise, we should only find a single row for this (user, feature) | ||
| [row] => Some(row.try_get(0)?), | ||
| _ => { | ||
| panic!("Programming error (SQL query probably doesn't match our expectations)") |
There was a problem hiding this comment.
Again, not sure panic is the right thing here, especially as this is reliant on external components to be correct, and thus this isn't some local invariant being broken.
| None, | ||
| move |args, _kwargs| -> PyResult<Py<PyAny>> { | ||
| let value = args.get_item(0)?.unbind(); | ||
| if let Some(tx) = success_sender.lock().unwrap().take() { |
There was a problem hiding this comment.
We should do something with this unwrap, either error or make an .except if we know it can't panic.
| channel = self.make_request( | ||
| "GET", | ||
| "/_matrix/client/versions", | ||
| content={}, |
There was a problem hiding this comment.
| content={}, |
GET requests don't have request bodies.
| } | ||
|
|
||
| #[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord)] | ||
| pub enum RoomCreationPreset { |
There was a problem hiding this comment.
The rest of the file could all use some docstrings
| struct VersionsResponse { | ||
| versions: Vec<String>, | ||
| /// as per MSC1497 | ||
| unstable_features: std::collections::BTreeMap<String, bool>, |
There was a problem hiding this comment.
| unstable_features: std::collections::BTreeMap<String, bool>, | |
| unstable_features: UnstableFeatureMap, |
should work fine, no?
| MSC3575, | ||
| #[serde(rename = "msc4222")] | ||
| MSC4222, | ||
| } |
There was a problem hiding this comment.
Rather than using serde for this, feels cleaner to do:
impl PerUserExperimentalFeature {
fn as_str(&self) -> &'static str {
match self {
Self::MSC3881 => "msc3881",
Self::MSC3575 => "msc3575",
Self::MSC4222 => "msc4222",
}
}
}especially since we don't Deserialize
Based on the explorations done in #19824 and #19846,
Rust database access via Python database connection pool
This is a stepping stone before we can go full Rust everywhere. We're providing a generic interface as we want database access to work in Synapse and
synapse-rust-apps. Insynapse-rust-apps, we will use atokio-postgresbased database connection pool so it's full Rust.We want to avoid the situation where we have two database connection pools (one for Python, one for Rust) as we've run into connection exhaustion problems on Matrix.org before.
Why
runInteraction(...)?Using the same
runInteractionpattern that we already have in Synapse means we can port over existing Synapse code/endpoints without much thought. But this pattern also makes sense because we want1 transactions to have repeatable-read isolation (easy to think about, less foot-guns). Having everything thappen in a function callback means we can do retries for serialization/deadlock errors.As a note, this strategy is less of an impedance mismatch (aligns more closely) with Synapse so the glue code for the
python_db_poolshould also be simpler.How does this interact with logcontext (
LoggingContext)?See docs on log contexts for more background.
We already support normal logging from Rust -> Python with
pyo3-logandlogbut as soon as we pass a thread boundary, everything is logged against thesentinellog context. Normally, we want logs and CPU/DB usage correlated with the request that spawned the work.You can see how I took a stab at fixing this in #19846 by capturing the logcontext in a Tokio task local and re-activating as necessary. For example, in that PR, I reactivated the logcontext in
run_python_awaitable(...)which we use to callrunInteraction(...)from the Rust side which means all of the database usage is correlated with the request as expected. It also means anylog:info!(...)done inrun_interaction(...)is correlated correctly. But there needs to be a better story for when you want to log everywhere else.I haven't explored tracking CPU usage on the Rust side.
I've left all of this out of this PR as I think it will be better to tackle this as a dedicated follow-up. For example, I'm thinking about instead creating a new
LoggingContextwith theparent_contextset to the calling context and try to avoid needing to callset_current_context(...)on the Python side where possible (like tracking CPU).Testing strategy
Added some tests that exercise some
asyncRust handlers for the/versionsendpoint:Real-world:
poetry run synapse_homeserver --config-path homeserver.yamlGET http://localhost:8008/_matrix/client/versionsPull Request Checklist
EventStoretoEventWorkerStore.".code blocks.Footnotes
To note: Ideally, we'd want the least isolation possible but the problem is that there is no tooling to yell at you when your queries/logic is wrong so repeatable-read isolation is a great balance. ↩