- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.5k
fix(mysql): RFC Handle missing server-side prepared statements #3995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Some cloud offerings like AWS Aurora allow for "zero-downtime" restarts for patches, which preserves existing TCP connections but wipes out a lot of server state, including the statement cache. In that scenario, trying to execute a previously prepared statement causes an error response with ``` HY000 Unknown prepared statement handler (<id>) given to mysql_stmt_precheck ``` which appears to be the only way to detect this scenario. To avoid subsequent errors for the same connection, we can clear the statement cache on the client side, causing all queries to get re-prepared. This is basically what ActiveRecord does,[0] which we can confirm is not vulnerable to the same issue. An even better solution would be to re-prepare and -try the query right there, like ActiveRecord does. [0]: https://github.com/rails/rails/blob/main/activerecord/lib/active_record/connection_adapters/mysql2/database_statements.rb#L66-L78
I think it's best to test this issue through integration tests, but there seems to be no way to clear the server-side prepared statements without access to the raw packet stream, as they're not named and thus cannot be deleted via DEALLOCATE PREPARED. To make the test work, additional interfaces have to be made available, which should not be shipped as part of the library for regular use.
5f8f07a    to
    f1fd80a      
    Compare
  
    | I'm not hostile to this change, but I would consider the wiping of prepared statements to be a huge bug on Aurora's part. We've run into issues with "compatible" providers not handling prepared statements correctly before, e.g. PgBouncer, and in many cases they've actually agreed it was a bug/misfeature and fixed it. We can merge a fix for this, but I'd like to see at least a little pressure to get this fixed upstream. | 
| I agree, it's very surprising behaviour, and there seems to be no way of opting out whatsoever if you're using Aurora. I'm happy to open a support ticket with AWS on our organisation's behalf/complain to our account manager, but I'm afraid we're not significant enough to be in a position to actually exert any meaningful pressure on AWS. Fwiw, in the meantime we're using a cheap query in  | 
| 
 That's fine, it generally takes many people complaining over time. As long as they're made aware of it. It's just become my pet peeve that all these third-party implementations advertise themselves as "compatible" with a database's wire protocol but then break spectacularly when you try to use any feature that isn't part of the basic query flow. | 
| .intersects(Status::SERVER_STATUS_IN_TRANS) | ||
| } | ||
|  | ||
| pub async fn nuke_cached_statements(&mut self) -> Result<(), Error> { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, this shouldn't just be a public method on the connection. Users might confuse it for the method that they're supposed to use to clear the prepared statement cache.
The easiest route would be to just mark this #[doc(hidden)].
| // query response is a meta-packet which may be one of: | ||
| // Ok, Err, ResultSet, or (unhandled) LocalInfileRequest | ||
| let mut packet = self.inner.stream.recv_packet().await?; | ||
| let mut packet = self.inner.stream.recv_packet().await.inspect_err(|_| { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if you could find the exact error code that's returned, that would probably be better. This is a huge pessimization.
Keep in mind we also get errors here for "normal" errors like constraint violations.
This would leak any valid prepared statements on the server side.
| // Ok, Err, ResultSet, or (unhandled) LocalInfileRequest | ||
| let mut packet = self.inner.stream.recv_packet().await?; | ||
| let mut packet = self.inner.stream.recv_packet().await.inspect_err(|_| { | ||
| // if a prepared statement vanished on the server side, we get an error here | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to link to the specific issue for context, if you created one (I didn't look).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is also fine.
Does your PR solve an issue?
First of all, this is not merge-able as-is, it's a request for feedback/advice.
We're using sqlx with AWS Aurora, and have noticed an issue with Aurora's "zero-downtime" restarts, which wipe prepared statements but preserve TCP connections. In this scenario, all existing connections get effectively poisoned without notice, breaking queries until they age out eventually. This has forced us to disable query caching, incurring quite a big performance overhead.
I've had a look at how ActiveRecord, our other main stack handles this, and they wipe the client-side cache and retry if they get back an error. While retrying right away would be nice to rescue the query, getting the connection back into a functional state for future queries is the next best thing, and rather easy.
The main problem is testing the change. I've included a second commit with a working, but bad integration test setup. It's bad because it needs to expose additional public interfaces for the test to enable deleting the server-side prepared statements while leaving the client-side cache intact, something that the library API should not allow for consistency. I feel like an integration test is the better way to test this scenario, but neither sqlx nor MySQL expose a way to set up the required conditions. Specifically, statements prepared via the binary protocol are not named, and thus cannot be deallocated via
DEALLOCATE PREPARED.I would like to invite feedback & advice on:
If this or a similar approach gets accepted, the same change might also be needed for Postgres, I suspect it will exhibit the same issue.
Is this a breaking change?
Assuming all the issues get resolved, no, I don't think it would be breaking.
The worst case outcome I can think of is a false positive match on the error, which would wrongly throw away the statement cache and cause re-preparation. It might be nice to match more closely on the specific error, but given the state of this change, this is a functional PoC.