Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't cache sanitization results for large sql statements #13353

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

laurit
Copy link
Contributor

@laurit laurit commented Feb 19, 2025

Hopefully resolves #13180
Since we keep the statement as key in the sanitization cache large statements can cause the cache to grow to several hundred mb in size. This PR disables caching for statements larger than 10kb. There isn't any particular reason why 10kb was chosen so feel free to suggest a different size. Besides disabling the cache this PR introduces a thread local context for sharing computed values between span name extract and attribute extractor for sql client calls. This allows us to sanitize each statement only once and reuse the result between span name and attribute extraction.

@laurit laurit requested a review from a team as a code owner February 19, 2025 14:57
@@ -24,7 +24,9 @@ default String getDbSystem(REQUEST request) {

@Deprecated
@Nullable
String getUser(REQUEST request);
default String getUser(REQUEST request) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are removed in the stable semconv we don't need to force users to implement them.

// sanitization result will not be cached for statements larger than the threshold to avoid
// cache growing too large
// https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/13180
if (statement.length() > LARGE_STATEMENT_THRESHOLD) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking we could hash these larger statements instead of using the whole statement as the key, but that might be more computationally expensive, so this seems reasonable to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually my first attempt was to use hashing. Computing a hash for a very large statement can be more expensive than applying the sanitizer as the sanitizer also applies a size limit. My guess is that many of these super large statements could be dynamically generated so it is likely that the statement is executed only once and would not benefit from caching anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DB statement sanitization causes memory leaks
2 participants