-
Notifications
You must be signed in to change notification settings - Fork 904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't cache sanitization results for large sql statements #13353
base: main
Are you sure you want to change the base?
Conversation
@@ -24,7 +24,9 @@ default String getDbSystem(REQUEST request) { | |||
|
|||
@Deprecated | |||
@Nullable | |||
String getUser(REQUEST request); | |||
default String getUser(REQUEST request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are removed in the stable semconv we don't need to force users to implement them.
// sanitization result will not be cached for statements larger than the threshold to avoid | ||
// cache growing too large | ||
// https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/13180 | ||
if (statement.length() > LARGE_STATEMENT_THRESHOLD) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was thinking we could hash these larger statements instead of using the whole statement as the key, but that might be more computationally expensive, so this seems reasonable to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually my first attempt was to use hashing. Computing a hash for a very large statement can be more expensive than applying the sanitizer as the sanitizer also applies a size limit. My guess is that many of these super large statements could be dynamically generated so it is likely that the statement is executed only once and would not benefit from caching anyway.
Hopefully resolves #13180
Since we keep the statement as key in the sanitization cache large statements can cause the cache to grow to several hundred mb in size. This PR disables caching for statements larger than 10kb. There isn't any particular reason why 10kb was chosen so feel free to suggest a different size. Besides disabling the cache this PR introduces a thread local context for sharing computed values between span name extract and attribute extractor for sql client calls. This allows us to sanitize each statement only once and reuse the result between span name and attribute extraction.