You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently, this receiver produces only instance level metrics, there are no query-level metrics for users to identify slow, resource-intensive database queries. Identifying such queries helps debug the application or database performance issues. If a user wants to investigate the slow query issue, instance-level metrics are not gonna help a lot, as it does not show how a certain query is processed by the server. In this case, query-level metrics for a query that runs slowly could bring more context and greatly help the process of debugging and optimizing.
For instance, we can obtain physical reads of a single query consumed. This metric indicates disk I/O activity, and having a high number on this may suggest the query consumed too much I/O that the instance can serve, so we know where the bottleneck is and how to fix it.
Describe the solution you'd like
Oracle DB provides insights into query performance through multiple stats table. We propose to use the Oracle table V$SQL to fetch the query-level data. We will make changes to the the Oracle receiver to fetch the data from this table and produce corresponding metrics. Along with the query-level metrics, we also intend to fetch the query text and query execution plan. Query text would be fetched from table V$SQLSTATS, and query execution plan from V$SQL.
Concerns
However, the length of the sanitized query text and the execution plan can be very long, which exceeds the length limit of metric storage. The execution plan has a high cardinality value that does not fit into the metric storage.
To solve this problem, we propose sending query texts and execution plans in logs. So, every query result fetched from the SQL server would populate a metric and a corresponding log.
Describe alternatives you've considered
None
Additional context
We (a Splunk team) are interested in implementing this feature. But, since using logs like this is not what a common receiver would do, we would like to know whether the community can accept this or not before we work on the implementation.
Hello @akshayAppd19, thanks for submitting this issue and for including so much information, it's really helpful!
This issue seems to have two main proposals, the first being adding metrics for SQL queries from the V$SQL and V$SQLSTATS table. I think this is a great idea, happy to have these added!
The second part of this request is adding logs support to help fill out query information that doesn't fit in the metrics. The receiver would be used in two instances in this case, so the logs receiver would be duplicating the queries made by the metrics receiver. One receiver instance can't create metrics and logs at the same time from the same data. My other main concern is that the query and query plan data doesn't change often, so my guess is that it would need to be scraped less frequently than metrics, but frequently enough that changes are caught in a timely manner.
Ideally, there'd be some unique value for a SQL query and plan that could be added to the metric which could be used to effectively communicate which query it is, and which plan. This would greatly simplify the logic and UX of components in the Collector.
This isn't a blocker for me, but I'd want to make sure there's no other column from either table that could be used for a query to effectively map the value to the query in Oracle DB.
Can you confirm there's no HASH or ID value in either table (V$SQL or V$SQLSTATS) that can be used instead of your proposed solution?
Hello @crobert-1 yes we do want to cache query test as it won't change frequently. However, in Oracle, query plan may change even if the text doesn't change. So, we are still exploring if caching query plan would be useful or not.
The SQL_ID attribute would be used to map the metric with a query. We would use SQL_ID to fetch the query text too.
Component(s)
receiver/oracledb
Is your feature request related to a problem? Please describe.
Currently, this receiver produces only instance level metrics, there are no query-level metrics for users to identify slow, resource-intensive database queries. Identifying such queries helps debug the application or database performance issues. If a user wants to investigate the slow query issue, instance-level metrics are not gonna help a lot, as it does not show how a certain query is processed by the server. In this case, query-level metrics for a query that runs slowly could bring more context and greatly help the process of debugging and optimizing.
For instance, we can obtain physical reads of a single query consumed. This metric indicates disk I/O activity, and having a high number on this may suggest the query consumed too much I/O that the instance can serve, so we know where the bottleneck is and how to fix it.
Describe the solution you'd like
Oracle DB provides insights into query performance through multiple stats table. We propose to use the Oracle table V$SQL to fetch the query-level data. We will make changes to the the Oracle receiver to fetch the data from this table and produce corresponding metrics. Along with the query-level metrics, we also intend to fetch the query text and query execution plan. Query text would be fetched from table V$SQLSTATS, and query execution plan from V$SQL.
Concerns
However, the length of the sanitized query text and the execution plan can be very long, which exceeds the length limit of metric storage. The execution plan has a high cardinality value that does not fit into the metric storage.
To solve this problem, we propose sending query texts and execution plans in logs. So, every query result fetched from the SQL server would populate a metric and a corresponding log.
Describe alternatives you've considered
None
Additional context
We (a Splunk team) are interested in implementing this feature. But, since using logs like this is not what a common receiver would do, we would like to know whether the community can accept this or not before we work on the implementation.
Other receivers that have a similar approach:
k8sclusterreceiver: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/40dbf0b5487fd20e2024b66d72cd9945b90eca98/receiver/k8sclusterreceiver
The text was updated successfully, but these errors were encountered: