Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SkyWalking Java Agent self observability dashboard #12622

Merged
merged 6 commits into from
Sep 16, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/en/changes/changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@
* Fix `findEndpoint` query require `keyword` when using BanyanDB.
* Support to analysis the ztunnel mapped IP address in eBPF Access Log Receiver.
* Adapt BanyanDB Java Client 0.7.0-rc3.
* Add SkyWalking Java Agent self observability dashboard.

#### UI

Expand Down
32 changes: 32 additions & 0 deletions docs/en/setup/backend/dashboards-so11y-java-agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Java Agent self observability dashboard

SkyWalking java agent reports itself metrics by Meter APIS in order to measure tracing performance.
it also provides a dashboard to visualize the agent metrics.

## Data flow
1. SkyWalking java agent reports metrics data internally and automatically.
2. SkyWalking OAP accept these meters through native protocols.
3. The SkyWalking OAP Server parses the expression with [MAL](../../concepts-and-designs/mal.md) to filter/calculate/aggregate and store the results.

## Set up
Java Agent so11y is a build-in feature, it reports meters automatically after boot.

## Self observability monitoring
Self observability monitoring provides monitoring of the runtime performance of the java agent itself. `agent.service_name` is a `Service` in Agent so11y, and land on the `Layer: SO11Y_JAVA_AGENT`.

### Self observability metrics

| Unit | Metric Name | Description | Data Source |
|-------------------|----------------------------------------------------------------|---------------------------------------------|-----------------------|
| Count Per Minute | meter_java_agent_created_tracing_context_count | Created Tracing Context Count (Per Minute) | SkyWalking Java Agent |
| Count Per Minute | meter_java_agent_finished_tracing_context_count | Finished Tracing Context Count (Per Minute) | SkyWalking Java Agent |
| Count Per Minute | meter_java_agent_created_ignored_context_count | Created Ignored Context Count (Per Minute) | SkyWalking Java Agent |
| Count Per Minute | meter_java_agent_finished_ignored_context_count | Finished Ignored Context Count (Per Minute) | SkyWalking Java Agent |
| Count Per Minute | meter_java_agent_possible_leaked_context_count | Possible Leak Context Count (Per Minute) | SkyWalking Java Agent |
| Count Per Minute | meter_java_agent_interceptor_error_count | Interceptor Error Count (Per Minute) | SkyWalking Java Agent |
| ns | meter_java_agent_tracing_context_execution_time_percentile | Tracing Context Execution Time (ns) | SkyWalking Java Agent |

## Customizations
You can customize your own metrics/expression/dashboard panel.
The metrics definition and expression rules are found in `/meter-analyzer-config/java-agent.yaml`
The self observability dashboard panel configurations are found in `/config/ui-initialized-templates/so11y_java_agent`.
2 changes: 2 additions & 0 deletions docs/menu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,8 @@ catalog:
path: "/en/setup/backend/dashboards-so11y"
- name: "Satellite self telemetry"
path: "/en/setup/backend/dashboards-so11y-satellite"
- name: "SkyWalking Java Agent self telemetry"
path: "/en/setup/backend/dashboards-so11y-java-agent"
- name: "Configuration Vocabulary"
path: "/en/setup/backend/configuration-vocabulary"
- name: "Advanced Setup"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,13 @@ public enum Layer {
* Cilium is open source software for providing and transparently securing network connectivity and load balancing
* between application workloads such as application containers or processes.
*/
CILIUM_SERVICE(38, true);
CILIUM_SERVICE(38, true),

/**
* The self observability of SkyWalking Java Agent,
* which provides the abilities to measure the tracing performance and error statistics of plugins.
*/
SO11Y_JAVA_AGENT(39, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a UI PR for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How your preview works without UI change? Is that possible?


private final int value;
/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ public class UITemplateInitializer {
Layer.CLICKHOUSE.name(),
Layer.ACTIVEMQ.name(),
Layer.CILIUM_SERVICE.name(),
Layer.SO11Y_JAVA_AGENT.name(),
"custom"
};
private final UITemplateManagementService uiTemplateManagementService;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -269,7 +269,7 @@ agent-analyzer:
# Nginx and Envoy agents can't get the real remote address.
# Exit spans with the component in the list would not generate the client-side instance relation metrics.
noUpstreamRealAddressAgents: ${SW_NO_UPSTREAM_REAL_ADDRESS:6000,9000}
meterAnalyzerActiveFiles: ${SW_METER_ANALYZER_ACTIVE_FILES:datasource,threadpool,satellite,go-runtime,python-runtime,continuous-profiling} # Which files could be meter analyzed, files split by ","
meterAnalyzerActiveFiles: ${SW_METER_ANALYZER_ACTIVE_FILES:datasource,threadpool,satellite,go-runtime,python-runtime,continuous-profiling,java-agent} # Which files could be meter analyzed, files split by ","
slowCacheReadThreshold: ${SW_SLOW_CACHE_SLOW_READ_THRESHOLD:default:20,redis:10} # The slow cache read operation thresholds. Unit ms.
slowCacheWriteThreshold: ${SW_SLOW_CACHE_SLOW_WRITE_THRESHOLD:default:20,redis:10} # The slow cache write operation thresholds. Unit ms.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

expSuffix: instance(['service'], ['instance'], Layer.SO11Y_JAVA_AGENT)
metricPrefix: meter_java_agent
metricsRules:
- name: created_tracing_context_count
exp: created_tracing_context_counter.sum(['created_by', 'service', 'instance']).increase('PT1M')
- name: finished_tracing_context_count
exp: finished_tracing_context_counter.sum(['service', 'instance']).increase('PT1M')
- name: created_ignored_context_count
exp: created_ignored_context_counter.sum(['created_by', 'service', 'instance']).increase('PT1M')
- name: finished_ignored_context_count
exp: finished_ignored_context_counter.sum(['service', 'instance']).increase('PT1M')
- name: possible_leaked_context_count
exp: possible_leaked_context_counter.sum(['source', 'service', 'instance']).increase('PT1M')
- name: interceptor_error_count
exp: interceptor_error_counter.sum(['plugin_name', 'inter_type', 'service', 'instance']).increase('PT1M')
- name: tracing_context_execution_time_percentile
exp: tracing_context_performance.sum(['le', 'service', 'instance']).histogram().histogram_percentile([50,70,90,99])
Original file line number Diff line number Diff line change
Expand Up @@ -247,3 +247,8 @@ menus:
description: "Satellite: an open-source agent designed for the cloud-native infrastructures, which provides a low-cost, high-efficient, and more secure way to collect telemetry data. It is the recommended load balancer for telemetry collecting."
documentLink: https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-load-balancer/
i18nKey: self_observability_satellite
- title: SkyWalking Java Agent
layer: SO11Y_JAVA_AGENT
description: The Java Agent for Apache SkyWalking, which provides the native tracing/metrics/logging/event/profiling abilities for Java projects.
documentLink: https://skywalking.apache.org/docs/main/next/en/setup/backend/dashboards-so11y-java-agent/
i18nKey: self_observability_java_agent
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
[
{
"id": "Self-Observability-Java-Agent-Instance",
"configuration": {
"children": [
{
"x": 0,
"y": 0,
"w": 6,
"h": 13,
"i": "14",
"type": "Widget",
"widget": {
"title": "Tracing Context Creation (Per Minute)",
"tips": "The number of created tracing contexts per minute, including a label created_by(value=sampler,propagated)."
},
"graph": {
"type": "Line",
"step": false,
"smooth": false,
"showSymbol": true,
"showXAxis": true,
"showYAxis": true
},
"expressions": [
"meter_java_agent_created_tracing_context_count"
]
},
{
"x": 6,
"y": 0,
"w": 6,
"h": 13,
"i": "6",
"type": "Widget",
"widget": {
"title": "Tracing Context Creation and Completion (Per Minute)",
"tips": "The number of created tracing contexts and finished tracing contexts per minute."
},
"graph": {
"type": "Line",
"step": false,
"smooth": false,
"showSymbol": true,
"showXAxis": true,
"showYAxis": true
},
"metricConfig": [
{
"label": "Creation"
},
{
"label": "Completion"
}
],
"expressions": [
"aggregate_labels(meter_java_agent_created_tracing_context_count,sum)",
"meter_java_agent_finished_tracing_context_count"
]
},
{
"x": 12,
"y": 0,
"w": 6,
"h": 13,
"i": "1",
"type": "Widget",
"widget": {
"title": "Ignored Context Creation (Per Minute)",
"tips": "The number of created ignored contexts per minute, including a label created_by(value=sampler,propagated)."
},
"graph": {
"type": "Line",
"step": false,
"smooth": false,
"showSymbol": true,
"showXAxis": true,
"showYAxis": true
},
"expressions": [
"meter_java_agent_created_ignored_context_count"
]
},
{
"x": 18,
"y": 0,
"w": 6,
"h": 13,
"i": "2",
"type": "Widget",
"widget": {
"title": "Ignored Context Creation and Completion (Per Minute)",
"tips": "The number of created ignored contexts and finished ignored contexts per minute."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not always per minute. When you pick several hours/days duration, the unit would be hour or day. You could just remove per minute for these tips.

},
"graph": {
"type": "Line",
"step": false,
"smooth": false,
"showSymbol": true,
"showXAxis": true,
"showYAxis": true
},
"expressions": [
"aggregate_labels(meter_java_agent_created_ignored_context_count,sum)",
"meter_java_agent_finished_ignored_context_count"
],
"metricConfig": [
{
"label": "Creation"
},
{
"label": "Completion"
}
]
},
{
"x": 0,
"y": 13,
"w": 6,
"h": 13,
"i": "11",
"type": "Widget",
"widget": {
"title": "Possible Leaked Context (Per Minute)",
"tips": "The number of detected leaked contexts per minute, including a label source(value=tracing, ignore)."
},
"graph": {
"type": "Line",
"step": false,
"smooth": false,
"showSymbol": true,
"showXAxis": true,
"showYAxis": true
},
"expressions": [
"meter_java_agent_possible_leaked_context_count"
],
"metricConfig": [
{
"label": "count"
}
]
},
{
"x": 12,
"y": 13,
"w": 12,
"h": 13,
"i": "8",
"type": "Widget",
"widget": {
"title": "Interceptor Error Count (Per Minute)",
"tips": "The number of errors happened in the interceptor logic per minute, including the label plugin_name and inter_type(constructor, inst, static)."
},
"graph": {
"type": "Line",
"step": false,
"smooth": false,
"showSymbol": true,
"showXAxis": true,
"showYAxis": true
},
"expressions": [
"meter_java_agent_interceptor_error_count"
],
"metricConfig": [
{
"label": "count"
}
]
},
{
"x": 6,
"y": 13,
"w": 6,
"h": 13,
"i": "15",
"type": "Widget",
"graph": {
"type": "Line",
"step": false,
"smooth": false,
"showSymbol": true,
"showXAxis": true,
"showYAxis": true
},
"widget": {
"title": "Tracing Context Execution time (ms)",
"tips": "For successfully finished tracing context, it measures every interceptor's time cost."
},
"expressions": [
"relabels(meter_java_agent_tracing_context_execution_time_percentile,p='50,75,90,95,99',p='50,75,90,95,99')/1000000"
]
}
],
"layer": "SO11Y_JAVA_AGENT",
"entity": "ServiceInstance",
"name": "Self-Observability-Java-Agent-Instance",
"isRoot": false
}
}
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
[
{
"id": "Self-Observability-Java-Agent-Service",
"configuration": {
"children": [
{
"x": 0,
"y": 2,
"w": 24,
"h": 38,
"i": "0",
"type": "Widget",
"graph": {
"type": "InstanceList",
"dashboardName": "Self-Observability-Java-Agent-Instance",
"fontSize": 12
},
"metricConfig": [
{
"label": "Context Creation",
"detailLabel": "context_creation",
"unit": "Per Minute"
},
{
"label": "Context Completion",
"unit": "Per Minute",
"detailLabel": "context_completion"
}
],
"expressions": [
"avg(aggregate_labels(meter_java_agent_created_tracing_context_count,sum)+aggregate_labels(meter_java_agent_created_ignored_context_count,sum))",
"avg(meter_java_agent_finished_tracing_context_count+meter_java_agent_finished_ignored_context_count)"
],
"subExpressions": [
"aggregate_labels(meter_java_agent_created_tracing_context_count,sum)+aggregate_labels(meter_java_agent_created_ignored_context_count,sum)",
"meter_java_agent_finished_tracing_context_count+meter_java_agent_finished_ignored_context_count"
]
},
{
"x": 0,
"y": 0,
"w": 24,
"h": 2,
"i": "100",
"type": "Text",
"graph": {
"fontColor": "theme",
"backgroundColor": "theme",
"content": "The self observability of SkyWalking Java Agent, which provides the abilities to measure the tracing performance and error statistics of plugins.",
"fontSize": 14,
"textAlign": "left",
"url": "https://skywalking.apache.org/docs/main/next/en/setup/backend/dashboards-so11y-java-agent/"
}
}
],
"layer": "SO11Y_JAVA_AGENT",
"entity": "Service",
"name": "Self-Observability-Java-Agent-Service",
"isRoot": true
}
}
]
Loading
Loading