Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature pulsar monitoring #11339

Merged
merged 22 commits into from
Oct 31, 2023
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/skywalking.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -641,6 +641,8 @@ jobs:
config: test/e2e-v2/cases/kafka/kafka-monitoring/e2e.yaml
- name: MQE Service
config: test/e2e-v2/cases/mqe/e2e.yaml
- name: Pulsar and BookKeeper
config: test/e2e-v2/cases/pulsar/e2e.yaml

- name: UI Menu BanyanDB
config: test/e2e-v2/cases/menu/banyandb/e2e.yaml
Expand Down
1 change: 1 addition & 0 deletions docs/en/changes/changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
* ElasticSearchClient: Add `deleteById` API.
* Fix Custom alarm rules are overwritten by 'resource/alarm-settings.yml'
* Support Kafka Monitoring.
* Support Pulsar server and BookKeeper server Monitoring.
* [Breaking Change] Elasticsearch storage merge all management data indices into one index `management`,
including `ui_template,ui_menu,continuous_profiling_policy`.
* Add a release mechanism for alarm windows when it is expired in case of OOM.
Expand Down
61 changes: 61 additions & 0 deletions docs/en/setup/backend/backend-bookkeeper-monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# BookKeeper monitoring

SkyWalking leverages OpenTelemetry Collector to collect metrics data from the BookKeeper and leverages OpenTelemetry Collector to transfer the metrics to
[OpenTelemetry receiver](opentelemetry-receiver.md) and into the [Meter System](./../../concepts-and-designs/meter.md).
Kafka entity as a `Service` in OAP and on the `Layer: BOOKKEEPER.

## Data flow

1. BookKeeper exposes metrics through Prometheus endpoint.
2. OpenTelemetry Collector fetches metrics from BookKeeper cluster via Prometheus Receiver and pushes metrics to SkyWalking OAP Server via OpenTelemetry gRPC exporter.
3. The SkyWalking OAP Server parses the expression with [MAL](../../concepts-and-designs/mal.md) to
filter/calculate/aggregate and store the results.`

## Setup

1. Set up [BookKeeper Cluster](https://bookkeeper.apache.org/docs/deployment/manual).
2. Set up [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/getting-started/#kubernetes). The example
for OpenTelemetry Collector configuration, refer
to [here](../../../../test/e2e-v2/cases/pulsar/otel-collector-config.yaml).
3. Config SkyWalking [OpenTelemetry receiver](opentelemetry-receiver.md).

## BookKeeper Monitoring

Bookkeeper monitoring provides multidimensional metrics monitoring of BookKeeper cluster as `Layer: BOOKKEEPER` `Service` in
the OAP. In each cluster, the nodes are represented as `Instance`.

### BookKeeper Cluster Supported Metrics

| Monitoring Panel | Metric Name | Description | Data Source |
|--------------------------------|------------------------------------------------------------------|---------------------------------------------------|---------------------|
| Bookie Ledgers Count | meter_bookkeeper_bookie_ledgers_count | The number of the bookie ledgers. | Bookkeeper Cluster |
| Bookie Ledger Writable Dirs | meter_bookkeeper_bookie_ledger_writable_dirs | The number of writable directories in the bookie. | Bookkeeper Cluster |
| Bookie Ledger Dir Usage | meter_bookkeeper_bookie_ledger_dir_data_bookkeeper_ledgers_usage | The number of successfully created connections. | Bookkeeper Cluster |
| Bookie Entries Count | meter_bookkeeper_bookie_entries_count | The number of the bookie write entries. | Bookkeeper Cluster |
| Bookie Write Cache Size | meter_bookkeeper_bookie_write_cache_size | The size of the bookie write cache. | Bookkeeper Cluster |
| Bookie Write Cache Entry Count | meter_bookkeeper_bookie_write_cache_count | The entry count in the bookie write cache. | Bookkeeper Cluster |
| Bookie Read Cache Size | meter_bookkeeper_bookie_read_cache_size | The size of the bookie read cache. | Bookkeeper Cluster |
| Bookie Read Cache Entry Count | meter_bookkeeper_bookie_read_cache_count | The entry count in the bookie read cache. | Bookkeeper Cluster |
| Bookie Read Rate | meter_bookkeeper_bookie_read_rate | The bookie read rate. | Bookkeeper Cluster |
| Bookie Write Rate | meter_bookkeeper_bookie_write_rate | The bookie write rate. | Bookkeeper Cluster |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit should be byte/s ? Please indicate and add on the dashboard.


### BookKeeper Node Supported Metrics

| Monitoring Panel | Metric Name | Description | Data Source |
|-------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|--------------------|
| JVM Memory Pool Used | meter_bookkeeper_node_jvm_memory_pool_used | The usage of the broker jvm memory pool. | Bookkeeper Bookie |
| JVM Memory | meter_bookkeeper_node_jvm_memory_used <br /> meter_bookkeeper_node_jvm_memory_committed <br /> meter_bookkeeper_node_jvm_memory_init | The usage of the broker jvm memory. | Bookkeeper Bookie |
| JVM Threads | meter_bookkeeper_node_jvm_threads_current <br /> meter_bookkeeper_node_jvm_threads_daemon <br /> meter_bookkeeper_node_jvm_threads_peak <br /> meter_bookkeeper_node_jvm_threads_deadlocked | The count of the jvm threads. | Bookkeeper Bookie |
| GC Time | meter_bookkeeper_node_jvm_gc_collection_seconds_sum | Time spent in a given JVM garbage collector in seconds. | Bookkeeper Bookie |
| GC Count | meter_bookkeeper_node_jvm_gc_collection_seconds_count | The count of a given JVM garbage. | Bookkeeper Bookie |
| Thread Executor | meter_bookkeeper_node_thread_executor_completed | The count of the executor thread. | Bookkeeper Bookie |
| Thread Executor Tasks | meter_bookkeeper_node_thread_executor_tasks_completed <br /> meter_bookkeeper_node_thread_executor_tasks_rejected <br /> meter_bookkeeper_node_thread_executor_tasks_failed | The count of the executor tasks. | Bookkeeper Bookie |
| Pooled Threads | meter_bookkeeper_node_high_priority_threads <br /> meter_bookkeeper_node_read_thread_pool_threads | The count of the pooled thread. | Bookkeeper Bookie |
| Pooled Threads Max Queue Size | meter_bookkeeper_node_high_priority_thread_max_queue_size <br /> meter_bookkeeper_node_read_thread_pool_max_queue_size | The count of the pooled threads max queue size. | Bookkeeper Bookie |

## Customizations

You can customize your own metrics/expression/dashboard panel.
The metrics definition and expression rules are found
in `otel-rules/bookkeeper/bookkeeper-cluster.yaml, otel-rules/bookkeeper/bookkeeper-node.yaml`.
The RabbitMQ dashboard panel configurations are found in `/config/ui-initialized-templates/bookkeeper`.
67 changes: 67 additions & 0 deletions docs/en/setup/backend/backend-pulsar-monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Pulsar monitoring

SkyWalking leverages OpenTelemetry Collector to collect metrics data in Prometheus format from the Pulsar and transfer the metrics to
[OpenTelemetry receiver](opentelemetry-receiver.md) and into the [Meter System](./../../concepts-and-designs/meter.md).
Kafka entity as a `Service` in OAP and on the `Layer: PULSAR.

## Data flow

1. Pulsar exposes metrics through Prometheus endpoint.
2. OpenTelemetry Collector fetches metrics from Pulsar cluster via Prometheus Receiver and pushes metrics to SkyWalking OAP Server via OpenTelemetry gRPC exporter.
3. The SkyWalking OAP Server parses the expression with [MAL](../../concepts-and-designs/mal.md) to
filter/calculate/aggregate and store the results.`

## Setup

1. Set up [Pulsar Cluster](https://pulsar.apache.org/docs/3.1.x/getting-started-docker-compose/). (Pulsar cluster includes pulsar broker cluster and Bookkeeper bookie cluster.)
2. Set up [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/getting-started/#kubernetes). The example
for OpenTelemetry Collector configuration, refer
to [here](../../../../test/e2e-v2/cases/pulsar/otel-collector-config.yaml).
3. Config SkyWalking [OpenTelemetry receiver](opentelemetry-receiver.md).

## Pulsar Monitoring

Pulsar monitoring provides multidimensional metrics monitoring of Pulsar cluster as `Layer: PULSAR` `Service` in
the OAP. In each cluster, the nodes are represented as `Instance`.

### Pulsar Cluster Supported Metrics

| Monitoring Panel | Metric Name | Description | Data Source |
|----------------------|--------------------------------------------|--------------------------------------------------------------------------------------------------------|----------------|
| Total Topics | meter_pulsar_total_topics | The number of Pulsar topics in this cluster. | Pulsar Cluster |
| Total Subscriptions | meter_pulsar_total_subscriptions | The number of Pulsar subscriptions in this cluster. | Pulsar Cluster |
| Total Producers | meter_pulsar_total_producers | The number of active producers connected to this cluster. | Pulsar Cluster |
| Total Consumers | meter_pulsar_total_consumers | The number of active consumers connected to this cluster. | Pulsar Cluster |
| Message Rate In | meter_pulsar_message_rate_in | The total message rate coming into this cluster (message per second). | Pulsar Cluster |
| Message Rate Out | meter_pulsar_message_rate_out | The total message rate going out from this cluster (message per second). | Pulsar Cluster |
| Throughput In | meter_pulsar_throughput_in | The total throughput coming into this cluster (byte per second). | Pulsar Cluster |
| Throughput Out | meter_pulsar_throughput_out | The total throughput going out from this cluster (byte per second). | Pulsar Cluster |
| Storage Size | meter_pulsar_storage_size | The total storage size of all topics in this broker (in bytes). | Pulsar Cluster |
| Storage Logical Size | meter_pulsar_storage_logical_size | The storage size of all topics in this broker without replicas (in bytes). | Pulsar Cluster |
| Storage Write Rate | meter_pulsar_storage_write_rate | The total message batches (entries) written to the storage for this broker (message batch per second). | Pulsar Cluster |
| Storage Read Rate | meter_pulsar_storage_read_rate | The total message batches (entries) read from the storage for this broker (message batch per second). | Pulsar Cluster |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The units such as (message per second) you add to the description should be added to the dashboards too.



### Pulsar Node Supported Metrics


| Monitoring Panel | Metric Name | Description | Data Source |
|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|----------------|
| Active Connections | meter_pulsar_broker_active_connections | The number of active connections. | Pulsar Broker |
| Total Connections | meter_pulsar_broker_total_connections | The total number of connections. | Pulsar Broker |
| Connection Create Success Count | meter_pulsar_broker_connection_create_success_count | The number of successfully created connections. | Pulsar Broker |
| Connection Create Fail Count | meter_pulsar_broker_connection_create_fail_count | The number of failed connections. | Pulsar Broker |
| Connection Closed Total Count | meter_pulsar_broker_connection_closed_total_count | The total number of closed connections. | Pulsar Broker |
| JVM Buffer Pool Used | meter_pulsar_broker_jvm_buffer_pool_used_bytes | The usage of jvm buffer pool. | Pulsar Broker |
| JVM Memory Pool Used | meter_pulsar_broker_jvm_memory_pool_used | The usage of jvm memory pool. | Pulsar Broker |
| JVM Memory | meter_pulsar_broker_jvm_memory_init <br /> meter_pulsar_broker_jvm_memory_used <br /> meter_pulsar_broker_jvm_memory_committed | The usage of jvm memory. | Pulsar Broker |
| JVM Threads | meter_pulsar_broker_jvm_threads_current <br /> meter_pulsar_broker_jvm_threads_daemon <br /> meter_pulsar_broker_jvm_threads_peak <br /> meter_pulsar_broker_jvm_threads_deadlocked | The usage of jvm threads. | Pulsar Broker |
| GC Time | meter_pulsar_broker_jvm_gc_collection_seconds_sum | Time spent in a given JVM garbage collector in seconds. | Pulsar Broker |
| GC Count | meter_pulsar_broker_jvm_gc_collection_seconds_count | The count of a given JVM garbage collector. | Pulsar Broker |

## Customizations

You can customize your own metrics/expression/dashboard panel.
The metrics definition and expression rules are found
in `otel-rules/pulsar/pulsar-cluster.yaml, otel-rules/pulsar/pulsar-broker.yaml`.
The RabbitMQ dashboard panel configurations are found in `ui-initialized-templates/pulsar`.
4 changes: 4 additions & 0 deletions docs/menu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -259,12 +259,16 @@ catalog:
path: "/en/setup/backend/backend-elasticsearch-monitoring"
- name: "MongoDB"
path: "/en/setup/backend/backend-mongodb-monitoring"
- name: "BookKeeper"
path: "/en/setup/backend/backend-bookkeeper-monitoring"
- name: "MQ Monitoring"
catalog:
- name: "RabbitMQ"
path: "/en/setup/backend/backend-rabbitmq-monitoring"
- name: "Kafka"
path: "/en/setup/backend/backend-kafka-monitoring"
- name: "Pulsar"
path: "/en/setup/backend/backend-pulsar-monitoring"
- name: "Self Observability"
liangyepianzhou marked this conversation as resolved.
Show resolved Hide resolved
catalog:
- name: "OAP self telemetry"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,19 @@ public enum Layer {
/**
* Kafka is a distributed streaming platform that is used publish and subscribe to streams of records.
*/
KAFKA(31, true);
KAFKA(31, true),

/**
* Pulsar is a distributed pub-sub messaging platform that provides high-performance, durable messaging.
* It is used to publish and subscribe to streams of records.
* Pulsar supports scalable and fault-tolerant messaging, making it suitable for use in distributed systems.
*/
PULSAR(32, true),

/**
* A scalable, fault-tolerant, and low-latency storage service optimized for real-time workloads.
*/
BOOKKEEPER(33, true);
liangyepianzhou marked this conversation as resolved.
Show resolved Hide resolved

private final int value;
/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@ public class UITemplateInitializer {
Layer.RABBITMQ.name(),
Layer.MONGODB.name(),
Layer.KAFKA.name(),
Layer.PULSAR.name(),
Layer.BOOKKEEPER.name(),
"custom"
};
private final UITemplateManagementService uiTemplateManagementService;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,7 @@ receiver-otel:
selector: ${SW_OTEL_RECEIVER:default}
default:
enabledHandlers: ${SW_OTEL_RECEIVER_ENABLED_HANDLERS:"otlp-metrics,otlp-logs"}
enabledOtelMetricsRules: ${SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES:"apisix,k8s/*,istio-controlplane,vm,mysql/*,postgresql/*,oap,aws-eks/*,windows,aws-s3/*,aws-dynamodb/*,aws-gateway/*,redis/*,elasticsearch/*,rabbitmq/*,mongodb/*,kafka/*"}
enabledOtelMetricsRules: ${SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES:"apisix,k8s/*,istio-controlplane,vm,mysql/*,postgresql/*,oap,aws-eks/*,windows,aws-s3/*,aws-dynamodb/*,aws-gateway/*,redis/*,elasticsearch/*,rabbitmq/*,mongodb/*,kafka/*,pulsar/*,bookkeeper/*"}

receiver-zipkin:
selector: ${SW_RECEIVER_ZIPKIN:-}
Expand Down
Loading
Loading