Skip to content

Commit

Permalink
Feature pulsar monitoring (#11339)
Browse files Browse the repository at this point in the history
  • Loading branch information
liangyepianzhou committed Oct 31, 2023
1 parent 92af797 commit ced3f22
Show file tree
Hide file tree
Showing 29 changed files with 2,708 additions and 2 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/skywalking.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -641,6 +641,8 @@ jobs:
config: test/e2e-v2/cases/kafka/kafka-monitoring/e2e.yaml
- name: MQE Service
config: test/e2e-v2/cases/mqe/e2e.yaml
- name: Pulsar and BookKeeper
config: test/e2e-v2/cases/pulsar/e2e.yaml

- name: UI Menu BanyanDB
config: test/e2e-v2/cases/menu/banyandb/e2e.yaml
Expand Down
1 change: 1 addition & 0 deletions docs/en/changes/changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
* ElasticSearchClient: Add `deleteById` API.
* Fix Custom alarm rules are overwritten by 'resource/alarm-settings.yml'
* Support Kafka Monitoring.
* Support Pulsar server and BookKeeper server Monitoring.
* [Breaking Change] Elasticsearch storage merge all management data indices into one index `management`,
including `ui_template,ui_menu,continuous_profiling_policy`.
* Add a release mechanism for alarm windows when it is expired in case of OOM.
Expand Down
61 changes: 61 additions & 0 deletions docs/en/setup/backend/backend-bookkeeper-monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# BookKeeper monitoring

SkyWalking leverages OpenTelemetry Collector to collect metrics data from the BookKeeper and leverages OpenTelemetry Collector to transfer the metrics to
[OpenTelemetry receiver](opentelemetry-receiver.md) and into the [Meter System](./../../concepts-and-designs/meter.md).
Kafka entity as a `Service` in OAP and on the `Layer: BOOKKEEPER.

## Data flow

1. BookKeeper exposes metrics through Prometheus endpoint.
2. OpenTelemetry Collector fetches metrics from BookKeeper cluster via Prometheus Receiver and pushes metrics to SkyWalking OAP Server via OpenTelemetry gRPC exporter.
3. The SkyWalking OAP Server parses the expression with [MAL](../../concepts-and-designs/mal.md) to
filter/calculate/aggregate and store the results.`

## Setup

1. Set up [BookKeeper Cluster](https://bookkeeper.apache.org/docs/deployment/manual).
2. Set up [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/getting-started/#kubernetes). The example
for OpenTelemetry Collector configuration, refer
to [here](../../../../test/e2e-v2/cases/pulsar/otel-collector-config.yaml).
3. Config SkyWalking [OpenTelemetry receiver](opentelemetry-receiver.md).

## BookKeeper Monitoring

Bookkeeper monitoring provides multidimensional metrics monitoring of BookKeeper cluster as `Layer: BOOKKEEPER` `Service` in
the OAP. In each cluster, the nodes are represented as `Instance`.

### BookKeeper Cluster Supported Metrics

| Monitoring Panel | Metric Name | Description | Data Source |
|--------------------------------|------------------------------------------------------------------|---------------------------------------------------|---------------------|
| Bookie Ledgers Count | meter_bookkeeper_bookie_ledgers_count | The number of the bookie ledgers. | Bookkeeper Cluster |
| Bookie Ledger Writable Dirs | meter_bookkeeper_bookie_ledger_writable_dirs | The number of writable directories in the bookie. | Bookkeeper Cluster |
| Bookie Ledger Dir Usage | meter_bookkeeper_bookie_ledger_dir_data_bookkeeper_ledgers_usage | The number of successfully created connections. | Bookkeeper Cluster |
| Bookie Entries Count | meter_bookkeeper_bookie_entries_count | The number of the bookie write entries. | Bookkeeper Cluster |
| Bookie Write Cache Size | meter_bookkeeper_bookie_write_cache_size | The size of the bookie write cache (MB). | Bookkeeper Cluster |
| Bookie Write Cache Entry Count | meter_bookkeeper_bookie_write_cache_count | The entry count in the bookie write cache. | Bookkeeper Cluster |
| Bookie Read Cache Size | meter_bookkeeper_bookie_read_cache_size | The size of the bookie read cache (MB). | Bookkeeper Cluster |
| Bookie Read Cache Entry Count | meter_bookkeeper_bookie_read_cache_count | The entry count in the bookie read cache. | Bookkeeper Cluster |
| Bookie Read Rate | meter_bookkeeper_bookie_read_rate | The bookie read rate (bytes/s). | Bookkeeper Cluster |
| Bookie Write Rate | meter_bookkeeper_bookie_write_rate | The bookie write rate (bytes/s). | Bookkeeper Cluster |

### BookKeeper Node Supported Metrics

| Monitoring Panel | Metric Name | Description | Data Source |
|-------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|--------------------|
| JVM Memory Pool Used | meter_bookkeeper_node_jvm_memory_pool_used | The usage of the broker jvm memory pool. | Bookkeeper Bookie |
| JVM Memory | meter_bookkeeper_node_jvm_memory_used <br /> meter_bookkeeper_node_jvm_memory_committed <br /> meter_bookkeeper_node_jvm_memory_init | The usage of the broker jvm memory. | Bookkeeper Bookie |
| JVM Threads | meter_bookkeeper_node_jvm_threads_current <br /> meter_bookkeeper_node_jvm_threads_daemon <br /> meter_bookkeeper_node_jvm_threads_peak <br /> meter_bookkeeper_node_jvm_threads_deadlocked | The count of the jvm threads. | Bookkeeper Bookie |
| GC Time | meter_bookkeeper_node_jvm_gc_collection_seconds_sum | Time spent in a given JVM garbage collector in seconds. | Bookkeeper Bookie |
| GC Count | meter_bookkeeper_node_jvm_gc_collection_seconds_count | The count of a given JVM garbage. | Bookkeeper Bookie |
| Thread Executor Completed | meter_bookkeeper_node_thread_executor_completed | The count of the executor thread. | Bookkeeper Bookie |
| Thread Executor Tasks | meter_bookkeeper_node_thread_executor_tasks_completed <br /> meter_bookkeeper_node_thread_executor_tasks_rejected <br /> meter_bookkeeper_node_thread_executor_tasks_failed | The count of the executor tasks. | Bookkeeper Bookie |
| Pooled Threads | meter_bookkeeper_node_high_priority_threads <br /> meter_bookkeeper_node_read_thread_pool_threads | The count of the pooled thread. | Bookkeeper Bookie |
| Pooled Threads Max Queue Size | meter_bookkeeper_node_high_priority_thread_max_queue_size <br /> meter_bookkeeper_node_read_thread_pool_max_queue_size | The count of the pooled threads max queue size. | Bookkeeper Bookie |

## Customizations

You can customize your own metrics/expression/dashboard panel.
The metrics definition and expression rules are found
in `otel-rules/bookkeeper/bookkeeper-cluster.yaml, otel-rules/bookkeeper/bookkeeper-node.yaml`.
The RabbitMQ dashboard panel configurations are found in `/config/ui-initialized-templates/bookkeeper`.
67 changes: 67 additions & 0 deletions docs/en/setup/backend/backend-pulsar-monitoring.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Pulsar monitoring

SkyWalking leverages OpenTelemetry Collector to collect metrics data in Prometheus format from the Pulsar and transfer the metrics to
[OpenTelemetry receiver](opentelemetry-receiver.md) and into the [Meter System](./../../concepts-and-designs/meter.md).
Kafka entity as a `Service` in OAP and on the `Layer: PULSAR.

## Data flow

1. Pulsar exposes metrics through Prometheus endpoint.
2. OpenTelemetry Collector fetches metrics from Pulsar cluster via Prometheus Receiver and pushes metrics to SkyWalking OAP Server via OpenTelemetry gRPC exporter.
3. The SkyWalking OAP Server parses the expression with [MAL](../../concepts-and-designs/mal.md) to
filter/calculate/aggregate and store the results.`

## Setup

1. Set up [Pulsar Cluster](https://pulsar.apache.org/docs/3.1.x/getting-started-docker-compose/). (Pulsar cluster includes pulsar broker cluster and Bookkeeper bookie cluster.)
2. Set up [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/getting-started/#kubernetes). The example
for OpenTelemetry Collector configuration, refer
to [here](../../../../test/e2e-v2/cases/pulsar/otel-collector-config.yaml).
3. Config SkyWalking [OpenTelemetry receiver](opentelemetry-receiver.md).

## Pulsar Monitoring

Pulsar monitoring provides multidimensional metrics monitoring of Pulsar cluster as `Layer: PULSAR` `Service` in
the OAP. In each cluster, the nodes are represented as `Instance`.

### Pulsar Cluster Supported Metrics

| Monitoring Panel | Metric Name | Description | Data Source |
|----------------------|--------------------------------------------|--------------------------------------------------------------------------------------------------------|----------------|
| Total Topics | meter_pulsar_total_topics | The number of Pulsar topics in this cluster. | Pulsar Cluster |
| Total Subscriptions | meter_pulsar_total_subscriptions | The number of Pulsar subscriptions in this cluster. | Pulsar Cluster |
| Total Producers | meter_pulsar_total_producers | The number of active producers connected to this cluster. | Pulsar Cluster |
| Total Consumers | meter_pulsar_total_consumers | The number of active consumers connected to this cluster. | Pulsar Cluster |
| Message Rate In | meter_pulsar_message_rate_in | The total message rate coming into this cluster (message per second). | Pulsar Cluster |
| Message Rate Out | meter_pulsar_message_rate_out | The total message rate going out from this cluster (message per second). | Pulsar Cluster |
| Throughput In | meter_pulsar_throughput_in | The total throughput coming into this cluster (byte per second). | Pulsar Cluster |
| Throughput Out | meter_pulsar_throughput_out | The total throughput going out from this cluster (byte per second). | Pulsar Cluster |
| Storage Size | meter_pulsar_storage_size | The total storage size of all topics in this broker (in bytes). | Pulsar Cluster |
| Storage Logical Size | meter_pulsar_storage_logical_size | The storage size of all topics in this broker without replicas (in bytes). | Pulsar Cluster |
| Storage Write Rate | meter_pulsar_storage_write_rate | The total message batches (entries) written to the storage for this broker (message batch per second). | Pulsar Cluster |
| Storage Read Rate | meter_pulsar_storage_read_rate | The total message batches (entries) read from the storage for this broker (message batch per second). | Pulsar Cluster |


### Pulsar Node Supported Metrics


| Monitoring Panel | Metric Name | Description | Data Source |
|---------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|----------------|
| Active Connections | meter_pulsar_broker_active_connections | The number of active connections. | Pulsar Broker |
| Total Connections | meter_pulsar_broker_total_connections | The total number of connections. | Pulsar Broker |
| Connection Create Success Count | meter_pulsar_broker_connection_create_success_count | The number of successfully created connections. | Pulsar Broker |
| Connection Create Fail Count | meter_pulsar_broker_connection_create_fail_count | The number of failed connections. | Pulsar Broker |
| Connection Closed Total Count | meter_pulsar_broker_connection_closed_total_count | The total number of closed connections. | Pulsar Broker |
| JVM Buffer Pool Used | meter_pulsar_broker_jvm_buffer_pool_used_bytes | The usage of jvm buffer pool. | Pulsar Broker |
| JVM Memory Pool Used | meter_pulsar_broker_jvm_memory_pool_used | The usage of jvm memory pool. | Pulsar Broker |
| JVM Memory | meter_pulsar_broker_jvm_memory_init <br /> meter_pulsar_broker_jvm_memory_used <br /> meter_pulsar_broker_jvm_memory_committed | The usage of jvm memory. | Pulsar Broker |
| JVM Threads | meter_pulsar_broker_jvm_threads_current <br /> meter_pulsar_broker_jvm_threads_daemon <br /> meter_pulsar_broker_jvm_threads_peak <br /> meter_pulsar_broker_jvm_threads_deadlocked | The usage of jvm threads. | Pulsar Broker |
| GC Time | meter_pulsar_broker_jvm_gc_collection_seconds_sum | Time spent in a given JVM garbage collector in seconds. | Pulsar Broker |
| GC Count | meter_pulsar_broker_jvm_gc_collection_seconds_count | The count of a given JVM garbage collector. | Pulsar Broker |

## Customizations

You can customize your own metrics/expression/dashboard panel.
The metrics definition and expression rules are found
in `otel-rules/pulsar/pulsar-cluster.yaml, otel-rules/pulsar/pulsar-broker.yaml`.
The RabbitMQ dashboard panel configurations are found in `ui-initialized-templates/pulsar`.
4 changes: 4 additions & 0 deletions docs/menu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -259,12 +259,16 @@ catalog:
path: "/en/setup/backend/backend-elasticsearch-monitoring"
- name: "MongoDB"
path: "/en/setup/backend/backend-mongodb-monitoring"
- name: "BookKeeper"
path: "/en/setup/backend/backend-bookkeeper-monitoring"
- name: "MQ Monitoring"
catalog:
- name: "RabbitMQ"
path: "/en/setup/backend/backend-rabbitmq-monitoring"
- name: "Kafka"
path: "/en/setup/backend/backend-kafka-monitoring"
- name: "Pulsar"
path: "/en/setup/backend/backend-pulsar-monitoring"
- name: "Self Observability"
catalog:
- name: "OAP self telemetry"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,19 @@ public enum Layer {
/**
* Kafka is a distributed streaming platform that is used publish and subscribe to streams of records.
*/
KAFKA(31, true);
KAFKA(31, true),

/**
* Pulsar is a distributed pub-sub messaging platform that provides high-performance, durable messaging.
* It is used to publish and subscribe to streams of records.
* Pulsar supports scalable and fault-tolerant messaging, making it suitable for use in distributed systems.
*/
PULSAR(32, true),

/**
* A scalable, fault-tolerant, and low-latency storage service optimized for real-time workloads.
*/
BOOKKEEPER(33, true);

private final int value;
/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@ public class UITemplateInitializer {
Layer.RABBITMQ.name(),
Layer.MONGODB.name(),
Layer.KAFKA.name(),
Layer.PULSAR.name(),
Layer.BOOKKEEPER.name(),
"custom"
};
private final UITemplateManagementService uiTemplateManagementService;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -340,7 +340,7 @@ receiver-otel:
selector: ${SW_OTEL_RECEIVER:default}
default:
enabledHandlers: ${SW_OTEL_RECEIVER_ENABLED_HANDLERS:"otlp-metrics,otlp-logs"}
enabledOtelMetricsRules: ${SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES:"apisix,k8s/*,istio-controlplane,vm,mysql/*,postgresql/*,oap,aws-eks/*,windows,aws-s3/*,aws-dynamodb/*,aws-gateway/*,redis/*,elasticsearch/*,rabbitmq/*,mongodb/*,kafka/*"}
enabledOtelMetricsRules: ${SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES:"apisix,k8s/*,istio-controlplane,vm,mysql/*,postgresql/*,oap,aws-eks/*,windows,aws-s3/*,aws-dynamodb/*,aws-gateway/*,redis/*,elasticsearch/*,rabbitmq/*,mongodb/*,kafka/*,pulsar/*,bookkeeper/*"}

receiver-zipkin:
selector: ${SW_RECEIVER_ZIPKIN:-}
Expand Down
Loading

0 comments on commit ced3f22

Please sign in to comment.