Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature pulsar monitoring #11339

Merged
merged 22 commits into from
Oct 31, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/skywalking.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -641,6 +641,9 @@ jobs:
config: test/e2e-v2/cases/kafka/kafka-monitoring/e2e.yaml
- name: MQE Service
config: test/e2e-v2/cases/mqe/e2e.yaml
- name: Pulsar
// TODO: test/e2e-v2/cases/pulsar/e2e.yaml
liangyepianzhou marked this conversation as resolved.
Show resolved Hide resolved
config: test/e2e-v2/cases/pulsar/e2e.yaml

- name: UI Menu BanyanDB
config: test/e2e-v2/cases/menu/banyandb/e2e.yaml
Expand Down
1 change: 1 addition & 0 deletions docs/en/changes/changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
* ElasticSearchClient: Add `deleteById` API.
* Fix Custom alarm rules are overwritten by 'resource/alarm-settings.yml'
* Support Kafka Monitoring.
* Support Pulsar Monitoring.
wu-sheng marked this conversation as resolved.
Show resolved Hide resolved
* [Breaking Change] Elasticsearch storage merge all management data indices into one index `management`,
including `ui_template,ui_menu,continuous_profiling_policy`.

Expand Down
3 changes: 3 additions & 0 deletions docs/menu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,9 @@ catalog:
path: "/en/setup/backend/backend-rabbitmq-monitoring"
- name: "Kafka"
path: "/en/setup/backend/backend-kafka-monitoring"
// TODO: backend-pulsar-monitoring
- name: "Pulsar"
path: "/en/setup/backend/backend-pulsar-monitoring"
- name: "Self Observability"
liangyepianzhou marked this conversation as resolved.
Show resolved Hide resolved
catalog:
- name: "OAP self telemetry"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,14 @@ public enum Layer {
/**
* Kafka is a distributed streaming platform that is used publish and subscribe to streams of records.
*/
KAFKA(31, true);
KAFKA(31, true),

/**
* Pulsar is a distributed pub-sub messaging platform that provides high-performance, durable messaging.
* It is used to publish and subscribe to streams of records.
* Pulsar supports scalable and fault-tolerant messaging, making it suitable for use in distributed systems.
*/
PULSAR(32, true);

private final int value;
/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ public class UITemplateInitializer {
Layer.RABBITMQ.name(),
Layer.MONGODB.name(),
Layer.KAFKA.name(),
Layer.PULSAR.name(),
"custom"
};
private final UITemplateManagementService uiTemplateManagementService;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -341,7 +341,7 @@ receiver-otel:
selector: ${SW_OTEL_RECEIVER:default}
default:
enabledHandlers: ${SW_OTEL_RECEIVER_ENABLED_HANDLERS:"otlp-metrics,otlp-logs"}
enabledOtelMetricsRules: ${SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES:"apisix,k8s/*,istio-controlplane,vm,mysql/*,postgresql/*,oap,aws-eks/*,windows,aws-s3/*,aws-dynamodb/*,aws-gateway/*,redis/*,elasticsearch/*,rabbitmq/*,mongodb/*,kafka/*"}
enabledOtelMetricsRules: ${SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES:"apisix,k8s/*,istio-controlplane,vm,mysql/*,postgresql/*,oap,aws-eks/*,windows,aws-s3/*,aws-dynamodb/*,aws-gateway/*,redis/*,elasticsearch/*,rabbitmq/*,mongodb/*,kafka/*,pulsar/*"}

receiver-zipkin:
selector: ${SW_RECEIVER_ZIPKIN:-}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
filter: "{ tags -> tags.job_name == 'bookkeeper-monitoring' }" # The OpenTelemetry job name
expSuffix: tag({tags -> tags.cluster = 'bookkeeper::' + tags.cluster}).service(['cluster'], Layer.BOOKKEEPER)
metricPrefix: meter_bookkeeper
metricsRules:

- name: server_status
exp: bookie_SERVER_STATUS.sum(['cluster','bookie'])

- name: add_entry_count
exp: bookkeeper_server_ADD_ENTRY_count.sum(['cluster','bookie'])

- name: read_entry_count
exp: bookkeeper_server_READ_ENTRY_count.sum(['cluster','bookie'])

- name: write_bytes
exp: bookie_WRITE_BYTES.sum(['cluster','bookie'])

- name: read_bytes
exp: bookie_READ_BYTES.sum(['cluster','bookie'])

- name: add_entry_request_latency
exp: bookkeeper_server_ADD_ENTRY_REQUEST.sum(['cluster','bookie'])

- name: read_entry_request_latency
exp: bookkeeper_server_READ_ENTRY_REQUEST.sum(['cluster','bookie'])

- name: bookie_read_threadpool_queue
exp: bookkeeper_server_BookieReadThreadPool_queue_{thread_id}.sum(['cluster','bookie'])

- name: bookie_read_threadpool_task_queued_time
exp: bookkeeper_server_BookieReadThreadPool_task_queued.sum(['cluster','bookie'])

- name: bookie_read_threadpool_task_execution_time
exp: bookkeeper_server_BookieReadThreadPool_task_execution.sum(['cluster','bookie'])

- name: journal_sync_count
exp: bookie_journal_JOURNAL_SYNC_count.sum(['cluster','bookie'])

- name: journal_queue_size
exp: bookie_journal_JOURNAL_QUEUE_SIZE.sum(['cluster','bookie'])

- name: journal_force_write_queue_size
exp: bookie_journal_JOURNAL_FORCE_WRITE_QUEUE_SIZE.sum(['cluster','bookie'])

- name: journal_cb_queue_size
exp: bookie_journal_JOURNAL_CB_QUEUE_SIZE.sum(['cluster','bookie'])

- name: journal_add_entry_latency
exp: bookie_journal_JOURNAL_ADD_ENTRY.sum(['cluster','bookie'])

- name: journal_sync_latency
exp: bookie_journal_JOURNAL_SYNC.sum(['cluster','bookie'])

- name: journal_creation_latency
exp: bookie_journal_JOURNAL_CREATION_LATENCY.sum(['cluster','bookie'])

- name: ledgers_count
exp: bookie_ledgers_count.sum(['cluster','bookie'])

- name: entries_count
exp: bookie_entries_count.sum(['cluster','bookie'])

- name: write_cache_size
exp: bookie_write_cache_size.sum(['cluster','bookie'])

- name: read_cache_size
exp: bookie_read_cache_size.sum(['cluster','bookie'])

- name: deleted_ledger_count
exp: bookie_DELETED_LEDGER_COUNT.sum(['cluster','bookie'])

- name: ledger_writable_dirs
exp: bookie_ledger_writable_dirs.sum(['cluster','bookie'])

- name: bookie_flush_latency
exp: bookie_flush.sum(['cluster','bookie'])

- name: throttled_write_requests
exp: bookie_throttled_write_requests.sum(['cluster','bookie'])
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
filter: "{ tags -> tags.job_name == 'pulsar-monitoring' }" # The OpenTelemetry job name
expSuffix: tag({tags -> tags.cluster = 'pulsar::' + tags.cluster}).instance(['cluster'], ['broker'], Layer.PULSAR)
metricPrefix: meter_pulsar_broker

# Metrics Rules
metricsRules:
# Cache Metrics
- name: cache_evictions
exp: pulsar_ml_cache_evictions.sum(['cluster', 'broker'])
- name: cache_inserted_entries
exp: pulsar_ml_cache_inserted_entries_total.sum(['cluster', 'broker'])
- name: cache_evicted_entries
exp: pulsar_ml_cache_evicted_entries_total.sum(['cluster', 'broker'])
- name: cache_entries
exp: pulsar_ml_cache_entries.sum(['cluster', 'broker'])
- name: cache_hits_rate
exp: pulsar_ml_cache_hits_rate.sum(['cluster', 'broker'])
- name: cache_hits_throughput
exp: pulsar_ml_cache_hits_throughput.sum(['cluster', 'broker'])
- name: cache_misses_rate
exp: pulsar_ml_cache_misses_rate.sum(['cluster', 'broker'])
- name: cache_misses_throughput
exp: pulsar_ml_cache_misses_throughput.sum(['cluster', 'broker'])

# Connection Metrics
- name: active_connections
exp: pulsar_active_connections.sum(['cluster', 'broker'])
- name: total_connections
exp: pulsar_connection_created_total_count.sum(['cluster', 'broker'])

# Topic and Subscription Metrics
- name: total_topics
exp: pulsar_broker_topics_count.sum(['cluster', 'broker'])
- name: total_subscriptions
exp: pulsar_broker_subscriptions_count.sum(['cluster', 'broker'])

# Producer and Consumer Metrics
- name: total_producers
exp: pulsar_broker_producers_count.sum(['cluster', 'broker'])
- name: total_consumers
exp: pulsar_broker_consumers_count.sum(['cluster', 'broker'])

# Message Rate and Throughput Metrics
- name: message_rate_in
exp: pulsar_broker_rate_in.sum(['cluster', 'broker'])
- name: message_rate_out
exp: pulsar_broker_rate_out.sum(['cluster', 'broker'])
- name: throughput_in
exp: pulsar_broker_throughput_in.sum(['cluster', 'broker'])
- name: throughput_out
exp: pulsar_broker_throughput_out.sum(['cluster', 'broker'])

# JVM Metrics
- name: jvm_memory_used
exp: jvm_memory_used_bytes.sum(['cluster', 'broker'])
- name: jvm_memory_committed
exp: jvm_memory_committed_bytes.sum(['cluster', 'broker'])
- name: jvm_memory_max
exp: jvm_memory_max_bytes.sum(['cluster', 'broker'])
- name: jvm_gc_collection_seconds
exp: jvm_gc_collection_seconds.sum(['cluster', 'broker']).rate('PT1M')
- name: jvm_threads_current
exp: jvm_threads_current.sum(['cluster', 'broker'])
- name: jvm_threads_daemon
exp: jvm_threads_daemon.sum(['cluster', 'broker'])
- name: jvm_threads_peak
exp: jvm_threads_peak.sum(['cluster', 'broker'])
- name: jvm_threads_started_total
exp: jvm_threads_started_total.sum(['cluster', 'broker']).rate('PT1M')
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@

filter: "{ tags -> tags.job_name == 'pulsar-monitoring' }" # The OpenTelemetry job name
expSuffix: tag({tags -> tags.cluster = 'pulsar::' + tags.cluster}).service(['cluster'], Layer.PULSAR)
metricPrefix: meter_pulsar

# Metrics Rules
metricsRules:
# Connection Metrics
- name: active_connections
exp: pulsar_active_connections.sum(['cluster', 'broker'])
- name: total_connections
exp: pulsar_connection_created_total_count.sum(['cluster', 'broker'])
- name: connection_create_success_count
exp: pulsar_connection_create_success_count.sum(['cluster', 'broker'])
- name: connection_create_fail_count
exp: pulsar_connection_create_fail_count.sum(['cluster', 'broker'])
- name: connection_closed_total_count
exp: pulsar_connection_closed_total_count.sum(['cluster', 'broker'])
- name: throttled_connections
exp: pulsar_broker_throttled_connections.sum(['cluster', 'broker'])
- name: throttled_connections_global_limit
exp: pulsar_broker_throttled_connections_global_limit.sum(['cluster', 'broker'])

# Topic and Subscription Metrics
- name: total_topics
exp: pulsar_broker_topics_count.sum(['cluster', 'broker'])
- name: total_subscriptions
exp: pulsar_broker_subscriptions_count.sum(['cluster', 'broker'])

# Producer and Consumer Metrics
- name: total_producers
exp: pulsar_broker_producers_count.sum(['cluster', 'broker'])
- name: total_consumers
exp: pulsar_broker_consumers_count.sum(['cluster', 'broker'])

# Message Rate and Throughput Metrics
- name: message_rate_in
exp: pulsar_broker_rate_in.sum(['cluster', 'broker'])
- name: message_rate_out
exp: pulsar_broker_rate_out.sum(['cluster', 'broker'])
- name: throughput_in
exp: pulsar_broker_throughput_in.sum(['cluster', 'broker'])
- name: throughput_out
exp: pulsar_broker_throughput_out.sum(['cluster', 'broker'])
Loading
Loading