-
Notifications
You must be signed in to change notification settings - Fork 16
Google Cloud Platform monitoring with Zabbix
Many enterprises use Zabbix as a monitoring tool. When moving services to the Google Cloud Platform it is handy to integrate some of the critical metrics of GCP directly into Zabbix. This article describes how to achieve that.
Slides: http://www.slideshare.net/maxkuzkin/google-cloud-platform-monitoring-with-zabbix
Zabbix is an open source distributed monitoring solution, which could at high-level be visualized with the following diagram:
Where:
- Host is any monitored device, where Agent (or SNMP, JMX, IPMI interfaces) are installed.
- Examples: Linux/Windows Server, Router, etc.
- Item is a particular metric configured on a particular Host.
- Examples: system.cpu.load[all,avg5], system.cpu.num[online], script[echo,hello,world], net.tcp.service[http] …
- Key is a type of Item that can be gathered from the Host.
- Examples: system.cpu.load[,], system.cpu.num[], script[,,...], net.tcp.service[,,] …
The simplest way to extend Zabbix is called "External Checks", which is, basically, a script that Zabbix launches and expects a value to be returned in stdout. And example of external check configuration is shown below:
Google Cloud Platform has Stackdriver monitoring service. It provides powerful monitoring, logging, and diagnostics. For the purposes of integration with Zabbix we're mostly interested in its API as shown in the following diagram:
The simplest way to integrate Zabbix with Google Cloud Platform for monitoring purposes could be visualized as:
Where Zabbix external check calls the gcpmetrics utility to gather data returned by Google Cloud Monitoring API.
Follow these steps to configure tracking of HTTP 5xx response statuses (server errors) of Google Cloud Platform Monitoring API in Zabbix.
Default installation of Zabbix is configured to terminate all scripts in 3 seconds. Google Monitoring API may take 5-10+ seconds to respond, depends on the actual query. We suggest to set Zabbix Timeout to 30 seconds
### Option: Timeout
# Specifies how long we wait for agent,
# SNMP device or external check (in seconds).
#
# Mandatory: no
# Range: 1-30
# Default:
# Timeout=3
Timeout=30
$ pip install --upgrade gcpmetrics
[...]
And verify that installation was successful:
$ gcpmetrics --version
Location of the ExternalScripts folder is defined in the configuration file, like:
### Option: ExternalScripts
# Full path to location of external scripts.
# Default depends on compilation options.
#
# Mandatory: no
# Default:
# ExternalScripts=${datadir}/zabbix/externalscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
Switch to that location and initialize gcpmetrics configuration:
$ cd /usr/lib/zabbix/externalscripts
$ gcpmetrics --init-config ./gcp
Creating folder: ./gcp
Creating configuration file: ./gcp/config.yaml
Creating key file: ./gcp/keyfile.json
Configuration created, use --config ./gcp/config.yaml to reference it.
Replace project and service with your Google Cloud Platform identifiers:
keyfile: ./keyfile.json
project: {my-unique-project-id} // REPLACE WITH YOUR ID
service: {default} // REPLACE WITH YOUR ID
[...]
Download your GCP service account keyfile as shown below:
Create 2 scripts in the ExternalScripts folder:
$ cd /usr/lib/zabbix/externalscripts
$ touch ./tm-http5xx-absolute.sh
$ touch ./tm-http5xx-relative.sh
Fill both of them with the following content:
#!/usr/bin/env sh
gcpmetrics --config /usr/lib/zabbix/externalscripts/gcp/config.yaml --preset http_response_5xx_sum
Note: 2 scripts are needed only for learning purposes, to show both Absolute and Delta behavior of Zabbix (because it doesn’t allow 2 items to refer to the same script).
Now create 2 Zabbix items that would correspond to the scripts that we've just created:
With items created you can now visualize them by creating new graph as shown below
and after some period of time (if you service actually generates 5xx statuses in that period of time) you will see something similar to this: