Skip to content
This repository has been archived by the owner on Dec 5, 2024. It is now read-only.

Google Cloud Platform monitoring with Zabbix

Max Kuzkin edited this page Sep 2, 2016 · 17 revisions

Overview

Many enterprises use Zabbix as a monitoring tool. When moving services to the Google Cloud Platform it is handy to integrate some of the critical metrics of GCP directly into Zabbix. This article describes how to achieve that.

Slides: http://www.slideshare.net/maxkuzkin/google-cloud-platform-monitoring-with-zabbix

Zabbix

Zabbix is an open source distributed monitoring solution, which could at high-level be visualized with the following diagram:

Where:

  1. Host is any monitored device, where Agent (or SNMP, JMX, IPMI interfaces) are installed.
  • Examples: Linux/Windows Server, Router, etc.
  1. Item is a particular metric configured on a particular Host.
  • Examples: system.cpu.load[all,avg5], system.cpu.num[online], script[echo,hello,world], net.tcp.service[http] …
  1. Key is a type of Item that can be gathered from the Host.
  • Examples: system.cpu.load[,], system.cpu.num[], script[,,...], net.tcp.service[,,] …

Zabbix Extensibility

The simplest way to extend Zabbix is called "External Checks", which is, basically, a script that Zabbix launches and expects a value to be returned in stdout. And example of external check configuration is shown below:

Google Cloud Monitoring

Google Cloud Platform has Stackdriver monitoring service. It provides powerful monitoring, logging, and diagnostics. For the purposes of integration with Zabbix we're mostly interested in its API as shown in the following diagram:

Zabbix + Google Cloud Platform

The simplest way to integrate Zabbix with Google Cloud Platform for monitoring purposes could be visualized as:

Where Zabbix external check calls the gcpmetrics utility to gather data returned by Google Cloud Monitoring API.

Configuration Guide

Follow these steps to configure tracking of HTTP 5xx response statuses (server errors) of Google Cloud Platform Monitoring API in Zabbix.

1. Configure Timeouts of Zabbix

Default installation of Zabbix is configured to terminate all scripts in 3 seconds. Google Monitoring API may take 5-10+ seconds to respond, depends on the actual query. We suggest to set Zabbix Timeout to 30 seconds

### Option: Timeout
#       Specifies how long we wait for agent,
#       SNMP device or external check (in seconds).
#
# Mandatory: no
# Range: 1-30
# Default:
# Timeout=3

Timeout=30

2. Install gcpmetrics

$ pip install --upgrade gcpmetrics
[...]

And verify that installation was successful:

$ gcpmetrics --version

3. Configure ExternalScripts

Location of the ExternalScripts folder is defined in the configuration file, like:

### Option: ExternalScripts
#       Full path to location of external scripts.
#       Default depends on compilation options.
#
# Mandatory: no
# Default:
# ExternalScripts=${datadir}/zabbix/externalscripts

ExternalScripts=/usr/lib/zabbix/externalscripts

Switch to that location and initialize gcpmetrics configuration:

$ cd /usr/lib/zabbix/externalscripts
$ gcpmetrics --init-config ./gcp
Creating folder: ./gcp
Creating configuration file: ./gcp/config.yaml
Creating key file: ./gcp/keyfile.json
Configuration created, use --config ./gcp/config.yaml to reference it.

4. Edit config.yaml and keyfile.json

Replace project and service with your Google Cloud Platform identifiers:

keyfile: ./keyfile.json
project: {my-unique-project-id} // REPLACE WITH YOUR ID
service: {default} // REPLACE WITH YOUR ID

[...]

5. Update keyfile.json

Download your GCP service account keyfile as shown below:

6. Create scripts

Create 2 scripts in the ExternalScripts folder:

$ cd /usr/lib/zabbix/externalscripts
$ touch ./tm-http5xx-absolute.sh
$ touch ./tm-http5xx-relative.sh

Fill both of them with the following content:

#!/usr/bin/env sh
gcpmetrics --config /usr/lib/zabbix/externalscripts/gcp/config.yaml --preset http_response_5xx_sum

Note: 2 scripts are needed only for learning purposes, to show both Absolute and Delta behavior of Zabbix (because it doesn’t allow 2 items to refer to the same script).

7. Create items

Now create 2 Zabbix items that would correspond to the scripts that we've just created:

8. Create graph

With items created you can now visualize them by creating new graph as shown below

and after some period of time (if you service actually generates 5xx statuses in that period of time) you will see something similar to this: