A highly scalable framework for the performance and energy monitoring of HPC servers
This setup will install all server-side components of the ExaMon framework:
- MQTT broker and Db connector
- Grafana
- KairosDB
- Cassandra
- Example plugins
This Examon installation includes the following plugins:
random_pub
Please note: the random_pub plugin is used to test the system and it will publish random metrics. It can be disabled as described in the Enable/disable plugins section.
Since Cassandra is the component that requires the majority of resources, you can find more details about the suggested hardware configuration of the system that will host the services here:
To install all the services needed by ExaMon we will use Docker and Docker Compose:
Install Docker and Docker Compose.
First you will need to clone the Git repository:
git clone https://github.com/ExamonHPC/examon.git
Once you have the above setup, you need to create the Docker services:
docker compose up -d
This will build the Docker images and fetch some prebuilt images and then start the services. You can refer to the docker-compose.yml
file to see the full configuration.
Log in to the Grafana server using your browser and the default credentials:
NOTE: This installation sets the default password to GF_SECURITY_ADMIN_PASSWORD
in the docker-compose.yml
file.
Follow the normal procedure for adding a new data source:
From the Grafana UI, add a new data source and select KairosDB
.
Fill out the form with the following settings:
- Name:
kairosdb
- Url: http://kairosdb:8083
- Access:
Server
To import the dashboards stored in the dashboards/
folder:
To test the installation, you can import the Examon Test - Random Sensor.json
dashboard.
Installing the ExaMon plugins requires the configuration of each individual component.
It is necessary to define all the properties of the .conf
configuration file of the plugins
with the appropriate values related to the server hosting the framework. In particular, it is necessary
to define the IP addresses and ports of the server where the KairosDB and/or MQTT broker services run,
as well as their credentials.
The configuration files to be edited are located in the respective plugin folders contained in the
following folders:
Plugin | Path |
---|---|
random_pub | /publishers/random_pub |
Please refer to the respective plugin readme file (Configuration section) for further details.
The plugins are managed by supervisord, which is the microservices manager for the examon container.
The majority of the commands follow the supervisorctl syntax:
supervisorctl <command> <plugin-name>
The most used commands are:
start
stop
restart
status
tail
to see the full list of commands, you can use the following command:
docker exec -it <examon-container-name> supervisorctl help
To start the plugins, you need to run the following command:
docker exec -it <examon-container-name> supervisorctl start <plugin-name>
Example:
docker exec -it examon supervisorctl start plugins:random_pub
Or, if you want to start all the plugins, you can use the following command:
docker exec -it <examon-container-name> supervisorctl start plugins:*
As an alternative, you can open the supervisor shell to manage the plugins and start/stop them individually:
docker exec -it <examon-container-name> supervisorctl
To check the logs of the plugins, you can use the following command:
docker exec -it <examon-container-name> supervisorctl tail [-f] <plugin-name>
Some plugins may be disabled by default and need to be started manually each time the examon container is started.
To enable and start the plugins automatically, you need to edit the supervisor configuration file for the examon service.
docker exec -it <examon-container-name> bash
vi /etc/supervisor/conf.d/supervisor.conf
Then, for each plugin, set the following parameters to true:
autostart=True
Restart the examon container to apply the changes:
docker restart <examon-container-name>
Please note that the supervisor configuration will be lost in case the container is recreated.
To make the settings persistent, you need to edit the supervisor configuration file in docker/examon/supervisor.conf
and rebuild.
The Examon server must be enabled in the supervisor configuration file and configured to use the Examon REST API.
Please refer to the README.rst
file in the web/examon-server
folder for more information.
NOTE: The Cassandra related settings must be the same as the ones used in the Slurm publisher in the Cassandra section.
During the installation, two Docker volumes are created, which are required for data persistence.
$ docker volume ls
DRIVER VOLUME NAME
local examon_cassandra_volume
local examon_grafana_volume
- The
examon_cassandra_volume
is used to store the collected metrics - The
examon_grafana_volume
is used to store Grafana:- users account data
- dashboards
To set a custom volume path, you can use the following settings in the docker-compose.yml
file:
volumes:
cassandra_volume:
driver: local
driver_opts:
type: none
device: /path/to/cassandra/volume
o: bind
grafana_volume:
driver: local
driver_opts:
type: none
device: /path/to/grafana/volume
o: bind