Monitoring and alerts #199

DanielABrennand · 2025-10-14T13:46:04Z

Adds two data scrapers (Node_exporter for hardware, cadvisor for docker containers) to monitor the vm. Then adds prometheus to collate this information and pass to grafana where it is graphed in dashboards. Grafana also has alerts setup to inform us if either the api container or pgadmin goes down for more than 3 mins and if the vm is using more than 95% of either its RAM or cpu for over 5 mins.
Note if testing this locally: in the docker-compose file the enviromental variable GF_SERVER_ROOT_URL must be commented out for grafana
Additionally the provisioned contact point for the alerting is a dummy url and in production will need to be changed manually.

DanielABrennand added 21 commits September 2, 2025 14:34

Added monitoring services, prometheus grafana and hardware monitor

b97f532

Added monitoring pathaway

33363c8

Added prometheus config file

d93f416

Added grafana dashboard config file

28ee5d7

Added grafana template for hardware monitoring dashboard

f47bbaf

Added grafana datasource config file

f984a74

Added Cadvisor to compose and prometheus targets

df15ed8

Changed port for cadvisor

22c7746

Added cadvisor to internal docker network

b586832

Changed scraping config for cadvisor

f663ab5

Added provisioned container dashboard

61d7e13

Added uid for prometeus datasource

471b700

Added provisioned alerting rules

b85827e

Updated alerts with correct contact point

303ef87

Added template for contact point provisioning

5e431a4

Updated container rules with correct nodata behaviour

55394ee

Fixed dummy webhook syntax

4a08455

Added env varibale controlled grafana username and password and defaults

b892b1a

Updated CD to now include grafana username and password from secrets

15a3113

Switched docker compose to production version (I.e. GF_SERVER_ROOT)

2e2bead

Merge remote-tracking branch 'origin/develop' into Monitoring

70142cb

DanielABrennand linked an issue Oct 14, 2025 that may be closed by this pull request

Monitoring and alerts #76

Open

NathanCummings requested review from NathanCummings and jameshod5 October 14, 2025 14:55

DanielABrennand added 2 commits November 27, 2025 14:44

Changed grafana provisioning to now be in its own folder under dev

706a683

Fixed Server root for deployment

85f0218

NathanCummings approved these changes Nov 27, 2025

View reviewed changes

NathanCummings merged commit 4b52e39 into develop Nov 27, 2025
2 checks passed

NathanCummings deleted the Monitoring branch November 27, 2025 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitoring and alerts #199

Monitoring and alerts #199

Uh oh!

DanielABrennand commented Oct 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Monitoring and alerts #199

Monitoring and alerts #199

Uh oh!

Conversation

DanielABrennand commented Oct 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants