You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement comprehensive monitoring and observability features including Prometheus metrics, health check endpoints, structured logging, and audit logging to ensure operational visibility and system reliability.
Objective
Create a robust monitoring system that provides real-time insights into service health, performance metrics, and security events while maintaining detailed audit trails for compliance and troubleshooting.
Canonical Scope
This document is the canonical source for:
Structured logging approach and helpers
Request logging middleware behavior
Health and readiness endpoints, semantics, and status criteria
Metrics definitions and exposure
For audit storage and retention, see 05 Database Layer. For validation and error schema, see 07 Security & Validation.
Tasks
Prometheus Metrics Implementation
Set up Prometheus metrics in internal/monitoring/metrics.go
Implement request metrics:
var (
RequestsTotal=prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "certificate_api_requests_total",
Help: "Total number of API requests",
},
[]string{"method", "status"},
)
RequestDuration=prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "certificate_api_request_duration_seconds",
Help: "Request duration in seconds",
Buckets: prometheus.DefBuckets,
},
[]string{"method", "quantile"},
)
)
Implement certificate metrics:
ActiveCertificates=prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "certificate_api_active_certificates",
Help: "Number of active certificates",
},
[]string{"ca"},
)
CertificatesExpiringSoon=prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Name: "certificate_api_certificates_expiring_soon",
Help: "Number of certificates expiring soon",
},
[]string{"days", "ca"},
)
CertificatesIssuedTotal=prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "certificate_api_certificates_issued_total",
Help: "Total number of certificates issued",
},
[]string{"profile", "ca"},
)
CertificatesRenewedTotal=prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "certificate_api_certificates_renewed_total",
Help: "Total number of certificates renewed",
},
[]string{"profile", "ca"},
)
Implement system health metrics:
ServiceUp=prometheus.NewGauge(
prometheus.GaugeOpts{
Name: "certificate_api_up",
Help: "Service availability (1 = up, 0 = down)",
},
)
DatabaseConnectionsActive=prometheus.NewGauge(
prometheus.GaugeOpts{
Name: "certificate_api_database_connections_active",
Help: "Number of active database connections",
},
)
PCAAPICalls=prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "certificate_api_pca_api_calls_total",
Help: "Total number of AWS PCA API calls",
},
[]string{"operation", "status"},
)
PCAAPILatency=prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Name: "certificate_api_pca_api_latency_seconds",
Help: "AWS PCA API call latency",
Buckets: prometheus.DefBuckets,
},
[]string{"operation", "quantile"},
)
Implement security metrics:
AuthenticationFailures=prometheus.NewCounterVec(
prometheus.CounterOpts{
Name: "certificate_api_authentication_failures_total",
Help: "Total number of authentication failures",
},
[]string{"reason"},
)
Register all metrics with Prometheus registry
Create metrics endpoint on port 9090 as specified
Health Check Implementation
Implement health check endpoint in internal/monitoring/health.go
Monitoring & Observability
Overview
Implement comprehensive monitoring and observability features including Prometheus metrics, health check endpoints, structured logging, and audit logging to ensure operational visibility and system reliability.
Objective
Create a robust monitoring system that provides real-time insights into service health, performance metrics, and security events while maintaining detailed audit trails for compliance and troubleshooting.
Canonical Scope
Tasks
Prometheus Metrics Implementation
internal/monitoring/metrics.goHealth Check Implementation
internal/monitoring/health.go/health:/readyStructured Logging with slog
sloginternal/logging/logger.go:Audit Logging System
internal/monitoring/audit.goMetrics Collection Jobs
Performance Monitoring
Alerting Configuration
Gin Middleware Integration
Acceptance Criteria
Technical Considerations
Dependencies
slogfor structured loggingTesting Requirements
Definition of Done