Fortify is a production-grade resilience and fault-tolerance library for Go 1.23+. It provides a comprehensive suite of battle-tested patterns including circuit breakers, retries, rate limiting, timeouts, and bulkheads with zero external dependencies for core functionality.
- π Type-Safe: Built with Go 1.23+ generics for compile-time safety
- β‘ High Performance: <1Β΅s overhead with zero allocations in hot paths
- π― Zero Dependencies: Core patterns have no external dependencies
- π Observable: Built-in support for structured logging (slog) and OpenTelemetry
- π Prometheus Metrics: Export metrics for all resilience patterns
- π Framework Integration: First-class support for HTTP and gRPC
- π§© Composable: Fluent API for combining multiple patterns
- π§ͺ Well Tested: >95% test coverage with race detection
- πͺοΈ Chaos Engineering: Built-in testing utilities for resilience validation
- π Performance Testing: Automated regression detection and benchmarking
- π Production Ready: Battle-tested patterns with comprehensive examples
go get github.com/felixgeelhaar/fortifyRequirements: Go 1.23 or higher
package main
import (
"context"
"time"
"github.com/felixgeelhaar/fortify/circuitbreaker"
"github.com/felixgeelhaar/fortify/retry"
)
func main() {
// Create a circuit breaker
cb := circuitbreaker.New[string](circuitbreaker.Config{
MaxRequests: 100,
Interval: time.Second * 10,
ReadyToTrip: func(counts circuitbreaker.Counts) bool {
return counts.ConsecutiveFailures >= 3
},
})
// Create a retry strategy
r := retry.New[string](retry.Config{
MaxAttempts: 3,
InitialDelay: time.Millisecond * 100,
BackoffPolicy: retry.BackoffExponential,
})
// Use them together
result, err := cb.Execute(context.Background(), func(ctx context.Context) (string, error) {
return r.Do(ctx, func(ctx context.Context) (string, error) {
return callExternalService(ctx)
})
})
}Prevents cascading failures by temporarily stopping requests to failing services.
import "github.com/felixgeelhaar/fortify/circuitbreaker"
cb := circuitbreaker.New[Response](circuitbreaker.Config{
MaxRequests: 100,
Interval: time.Second * 60,
Timeout: time.Second * 30, // Half-open timeout
ReadyToTrip: func(counts circuitbreaker.Counts) bool {
failureRatio := float64(counts.TotalFailures) / float64(counts.Requests)
return counts.Requests >= 10 && failureRatio >= 0.5
},
OnStateChange: func(from, to circuitbreaker.State) {
log.Printf("Circuit breaker: %s -> %s", from, to)
},
})
result, err := cb.Execute(ctx, func(ctx context.Context) (Response, error) {
return makeRequest(ctx)
})States: Closed β Open β Half-Open β Closed
Use Cases:
- Protecting against cascading failures
- Preventing resource exhaustion
- Fast failure for unhealthy dependencies
Automatically retries failed operations with configurable backoff strategies.
import "github.com/felixgeelhaar/fortify/retry"
r := retry.New[Response](retry.Config{
MaxAttempts: 5,
InitialDelay: time.Millisecond * 100,
MaxDelay: time.Second * 10,
BackoffPolicy: retry.BackoffExponential,
Multiplier: 2.0,
Jitter: true,
ShouldRetry: func(err error) bool {
return isTransientError(err)
},
})
result, err := r.Do(ctx, func(ctx context.Context) (Response, error) {
return makeRequest(ctx)
})Backoff Policies:
BackoffConstant: Fixed delay between retriesBackoffLinear: Linearly increasing delayBackoffExponential: Exponentially increasing delay
Use Cases:
- Handling transient network failures
- Dealing with rate-limited APIs
- Recovering from temporary service unavailability
Controls the rate of operations using a token bucket algorithm.
import "github.com/felixgeelhaar/fortify/ratelimit"
rl := ratelimit.New(ratelimit.Config{
Rate: 100, // 100 requests
Burst: 200, // burst of 200
Interval: time.Second, // per second
})
// Non-blocking check
if rl.Allow(ctx, "user-123") {
handleRequest()
}
// Blocking wait
if err := rl.Wait(ctx, "user-123"); err == nil {
handleRequest()
}Use Cases:
- Protecting APIs from abuse
- Ensuring fair resource usage
- Implementing user quotas
For distributed systems, use the Redis-backed rate limiter to share rate limits across multiple application instances:
import (
"github.com/redis/go-redis/v9"
redisrl "github.com/felixgeelhaar/fortify/backends/redis"
)
// Create Redis client
client := redis.NewClient(&redis.Options{
Addr: "localhost:6379",
})
// Create distributed rate limiter
rl, _ := redisrl.New(redisrl.Config{
Client: client,
Rate: 100,
Burst: 200,
Interval: time.Second,
})
// Same interface as in-memory - works across all instances!
if rl.Allow(ctx, "user-123") {
handleRequest()
}Features:
- Atomic operations via Lua scripts (no race conditions)
- Supports Redis Cluster and Sentinel
- Same interface as in-memory limiter (drop-in replacement)
- Configurable fail-open or fail-closed behavior
Installation:
go get github.com/felixgeelhaar/fortify/backends/redisSee Redis Backend Documentation and Migration Guide for details.
Enforces time limits on operations with context propagation.
import "github.com/felixgeelhaar/fortify/timeout"
tm := timeout.New[Response](timeout.Config{
DefaultTimeout: time.Second * 30,
OnTimeout: func(duration time.Duration) {
log.Printf("Operation timed out after %v", duration)
},
})
// Use specific timeout
result, err := tm.Execute(ctx, 5*time.Second, func(ctx context.Context) (Response, error) {
return makeRequest(ctx)
})
// Use default timeout
result, err := tm.ExecuteWithDefault(ctx, func(ctx context.Context) (Response, error) {
return makeRequest(ctx)
})Use Cases:
- Enforcing SLA response times
- Preventing resource leaks
- Setting operation deadlines
Limits concurrent operations to prevent resource exhaustion.
import "github.com/felixgeelhaar/fortify/bulkhead"
bh := bulkhead.New[Response](bulkhead.Config{
MaxConcurrent: 10, // Max concurrent operations
MaxQueue: 20, // Max queued operations
QueueTimeout: time.Second * 5, // Queue wait timeout
OnRejected: func() {
log.Println("Request rejected: bulkhead full")
},
})
result, err := bh.Execute(ctx, func(ctx context.Context) (Response, error) {
return makeRequest(ctx)
})
// Get statistics
stats := bh.Stats()
log.Printf("Active: %d, Queued: %d, Rejected: %d",
stats.ActiveRequests, stats.QueuedRequests, stats.RejectedRequests)Use Cases:
- Preventing resource exhaustion
- Isolating critical operations
- Managing concurrent access
Provides graceful degradation with automatic fallback on errors.
import "github.com/felixgeelhaar/fortify/fallback"
fb := fallback.New[Response](fallback.Config{
Primary: func(ctx context.Context) (Response, error) {
return primaryService.Call(ctx)
},
Fallback: func(ctx context.Context, err error) (Response, error) {
log.Printf("Primary failed: %v, using fallback", err)
return fallbackService.Call(ctx)
},
ShouldFallback: func(err error) bool {
return isServiceError(err) // Only fallback on service errors
},
OnFallback: func(err error) {
metrics.IncFallbackCount()
},
})
result, err := fb.Execute(ctx)Use Cases:
- Graceful service degradation
- Multi-tier service architectures
- Cache fallback strategies
Combine multiple patterns into a single execution chain:
import "github.com/felixgeelhaar/fortify/middleware"
chain := middleware.New[Response]().
WithBulkhead(bh).
WithRateLimit(rl, "user-key").
WithTimeout(tm, 5*time.Second).
WithCircuitBreaker(cb).
WithRetry(r)
result, err := chain.Execute(ctx, func(ctx context.Context) (Response, error) {
return makeRequest(ctx)
})Order matters:
- Bulkhead - Limit concurrency first
- Rate Limit - Check quotas
- Timeout - Enforce time limits
- Circuit Breaker - Check service health
- Retry - Retry on failures
Integrate resilience patterns with standard http.Handler:
import (
"net/http"
fortifyhttp "github.com/felixgeelhaar/fortify/http"
)
// Create patterns
cb := circuitbreaker.New[*http.Response](/* config */)
rl := ratelimit.New(/* config */)
tm := timeout.New[*http.Response](/* config */)
// Apply middleware
handler := fortifyhttp.RateLimit(rl, fortifyhttp.KeyFromIP)(
fortifyhttp.Timeout(tm, 5*time.Second)(
fortifyhttp.CircuitBreaker(cb)(
http.HandlerFunc(myHandler),
),
),
)
http.Handle("/api", handler)Key Extractors:
KeyFromIP- Extract client IPKeyFromHeader(name)- Extract from HTTP header
Status Codes:
503 Service Unavailable- Circuit breaker open429 Too Many Requests- Rate limit exceeded504 Gateway Timeout- Request timeout
Integrate with gRPC services:
import (
fortifygrpc "github.com/felixgeelhaar/fortify/grpc"
"google.golang.org/grpc"
)
// Unary interceptors
server := grpc.NewServer(
grpc.UnaryInterceptor(
fortifygrpc.UnaryCircuitBreakerInterceptor(cb),
),
grpc.StreamInterceptor(
fortifygrpc.StreamRateLimitInterceptor(rl,
fortifygrpc.StreamKeyFromMetadata("x-api-key")),
),
)Interceptors:
UnaryCircuitBreakerInterceptorUnaryRateLimitInterceptorUnaryTimeoutInterceptorStreamCircuitBreakerInterceptorStreamRateLimitInterceptorStreamTimeoutInterceptor
import (
"log/slog"
fortifyslog "github.com/felixgeelhaar/fortify/slog"
)
logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))
fortifyslog.LogPatternEvent(logger, fortifyslog.PatternCircuitBreaker, "state_change",
slog.String("from", "closed"),
slog.String("to", "open"),
)import (
fortifyotel "github.com/felixgeelhaar/fortify/otel"
"go.opentelemetry.io/otel/sdk/trace"
)
provider := trace.NewTracerProvider(/* config */)
tracer := fortifyotel.NewTracer(provider, "my-service")
ctx, span := tracer.StartSpan(ctx, fortifyotel.PatternCircuitBreaker, "execute")
defer span.End()
tracer.SetAttributes(span,
attribute.Int("requests", 100),
attribute.String("state", "closed"),
)Export detailed metrics for all resilience patterns:
import (
"github.com/felixgeelhaar/fortify/metrics"
"github.com/prometheus/client_golang/prometheus"
)
// Register Fortify metrics with Prometheus
metrics.MustRegister(prometheus.DefaultRegisterer)
// Use the default collector
collector := metrics.DefaultCollector()
// Record circuit breaker metrics
collector.RecordCircuitBreakerRequest("api-client", "closed")
collector.RecordCircuitBreakerSuccess("api-client")
// Record retry metrics
collector.RecordRetryAttempts("database-query", 2)
collector.RecordRetrySuccess("database-query")Available Metrics:
- Circuit Breaker: state, requests, successes, failures, state changes
- Retry: attempts, duration, successes, failures
- Rate Limit: allowed/denied requests, wait times
- Timeout: executions, exceeded, durations
- Bulkhead: active/queued requests, rejections, durations
Fortify is designed for production use with minimal overhead:
| Pattern | Overhead | Allocations |
|---|---|---|
| Circuit Breaker | ~30ns | 0 |
| Retry | ~25ns | 0 |
| Rate Limiter | ~45ns | 0 |
| Timeout | ~50ns | 0 |
| Bulkhead | ~39ns | 0 |
Benchmarks on Apple M1, Go 1.23
Comprehensive examples are available in the examples/ directory:
-
Basic: Individual pattern usage
-
HTTP: Web server integration
-
Composition: Advanced patterns
Run examples:
go run examples/basic/circuit_breaker.go
go run examples/http/server.go
go run examples/composition/chain.go- Circuit Breaker: Use for external dependencies that can fail
- Retry: Use for transient failures (network issues, rate limits)
- Rate Limiter: Use to protect your API from overload
- Timeout: Use to enforce SLAs and prevent resource leaks
- Bulkhead: Use to isolate critical operations
- Circuit Breaker: Tune
ReadyToTripbased on your error budget - Retry: Use exponential backoff with jitter for distributed systems
- Rate Limiter: Set burst capacity for handling traffic spikes
- Timeout: Set timeouts based on p99 latency + buffer
- Bulkhead: Size based on available resources and expected load
Recommended order for combining patterns:
- Bulkhead - Limit concurrency to prevent resource exhaustion
- Rate Limit - Check quotas before processing
- Timeout - Set operation deadline
- Circuit Breaker - Check service health
- Retry - Handle transient failures
- Always configure
OnStateChange,OnRetry,OnTimeout, andOnRejectedcallbacks - Use structured logging for better debugging
- Integrate OpenTelemetry for distributed tracing
- Monitor pattern metrics in production
Run tests with race detection:
# All tests
go test -v -race ./...
# Specific package
go test -v -race ./circuitbreaker
# With coverage
go test -v -race -coverprofile=coverage.out ./...
go tool cover -html=coverage.outTest resilience with built-in chaos utilities:
import fortifytesting "github.com/felixgeelhaar/fortify/testing"
// Inject errors with configurable probability
injector := fortifytesting.NewErrorInjector(0.3, errors.New("service unavailable"))
// Add network latency
latency := fortifytesting.NewLatencyInjector(10*time.Millisecond, 50*time.Millisecond)
// Simulate timeouts
timeout := fortifytesting.NewTimeoutSimulator(100*time.Millisecond, 0.5)
// Create flakey service combining all
service := fortifytesting.NewFlakeyService(0.3, 10*time.Millisecond, 30*time.Millisecond)Chaos Utilities:
ErrorInjector: Simulate failures with probabilityLatencyInjector: Add realistic network delaysTimeoutSimulator: Create timeout scenariosFlakeyService: Combine errors, latency, and timeouts
Automated benchmark tracking and regression detection:
# Run benchmarks with automation
./scripts/benchmark.sh run
# Generate performance baseline
./scripts/benchmark.sh generate-baseline
# Check for regressions
./scripts/benchmark.sh check
# Complete workflow
./scripts/benchmark.sh allFeatures:
- Automatic regression detection (time, allocations, memory)
- Configurable thresholds (10% time, 20% allocs, 15% memory)
- Historical tracking with JSON storage
- CI/CD integration with GitHub Actions
- Detailed performance reports
See Performance Testing Guide for details.
Run benchmarks:
go test -bench=. -benchmem ./...Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Write tests for new functionality
- Ensure all tests pass with race detection
- Submit a pull request
MIT License - see LICENSE file for details.
Fortify is inspired by resilience libraries from other ecosystems:
- Hystrix (Java/Netflix)
- resilience4j (Java)
- Polly (.NET)
- π Documentation
- π Issue Tracker
- π¬ Discussions
Built with β€οΈ by Felix Geelhaar
