A PayPal/Stripe-style payment platform built to practice and showcase system design concepts from my internship experience. Money movement stays strongly consistent in PostgreSQL with everything async (fraud, notifications, analytics) goes through Kafka so checkout stays fast.
Features
- One-time payments across currencies with live FX conversion
- Recurring billing (subscriptions) triggered by Kubernetes CronJob
- Refunds
- JWT auth for merchants and consumers
- Async fraud scoring with post-capture reversal
- Append-only payment status audit trail on every state change
Deferred (not implemented but mocked)
- KYC, RBAC, MFA, disputes, real Visa/Mastercard processors — hardcoded JWT roles and a mock gateway stand in for now
Realistic scale targets
- 1,000+ payments per second (designed for; load-tested locally at ~100–500 TPS with k6)
- Multi-currency — USD, EUR, etc. via fx-service + Redis cache
- 99.9%+ availability target via multi-AZ EKS, RDS, MSK in prod
| Layer | Tech |
|---|---|
| Frontend | React, Tailwind, Vite |
| Backend | Spring Boot 3, Maven, Hibernate / Spring Data JPA |
| Transactional DB | PostgreSQL (schema-per-service locally, Aurora in prod) |
| NoSQL | MongoDB (sessions, fraud signals, notification logs) |
| Cache | Redis (idempotency, FX rates, balance cache) |
| Queue / streams | Kafka + Kafka Streams (MSK in prod) |
| Batch | Spring Batch (FX rate import, reconciliation) |
| Orchestration | Kubernetes — Minikube locally, EKS in prod (Helm charts) |
| Backend patterns | Controller → Service → Repository; saga orchestrator in payment-service |
| Resilience | Resilience4j circuit breaker on mock gateway |
| Metrics / traces | Prometheus, Grafana, Jaeger (local); Datadog (prod) |
| Tests | JUnit 5, Cucumber BDD, Testcontainers |
| CI | GitHub Actions → ECR → EKS |
Browser → ingress (payment.local)
↓
user · ledger · payment · transaction · fx · fraud · gateway-mock
notification · recurring · stream-processing · batch-jobs
↓
PostgreSQL (payments, ledger, accounts, subscriptions — ACID)
MongoDB (sessions, fraud docs, notification log)
Redis (idempotency, FX cache, hot balances)
Kafka (payment.events, payment.captured, fraud.*, recurring.*)
↑
Docker Compose on host (Postgres, Mongo, Redis, Kafka)
Pods reach host via host.minikube.internal
Postgres, MongoDB, Redis, and Kafka run in Docker Compose on the host — not inside Minikube pods. Databases need stable disks; pods are ephemeral. Minikube runs the 11 Spring Boot services, ingress, billing CronJob, and observability stack. Same Helm chart deploys to EKS in prod (RDS, DocumentDB/Atlas, ElastiCache, MSK replace Compose).
Without K8s: scripts/start-local.sh runs JARs directly on localhost ports for fast iteration.
The hot path does not wait for fraud. Fraud runs async after capture; if it fails, we reverse.
- Client
POST /api/paymentswithIdempotency-Keyheader. - payment-service checks idempotency (Redis lock + Postgres unique key) and creates
PENDING. - fx-service returns rate if settlement currency differs (cached in Redis).
- account-ledger-service reserves funds (
SELECT FOR UPDATEon account row). - payment-gateway-mock authorizes card (Resilience4j circuit breaker;
tok_decline/tok_timeoutfor failure testing). - Ledger captures: consumer pending → merchant available.
- Append row to
payment_status_history; write outbox event in same DB transaction. - Return
201 CAPTUREDto client — fast checkout. - Outbox relay publishes
payment.capturedto Kafka. - stream-processing-service counts velocity per user in a 5-min window →
fraud.enriched.events. - fraud-service scores rules (amount > 10k, velocity > 5). Publishes
payment.fraud_rejectedorpayment.fraud_cleared. - On reject → payment-service runs compensation saga (reverse ledger) → status
REVERSED.
Why async fraud?
Sync fraud adds 50–300 ms on every checkout. At high TPS that kills p99 latency. Trade-off: there's a brief window where a bad payment is captured before reversal — real systems mitigate with holds, amount limits, or sync fraud for high-risk merchants only.
What's the Idempotency-Key header?
Double-submit or network retry without it could charge twice.
- First request with key
pay-001→ process payment, store key in Postgres (UNIQUE) + Redis. - Same key again → return existing payment, don't charge again.
- Redis lock prevents two in-flight requests with the same key.
POST /api/payments/{id}/refundonly if status isCAPTUREDorFRAUD_CLEARED.- Ledger posts balanced debit/credit: merchant → consumer.
- Status →
REFUNDED; outbox → Kafka for transaction-service and notifications.
- K8s CronJob (every 5 min locally, hourly in prod) POSTs
/internal/billing/run-due. - recurring-service queries Postgres for subscriptions where
next_billing_at <= now. - Publishes
payment.recurring.chargeper subscription (idempotency key =subscriptionId + billingPeriod). - payment-service consumes and runs the same payment saga as one-time pay.
Spring Batch in batch-jobs-service handles bulk FX import (POST /api/batch/fx-import), not the hot payment path.
Every money movement = balanced debit + credit rows in ledger_entries. Account balances updated in the same transaction with pessimistic row locks. PostgreSQL is the only source of truth for money — never write balances to MongoDB.
HTTP request
↓
Controller — routing, validation (@RestController)
↓
Service — saga steps, idempotency, status machine
↓
Repository — JPA for Postgres; MongoRepository for logs/sessions
↓
Postgres / MongoDB / Redis / Kafka
- Microservices — bounded contexts per service; shared
payment-commonfor outbox, idempotency, events - Kubernetes — 11 services as deployments; CronJob for billing; HPA-ready; ingress routes traffic. Data stays outside cluster (Compose locally, managed AWS services in prod)
- Saga orchestration — payment-service coordinates reserve → gateway → capture; compensation on failure or fraud reject
- Event-driven audit — append-only
payment_status_history+ Kafkapayment.eventsfor replay - Kafka Streams — velocity windows enrich fraud decisions without blocking checkout
- Circuit breaker — gateway-mock wraps external auth calls (simulates bank/card network flakiness)
- Idempotency — header + Redis + Postgres unique constraint
- Exponential Backoff - Retries are handled gracefully via a backoff and circuit breaker pattern
- Observability — Prometheus/Grafana/Jaeger locally; Datadog APM + logs in prod
- BDD tests — Cucumber features for payment, refund, fraud reversal, recurring
| Topic | Purpose |
|---|---|
payment.events |
Domain events → transaction-service, notifications |
payment.captured |
Async fraud entry point |
payment.fraud_rejected |
Triggers reversal saga |
payment.fraud_cleared |
Fraud passed → notify |
fraud.enriched.events |
Velocity-enriched payload from Kafka Streams |
payment.recurring.charge |
Billing cycle charges |
payment.reversed / payment.refunded |
Compensation / refund events |
notification.commands |
Email/webhook mock |
| Key | Purpose |
|---|---|
idempotency:{key} |
Cached payment response (24h TTL) |
idempotency:{key}:lock |
In-flight request lock |
balance:{accountId} |
Cached available:pending balance |
fx:{from}:{to} |
FX rate cache |
| Service | Port |
|---|---|
| user-service | 8081 |
| account-ledger-service | 8082 |
| payment-service | 8083 |
| transaction-service | 8084 |
| fx-service | 8085 |
| stream-processing-service | 8086 |
| fraud-service | 8087 |
| payment-gateway-mock | 8088 |
| notification-service | 8089 |
| recurring-service | 8090 |
| batch-jobs-service | 8091 |
Local stack runs on Minikube + Compose. For production we deploy via Terraform (infra/terraform/) and the same Helm chart.
Users / Merchants
↓
CloudFront (React SPA)
↓
Route 53 → ALB + WAF
↓
┌─────────────────────────────────────────────────────────────┐
│ EKS (multi-AZ) │
│ user · ledger · payment · transaction · fx · fraud │
│ gateway-mock · notification · recurring · streams · batch │
│ HPA on payment-service / api paths │
│ CronJob → recurring billing │
└─────────────────────────────────────────────────────────────┘
↓ ↓ ↓ ↓
Aurora Postgres DocumentDB/ ElastiCache Amazon MSK
(multi-AZ, Atlas Redis (3 brokers)
read replicas) (MongoDB)
CI: GitHub Actions → ECR → Helm deploy to EKS
Secrets: AWS Secrets Manager
Observability: Datadog (APM, logs, infra metrics)
CloudWatch for raw AWS logs backup
Sharding path (not implemented locally): shard Postgres by merchant_id hash when single primary exceeds ~5–10k write TPS. Kafka partitions already keyed by payment/merchant ID.
Temporal (future) — right now the saga lives as Java code inside payment-service. That works for MVP, but long-running flows (recurring billing + dunning + retry + partial refund + dispute) get messy: we need to remember state across crashes, retries, and days-long waits. Temporal gives us durable workflows — each step is recorded, survives pod restarts, and retries/compensations are built in. Maybe can consider if we have a lot of workflows in the future or complicated workflows
Disputes / chargebacks — not built. Would be a separate lifecycle: DISPUTED → EVIDENCE → WON/LOST with ledger hold on disputed amount.
KYC / RBAC / MFA — JWT carries hardcoded ROLE_MERCHANT / ROLE_CONSUMER. Real system would integrate identity providers, step-up auth for large transfers, and block payouts until KYC APPROVED.
If Redis dies — idempotency falls back to Postgres unique constraint; balance cache misses go to Postgres. Hot path still works, just slower.
If Kafka dies — payments still commit (outbox rows queue up); fraud/notifications catch up when relay resumes. Ledger never depends on Kafka for correctness.
Prerequisites: Java 21, Maven, Docker, Minikube, Helm, Node 18+
export JAVA_HOME=/opt/homebrew/opt/openjdk@21/libexec/openjdk.jdk/Contents/Home
# 1. Data layer (host)
docker compose -f infra/docker/docker-compose.yml up -d
# 2. Kubernetes
./infra/k8s/minikube/setup.sh
eval $(minikube docker-env)
mvn package -DskipTests
# build images (example)
docker build -t payment-system/payment-service:latest services/payment-service
# ... repeat for other services, or use CI pipeline
helm upgrade --install payment-system infra/k8s/helm/payment-system \
--namespace payment-system --create-namespace
# 3. Ingress
echo "$(minikube ip) payment.local" | sudo tee -a /etc/hosts
minikube tunnel # separate terminal
# 4. Frontend
cd frontend && npm install && npm run devApp: http://localhost:5173 (frontend) · API via ingress http://payment.local or port-forward individual services
Fast dev without K8s:
docker compose -f infra/docker/docker-compose.yml up -d
./scripts/start-local.sh
cd frontend && npm run devTests:
mvn test
mvn test -pl services/payment-service -Dtest=RunCucumberTest -am # needs Docker
k6 run infra/loadtest/payment-load.jsTeardown:
helm uninstall payment-system -n payment-system
minikube stop
docker compose -f infra/docker/docker-compose.yml down