A social media style app built to practice and showcase system design concepts i have learnt throughout my internships. We focus on scalability and availability here as we need to serve a good user experience for billions of concurrent users across the world.
Features
- Image upload — support different formats and sizes
- Like and comment on posts; view likes and comments; get notifications when posts are liked or commented on
- Follow other users; get notified when someone follows the person
- Analytics
Realistic Scale targets
- 100,000 concurrent uploads
- 1,000,000 concurrent likes and comments
- 1,000,000,000 registered users
| Layer | Tech |
|---|---|
| Frontend | React, Tailwind |
| API | NestJS monorepo api-gateway (HTTP) → gRPC microservices |
| Database | PostgreSQL |
| Cache | Redis |
| Queue | Kafka |
| Orchestration | Kubernetes (Minikube locally, Kustomize manifests) |
| Backend patterns | NestJS DI, Controller → Service → Repository (TypeORM) |
| Docs | Swagger at /api/docs |
| Metrics / traces | Prometheus, Grafana, Jaeger |
| CI | GitHub Actions |
Browser → ingress (/ → web, /api → gateway, /media → media-service)
↓
api-gateway (JWT, rate limit, Swagger)
↓ gRPC
auth · user · media · post · feed · like · comment · fanout · notification
↓
Postgres (users, posts, likes, comments, follows)
Redis (feeds, like/comment counts)
Kafka (new posts, likes, comments, notifications)
Postgres is run on docker locally and will depend on RDS in prod. We should try to avoid running db on the pod as it has a state and will be complex to handle in k8 as pods are ephemeral.
When feed is open, the app does not scan Postgres for every follower/post combo.
feed-servicereads a Redis sorted setfeed:{userId}— post IDs were added when I followed someone or when people I follow posted.- For each post ID,
post-serviceloads the post from Postgres (caption, media URL, username) and pulls like/comment counts from Redis.
Reads stay fast because the fanout work happened at write time.
- Post saved to Postgres.
- Post ID pushed into the author's feed and followers' feeds in Redis — but only if the author has ≤10k followers (see below).
post.publishedon Kafka;fanout-servicedoes the same Redis push for anything the sync step missed.
Why skip fanout above 10k followers?
Two ways to build a feed:
- Push (what I use for normal accounts): when you post, copy the post ID into every follower's Redis feed up front. Opening the feed is a fast Redis read.
- Pull (for big accounts): don't write to millions of feeds on every post. When someone opens their feed, query recent posts from the people they follow. We set a threshold of maybe 10K? we can maybe run a normal postgres query for recent posts of probably most interacted accounts? (maybe we can add a algorithm here or something like how tiktok and reels fyp work) so that it wont be too intensive on the server, and of course pagination and lazy loading.
Imagine if a celebrity has 1M followers, one post would mean 1M Redis writes before the upload even finishes. That blocks the request and hammers Redis for no good reason.
So the rule is: ≤set a threshold like 10k followers → push fanout on write. >10k → skip push (pull model on read — follower loads posts from followed users via Postgres when they open the feed). The threshold is configurable via FANOUT_FOLLOWER_THRESHOLD.
Eventual consistency is chosen here to handle many concurrent writes. Strong consistency on every like would mean locking the same row/post under heavy traffic hence slower, and bad UX. The exact count being off by one for a second doesn't matter much at scale.
- Tap like → UI updates immediately (optimistic).
like-servicewrites the like to Postgres, bumps Redislike_count:{postId}, returns the new count.- Kafka event for notifications. If Kafka is down, the like still sticks — the HTTP response doesn't depend on it.
- Idempotency — see below.
Comments: same pattern (Postgres row + Redis counter + Kafka).
What's the Idempotency-Key header?
When you like a post, the frontend sends a random UUID in the Idempotency-Key header.
Problem it solves: you double-tap the heart, or the request times out and the browser retries — without this, the server might count two likes.
How it works:
- First request with key
abc-123→ process the like, store the response in Redis underidempotency:{userId}:abc-123(24h TTL). - Same key again → return the cached response, don't increment again.
- Plus a
like_deduprow in Postgres(userId, postId)unique constraint so the same user can't like twice even with different keys.
So it's retry-safe and double-tap-safe.
Saving the follow in Postgres isn't enough — I also copy the followee's recent posts into the follower's Redis feed so the timeline isn't empty.
Each microservice follows a layered NestJS layout:
HTTP/gRPC request
↓
Controller — routing only (api-gateway HTTP controllers, or @GrpcMethod in services)
↓
Service — business logic (likePost, createPost, follow, idempotency checks)
↓
Repository — data access (TypeORM Repository<T> for Postgres reads/writes)
↓
Postgres / Redis / Kafka
- Microservices + BFF — browser talks HTTP; services talk gRPC for faster internal calls
- Kubernetes — 10 services + web + Redis/Kafka/observability; K8s runs each as its own deployment, restarts crashed pods, routes traffic through ingress, and matches how this would run on EKS in prod. Postgres stays outside the cluster (docker-compose locally, RDS in prod) because databases need stable disks, not ephemeral pods
- Kafka — decouple likes/comments/notifications from the HTTP path so writes stay fast
- Eventual consistency — like/comment counts in Redis; notifications catch up via Kafka
- Strong consistency where it matters — auth, follows, ownership in Postgres transactions
- Cache for reads — feeds and counts in Redis; Postgres holds durable metadata
- Hybrid fanout — push to Redis when ≤10k followers; pull on read for bigger accounts
- Indexes —
(followeeId)on follows for fanout queries,(userId, createdAt)on posts for profile/feed pulls,(postId)on likes/comments; engagement tables partitioned by time for the 1B-user / high-write story - Idempotency —
Idempotency-Keyheader + Redis cache + dedup tables - Optimistic UI — React updates before the server responds
- State machines — media upload and post publish statuses
- Observability — Prometheus, Grafana, Jaeger
- Swagger —
/api/docs - CICD — GitHub Actions / Jenkins
- Test Cases - WIP
| Key | Purpose |
|---|---|
feed:{userId} |
Sorted set of post IDs for home feed |
like_count:{postId} |
Like count |
comment_count:{postId} |
Comment count |
user_likes:{userId} |
Set of posts this user liked |
Swagger UI: http://localhost:8080/api/docs
Local stack only runs on Minikube. For production we deploy via infra as code. Terraform sketches live in infra/terraform/. Currently still learning more about iac.
Users
↓
CloudFront (static web assets + cached media)
↓
Route 53 → ALB (HTTPS, /api routing)
↓
┌─────────────────────────────────────────────────────────────┐
│ EKS (multi-AZ) │
│ │
│ web · api-gateway · auth · user · media · post · feed │
│ like · comment · fanout · notification │
│ │
│ HPA on api-gateway / web (CPU, RPS) │
│ KEDA on like / comment / fanout consumers (Kafka lag) │
└─────────────────────────────────────────────────────────────┘
↓ ↓ ↓ ↓
RDS Postgres ElastiCache Amazon MSK S3 (+ CloudFront)
(multi-AZ, Redis (3 brokers, pre-signed uploads,
read replicas) cluster mode) same topics) media storage
CI: GitHub Actions → ECR → Argo CD / Flux deploy to EKS
Secrets: AWS Secrets Manager → External Secrets Operator → K8s
Observability: OpenTelemetry → managed metrics/traces/logs
(e.g. Datadog)
Right now celebrity posts dont appear on a normal person's feed as we are not injecting their posts into Redis. We can actually apply a hybrid approach to generating a person's feed be it celebrity or non celebrity by using the pre generated feed from redis combined with the recent posts which the user's followees with more than the threshold amount of followers possesses and sort it by time. And also if i just followed a user, backfilling is needed to display their recent posts on your feed.
Right now if redis dies, the home feed dies, which is not we want in a application thats centered around availability and scalability. We should fall back to postgres if such a thing were to happen, we should also replicate our in memory db (redis atm), if primary fails we promote the replica and failover. We can make shards of redis clusters by user id and replicas for each shard so that if one primary shard goes down, we can still serve for other users. Replicas are async by default so data is not strongly consistent across all nodes, hence if one primary node goes down and gets taken over, we continue serving that stale data while rebuilding whats missing in that shard via postgres.
Another way is lazy rebuilding on read, so if cache is empty we serve from postgres but trigger a job which repopulates only that user's feed so we dont touch the inactive users.
npm ci
make up./scripts/obs-port-forward.sh # Grafana :3001, Prometheus :9090, Jaeger :16686
open http://localhost:8080/api/docs
make down # teardownWithout K8s: make up-dev then make dev → http://localhost:4200