Youtube Playlist Link :- https://youtube.com/playlist?list=PLNzPfhXM567Rv1GAnHgVslVfTsjBstuph&si=R3TXcs0F_IW0vclS
Answer:
System Design is the process of defining the architecture, components, and data flow of a system to meet both functional and non-functional requirements (like scalability, reliability, and performance).
In simple terms, it’s about creating the blueprint of a large-scale system — how different parts (APIs, databases, services, caches, etc.) interact to serve millions of users efficiently.
Key Components:
- Functional Requirements – What the system should do.
- Non-Functional Requirements – How well it should perform (scalability, latency, fault tolerance).
- High-Level Design – System architecture, data flow, service interactions.
- Low-Level Design – Data models, algorithms, and detailed interactions between components.
Answer:
System Design is important because it ensures that a system can scale, perform, and remain reliable as it grows. It helps engineers make the right architectural decisions early, preventing costly rework later.
Key Reasons:
- Scalability – Handles increased user load efficiently.
- Reliability – Ensures the system stays available even if parts fail.
- Performance – Maintains low latency and fast response times.
- Maintainability – Makes the system easier to update and debug.
- Cost Efficiency – Optimizes infrastructure and operational costs.
Answer:
Networking is the backbone of distributed systems — every service-to-service call, API request, and database query relies on network communication. Understanding networking helps you design systems that are fast, reliable, and fault-tolerant.
Key Reasons:
- Service Communication – Enables interaction between microservices, databases, and clients.
- Performance Optimization – Helps reduce latency and bandwidth usage.
- Scalability – Supports horizontal scaling using load balancers and distributed nodes.
- Fault Tolerance – Proper network design avoids single points of failure.
- Security – Ensures safe data transfer using encryption and secure protocols (HTTPS, TLS).
Answer:
In large-scale systems, networking directly affects performance, scalability, and reliability. As the system grows, data must travel across multiple servers, regions, and services — making network efficiency critical.
Key Impacts:
- Latency – Each network call adds delay; optimizing connections reduces response time.
- Bandwidth Usage – Inefficient data transfer can slow down or overload the system.
- Scalability – Proper network design (load balancers, CDNs, replication) enables horizontal scaling.
- Fault Tolerance – Redundant paths and retries help handle node or region failures.
- Consistency – Network delays can cause data synchronization issues across distributed nodes.
- Cost – Cross-region data transfers and high bandwidth usage increase infrastructure costs.
Answer:
An IP (Internet Protocol) Address is a unique identifier assigned to every device connected to a network. It allows devices (like servers, clients, routers) to find and communicate with each other over the internet or a local network.
Key Points:
-
Purpose – Identifies and locates devices in a network.
-
Format – Can be IPv4 (e.g.,
192.168.1.1) or IPv6 (e.g.,2001:0db8::1). -
Types –
- Public IP: Used for communication over the internet.
- Private IP: Used within internal/local networks.
-
Static vs Dynamic:
- Static IP – Manually assigned, doesn’t change.
- Dynamic IP – Assigned automatically by DHCP, can change over time.
Answer:
IPv4 (Internet Protocol version 4) is the fourth version of the Internet Protocol and the most widely used system for identifying devices on a network. It uses a 32-bit address format, allowing around 4.3 billion unique addresses.
Key Points:
- Format: Four numbers separated by dots (e.g.,
192.168.1.1). - Address Range: 0.0.0.0 to 255.255.255.255.
- Address Space: ~4.3 billion addresses (2³²).
- Representation: Each part (octet) ranges from 0–255.
- Limitation: Due to internet growth, IPv4 addresses are running out — leading to IPv6 adoption.
Answer:
IPv6 (Internet Protocol version 6) is the successor to IPv4, developed to overcome the limitation of IPv4’s address exhaustion. It uses a 128-bit address format, providing an almost unlimited number of unique IP addresses.
Key Points:
- Format: Eight groups of four hexadecimal digits separated by colons (e.g.,
2001:0db8:85a3:0000:0000:8a2e:0370:7334). - Address Space: 2¹²⁸ possible addresses — enough for every device globally.
- Simplified Configuration: Supports auto-configuration without DHCP.
- Security: Built-in support for IPsec (encryption and authentication).
- Efficiency: Improved routing and reduced network congestion compared to IPv4.
Answer:
Private IP addresses are used within local networks (like home, office, or internal company systems), while Public IP addresses are used to communicate over the internet.
| Aspect | Private IP | Public IP |
|---|---|---|
| Scope | Used within a local/private network | Used across the internet |
| Access | Not directly accessible from the internet | Accessible globally |
| Assigned By | Local network devices (e.g., router/DHCP) | Internet Service Provider (ISP) |
| Examples (IPv4) | 10.0.0.0 – 10.255.255.255172.16.0.0 – 172.31.255.255192.168.0.0 – 192.168.255.255 |
Any address not in the private IP ranges |
| Security | More secure, isolated from external access | Exposed to external traffic, needs protection |
| Use Case | Home LANs, corporate intranets, internal servers | Web servers, public APIs, cloud services |
In short: Private IP = for internal communication. Public IP = for external (internet) communication.
Answer:
Private IPs are essential for enabling internal communication within local networks without consuming public IP addresses. They help build secure, isolated environments for devices to connect and share resources efficiently.
Key Reasons:
- Address Conservation: Reduces the need for public IPs, which are limited in IPv4.
- Internal Communication: Allows devices (like laptops, printers, and servers) to connect within the same network.
- Security: Keeps internal systems hidden from the public internet, reducing attack risks.
- Cost Efficiency: No need for every device to have its own public IP.
- Network Flexibility: Enables local routing, DHCP usage, and NAT (Network Address Translation).
In short: Private IPs make internal networking secure, scalable, and cost-effective without exposing internal systems to the public web.
Answer:
IP (Internet Protocol) plays a fundamental role in system design by enabling communication between different components of a distributed system — servers, databases, load balancers, and clients — across networks.
Key Roles:
- Identification: Each device or service in a network is uniquely identified by an IP address.
- Routing: IP ensures that data packets reach the correct destination across local or global networks.
- Scalability: Allows horizontal scaling by assigning unique IPs to new servers or microservices.
- Load Balancing: Load balancers use IPs to route traffic to multiple backend servers.
- Security & Isolation: Private IPs separate internal traffic from public exposure, improving security.
- High Availability: Enables failover and redundancy through multiple IP-based routes and replicas.
In short: IP addresses are the foundation of connectivity in any large-scale system — making communication, scalability, and reliability possible.
Answer:
DNS (Domain Name System) is the internet’s phonebook, translating human-readable domain names (like google.com) into IP addresses (like 142.250.190.78) that computers use to identify each other on a network.
Without DNS, users would have to remember complex IP addresses to access websites.
Key Points:
- Purpose: Converts domain names to IP addresses for easy access.
- Analogy: Just like saving a contact name instead of remembering a phone number.
- Function: When you enter a website URL, DNS finds the correct IP so your browser can connect to the right server.
- Hierarchy: DNS operates through a distributed hierarchy of servers — Root, TLD, and Authoritative servers.
- Performance: DNS caching helps speed up future lookups and reduce latency.
In short: DNS makes the internet human-friendly by bridging the gap between names and numerical IP addresses.
Answer:
DNS operates through a hierarchy of servers, each playing a specific role in resolving a domain name to its IP address. The resolution process usually involves multiple servers working together.
Main Types of DNS Servers:
-
Recursive Resolver:
- The first server contacted by your browser or device.
- Responsible for querying other DNS servers to find the correct IP address.
- Often managed by ISPs or public DNS providers (e.g., Google DNS
8.8.8.8).
-
Root DNS Server:
- The starting point of DNS resolution.
- Directs queries to the correct Top-Level Domain (TLD) server (like
.com,.org,.net).
-
TLD (Top-Level Domain) Server:
- Manages domain extensions (e.g.,
.com,.in,.org). - Points to the Authoritative DNS Server of the specific domain.
- Manages domain extensions (e.g.,
-
Authoritative DNS Server:
- The final source of truth for a domain.
- Holds actual DNS records (like A, AAAA, MX, and CNAME) and provides the corresponding IP address.
In short: The DNS resolution chain works as: Client → Recursive Resolver → Root Server → TLD Server → Authoritative Server → IP Address Returned
Answer:
DNS caching is crucial because it reduces lookup time and improves performance by storing previously resolved domain-to-IP mappings. Instead of querying DNS servers repeatedly, the system can reuse cached results, leading to faster response times and lower network load.
Benefits:
- Reduces DNS query latency.
- Minimizes load on upstream DNS servers.
- Improves user experience with quicker website access.
- Decreases bandwidth and infrastructure costs.
Answer:
Caching can happen at multiple layers in the DNS resolution chain:
-
Browser Cache: Stores recent DNS lookups for a short duration to speed up repeated visits.
-
Operating System (OS) Cache: The OS maintains a local DNS cache shared across applications.
-
Recursive Resolver Cache: ISPs or public resolvers (like Google DNS, Cloudflare DNS) cache responses to serve multiple users efficiently.
-
CDN or Proxy Cache: Content Delivery Networks often cache DNS responses close to users to reduce lookup distance.
Answer:
TTL (Time To Live) defines how long a DNS record can be stored in a cache before it must be refreshed from the authoritative server. It’s measured in seconds and balances freshness with performance.
Example:
If a DNS record has a TTL of 3600, it means the cached record is valid for 1 hour.
Key Insight:
- Higher TTL: Better performance, slower updates.
- Lower TTL: Faster updates, more DNS lookups.
Answer:
-
User enters URL / app requests domain The client (browser or app) starts by asking the OS to resolve the domain name.
-
Browser cache check Browser checks its internal DNS cache. If found and not expired, use it and finish.
-
OS / Local resolver cache If browser cache misses, the OS DNS cache (or local DNS resolver) is checked next.
-
Hosts file check The OS checks the local
hostsfile for a static mapping. If present, use it. -
Query to Recursive Resolver If still unresolved, the OS sends a DNS query to the configured recursive resolver (usually ISP or public DNS like 8.8.8.8). The recursive resolver will perform the full resolution on behalf of the client.
-
Recursive resolver cache The resolver checks its cache. If a cached answer exists and is valid (TTL not expired), it returns the IP to the client.
-
Root server query (if no cache) If the resolver has no cached answer, it queries a Root DNS server. The Root responds with the authoritative TLD server location for the domain’s extension (e.g.,
.com). -
TLD server query The resolver queries the TLD server returned by the Root. The TLD responds with the authoritative name server(s) for the specific domain.
-
Authoritative server query The resolver queries the domain’s authoritative DNS server. The authoritative server returns the final DNS record (A, AAAA, CNAME, etc.) with its TTL.
-
Resolver returns result to client The recursive resolver caches the answer (for TTL seconds) and returns the IP address to the client.
-
Client caches the result The OS and browser cache the record according to TTL for future requests.
-
Client connects to server With the IP known, the client initiates a TCP/TLS connection (e.g., TCP handshake, TLS handshake) to the server and requests the resource (HTTP request).
-
Subsequent requests Subsequent lookups follow the same flow but often end at cache layers (browser, OS, resolver) until TTL expiry.
Notes:
- Caching at multiple layers reduces lookup time and network load.
- Recursive resolvers can perform iterative queries: they query Root → TLD → Authoritative and assemble the answer.
- Modern variants include DNS over HTTPS (DoH) and DNS over TLS (DoT), which encrypt DNS queries between client and resolver.
Answer:
DNS plays a critical role in large-scale systems by ensuring smooth routing, scalability, and high availability across distributed infrastructure. It acts as the entry point for almost every user request to reach the right server or service.
Key Reasons:
-
Traffic Routing: DNS directs user requests to appropriate servers, data centers, or regions based on geography, latency, or load.
-
Scalability: Enables horizontal scaling by balancing traffic across multiple IPs or service instances.
-
High Availability: Allows failover by rerouting traffic to healthy servers or backup regions when one fails.
-
Load Balancing Integration: Works with DNS-based load balancing (like round-robin or geo-DNS) to distribute requests efficiently.
-
Performance Optimization: DNS caching reduces resolution time, improving user experience for global systems.
-
Service Discovery: Helps microservices find each other dynamically through DNS-based service registries.
-
Security and Control: Supports protection via DNS filtering, DNSSEC, and traffic management policies.
In summary: DNS is not just a name-to-IP translator—it’s a strategic component in system design that ensures speed, resilience, and global scalability in modern distributed systems.
Answer:
The Client-Server Model is a network architecture where multiple clients (users or devices) request and receive services or data from a centralized server. The server hosts, manages, and delivers resources, while the client consumes them.
Key Points:
-
Client: Sends requests (e.g., browser, mobile app, API consumer).
-
Server: Processes requests and sends responses (e.g., web server, database server).
-
Communication: Usually happens over a network using standard protocols like HTTP or TCP/IP.
-
Scalability: Multiple clients can interact with one or more servers concurrently.
-
Examples:
- Web browsing (browser → web server)
- Mobile apps using backend APIs
- Database queries (application → DB server)
In short: The client-server model forms the foundation of modern internet architecture, enabling distributed computing and centralized control.
Answer:
The Client-Server Model consists of several core components that work together to enable communication, data exchange, and service delivery between clients and servers.
Key Components:
-
Client: The end-user device or application that sends requests to access data or services (e.g., web browser, mobile app, API consumer).
-
Server: The centralized system that receives client requests, processes them, and returns responses (e.g., web server, database server, file server).
-
Network: The medium that connects clients and servers, enabling data transmission over LAN, WAN, or the internet.
-
Request: The message sent by the client to the server, specifying what data or operation is needed.
-
Response: The server’s reply to the client, containing the requested data or the result of an operation.
-
Protocols: The communication rules that define how clients and servers interact — commonly HTTP/HTTPS, TCP/IP, WebSocket, etc.
-
Database (optional): A backend component where the server stores and retrieves persistent data.
In summary: These components together create a structured system where clients request, servers process, and networks connect, forming the basis of all web and distributed applications.
Answer:
Client and server communication happens through a network using well-defined protocols (like HTTP or TCP/IP). The client initiates a request, and the server processes it and sends back a response. This exchange enables everything from web browsing to API calls.
-
Connection Establishment: The client establishes a connection with the server, typically over TCP/IP.
-
Request Sending: The client sends a request (e.g., an HTTP GET or POST) to the server specifying what it needs.
-
Processing by Server: The server receives the request, processes it (e.g., fetches data, runs logic), and prepares a response.
-
Response Sending: The server sends the response (data or status) back to the client.
-
Connection Termination (or Reuse): The connection is closed, or kept alive for further requests (using persistent connections).
-
Synchronous Communication:
- The client waits for the server to respond before proceeding.
- Example: HTTP request from a browser to a web server.
- Suitable for real-time, request-response systems.
-
Asynchronous Communication:
- The client doesn’t wait for an immediate response; communication happens in the background.
- Example: Message queues (Kafka, RabbitMQ), WebSockets, event-driven systems.
- Useful for high-performance, decoupled architectures.
In short: Clients and servers communicate by exchanging requests and responses over network protocols — either synchronously for direct interactions or asynchronously for scalable, event-driven systems.
Answer:
The HTTP request–response cycle is the standard flow where a client (browser or app) sends an HTTP request to a server, the server processes it, and returns an HTTP response. It includes optional TCP/TLS handshakes, routing via proxies/load balancers, and caching layers.
Basic Steps (short):
- DNS lookup — resolve domain to IP.
- TCP handshake — client and server establish a TCP connection (SYN, SYN-ACK, ACK).
- TLS handshake (optional) — negotiate encryption (if HTTPS).
- HTTP request — client sends request line, headers, optional body.
- Server processing — server (and backend services/databases) handle the request.
- HTTP response — server sends status line, headers, and body.
- Connection close / keep-alive — connection either closed or reused.
- Client processes response — render page, cache, or follow redirects.
Client
|
v
DNS lookup -> TCP handshake -> (TLS handshake if HTTPS)
|
v
Send HTTP Request (request-line + headers + body)
|
v
[Load Balancer / Reverse Proxy]
|
v
Web/App Server ---> Backend services / Database
|
v
HTTP Response (status + headers + body)
|
v
Client (render / cache / follow-up)
|
v
Connection closed or reused (keep-alive)
Answer:
In system design, synchronous and asynchronous communication models define how services exchange data and when they wait for responses. Choosing between them affects performance, scalability, and user experience.
Definition: In synchronous communication, the client waits for the server to respond before continuing its work. The interaction happens in real-time.
Example: A web browser sending an HTTP request and waiting for the server’s response.
Characteristics:
- Request and response occur in the same session.
- Simpler to implement and debug.
- Tight coupling between client and server availability.
- Higher latency if one service is slow.
Use Cases:
- API calls (REST, gRPC sync).
- Payment gateways.
- Real-time validation (login, form submission).
Definition: In asynchronous communication, the client doesn’t wait for an immediate response. Requests are queued or processed in the background, and the client is notified later.
Example: A message is sent to a queue (e.g., Kafka, RabbitMQ), and the server processes it later.
Characteristics:
- Non-blocking and decoupled communication.
- Higher scalability and fault tolerance.
- More complex to design (needs message queues, event handling).
- Delayed response or eventual consistency.
Use Cases:
- Background jobs (email, notifications).
- Event-driven systems (order processing, analytics).
- Microservice communication through queues or pub/sub.
In short:
- Synchronous: Real-time, blocking, simple.
- Asynchronous: Background, non-blocking, scalable.
In system design, servers are often categorized as stateless or stateful depending on whether they retain information (state) about client interactions between requests. Understanding this distinction is critical for designing scalable and reliable distributed systems.
Definition: A stateless server does not store any client-specific data between requests. Each request from the client contains all the necessary information for the server to process it.
Characteristics:
- Each request is independent.
- Easier to scale horizontally since any server can handle any request.
- Simpler to maintain and recover after failures.
- No session data stored on the server side.
Example: HTTP is inherently a stateless protocol. REST APIs are typically designed to be stateless, meaning every request must include authentication tokens and all required context.
Use Cases:
- Web APIs (RESTful services).
- Content delivery systems.
- Load-balanced environments.
Definition: A stateful server maintains information about the client’s session across multiple requests. The server “remembers” previous interactions to provide continuity.
Characteristics:
- Server stores client-specific session data.
- Requests depend on prior interactions.
- Harder to scale horizontally because sessions must be maintained or shared.
- Requires session management mechanisms (e.g., sticky sessions, distributed caches).
Example: Online banking applications, multiplayer games, or chat applications often require stateful servers to track user sessions and data.
Use Cases:
- Real-time chat systems.
- Multiplayer online games.
- E-commerce carts with server-side sessions.
| Aspect | Stateless Server | Stateful Server |
|---|---|---|
| Session Data | Not stored on the server | Stored on the server |
| Scalability | Highly scalable | Harder to scale |
| Fault Tolerance | Easier to recover | Session loss on failure |
| Complexity | Simpler | More complex |
| Example | REST API | Online banking system |
In summary, stateless servers are ideal for scalability and reliability, while stateful servers are necessary when maintaining session continuity or user context across interactions.
A proxy is an intermediary server that sits between a client and a destination server, forwarding requests and responses between them. It acts as a gateway, managing communication, enhancing performance, improving security, and providing anonymity.
When a client sends a request, it first goes to the proxy server.
- The proxy forwards the request to the destination server on behalf of the client.
- The response from the destination server returns to the proxy, which then sends it back to the client.
This setup allows the proxy to inspect, modify, cache, or filter traffic as needed.
- Security: Hides client IPs and filters malicious requests.
- Caching: Stores frequently accessed data to reduce latency and bandwidth usage.
- Load Management: Distributes traffic among multiple servers.
- Access Control: Restricts access to certain content or sites.
- Anonymity: Masks client identity for privacy.
- Forward Proxy: Acts on behalf of clients (used for caching or filtering outgoing requests).
- Reverse Proxy: Acts on behalf of servers (used for load balancing, SSL termination, and caching).
In short, a proxy is a middle layer that improves security, performance, and scalability in distributed systems.
A forward proxy is a server that sits between the client and the internet, acting on behalf of the client to send requests to external servers. It hides the client’s identity and can control or monitor outbound traffic.
- The client sends a request to the forward proxy.
- The proxy checks policies, caching, or filters before forwarding it to the target server.
- The server’s response returns to the proxy, which then forwards it back to the client.
- Access Control: Restricts user access to specific websites or resources.
- Caching: Stores frequently accessed data to reduce network load and latency.
- Anonymity: Masks client IP addresses for privacy.
- Monitoring: Logs or inspects outbound traffic for security or analytics.
In a corporate network, a forward proxy ensures all employee web requests go through it — enforcing policies and blocking unauthorized sites.
In essence, a forward proxy primarily represents the client, helping manage, filter, and optimize outgoing traffic to external servers.
A reverse proxy is a server that sits in front of one or more backend servers and acts on behalf of those servers to handle incoming client requests. It hides the details of the backend infrastructure and manages traffic efficiently.
- The client sends a request to the reverse proxy (instead of directly to the backend server).
- The reverse proxy determines which backend server should handle the request.
- It forwards the request, receives the response from the server, and sends it back to the client.
The client never directly communicates with the backend servers.
- Load Balancing: Distributes incoming traffic across multiple servers for better scalability.
- Security: Masks backend servers’ IP addresses and filters malicious requests.
- Caching: Stores static or frequently accessed responses to reduce server load.
- SSL Termination: Handles SSL/TLS encryption and decryption to offload backend servers.
- Compression & Optimization: Improves response speed and bandwidth efficiency.
A website like example.com may use NGINX as a reverse proxy to route traffic among multiple application servers, cache responses, and manage SSL certificates.
In summary, a reverse proxy acts on behalf of servers, improving performance, scalability, and security in large-scale systems.
| Aspect | Forward Proxy | Reverse Proxy |
|---|---|---|
| Acts On Behalf Of | Client | Server |
| Primary Purpose | Controls and manages outbound traffic from clients | Manages and optimizes inbound traffic to servers |
| Visibility | The target server does not know the real client | The client does not know the real backend server |
| Typical Use Cases | Caching, access control, anonymity, monitoring | Load balancing, security, caching, SSL termination |
| Location in Network | Sits between client and the internet | Sits between internet and backend servers |
| Example Tools | Squid, Privoxy | NGINX, HAProxy, Apache HTTP Server (mod_proxy) |
| Used By | Clients (e.g., corporate users) | Servers (e.g., web applications) |
In short:
- A forward proxy protects and represents clients.
- A reverse proxy protects and represents servers.
Load balancing is essential for distributing incoming network or application traffic evenly across multiple servers. It ensures that no single server becomes a bottleneck, improving availability, performance, and scalability of the system.
-
High Availability: If one server fails, traffic can be automatically redirected to healthy servers, preventing downtime.
-
Scalability: Easily add or remove servers based on demand without affecting users.
-
Optimized Resource Utilization: Balances requests to prevent overloading certain servers while others remain idle.
-
Improved Performance: Reduces response time by routing requests to the least-loaded or nearest server.
-
Fault Tolerance: Helps maintain smooth operations even when individual components fail.
-
Maintenance Without Downtime: Servers can be updated or maintained without interrupting user access.
In short, load balancing enables systems to handle large-scale traffic efficiently, ensuring reliability and a seamless user experience.
Load balancers can be classified in two main ways — based on the OSI layer they operate on and based on how they are deployed.
| Type | Layer | Description | Example Use Case |
|---|---|---|---|
| Layer 4 Load Balancer (Transport Layer) | Operates at TCP/UDP level | Routes traffic based on network information such as IP address and port. Does not inspect application data. | Useful for simple, fast routing — e.g., balancing HTTP or database traffic. |
| Layer 7 Load Balancer (Application Layer) | Operates at HTTP/HTTPS level | Makes routing decisions based on content (URL path, headers, cookies, etc.). | Ideal for intelligent routing — e.g., sending image requests to a CDN and API requests to backend servers. |
| Type | Description | Examples |
|---|---|---|
| Hardware Load Balancer | Physical appliance designed for high-performance load balancing in enterprise data centers. | F5, Citrix ADC |
| Software Load Balancer | Runs on general-purpose servers; more flexible and cost-effective. | NGINX, HAProxy, Apache HTTP Server |
| Cloud/Managed Load Balancer | Provided as a managed service by cloud providers; automatically scales and manages routing. | AWS ELB, Google Cloud Load Balancing, Azure Load Balancer |
In summary:
- Layer-based load balancers decide how routing happens (network vs. application level).
- Deployment-based load balancers decide where and what kind of infrastructure handles the balancing.
Load balancing strategies define how traffic is distributed across servers. They can be broadly categorized into static and dynamic approaches.
In static load balancing, the traffic distribution is predefined and does not depend on real-time server load or performance. The load balancer follows fixed algorithms to decide which server handles each request.
Key Characteristics:
- Simpler to implement.
- No monitoring of server health or performance.
- Best suited for systems where server capacities are uniform and predictable.
Common Algorithms:
- Round Robin: Requests are distributed sequentially across servers.
- Weighted Round Robin: Servers with higher capacity get more requests.
- IP Hashing: Requests from the same client IP always go to the same server.
Example: A set of identical web servers handling equal-sized requests using Round Robin distribution.
In dynamic load balancing, the distribution adapts in real time based on the current load, health, and performance of servers.
Key Characteristics:
- Requires monitoring of CPU, memory, or response time.
- Automatically adjusts to handle changing workloads.
- Better suited for large-scale, high-traffic, or variable-load systems.
Common Algorithms:
- Least Connections: Routes requests to the server with the fewest active connections.
- Least Response Time: Sends requests to the server responding fastest.
- Resource-Based: Considers metrics like CPU or memory usage for routing.
Example: A dynamic web application routing traffic to the least busy or fastest-responding server to ensure consistent performance.
In summary:
- Static strategies are simple but inflexible.
- Dynamic strategies are intelligent and adaptive for modern, large-scale systems.
Selecting the right load balancer depends on your system architecture, traffic patterns, scalability goals, and operational requirements. The goal is to balance performance, cost, and complexity while ensuring reliability.
-
Traffic Type and Protocols
- For low-level TCP/UDP traffic → use a Layer 4 load balancer.
- For HTTP/HTTPS or content-based routing → use a Layer 7 load balancer.
-
Scalability Needs
- If you expect traffic to grow dynamically, a cloud-based or software load balancer with auto-scaling support is ideal.
-
Budget and Infrastructure
- Hardware load balancers offer performance but are expensive.
- Software or cloud load balancers are cost-effective and easier to maintain.
-
Health Monitoring and Failover
- Choose a load balancer that supports real-time health checks and automatic rerouting on server failures.
-
Security Requirements
- For SSL termination, DDoS protection, or request filtering, go with a reverse proxy-based load balancer (e.g., NGINX, AWS ALB).
-
Deployment Environment
- On-premises systems → Hardware or software load balancer.
- Cloud-native systems → Managed services like AWS ELB, GCP Load Balancer, or Azure LB.
| Scenario | Recommended Load Balancer |
|---|---|
| Static website with predictable traffic | NGINX (software L4/L7) |
| Scalable microservice-based system | AWS Application Load Balancer |
| Enterprise data center | F5 Hardware Load Balancer |
| API gateway for large-scale apps | HAProxy or NGINX with Layer 7 routing |
In summary: Choose a load balancer that aligns with your traffic patterns, scaling requirements, and operational constraints while ensuring high availability and performance.
An API Gateway is a single entry point for all client requests in a distributed system, especially in microservices architectures. It acts as an intermediary between clients and backend services, managing, routing, and securing API calls.
- Request Routing: Directs incoming requests to the appropriate backend service.
- Authentication & Authorization: Verifies client identity and permissions before forwarding requests.
- Rate Limiting & Throttling: Controls traffic flow to prevent overloading backend services.
- Load Balancing: Distributes requests across multiple instances of a service.
- Caching: Stores frequently accessed responses to improve performance.
- Request Transformation: Modifies headers, parameters, or payloads before sending to backend.
- Monitoring & Logging: Tracks API usage, latency, and errors for observability.
- Simplifies client communication by providing a unified API endpoint.
- Improves security, performance, and manageability of microservices.
- Reduces client complexity by abstracting multiple backend calls into a single request.
In a microservices-based e-commerce app:
- The API Gateway routes
/ordersto the Order Service,/usersto the User Service, and/paymentsto the Payment Service — all through one unified public endpoint.
In short, an API Gateway is the front door to your microservices ecosystem, handling routing, security, and performance optimization.
An API Gateway acts as a smart intermediary between clients and backend services, managing the flow of requests and responses efficiently. It centralizes common system concerns like authentication, routing, rate limiting, and monitoring.
-
Client Request: A client (web, mobile, or IoT) sends an API request to the gateway instead of directly contacting backend services.
-
Routing & Authentication: The gateway authenticates the request (using tokens, API keys, etc.) and determines which backend service should handle it.
-
Request Transformation (Optional): The gateway may modify headers, payloads, or URLs to match the backend service’s requirements.
-
Forwarding to Backend: The request is then forwarded to the appropriate backend microservice.
-
Response Aggregation (if needed): If the request involves multiple services, the gateway aggregates responses into a single unified output.
-
Response Back to Client: The gateway returns the final processed response to the client, possibly with caching, compression, or formatting applied.
Client → API Gateway → Authentication → Routing → Backend Services
↑
Monitoring, Caching, Logging
A mobile app requests /user/profile:
- The API Gateway authenticates the request using a JWT.
- Routes it to the User Service.
- Caches the response for repeated calls.
- Sends the processed data back to the client.
In summary, the API Gateway simplifies communication between clients and microservices by handling routing, security, transformation, and aggregation at a centralized layer.
An API Gateway provides a centralized layer for managing and optimizing communication between clients and backend services. It simplifies system architecture and improves performance, security, and scalability.
-
Centralized Entry Point
- Clients interact with a single endpoint instead of multiple microservices.
- Simplifies client logic and reduces network complexity.
-
Improved Security
- Handles authentication, authorization, and rate limiting at one layer.
- Hides internal service details and prevents direct client access to microservices.
-
Load Balancing and Traffic Management
- Distributes incoming traffic evenly across backend instances.
- Supports throttling and circuit breaking to maintain stability under load.
-
Request and Response Transformation
- Modifies headers, formats, or payloads to maintain compatibility between clients and services.
-
Caching and Performance Optimization
- Stores frequently requested responses to reduce latency and backend load.
-
Monitoring and Logging
- Provides unified metrics, request tracing, and logging for better observability.
-
Protocol Translation
- Converts between protocols (e.g., HTTP to gRPC, REST to WebSocket) for interoperability.
-
Response Aggregation
- Combines data from multiple microservices into a single client response, reducing the number of network calls.
In short, an API Gateway enhances security, performance, and developer productivity by acting as a smart, centralized control point for all service communication.
An API Gateway serves as the first line of defense for backend services, protecting them from unauthorized access, malicious traffic, and misuse. It centralizes security enforcement across all APIs in the system.
-
Authentication
- Verifies the identity of clients before allowing access.
- Supports mechanisms like JWT (JSON Web Tokens), OAuth 2.0, API keys, or Basic Auth.
-
Authorization
- Ensures that authenticated clients have permission to access specific resources or actions.
- Implements role-based or policy-based access control.
-
Rate Limiting and Throttling
- Limits the number of requests per client in a specific time window.
- Prevents abuse, brute-force attacks, and protects backend resources from overload.
-
IP Whitelisting and Blacklisting
- Restricts access based on IP addresses, blocking malicious or unauthorized sources.
-
Encryption (SSL/TLS Termination)
- Ensures secure data transmission over HTTPS.
- Offloads SSL decryption from backend servers to reduce their overhead.
-
Request Validation and Filtering
- Validates incoming payloads, headers, and parameters to prevent malformed or malicious requests.
- Blocks common attacks like SQL injection, XSS, or header tampering.
-
API Key Management
- Issues and manages unique API keys for tracking and controlling client usage.
-
CORS (Cross-Origin Resource Sharing) Control
- Defines which domains are allowed to access the APIs, preventing unauthorized cross-origin calls.
-
Logging and Audit Trails
- Captures detailed logs of requests, responses, and errors for auditing and incident investigation.
In summary, an API Gateway enforces consistent security policies across all microservices, ensuring authentication, authorization, encryption, and traffic control in one centralized layer.
Caching reduces latency and backend load by storing frequently accessed data closer to the client or gateway. It improves response time, scalability, and overall system performance.
- Client-Side Caching: Data stored in the browser or app for quick reuse.
- API Gateway Caching: Gateway stores common API responses to avoid repeated backend calls.
- Server-Side Caching: Backend servers or databases use in-memory stores (e.g., Redis, Memcached) to serve repeated queries faster.
- CDN Caching: Static content cached at edge locations for global performance improvement.
API Composition is a technique where the API Gateway combines data from multiple microservices into a single unified response for the client.
- Reduces the number of client requests.
- Simplifies client logic.
- Common in microservices architectures where data resides across multiple services.
Example:
A /user/profile endpoint may fetch data from User, Order, and Payment services and return it as one combined response.
API Aggregation is a broader pattern where multiple backend API calls are aggregated, transformed, or filtered before sending the final result to the client.
- Useful when clients need composite data.
- Can include merging, filtering, or data transformation.
Example: An e-commerce API Gateway aggregates data from inventory, pricing, and reviews services to present a complete product detail page.
In short:
- Composition = Combining multiple API calls into one response.
- Aggregation = Enhancing or transforming combined results for optimized delivery.
| Tool | Description |
|---|---|
| Kong | Lightweight, high-performance gateway built on NGINX; supports plugins for authentication, rate limiting, and logging. |
| NGINX | Widely used as a reverse proxy and API gateway; offers load balancing, caching, and SSL termination. |
| HAProxy | Reliable and fast TCP/HTTP load balancer that can be configured for gateway functionalities. |
| Traefik | Modern, cloud-native gateway with automatic service discovery and Let’s Encrypt integration. |
| Tyk | Full-featured open-source API gateway with analytics, authentication, and developer portal support. |
| Express Gateway | Built on Node.js, focuses on easy API management using JavaScript-based configuration. |
| Provider | Service | Description |
|---|---|---|
| Amazon Web Services (AWS) | API Gateway | Fully managed service supporting REST, WebSocket, and HTTP APIs with built-in monitoring and security. |
| Google Cloud Platform (GCP) | API Gateway / Apigee | Enterprise-grade API management with analytics, versioning, and policy enforcement. |
| Microsoft Azure | API Management (APIM) | Centralized gateway for securing, monitoring, and scaling APIs. |
| Kong Cloud | Kong Konnect | Managed version of Kong offering cloud scalability and integrated observability. |
| Cloudflare | API Gateway | Focused on edge-level protection, rate limiting, and bot mitigation. |
Summary:
- Open source gateways provide flexibility and control for custom setups.
- Cloud-based gateways offer managed scalability, security, and reduced operational overhead.
Use an API Gateway when you need centralized control, scalability, and simplified client interaction across multiple services.
Ideal Scenarios:
- Microservices Architecture – To route and manage traffic across many small services.
- Unified Entry Point – When clients (mobile/web) need a single endpoint instead of calling multiple services.
- Security Enforcement – To handle authentication, authorization, and rate limiting in one place.
- Traffic Management – For load balancing, caching, and request throttling.
- Cross-Cutting Concerns – When you need logging, monitoring, or analytics for all APIs.
- Response Aggregation – When client responses need to be composed from multiple services.
Avoid an API Gateway if it adds unnecessary complexity or latency for your use case.
Avoid in These Situations:
- Small or Monolithic Applications – A direct client-to-server model is simpler and faster.
- Low Traffic Systems – Gateway setup and maintenance may not justify the overhead.
- Latency-Sensitive Applications – Extra network hop can increase response time.
- Simple Internal APIs – If only internal services communicate, direct service-to-service calls may be better.
In summary:
- Use an API Gateway for scalability, security, and central management in distributed systems.
- Avoid it when simplicity and low latency are higher priorities than centralized control.
A Content Delivery Network (CDN) is a globally distributed network of servers that deliver web content (like images, videos, scripts, and static files) to users based on their geographic location.
A CDN is a system of edge servers placed across various regions that cache and serve content from the nearest server to the user, improving speed and reliability.
CDNs exist to reduce latency, improve content delivery speed, enhance availability, and reduce server load by bringing content physically closer to end-users.
In short: CDNs make websites faster, scalable, and more reliable worldwide.
CDNs are needed to enhance performance, scalability, and reliability of content delivery. They help deliver data faster to users regardless of their location by caching content on geographically distributed servers.
- High Latency: Users far from the origin server experience slow loading times.
- Server Overload: A single origin server must handle all requests, leading to slowdowns or crashes under high traffic.
- Poor Global Performance: Websites perform inconsistently across regions.
- Bandwidth Bottlenecks: Increased network congestion and higher data transfer costs.
- Low Availability: Outages at the origin server can make the entire site inaccessible.
In essence: Without a CDN, performance drops, costs rise, and reliability suffers.
A Content Delivery Network (CDN) is a globally distributed system of servers designed to deliver web content efficiently and reliably. Its architecture consists of several key components that work together to reduce latency and improve user experience.
-
Origin Server: The main server where the original content (e.g., images, videos, HTML files) is stored.
-
Edge Servers (PoPs – Points of Presence): Servers located in multiple geographic locations that cache and serve content closer to end users.
-
CDN Management Layer: Handles routing, load balancing, and cache invalidation between origin and edge servers.
-
DNS and Request Routing System: Directs user requests to the nearest or most optimal edge server using techniques like GeoDNS or Anycast.
-
Caching Mechanism: Stores frequently requested content at the edge to minimize repeated requests to the origin.
- User requests content (e.g., a webpage or image).
- The DNS routes the request to the nearest edge server.
- If the edge has cached content, it’s delivered immediately.
- If not, the edge fetches it from the origin, caches it, and serves it to the user.
This distributed approach ensures low latency, high availability, and faster content delivery across the globe.
A CDN caches content by storing copies of static and dynamic assets on edge servers located close to users. This helps reduce latency and server load.
-
User Request: When a user requests a resource (e.g., image, CSS, video), the request first goes to the nearest CDN edge server.
-
Cache Lookup: The edge server checks if the requested content is already stored (cached) locally.
-
Cache Hit: If found, the edge server directly serves the cached content to the user — fast and efficient.
-
Cache Miss: If not found, the edge requests the content from the origin server, stores it temporarily (based on caching rules), and then serves it to the user.
- HTTP Headers:
Cache-Control,Expires, andETagheaders define how long content stays cached. - Time-to-Live (TTL): Specifies how long the cached item remains valid before revalidation.
- Cache Invalidation: Allows removing or updating outdated content manually or automatically.
By intelligently caching content, CDNs reduce latency, bandwidth costs, and load on origin servers.
CDNs distribute incoming user requests across multiple edge servers to avoid overloading any single node.
-
Goal: Optimize performance and ensure efficient resource usage.
-
Methods Used:
- DNS-based load balancing – directs users to the nearest or least-loaded server.
- Anycast routing – routes requests to the geographically closest server.
- Health checks – ensure only active servers receive traffic.
If an edge server or data center goes down, the CDN automatically redirects traffic to the next healthy and available node.
-
Mechanisms:
- Continuous health monitoring of nodes.
- Automatic rerouting during outages or degradation.
- Multi-origin support for redundancy.
Determines which edge server handles a specific user request based on various factors:
- Geolocation: Route to nearest edge server to reduce latency.
- Server Load: Send requests to least-busy nodes.
- Network Conditions: Reroute dynamically based on congestion or outages.
Together, these mechanisms ensure high availability, minimal latency, and fault tolerance in a CDN-powered system.
Compression reduces the size of files transferred between servers and clients, improving load time and reducing bandwidth usage.
Common Techniques:
- Gzip / Brotli: Compress text-based assets like HTML, CSS, JS before transmission.
- Image Compression: Tools like WebP or AVIF reduce image size without major quality loss.
- Video Compression: Codecs like H.264, H.265 (HEVC) for efficient media delivery.
Benefits:
- Faster content delivery.
- Lower bandwidth consumption.
- Improved user experience.
Minification removes unnecessary characters (spaces, comments, line breaks) from code files without changing functionality.
Techniques & Tools:
- CSS/JS Minifiers: Tools like UglifyJS, Terser, CSSNano.
- HTML Minifiers: Reduce response size by stripping redundant markup.
Benefits:
- Smaller file size.
- Faster parsing and execution by browsers.
In CDN systems, compression and minification are key optimization steps before caching or serving content to enhance overall performance and scalability.
-
Static Content: Includes files that don’t change often — images, CSS, JS, fonts, videos.
- Cached and served directly from CDN edge servers.
- Example: Serving website assets or media files globally.
-
Dynamic Content: Data generated in real-time (e.g., personalized dashboards, API responses).
- CDNs optimize routing and use TCP/UDP optimizations to accelerate delivery.
- Can use edge caching for partial or short-term storage.
Key Point: Static content is heavily cached, while dynamic content relies on optimized routing and edge acceleration.
Modern CDNs are not just for websites — they also accelerate API traffic and enable edge computing.
-
API Acceleration:
- Reduces latency for REST and GraphQL APIs by caching responses close to users.
- Uses smart routing, connection reuse, and HTTP/2 multiplexing.
- Example: Speeding up e-commerce or authentication APIs.
-
Edge Computing:
- Executes lightweight logic (e.g., authentication, request validation, personalization) directly on CDN edge servers.
- Reduces round trips to origin servers.
- Platforms like Cloudflare Workers and Akamai EdgeWorkers enable this.
Result: APIs become faster, more reliable, and scalable — improving end-user experience and backend efficiency.
TCP (Transmission Control Protocol) is a connection-oriented protocol used to ensure reliable and ordered data transmission between systems over a network.
- Connection-Oriented: Establishes a connection using a three-way handshake before data transfer.
- Reliable: Guarantees delivery with acknowledgment and retransmission of lost packets.
- Ordered Delivery: Ensures packets arrive in the same sequence they were sent.
- Error Checking: Detects and corrects transmission errors using checksums.
- Flow & Congestion Control: Adjusts data transfer rate based on network conditions.
- Web browsing (HTTP/HTTPS)
- Email (SMTP, IMAP, POP3)
- File transfers (FTP)
In system design, TCP is ideal when data accuracy and reliability are more important than speed.
UDP (User Datagram Protocol) is a connectionless communication protocol that focuses on speed and low latency rather than reliability.
- Connectionless: No handshake; data is sent directly without establishing a connection.
- Unreliable Delivery: Packets may be lost, duplicated, or arrive out of order — no acknowledgment or retransmission.
- Lightweight & Fast: Minimal overhead, making it faster than TCP.
- No Flow Control: Sender transmits data without checking receiver’s capacity.
- Real-time streaming (video/audio)
- Online gaming
- VoIP (Voice over IP)
- DNS lookups
In system design, UDP is preferred when speed and real-time communication matter more than guaranteed delivery.
HTTP (HyperText Transfer Protocol) is an application-layer protocol that defines how data is transmitted between a client (usually a browser) and a server over the web.
It is the foundation of communication for websites, APIs, and web services.
- Stateless: Each request is independent; the server doesn’t retain client context.
- Text-based & Simple: Easy to read and debug.
- Request–Response Model: The client sends a request, and the server responds with data (HTML, JSON, etc.).
- Extensible: Supports methods, headers, and status codes for flexible communication.
- Loading web pages.
- Communicating with REST APIs.
- Transferring resources like images, scripts, or files.
In system design, understanding HTTP is essential since almost every web-based system relies on it for data exchange and interoperability.
HTTP operates on a client–server model, where the client (e.g., browser, mobile app) sends a request to the server, and the server responds with the requested data. It typically runs over TCP (port 80) or HTTPS (port 443) for secure communication.
-
Client Initiates Request: The client sends an HTTP request (e.g., GET, POST) to the server.
-
Server Processes Request: The server interprets the request, fetches or generates the required resource.
-
Server Sends Response: The server returns an HTTP response containing status, headers, and data.
-
Connection Handling: Depending on configuration, the connection may close or stay open (persistent connections in HTTP/1.1+).
-
Request Line: Contains the HTTP method, target URL, and version. Example:
GET /api/users HTTP/1.1 -
Headers: Provide metadata like content type, user agent, and authorization info. Example:
Content-Type: application/json Authorization: Bearer <token> -
Body (Optional): Contains data sent to the server (mainly in POST, PUT requests). Example:
{ "name": "Vivek", "role": "Engineer" }
-
Status Line: Includes protocol version, status code, and status message. Example:
HTTP/1.1 200 OK -
Headers: Provide information like content type, length, and caching rules. Example:
Content-Type: application/json Cache-Control: no-cache -
Body: Contains the actual response data — HTML, JSON, images, etc. Example:
{ "success": true, "message": "User created" }
In summary: HTTP enables structured, stateless communication between clients and servers using a simple request–response model fundamental to all web systems.
The HTTP request–response cycle defines how a client and server exchange data over the web. It follows a predictable sequence that ensures communication and data transfer.
-
URL Resolution: The client (browser or app) converts the domain name to an IP address using DNS.
-
Connection Establishment: The client opens a TCP (or TLS for HTTPS) connection with the server.
-
HTTP Request Sent: The client sends an HTTP request with method, headers, and optional body. Example:
GET /home HTTP/1.1 Host: example.com -
Server Processing: The server interprets the request, fetches or generates the necessary data or resource.
-
HTTP Response Sent: The server replies with a status code, headers, and (optionally) a body. Example:
HTTP/1.1 200 OK Content-Type: text/html -
Client Renders Data: The client (browser/app) processes the response and renders it for the user.
-
Connection Termination (or Keep-Alive): The TCP connection is closed or kept alive for reuse in subsequent requests.
sequenceDiagram
participant C as Client
participant S as Server
C->>S: DNS Lookup & TCP Handshake
C->>S: HTTP Request (GET /home)
S->>C: HTTP Response (200 OK)
C->>C: Render Content
Note over C,S: Connection closes or stays open (Keep-Alive)
Summary: The HTTP cycle is a request–response loop built on TCP, forming the backbone of web communication between clients and servers.
HTTP is a stateless protocol, meaning each request is independent — the server does not remember any previous interactions with the client. Every request must include all necessary information for the server to process it.
Example: If a user logs in and makes another request, the server doesn’t automatically know it’s the same user unless session data is provided again.
- No Session Memory: Server can’t recall user data between requests.
- Repeated Authentication: Each request must carry credentials or tokens.
- Complex State Management: Hard to maintain user sessions, carts, or preferences.
- Increased Data Transfer: Each request carries more metadata (like headers, tokens).
To overcome statelessness, systems use external mechanisms to maintain user or session data:
- Cookies: Stored in the browser and sent automatically with each request.
- Sessions: Server stores session data (identified by a session ID in cookies).
- Tokens (JWTs): Client includes tokens in headers for stateless authentication.
- Caching / Databases: Persistent stores used for user or application state.
In short: HTTP’s stateless nature simplifies scalability but requires external state management for personalized and consistent user experiences.
HTTP methods define the type of action the client wants the server to perform on a resource. They are also called HTTP verbs.
-
Purpose: Retrieve data from the server.
-
Characteristics:
- No request body.
- Should not modify data (idempotent).
-
Example:
GET /users/123
-
Purpose: Send data to the server to create a new resource.
-
Characteristics:
- Includes a request body.
- Non-idempotent (can create duplicates if repeated).
-
Example:
POST /users Body: { "name": "Vivek" }
-
Purpose: Update or replace an existing resource completely.
-
Characteristics:
- Idempotent (same result on multiple calls).
- Includes a request body.
-
Example:
PUT /users/123 Body: { "name": "Vivek Panchal" }
-
Purpose: Partially update an existing resource.
-
Characteristics:
- Only modifies provided fields.
- Not necessarily idempotent.
-
Example:
PATCH /users/123 Body: { "email": "vivek@example.com" }
-
Purpose: Remove a resource from the server.
-
Characteristics:
- Idempotent.
- Usually has no body.
-
Example:
DELETE /users/123
- Purpose: Retrieve headers only (no body) for a resource.
- Use Case: Check if a resource exists or get metadata.
- Purpose: Describe supported HTTP methods for a resource.
- Use Case: Used in CORS preflight requests.
Summary Table
| Method | Action | Idempotent | Request Body | Typical Use |
|---|---|---|---|---|
| GET | Read | ✅ | ❌ | Fetch data |
| POST | Create | ❌ | ✅ | Create new data |
| PUT | Replace | ✅ | ✅ | Full update |
| PATCH | Modify | ❌ | ✅ | Partial update |
| DELETE | Remove | ✅ | ❌ | Delete resource |
| HEAD | Headers only | ✅ | ❌ | Check existence |
| OPTIONS | Capabilities | ✅ | ❌ | CORS, introspection |
HTTP status codes are 3-digit numbers sent by the server to indicate the result of a client’s request. They are grouped into five main categories based on their first digit.
Indicate that the request is received and being processed.
- 100 Continue: Request headers are accepted; client can send body.
- 101 Switching Protocols: Server is switching protocols (e.g., HTTP to WebSocket).
The request was successfully received, understood, and processed.
- 200 OK: Request succeeded (common for GET).
- 201 Created: New resource successfully created (for POST).
- 202 Accepted: Request accepted for processing, but not yet completed.
- 204 No Content: Request successful but no data to return (for DELETE).
Client must take further action to complete the request.
- 301 Moved Permanently: Resource has a new permanent URL.
- 302 Found: Temporary redirection.
- 304 Not Modified: Resource not changed; use cached version.
The request contains bad syntax or cannot be fulfilled.
- 400 Bad Request: Malformed request or invalid data.
- 401 Unauthorized: Authentication required or failed.
- 403 Forbidden: Client authenticated but not allowed access.
- 404 Not Found: Requested resource not found.
- 409 Conflict: Request conflicts with current server state.
- 429 Too Many Requests: Rate limit exceeded.
Server failed to fulfill a valid request.
- 500 Internal Server Error: Generic server-side failure.
- 502 Bad Gateway: Invalid response from an upstream server.
- 503 Service Unavailable: Server temporarily overloaded or down.
- 504 Gateway Timeout: Upstream server didn’t respond in time.
In summary: HTTP status codes provide a standardized way for the server to communicate request outcomes — helping clients handle errors, retries, and user feedback effectively.
HTTPS (HyperText Transfer Protocol Secure) is the secure version of HTTP, where all data exchanged between the client and server is encrypted using TLS (Transport Layer Security).
It ensures that communication over the web is private, authenticated, and tamper-proof.
-
TLS Handshake: Before any data exchange, the client and server establish a secure channel by:
- Exchanging cryptographic keys.
- Authenticating the server’s identity using an SSL/TLS certificate.
- Agreeing on encryption algorithms.
-
Encrypted Communication: Once the handshake completes, all HTTP requests and responses are transmitted in encrypted form.
-
Data Integrity: Ensures data isn’t modified or intercepted during transfer (prevents man-in-the-middle attacks).
- Encryption: Protects sensitive data like passwords and API tokens.
- Authentication: Confirms the identity of the server (and optionally, the client).
- Integrity: Prevents data tampering during transmission.
- SEO & Trust: Browsers mark non-HTTPS sites as “Not Secure,” and search engines prefer HTTPS.
- HTTP: Port 80
- HTTPS: Port 443
In system design, HTTPS is a non-negotiable standard for all modern web systems — critical for security, compliance, and user trust.
REST (Representational State Transfer) is an architectural style for designing networked applications, primarily web APIs. It defines a set of constraints that make systems scalable, stateless, and easy to maintain.
In REST, clients interact with resources (like users, products, or posts) through standard HTTP methods such as GET, POST, PUT, PATCH, and DELETE.
Each resource is identified by a unique URI (Uniform Resource Identifier).
Example:
GET /users/123
retrieves information about user 123.
-
Client–Server Separation: Client and server are independent; clients handle UI, servers handle data.
-
Statelessness: Each request contains all necessary information; the server doesn’t store client context.
-
Uniform Interface: Standardized resource access using consistent HTTP methods and URIs.
-
Resource Representation: Resources can be represented in multiple formats (usually JSON or XML).
-
Cacheable: Responses can be cached to improve performance and scalability.
-
Layered System: Intermediaries (like proxies, gateways, CDNs) can exist between client and server without affecting communication.
In essence: REST provides a simple, scalable, and standardized way to design APIs using existing web protocols — making it the foundation of most modern web services.
REST uses standard HTTP methods and status codes, making APIs easy to design, use, and debug. No additional protocol is required — it works seamlessly over the existing web infrastructure.
Because REST is stateless, servers don’t maintain client context. This allows easy horizontal scaling — multiple servers can handle requests independently.
Clients and servers are loosely coupled, so either side can evolve independently. REST APIs can be consumed by browsers, mobile apps, IoT devices, or microservices.
REST supports HTTP caching mechanisms (like ETag, Cache-Control) to reduce redundant requests and improve speed.
Most frameworks, libraries, and platforms have built-in support for REST — making development, monitoring, and integration straightforward.
REST aligns naturally with the HTTP ecosystem — URLs, headers, and responses — making it ideal for building modern distributed systems.
In summary: REST’s simplicity, scalability, and interoperability make it the default choice for web APIs and microservices in modern applications.
REST (Representational State Transfer) isn’t just about using HTTP — it’s defined by a set of architectural constraints. If an API follows these constraints, it is considered RESTful.
Here are the 6 core REST constraints 👇
- The client (frontend) and server (backend) are separate and independent.
- The client is responsible for the user interface and user experience.
- The server handles data storage, business logic, and processing.
- ✅ Benefit: Improves scalability, flexibility, and allows independent evolution of client and server.
- Each HTTP request from the client to the server must contain all information needed to process it.
- The server does not store any session state about the client between requests.
- ✅ Benefit: Simpler design, easier scalability, and fault tolerance.
⚠️ Challenge: Client must manage state (e.g., authentication tokens, shopping carts).
- Responses from the server should explicitly define whether they are cacheable or not (using HTTP headers like
Cache-ControlorETag). - Proper caching can improve performance and reduce server load.
- ✅ Benefit: Faster response times and better efficiency.
This is the core idea that makes REST unique. It defines a standard way for clients and servers to communicate, regardless of implementation.
It includes four key rules:
- Resource Identification – Every resource (user, product, post, etc.) is identified by a URI (e.g.,
/users/101). - Resource Manipulation via Representations – Clients interact with resources through representations (usually JSON or XML).
- Self-descriptive Messages – Each message includes enough information to describe how to process it (headers, content type, etc.).
- HATEOAS (Hypermedia as the Engine of Application State) – Clients should navigate the API dynamically using hyperlinks in responses (not always strictly followed in real-world APIs).
✅ Benefit: Decouples client and server — making APIs predictable and easy to understand.
- REST APIs can have multiple layers (e.g., load balancers, caches, security gateways) between client and server.
- The client doesn’t need to know which server actually handles its request.
- ✅ Benefit: Increases scalability, security, and flexibility.
- Servers can temporarily extend or customize client functionality by transferring executable code (e.g., JavaScript).
- This is optional and rarely used in most REST APIs.
✅ Benefit: Adds flexibility.
| # | Constraint | Description | Benefit |
|---|---|---|---|
| 1 | Client–Server | Separation of UI and data | Scalability, flexibility |
| 2 | Stateless | No session on server | Easy scaling, simpler design |
| 3 | Cacheable | Define cache policies | Faster performance |
| 4 | Uniform Interface | Standardized communication | Predictable, decoupled |
| 5 | Layered System | Multi-tier architecture | Scalability, security |
| 6 | Code on Demand | Send executable code | Optional flexibility |
RESTful APIs follow a set of best practices that make them scalable, reliable, and easy to use. These principles ensure consistency, simplicity, and performance in web service communication.
-
Endpoints should represent resources (nouns), not actions.
-
✅ Example:
GET /users/123 POST /users DELETE /users/123 -
❌ Avoid:
GET /getUser POST /createUser
- GET – Retrieve data
- POST – Create new resource
- PUT/PATCH – Update existing resource
- DELETE – Remove resource
Each method should serve its semantic purpose consistently.
- 200 OK – Request successful
- 201 Created – Resource created
- 400 Bad Request – Invalid request
- 401 Unauthorized / 403 Forbidden – Access denied
- 404 Not Found – Resource doesn’t exist
- 500 Internal Server Error – Server-side issue
-
Keep resource hierarchy logical and shallow. Example:
GET /users/123/orders GET /users/123/orders/456
Avoid deep nesting like:
/users/123/orders/456/items/789/payments/456
-
Maintain backward compatibility and smooth upgrades. Example:
/api/v1/users /api/v2/users
-
Help clients fetch only required data. Example:
GET /users?limit=10&page=2&sort=name&role=admin
- Use lowercase and plural nouns for resources.
Example:
/products,/users,/orders
- Each API call must contain all the data needed to process the request.
- Server does not store session information.
-
JSON is lightweight and widely supported.
-
Response example:
{ "id": 123, "name": "Vivek", "role": "admin" }
-
Include error code, message, and details for debugging.
{ "error": "InvalidRequest", "message": "Email field is required" }
- Use HTTPS for encryption.
- Implement authentication (JWT, OAuth2).
- Validate all inputs.
-
Include links to related actions or resources within responses.
{ "userId": 123, "name": "Vivek", "links": [ { "rel": "orders", "href": "/users/123/orders" } ] }
Real-time communication (RTC) refers to the instant exchange of data between systems with minimal latency, enabling users or services to interact live without noticeable delay.
It allows continuous, immediate data flow between clients and servers — unlike traditional request-response models where communication happens only when initiated by the client.
In modern applications, speed and interactivity are crucial. Real-time systems make apps more engaging, responsive, and user-friendly by delivering updates instantly as they happen.
- Chat and messaging apps (WhatsApp, Slack)
- Live streaming and gaming
- Collaborative tools (Google Docs, Figma)
- Stock market dashboards and trading systems
- IoT and sensor data monitoring
- Low latency — data transfer happens within milliseconds.
- Continuous connection — unlike HTTP’s request-response model.
- Event-driven architecture — actions trigger immediate updates.
- Scalability and reliability — essential for handling large concurrent connections.
In essence, real-time communication bridges the gap between user action and system response, enabling seamless, live digital experiences.
Real-time communication protocols enable instant data exchange between clients and servers — essential for chat apps, gaming, live dashboards, collaborative tools, and financial systems.
-
A full-duplex, persistent connection between client and server over a single TCP connection.
-
After the initial HTTP handshake, communication happens in both directions simultaneously.
-
Ideal for: Chat apps, live updates, multiplayer games, and trading platforms.
-
Example Flow:
Client → Server: WebSocket handshake (via HTTP) Server → Client: Connection upgrade (HTTP 101) Then → Real-time bidirectional data exchange
-
A unidirectional channel where the server pushes data to the client over HTTP.
-
Lightweight and efficient for continuous updates like stock prices or notifications.
-
Unlike WebSockets, the client cannot send data back on the same connection.
-
Example:
GET /events Content-Type: text/event-stream
- A fallback mechanism where the client repeatedly sends requests to the server, keeping the connection open until new data arrives.
- Simulates real-time updates when WebSockets are unavailable.
- Common in legacy systems or where infrastructure doesn’t support persistent connections.
- A lightweight publish-subscribe protocol designed for low-bandwidth and high-latency networks.
- Commonly used in IoT systems for sensor communication.
- Works on TCP and supports Quality of Service (QoS) levels for reliable delivery.
- A peer-to-peer communication protocol for audio, video, and data sharing directly between browsers.
- Eliminates need for a central relay server (though signaling servers are used to establish connections).
- Used in: Zoom, Google Meet, and multiplayer browser-based apps.
| Protocol | Type | Direction | Ideal Use Case |
|---|---|---|---|
| WebSocket | Full-duplex | Client ↔ Server | Real-time chat, live data |
| SSE | One-way | Server → Client | Notifications, live feeds |
| Long Polling | Simulated real-time | Client → Server | Legacy support |
| MQTT | Pub/Sub | Client ↔ Broker | IoT communication |
| WebRTC | Peer-to-peer | Client ↔ Client | Audio/video streaming |
As systems grow in complexity, REST alone may not efficiently handle high-performance, data-heavy, or microservice-based architectures. This led to the rise of modern API protocols like gRPC and GraphQL, which address REST’s limitations in flexibility, speed, and efficiency.
Definition: gRPC is a high-performance, open-source RPC framework developed by Google. It uses HTTP/2 for transport and Protocol Buffers (Protobuf) for data serialization.
Key Features:
- Binary data format (Protobuf) → smaller payloads, faster transmission.
- HTTP/2 multiplexing → multiple requests on one connection.
- Strongly typed contracts defined in
.protofiles. - Bidirectional streaming supported (client ↔ server).
- Ideal for microservices and internal service-to-service communication.
Example Use Case:
- Communication between backend services in distributed systems (e.g., authentication service ↔ payment service).
Definition: GraphQL is a query language for APIs developed by Facebook. It allows clients to request exactly the data they need, reducing over-fetching and under-fetching common in REST.
Key Features:
- Single endpoint for all queries and mutations.
- Client-controlled data fetching — specify fields in query.
- Strongly typed schema for predictable responses.
- Supports real-time updates via subscriptions.
Example Use Case:
- Mobile and web apps needing optimized, flexible data fetching (e.g., social media feeds, dashboards).
| Feature | REST | gRPC | GraphQL |
|---|---|---|---|
| Transport Protocol | HTTP/1.1 | HTTP/2 | HTTP/1.1 or HTTP/2 |
| Data Format | JSON | Protobuf (binary) | JSON |
| Communication | Request–response | Unary, Streaming | Query–based |
| Performance | Moderate | Very high | High for selective data |
| Use Case | Public APIs | Microservices | Flexible client APIs |
In summary:
- Use gRPC when you need speed and type safety between services.
- Use GraphQL when clients need flexible, optimized data fetching.
- Both coexist with REST — chosen based on system goals and data needs.
While REST has been the backbone of web APIs for years, modern applications have evolved — demanding faster, more flexible, and efficient communication patterns. REST’s simplicity becomes a limitation at scale or in complex data scenarios.
- REST endpoints return fixed data structures.
- Clients often receive more data than needed (over-fetching) or less data (under-fetching), requiring multiple calls.
- Example: A mobile app may only need a user’s name but receives their entire profile.
➡️ GraphQL solves this by letting clients query exactly what they need.
- Modern architectures use dozens of services communicating internally.
- REST’s text-based JSON and multiple HTTP connections can create performance bottlenecks.
➡️ gRPC provides binary serialization (Protobuf) and HTTP/2 multiplexing, making it faster and more efficient for service-to-service calls.
- REST is request-response based and stateless.
- It doesn’t support real-time communication like streaming or live updates.
➡️ gRPC streaming and GraphQL subscriptions handle real-time data seamlessly.
- REST APIs don’t enforce strict type safety or schemas, leading to potential integration issues.
- Difficult to evolve without breaking clients.
➡️ gRPC and GraphQL enforce typed schemas, improving reliability and backward compatibility.
- Modern systems serve web, mobile, IoT, and edge clients — each with different data needs.
- A single REST response may not fit all.
➡️ GraphQL gives clients control over data shape, improving flexibility and efficiency.
In summary: We need gRPC and GraphQL because REST — though simple and reliable — struggles with performance, flexibility, and real-time communication in modern distributed systems.
gRPC (Google Remote Procedure Call) enables communication between services as if they were calling local functions, even though they run on different machines. It relies on Protocol Buffers (Protobuf) for message serialization and HTTP/2 for transport.
-
Define the Service (.proto file)
- Developers define the service methods and message types in a
.protofile. Example:
syntax = "proto3"; service UserService { rpc GetUser (UserRequest) returns (UserResponse); } message UserRequest { string user_id = 1; } message UserResponse { string name = 1; int32 age = 2; }
- Developers define the service methods and message types in a
-
Generate Code
- The
.protofile is compiled using the gRPC compiler (protoc). - It generates client and server stubs in multiple languages (Java, Go, Python, etc.).
- These stubs handle network communication automatically.
- The
-
Server Implementation
-
The server implements the defined methods.
-
Example:
class UserService(UserServiceServicer): def GetUser(self, request, context): return UserResponse(name="Vivek", age=22)
-
-
Client Calls
-
The client uses the generated stub to call the remote method as if it were local.
-
Example:
response = stub.GetUser(UserRequest(user_id="123")) print(response.name)
-
-
Data Transmission
- gRPC serializes the request using Protocol Buffers → sends over HTTP/2.
- The server deserializes it, executes logic, and returns a serialized response.
-
HTTP/2: Enables multiplexing, header compression, and streaming.
-
Protobuf: Binary, compact, and faster than JSON.
-
Streaming Support:
- Unary RPC → Single request–response
- Server streaming → One request, multiple responses
- Client streaming → Multiple requests, one response
- Bidirectional streaming → Continuous data flow both ways
In short: gRPC turns network calls into lightweight, fast, type-safe function calls — perfect for microservices and real-time systems.
- gRPC is ideal for internal service-to-service communication in distributed systems.
- It provides low latency, type safety, and efficient serialization, making it faster than REST for backend communication.
- Example: Authentication Service ↔ Payment Service ↔ Order Service.
- gRPC supports bidirectional streaming, allowing continuous data flow.
- Useful for live chat, real-time analytics, IoT telemetry, or stock price updates.
- gRPC supports multiple programming languages (Go, Java, Python, C++, etc.).
- Perfect for teams building systems with mixed tech stacks that need a unified communication protocol.
- Uses binary Protocol Buffers, which are smaller and faster than JSON.
- Ideal for low-bandwidth networks, IoT, and high-performance computing systems.
- gRPC is not meant for browser-based clients (since browsers lack HTTP/2 full support for gRPC).
- Best suited for internal or backend APIs, not public-facing ones.
✅ When you need high performance and low latency. ✅ When building microservices with frequent internal communication. ✅ When your system requires streaming or real-time updates. ✅ When you need strong typing and contract-based APIs. ✅ When bandwidth efficiency is critical (IoT, mobile backends).
🚫 For public web APIs (REST/GraphQL are easier for browsers). 🚫 When debugging simplicity and human-readable payloads are priorities. 🚫 If your clients don’t support HTTP/2 or Protobuf.
In summary: Use gRPC for high-speed, internal, strongly typed, and streaming-based service communication — it excels where REST struggles in performance and efficiency.
GraphQL is a query language and runtime for APIs that lets clients request exactly the data they need — no more, no less. It replaces REST’s multiple endpoints with a single flexible endpoint, improving efficiency and client control.
-
The server defines a GraphQL schema describing data types and operations (queries, mutations, subscriptions).
-
Example:
type User { id: ID! name: String! age: Int } type Query { getUser(id: ID!): User }
-
Unlike REST (multiple endpoints like
/users,/posts), GraphQL exposes one endpoint, e.g.:POST /graphql
-
The client specifies exactly what fields it wants.
-
Example:
{ getUser(id: "123") { name age } }
-
Each field in the schema has a resolver function that fetches data from a database, another API, or microservice.
-
Example:
const resolvers = { Query: { getUser: (_, { id }) => db.users.findById(id), }, };
-
The server responds only with the requested fields, reducing over-fetching.
{ "data": { "getUser": { "name": "Vivek", "age": 22 } } }
-
Using GraphQL Subscriptions, clients can get live updates over WebSockets.
-
Example:
subscription { onUserUpdate { id name } }
- Single endpoint simplifies API management.
- Client-driven queries eliminate over-fetching.
- Strongly typed schema improves consistency.
- Supports real-time updates via subscriptions.
In short: GraphQL works by letting the client define the data structure, while the server resolves only what’s requested — leading to more efficient, flexible, and maintainable APIs.
- Ideal for modern web and mobile apps where different screens or devices need different data structures.
- Example: A mobile app might need fewer fields than a desktop web app.
- GraphQL lets each client query only the fields it needs from the same endpoint.
- GraphQL can combine data from multiple APIs or databases into a single unified schema.
- Example: A dashboard app pulling data from users, orders, and payments services — all resolved in one query instead of multiple REST calls.
- REST APIs often return too much or too little data.
- GraphQL gives precise control over data shape, improving network efficiency and performance.
- When frontend teams frequently change UI and data needs, GraphQL allows them to adjust queries without backend changes — speeding up development.
- Using GraphQL Subscriptions, clients can receive real-time updates (e.g., chat apps, notifications, live feeds) without constant polling.
✅ When clients have diverse data needs (web, mobile, IoT). ✅ When aggregating data from multiple sources or microservices. ✅ When optimizing network performance by fetching only necessary data. ✅ When rapid frontend iteration is required. ✅ When supporting real-time updates with subscriptions.
🚫 When API traffic is simple and predictable (REST may be simpler). 🚫 When binary or large file uploads are frequent (REST/gRPC perform better). 🚫 When you lack tooling for caching and rate-limiting (harder in GraphQL). 🚫 When teams are small — REST is easier to set up and maintain.
In summary: Use GraphQL when you need flexibility, optimized data fetching, and real-time capabilities across multiple clients — especially in data-rich or fast-changing applications.
Software architecture defines the high-level structure of a system — how components are organized, interact, and communicate with each other. It provides a blueprint for building scalable, maintainable, and reliable software systems.
-
Components: Independent modules or services that perform specific functions.
-
Connectors: Define how components communicate — e.g., APIs, message queues, or function calls.
-
Data Flow: How data moves between different layers or modules.
-
Design Principles: Includes separation of concerns, modularity, and loose coupling.
- Ensures scalability as systems grow.
- Improves maintainability and ease of development.
- Helps manage complexity in large applications.
- Supports better fault isolation and independent deployments.
In short, software architecture is the foundation of system design, guiding how different parts of a system fit and work together to meet business and technical goals.
Monolithic architecture is a traditional software design pattern where the entire application is built as a single, unified unit. All modules — such as authentication, database access, business logic, and UI — are tightly coupled and run within the same process.
- Single codebase and deployment unit
- Shared memory and resources
- Tightly coupled components
- Centralized data management
A web application where login, product catalog, and order management all reside in one project and are deployed together as a single .jar, .war, or .exe file.
- Simple to develop and deploy — one build, one deployment pipeline.
- Easy to test — everything runs in a single environment.
- Good performance — internal calls are fast since they happen in-process.
- Easier debugging — logs and errors are centralized.
- Scalability limitations — cannot scale individual components.
- Hard to maintain — changes in one area can affect others.
- Slower development — larger codebase increases complexity.
- Deployment risk — a small change requires redeploying the entire app.
- Technology lock-in — hard to adopt new languages or frameworks.
✅ Small to medium-sized applications. ✅ When the team is small and deployment simplicity is important. ✅ When scalability and modularity are not immediate concerns.
In summary: Monolithic architecture is simple and fast to start, but less flexible and harder to scale as systems grow.
Layered architecture — also known as N-tier architecture — is one of the most common software design patterns where an application is divided into logical layers, each with a specific responsibility. Each layer communicates only with the layer directly below or above it, promoting separation of concerns.
-
Presentation Layer (UI Layer)
- Handles the user interface and user interaction.
- Sends user input to the business layer and displays output.
- Example: HTML/CSS, React.js, Angular, Flutter.
-
Business Logic Layer (Service Layer)
- Contains core logic and business rules.
- Processes data, performs calculations, and coordinates between UI and Data layers.
- Example: Java, Node.js, Spring Boot, Express.js.
-
Data Access Layer (Persistence Layer)
- Manages data storage and retrieval.
- Handles database connections, queries, and ORM (Object Relational Mapping).
- Example: MySQL, MongoDB, PostgreSQL via Sequelize, Hibernate, etc.
- Integration Layer → For external API or third-party service communication.
- Cache Layer → For improving performance with Redis or Memcached.
- Security Layer → For authentication, authorization, and data protection.
+----------------------+
| Presentation Layer |
| (UI / API Gateway) |
+----------▲-----------+
|
+----------▼-----------+
| Business Logic Layer |
| (Services / Rules) |
+----------▲-----------+
|
+----------▼-----------+
| Data Access Layer |
| (Database / ORM) |
+----------------------+
✅ Separation of concerns — each layer has a clear responsibility. ✅ Easier to maintain — changes in one layer rarely affect others. ✅ Reusability — layers can be reused across multiple projects. ✅ Testability — each layer can be tested independently. ✅ Scalability — can scale specific layers as needed.
❌ Performance overhead — multiple layers add latency. ❌ Rigid dependencies — changes in one layer may require updates to interfaces above it. ❌ Complex deployment — in multi-tier systems, deployment and versioning can be tricky.
- When you need clear separation between UI, business logic, and data.
- When building enterprise applications or web services.
- When maintainability, testability, and scalability are important.
In summary: Layered architecture organizes code into structured tiers for maintainability and scalability, making it ideal for medium to large enterprise systems.
Microservices architecture is a modern architectural style where an application is broken down into a collection of small, independent services, each responsible for a specific business function and communicating through lightweight APIs (usually HTTP or message queues).
Microservices architecture structures an application as a set of loosely coupled, independently deployable services that work together to deliver business value.
Each service:
- Has its own codebase
- Can be developed, deployed, and scaled independently
- Often owns its own database (decentralized data management)
- Independence – Each service runs in its own process.
- Decentralization – No single shared database; each service manages its own data.
- Lightweight Communication – Services interact via HTTP (REST/gRPC) or message brokers (Kafka, RabbitMQ).
- Autonomous Teams – Different teams manage different services.
- Polyglot Freedom – Each service can use different languages, frameworks, or databases.
- Resilience – Failure in one service does not crash the entire system.
+-------------------+
| API Gateway |
+---------+----------+
|
------------------------------------------
| | | |
+----------+ +----------+ +----------+ +-----------+
| Auth | | Orders | | Users | | Payments |
| Service | | Service | | Service | | Service |
+----------+ +----------+ +----------+ +-----------+
| | | |
+------+ +------+ +------+ +------+
| DB A | | DB B | | DB C | | DB D |
+------+ +------+ +------+ +------+
✅ Independent Deployment – Each service can be deployed without affecting others. ✅ Scalability – Scale only the services that need it. ✅ Fault Isolation – One service failure won’t take down the entire system. ✅ Technology Flexibility – Use different stacks per service. ✅ Faster Development – Parallel development by multiple teams.
❌ Complexity – More services = more network calls, configs, monitoring, and debugging. ❌ Data Consistency – Managing distributed transactions is harder. ❌ Deployment & Ops Overhead – Requires DevOps maturity (Docker, Kubernetes, CI/CD). ❌ Latency – Network calls between services add overhead.
- Large applications needing high scalability and continuous delivery.
- Systems managed by multiple teams working on separate domains.
- Applications requiring rapid deployment cycles.
- Projects adopting cloud-native or Kubernetes infrastructure.
- Netflix (streaming & recommendations as separate services)
- Amazon (cart, payment, search — all independent)
- Uber (trip, driver, location services, etc.)
In summary: Microservices architecture enables flexibility, scalability, and independent deployment, making it ideal for large-scale, cloud-based, and fast-moving organizations — but it requires strong DevOps, monitoring, and orchestration practices.
Event-Driven Architecture (EDA) is a software design pattern where system components communicate through events — notifications that something has happened in the system. Instead of direct service-to-service calls, components emit, consume, and react to events asynchronously, enabling loose coupling, scalability, and real-time responsiveness.
Event-Driven Architecture is a pattern where components produce and consume events via a message broker or event bus, allowing asynchronous communication between independent services.
- Event: A record of something that has occurred (e.g., “OrderPlaced”, “UserSignedUp”).
- Producer: Component that emits events.
- Consumer: Component that listens for and reacts to events.
- Event Broker: Middleware (like Kafka, RabbitMQ) that routes events between producers and consumers.
+-------------+ +-------------------+
| Order | emits | Event Broker |
| Service +---------->+ (Kafka / Rabbit) |
+-------------+ +---------+---------+
|
+-----------------+-------------------+
| |
+-------------------+ +-------------------+
| Inventory | | Notification |
| Service | | Service |
+-------------------+ +-------------------+
consumes "OrderPlaced" consumes "OrderPlaced"
- Asynchronous communication – No waiting for direct responses.
- Loose coupling – Services don’t need to know about each other.
- Scalable & resilient – Event brokers handle spikes in workload.
- Reactive – System reacts instantly to changes or actions.
- Event persistence – Events can be stored for replay or audit.
✅ High decoupling – Each component evolves independently. ✅ Real-time responsiveness – Great for notification or streaming systems. ✅ Scalable – Events can be processed in parallel. ✅ Resilient – Failure in one consumer doesn’t affect others. ✅ Auditability – Events provide a clear log of system activity.
❌ Complex debugging – Tracing event flow across systems is hard. ❌ Event ordering issues – Ensuring correct processing order can be tricky. ❌ Duplicate handling – Consumers may receive the same event multiple times. ❌ Event schema evolution – Managing backward compatibility is challenging.
- Real-time systems (e.g., financial transactions, IoT, stock trading).
- Microservices communication where loose coupling is needed.
- Notification & streaming platforms (e.g., email alerts, analytics).
- Complex workflows triggered by multiple independent events.
- Event Brokers: Apache Kafka, RabbitMQ, Amazon SNS/SQS, Google Pub/Sub, Redis Streams.
- Event Storage: Kafka topics, event stores, DynamoDB Streams.
- Frameworks: Spring Cloud Stream, AWS Lambda (event triggers).
-
E-commerce app:
- Order Service emits “OrderPlaced”
- Inventory Service consumes and updates stock
- Notification Service sends confirmation email
- Analytics Service tracks order metrics
In summary: Event-Driven Architecture enables asynchronous, real-time communication between independent services. It improves scalability and responsiveness but requires careful event management, monitoring, and consistency handling.
Choosing the right system architecture depends on business needs, team size, scalability goals, and system complexity. Below are key factors that influence your decision and guidance on when to use Monolithic, Layered, Microservices, or Event-Driven architectures.
- Small/simple applications: → Use Monolithic — easy to develop, deploy, and manage.
- Medium complexity: → Use Layered (N-Tier) — provides structure while keeping simplicity.
- Large-scale/distributed systems: → Use Microservices or Event-Driven — better for scalability and flexibility.
- Small teams: → Monolithic or Layered — less coordination needed, faster delivery.
- Large or specialized teams: → Microservices — teams can own individual services independently. → Event-Driven — for teams experienced in async and distributed systems.
- Single deployment preferred: → Monolithic or Layered architectures.
- Frequent independent deployments needed: → Microservices — allows independent service updates. → Event-Driven — decouples release cycles.
- Low to moderate scalability: → Layered — vertical scaling works fine.
- High scalability: → Microservices — horizontal scaling at service level. → Event-Driven — asynchronous load handling and auto-scaling.
- Simple recovery sufficient: → Monolithic or Layered.
- Need for high availability and isolation: → Microservices or Event-Driven — service failures are isolated.
- Low latency & real-time systems: → Event-Driven — supports async, near real-time communication.
- Low overhead & simplicity: → Monolithic — single process with minimal network hops.
- Simple maintenance goal: → Layered Architecture.
- Evolving product with frequent changes: → Microservices — easier to update, replace, or scale parts independently.
- Startups / MVPs: → Monolithic — fast to build and iterate.
- Growing businesses: → Layered → Microservices (gradual evolution).
- Large enterprises or data-intensive apps: → Event-Driven — supports scalability and async workflows.
| Use Case | Recommended Architecture | Reason |
|---|---|---|
| Small web app or MVP | Monolithic | Simple, fast to deploy |
| Traditional enterprise app | Layered | Structured and maintainable |
| Large distributed web app | Microservices | Independent scaling and deployments |
| Real-time streaming/IoT system | Event-Driven | Asynchronous, high throughput |
| E-commerce or fintech | Microservices + Event-Driven | Scalable, fault-tolerant, reactive |
- Monolithic: Best for small teams & simple apps.
- Layered: Structured, ideal for medium-sized applications.
- Microservices: Highly scalable and flexible, for complex systems.
- Event-Driven: Best for asynchronous, real-time, and reactive systems.
In short:
Choose architecture based on system scale, team maturity, performance goals, and scalability needs — not just technology trends.
Understanding web concepts is essential in system design because most modern applications — from social networks to fintech platforms — are built over the web infrastructure. A solid grasp of how the web works helps engineers design scalable, reliable, and high-performance systems.
-
Foundation of Internet-based Systems
- Most distributed systems communicate over HTTP/HTTPS.
- Knowing how requests, responses, and protocols work enables better architectural decisions.
-
Performance Optimization
- Understanding caching, CDNs, and compression helps reduce latency and bandwidth usage.
- Helps design systems that deliver content faster across the globe.
-
Security Awareness
- Knowledge of SSL/TLS, authentication, and encryption helps build secure APIs and services.
-
Scalability and Load Management
- Concepts like DNS, proxies, and load balancers are essential for handling millions of users efficiently.
-
Better API and Service Design
- Understanding REST, GraphQL, and gRPC leads to cleaner, more reliable communication between services.
-
Effective Debugging and Monitoring
- Knowing the full web request flow — from client to server — helps identify bottlenecks or failures quickly.
-
Interview Relevance
- Web concepts are fundamental in system design interviews, forming the base for topics like caching, scaling, and data flow.
In summary: Learning web concepts builds the foundation for designing systems that are fast, scalable, secure, and maintainable — all critical for high-performance backend engineering.
The web is stateless by default, meaning each HTTP request is independent and doesn’t retain user information between interactions. However, real-world applications often need to remember user data (like login status, cart items, or preferences). This is where web sessions come in — they enable the server to maintain state across multiple client requests.
A session is a temporary interaction period between a client and a server. It stores user-specific data on the server (or a shared store) and identifies the user using a unique session ID.
- User logs in or performs an action → server creates a session.
- The session ID is stored in a cookie and sent to the client.
- For each subsequent request, the client sends back the session ID.
- The server retrieves user data using that session ID.
Client ---> Login ---> Server creates session ---> Sends Session ID
Client <--- Receives cookie (session ID)
Client ---> Makes request with cookie ---> Server identifies session
-
In-memory (e.g., in process memory) – simple but not scalable.
-
External session store – scalable and distributed options like:
- Redis
- Memcached
- Database (less common for high performance systems)
- Session ID: Unique token used to identify the session.
- Cookies: Store and send the session ID with each request.
- Session Expiry: Defines how long a session stays valid.
- Secure Sessions: Session data must be encrypted and transmitted over HTTPS.
- Scalability: In-memory sessions don’t work well with multiple servers (load-balanced systems).
- Security: Session hijacking or fixation attacks.
- Persistence: Sessions should expire or invalidate properly.
- Use sticky sessions (bind user to one server) — simple but less scalable.
- Use centralized session stores like Redis — scalable and distributed.
- Implement token-based authentication (JWT) — stateless alternative.
Web sessions enable state management in otherwise stateless HTTP communication, allowing personalized and secure user experiences. Modern scalable systems often use distributed session stores or stateless tokens (JWT) depending on system requirements.
- HTTP is stateless, meaning each request is independent and doesn’t remember previous interactions.
- Sessions bridge this gap by preserving user context between multiple requests (e.g., keeping a user logged in).
- Sessions store user identity and preferences after login.
- They enable personalized experiences — such as showing a user’s profile, cart, or dashboard data without repeated authentication.
- Sessions help implement secure access control, ensuring that only authenticated users access certain pages or APIs.
- They allow features like session expiration and logout handling to enhance security.
- Instead of revalidating user credentials or fetching data repeatedly, sessions store necessary info temporarily — reducing redundant computation and database queries.
-
Sessions maintain stateful continuity, such as:
- Shopping carts in e-commerce sites
- Multi-step forms
- Preferences across navigation
- Even in distributed systems, sessions (or token-based equivalents like JWTs) ensure consistent identity tracking across multiple backend servers.
In summary: Web sessions are vital because they enable stateful, secure, and personalized interactions in an inherently stateless web environment, improving both user experience and system efficiency.
HTTP is a stateless protocol, meaning each request from a client to a server is independent — the server does not remember any previous requests from that client.
In simple terms, every HTTP request is treated as a new interaction, even if it comes from the same user.
When you log in to a website:
- Without state management — the server forgets you’re logged in after each request.
- With state management — the server “remembers” your session, so you stay logged in.
- It was designed for simplicity and scalability.
- Statelessness makes servers lighter, faster, and easier to scale horizontally because they don’t store user session data by default.
Since web apps need to remember user data (like login status, cart items, or preferences), developers use several techniques to simulate state in a stateless environment.
- Small pieces of data stored in the client’s browser.
- Sent automatically with each request to the same domain.
- Commonly used to store session IDs, preferences, or auth tokens.
🟢 Example:
Set-Cookie: session_id=abc123; HttpOnly; Secure
- Server maintains user-specific data mapped to a session ID.
- The session ID is stored in a cookie on the client.
- Data can be stored in memory, Redis, or a database.
🟢 Best for: Secure or sensitive applications (e.g., banking, admin dashboards).
- Stateless method: all session info is encoded in the token itself.
- Server only needs to verify the token, not store session data.
- Common in modern microservice or API-based architectures.
🟢 Example:
Authorization: Bearer <jwt_token>
4. Hidden Form Fields / URL Parameters
- Passes state data (like user ID or cart info) through forms or URLs.
- Simple but not secure — rarely used in modern systems.
- Uses LocalStorage or SessionStorage in browsers to persist small amounts of data.
- Good for caching UI preferences or JWTs.
🟢 Example:
localStorage.setItem("auth_token", token)
| Technique | State Stored | Scalability | Security | Common Use |
|---|---|---|---|---|
| Cookies | Client | High | Medium | Preferences, session ID |
| Server-Side Session | Server | Medium | High | Authenticated sessions |
| JWT Token | Client | Very High | High | APIs, mobile apps |
| LocalStorage | Client | High | Low | UI settings, tokens |
| Hidden Fields / URL | Client | High | Low | Temporary data passing |
In summary: HTTP is stateless by design, but web apps use cookies, sessions, or tokens to maintain state across requests — enabling secure, consistent, and personalized user experiences.
Serialization is the process of converting an in-memory object (like a Python, Java, or Go object) into a byte stream or text format that can be:
- Stored (in a file, cache, or database)
- Transmitted (over a network between client and server)
The reverse process is called Deserialization.
Serialization is critical in distributed systems and web applications because it enables:
- 🔄 Data exchange between different services (e.g., client ↔ server)
- 🗃️ Persistent storage (saving structured data)
- 🌍 Cross-language communication (e.g., Java backend ↔ JS frontend)
- ⚙️ API performance optimization — choosing the right format can reduce latency and bandwidth usage.
| Format | Type | Human-readable | Typical Use |
|---|---|---|---|
| JSON | Text | ✅ Yes | Web APIs, config files |
| XML | Text | ✅ Yes (but verbose) | Legacy systems, SOAP APIs |
| Protobuf (Protocol Buffers) | Binary | ❌ No | High-performance microservices, gRPC |
| Aspect | JSON | XML | Protobuf |
|---|---|---|---|
| Readability | ✅ Easy to read | ❌ Not human-readable | |
| Efficiency (Size & Speed) | ❌ Slow & large | ✅ Fast & compact | |
| Compatibility (Cross-language) | ✅ Widely supported | ✅ Widely supported | ✅ Supported in many languages, but needs schema |
| Schema Requirement | ❌ None | ✅ Requires .proto file |
|
| Best Use Case | REST APIs, config, web apps | Legacy, enterprise apps | gRPC, microservices, low-latency systems |
Example:
Let’s say we have a simple User object:
{
"id": 101,
"name": "Vivek",
"email": "vivek@xyz.com"
}- JSON → Used by frontend & REST APIs (easy to read and debug)
- XML → Used in older enterprise APIs (
<user><id>101</id>...</user>) - Protobuf → Compact binary form transmitted in gRPC call
| Metric | JSON | XML | Protobuf |
|---|---|---|---|
| Encoding/Decoding Speed | ⚙️ Medium | 🐢 Slow | ⚡ Fast |
| Payload Size | 📦 Medium | 📦📦 Large | 📦 Small |
| CPU Usage | Moderate | High | Low |
| Network Efficiency | Medium | Poor | Excellent |
| Scenario | Recommended Format |
|---|---|
| Web or mobile APIs | JSON |
| Enterprise or legacy integration | XML |
| High-performance internal microservices | Protobuf |
Perfect — here’s a complete structured explanation of CORS (Cross-Origin Resource Sharing) covering every key point you mentioned, simplified yet detailed enough for system design interviews or notes 👇
Web browsers enforce the Same-Origin Policy for security reasons.
It prevents JavaScript on one domain (e.g., https://a.com) from making requests to another domain (e.g., https://b.com) unless explicitly allowed.
✅ Goal: Prevent malicious websites from reading sensitive data from other origins (like cookies, tokens, etc.).
- Your frontend app:
https://frontend.com - Your backend API:
https://api.server.com
Without CORS, the browser blocks this cross-origin request.
CORS allows the server to specify which origins are allowed to access its resources using specific HTTP headers.
So, if https://api.server.com includes:
Access-Control-Allow-Origin: https://frontend.com
the browser will permit that request.
CORS is browser-enforced and works through HTTP headers during requests & responses.
-
Browser sends request to a different origin (cross-domain).
-
Browser automatically adds the Origin header in request:
Origin: https://frontend.com -
Server responds with CORS headers to allow/deny:
Access-Control-Allow-Origin: https://frontend.com -
Browser checks if origin is allowed → grants or blocks access.
When the request is non-simple (like using PUT, DELETE, custom headers, or JSON body),
the browser first sends a preflight request — an OPTIONS request — to check if the actual request is allowed.
Preflight (OPTIONS):
OPTIONS /api/data HTTP/1.1
Origin: https://frontend.com
Access-Control-Request-Method: POST
Access-Control-Request-Headers: Content-Type
Server Response:
Access-Control-Allow-Origin: https://frontend.com
Access-Control-Allow-Methods: GET, POST, OPTIONS
Access-Control-Allow-Headers: Content-Type, Authorization
Access-Control-Max-Age: 86400
If the response is valid → browser proceeds with actual request.
| Header | Description |
|---|---|
| Access-Control-Allow-Origin | Specifies which origins can access (e.g., * or a specific domain). |
| Access-Control-Allow-Methods | Lists allowed HTTP methods (e.g., GET, POST, PUT). |
| Access-Control-Allow-Headers | Lists allowed custom headers (e.g., Content-Type, Authorization). |
| Access-Control-Allow-Credentials | Allows cookies or authorization headers if true. |
| Access-Control-Max-Age | Time (in seconds) that preflight can be cached. |
| Misconfiguration | Risk |
|---|---|
Access-Control-Allow-Origin: * with credentials |
❌ Major risk — exposes sensitive cookies/tokens. |
| Allowing too many origins | 🔓 Broad access — makes API vulnerable. |
| Forgetting preflight headers | ❌ Causes browser to block requests silently. |
| Reflecting Origin header blindly |
✅ Best Practices:
- Always whitelist specific origins.
- Never combine
*withAccess-Control-Allow-Credentials: true. - Validate the
Originheader server-side before responding.
-
Configure on backend via framework (e.g., Express.js, Spring Boot, Django).
-
Example (Node.js + Express):
app.use(cors({ origin: "https://frontend.com", methods: ["GET", "POST"], credentials: true }));
-
Same CORS rules apply.
-
GraphQL endpoints are usually single
/graphqlroutes — just ensure preflight and headers are properly handled. -
Example:
app.use('/graphql', cors({ origin: 'https://frontend.com' }), graphqlHTTP({ schema, rootValue }));
Instead of exposing APIs directly from another domain, use a gateway or proxy to make it appear as same-origin.
🧩 Example:
Frontend → /api → Gateway → https://backend.com/api
- Browser thinks request is to same origin → no CORS needed.
- Common in NGINX, AWS API Gateway, Kong, etc.
Move the request from frontend → backend (server-to-server), then forward results. Since servers don’t enforce CORS, this avoids browser restrictions.
Used before CORS — only for GET requests, now obsolete.
| Concept | Description |
|---|---|
| Problem | Same-Origin Policy blocks cross-domain requests. |
| Solution | CORS lets servers declare trusted origins. |
| Preflight | Browser checks permissions via OPTIONS before sending real request. |
| Risk | Misconfigurations can expose sensitive data. |
| Best Practice | Allow specific origins, use gateway for multi-origin access. |
| Alternative | Reverse proxy / API Gateway to unify origins. |
In system design, scalability refers to a system’s ability to handle increased load or demand gracefully by adding more resources — without compromising performance, availability, or reliability.
It ensures that as users, data, or requests grow, the system continues to work efficiently.
Scalability is the capacity of a system to grow in size, workload, or traffic while maintaining performance levels.
A scalable system can:
- Handle more requests per second
- Process larger datasets
- Serve more users simultaneously
✅ Example: An e-commerce website that performs equally well on Black Friday (high load) and normal days is a scalable system.
- User Growth: As the user base expands, more requests hit servers.
- Data Growth: More users generate more data — storage and processing must scale.
- Performance: Maintain fast response times even under heavy load.
- Business Continuity: Prevent downtime during peak usage.
- Cost Efficiency: Scale resources up or down based on demand.
📈 Scaling ensures your system remains reliable, efficient, and user-friendly as it grows.
- Add more power (CPU, RAM, storage) to an existing server.
- Simple to implement but limited by hardware capacity.
Example: Upgrading a server from 8GB RAM to 32GB.
✅ Easy to manage ❌ Has a physical/hardware limit
- Add more machines/servers to distribute the load.
- Requires load balancers and distributed systems.
Example: Adding more application servers behind a load balancer.
✅ Highly scalable ❌ More complex to manage
- Combine vertical + horizontal scaling — start with scaling up, then scale out as needed.
- Common in modern cloud systems.
| Challenge | Description |
|---|---|
| Data Consistency | Maintaining accurate data across distributed servers. |
| Load Distribution | Efficiently balancing traffic to avoid bottlenecks. |
| Latency | Increased network hops can slow down responses. |
| Cost Management | Scaling resources adds cost — needs to be optimized. |
| Fault Tolerance | Ensuring one node failure doesn’t crash the entire system. |
| Complexity | Distributed systems are harder to design, deploy, and debug. |
| Concept | Key Point |
|---|---|
| Scalability | System’s ability to handle growing load efficiently |
| Vertical Scaling | Add power to a single machine |
| Horizontal Scaling | Add more machines |
| Diagonal Scaling | Combine both |
| Goal | Maintain performance, reliability, and cost efficiency as system grows |
| Challenge | Data consistency, latency, load balancing, fault tolerance |
Scalability can be achieved through three main strategies — each with unique benefits, limitations, and trade-offs.
Definition: Increasing the capacity of a single machine — by adding more CPU, RAM, or storage.
How it works: Upgrade the same server → better specs → more workload handled.
Example:
- Upgrading a database server from 8GB → 64GB RAM.
- Moving from a single-core → 16-core processor.
Pros:
- Simple to implement.
- No code changes required.
- Easier maintenance and debugging.
Cons:
- Limited by hardware capacity.
- Downtime may be required during upgrades.
- Becomes expensive at high scale.
Best for: Early-stage startups or small-scale systems where simplicity matters more than scalability.
Definition: Adding more servers/machines to distribute traffic and workload across multiple nodes.
How it works: Instead of one powerful server, use multiple commodity servers behind a load balancer.
Example:
- Adding more web servers to handle increased traffic.
- Using distributed databases (like MongoDB, Cassandra).
Pros:
- Practically infinite scalability.
- Fault tolerance (failure of one node doesn’t stop the system).
- Zero downtime scaling.
Cons:
- Complex setup and monitoring.
- Data consistency and synchronization challenges.
- Requires load balancing and distributed system design.
Best for: High-traffic systems (e.g., YouTube, Amazon, Netflix) where performance and availability are critical.
Definition: Combines both vertical and horizontal scaling — scale up first, then scale out as demand grows.
How it works:
- Start with powerful machines (vertical scaling).
- Add more machines (horizontal scaling) when vertical limits are reached.
Example: Start with a powerful database instance → later shard or replicate when traffic increases.
Pros:
- Balanced performance and cost.
- Gradual and flexible scaling path.
- Ideal for growing systems.
Cons:
- Still inherits partial complexity from horizontal scaling.
- Requires good architecture planning.
Best for: Growing startups and mid-sized companies scaling from few users to millions.
| Strategy | Cost | Complexity | Performance | Example |
|---|---|---|---|---|
| Vertical Scaling | 💰 | 🟢 Low | ⚙️ High (initially) | Single-node PostgreSQL |
| Horizontal Scaling | 💰💰💰 | 🔴 High | ⚡ Very High | Distributed web servers, sharded DB |
| Diagonal Scaling | 💰💰 | 🟡 Medium | ⚙️⚡ Balanced | Modern SaaS & cloud systems |
| Use Case | Scaling Strategy | Example |
|---|---|---|
| Startup MVP / Small App | Vertical | Early-stage web app with limited users |
| High-traffic Web App | Horizontal | Netflix, Facebook, Amazon |
| Growing SaaS Product | Diagonal | Slack, Zoom, Shopify |
| Database-intensive System | Start vertical, later horizontal | PostgreSQL → Sharded setup |
| Compute-heavy workloads | Horizontal | AI/ML distributed training clusters |
| Scaling Type | Description | When to Use |
|---|---|---|
| Vertical (Scale Up) | Add more power to a single node | When simplicity > scale |
| Horizontal (Scale Out) | Add more servers to handle load | When reliability & scale are key |
| Diagonal (Hybrid) | Combine both for flexibility | When system grows over time |
Autoscaling is the cloud’s ability to automatically adjust computing resources — such as servers, containers, or instances — based on real-time demand.
Instead of manually adding or removing capacity, autoscaling dynamically scales up (add resources) when load increases and scales down (remove resources) when demand drops.
✅ Goal: Maintain performance, availability, and cost efficiency automatically.
Example: In AWS EC2, autoscaling can increase the number of instances during traffic spikes and reduce them during off-peak hours.
Autoscaling systems rely on metrics and rules to decide when to scale resources.
Key Components:
- Metrics: CPU usage, memory, request rate, latency, etc.
- Thresholds/Policies: Define upper and lower limits (e.g., scale out if CPU > 70%).
- Scaling Actions: Add or remove compute instances.
- Load Balancer: Distributes traffic among instances dynamically.
- Health Checks: Detects and replaces unhealthy instances.
Flow Example:
- Traffic increases → CPU hits 80%.
- Autoscaler triggers a scale-out event.
- New instance starts → load is distributed.
- Traffic decreases → autoscaler scales in by removing idle instances.
Monitoring Tools:
- AWS CloudWatch
- Azure Monitor
- Google Cloud Operations Suite
- Prometheus + Grafana (for Kubernetes)
Approaches:
- Reactive Scaling: Responds after metrics cross a threshold (e.g., CPU > 80%).
- Proactive Scaling: Uses predictive models and traffic patterns to scale before spikes occur (e.g., scale out before Black Friday).
✅ Best Practice: Combine both reactive + proactive scaling for optimal performance.
- Right-sizing Instances: Use instance types that match workload needs.
- Dynamic Scaling Policies: Scale gradually instead of adding many instances at once.
- Use Spot or Preemptible Instances: For non-critical workloads.
- Set Minimum and Maximum Limits: Prevent over-scaling and unnecessary cost.
- Leverage Serverless Architectures: Pay only for actual usage (e.g., AWS Lambda).
- Monitor Usage Trends: Regularly analyze metrics and adjust policies.
- Use Auto-scaling Groups: To manage multiple instances collectively.
| Concept | Description | Benefit |
|---|---|---|
| Autoscaling | Automatic adjustment of resources based on demand | Performance + Cost efficiency |
| Scaling Triggers | Metrics like CPU, request rate, latency | Dynamic response to load |
| Proactive Scaling | Predictive scaling before traffic spikes | Stability |
| Cost Optimization | Smart scaling + right-sizing | Lower cloud bills |
In system design, storage refers to how data is persisted, organized, and accessed across different components of a system. It’s the backbone of every scalable application — from storing user profiles to managing massive logs, transactions, or files.
Storage decisions directly affect performance, reliability, scalability, and cost of a system.
- Data Durability: Ensures information isn’t lost even after failures.
- Performance: Determines read/write speed and system responsiveness.
- Scalability: Supports increasing data volume and concurrent users.
- Cost Efficiency: Optimizes resource usage for both hot and cold data.
- Data Integrity & Availability: Maintains consistency and uptime for critical services.
📈 In short — storage impacts every aspect of system behavior, from latency to user experience.
| Type | Description | Examples | Common Storage |
|---|---|---|---|
| Structured Data | Organized, well-defined schema (rows/columns) | User tables, transactions | SQL Databases (MySQL, PostgreSQL) |
| Unstructured Data | No fixed schema or structure | Images, videos, logs, emails | NoSQL Databases, Object Storage (S3) |
👉 Semi-structured data like JSON or XML sits between these two, used in document stores (e.g., MongoDB).
| Category | Description | Examples |
|---|---|---|
| Block Storage | Raw volumes used by servers to store data as blocks | AWS EBS, SAN |
| File Storage | Hierarchical structure (files & folders) | Network File System (NFS), SMB |
| Object Storage | Data stored as objects with metadata | Amazon S3, Google Cloud Storage |
| Database Storage | Data stored in structured or semi-structured form | MySQL, MongoDB, Cassandra, DynamoDB |
| In-memory Storage | Fast, temporary data storage in RAM | Redis, Memcached |
When designing data systems, consider the following core properties:
- Durability – Data persists even after failures.
- Availability – Data can be accessed when needed.
- Consistency – All users see the same version of data.
- Latency – Time taken to read/write data.
- Scalability – Ability to handle more data and traffic.
- Fault Tolerance – System continues working during hardware/network failures.
No storage system is perfect — you must balance between:
- ⚡ Performance (Speed)
- 🧩 Consistency
- 🌍 Availability
- 💰 Cost
Example trade-offs:
- Caching improves speed but may reduce consistency.
- Replication increases availability but raises cost and complexity.
👉 The right choice depends on business needs — e.g., a banking app values consistency, while a video platform values availability.
The CAP Theorem (by Eric Brewer) states that a distributed system cannot simultaneously guarantee all three of the following:
| Property | Description |
|---|---|
| Consistency (C) | Every read gets the latest write or an error. |
| Availability (A) | Every request receives a non-error response. |
| Partition Tolerance (P) | System continues to work despite network failures. |
📘 A distributed system must choose any two out of the three (C, A, P).
| Type | Prioritizes | Example Systems | Use Case |
|---|---|---|---|
| CP (Consistency + Partition Tolerance) | Consistency over availability | HBase, MongoDB (configured), Zookeeper | Banking, financial systems |
| AP (Availability + Partition Tolerance) | Availability over consistency | Cassandra, DynamoDB, CouchDB | Social media, messaging apps |
| CA (Consistency + Availability) | Works only when no partition exists (non-distributed) | Traditional RDBMS (MySQL, PostgreSQL) | Single-node, local systems |
| Concept | Description |
|---|---|
| Storage | Manages how data is persisted and accessed |
| Structured Data | Fixed schema (tables, rows) |
| Unstructured Data | No schema (media, logs) |
| CAP Theorem | Choose 2 of: Consistency, Availability, Partition Tolerance |
| Trade-off Goal | Balance performance, cost, and reliability |
Here’s a structured and engaging explanation for your “Understanding Data Models: SQL vs NoSQL” topic — ideal for a video script, ebook section, or interview preparation notes 👇
In modern system design, databases are the backbone of any application. They store, organize, and manage data — ensuring it can be efficiently retrieved, updated, or deleted. Choosing the right data model is crucial because it directly affects scalability, performance, and maintainability of your system.
A database is an organized collection of structured information, or data, typically stored electronically in a computer system. Databases allow applications to store, manage, and query data efficiently using a Database Management System (DBMS).
A Relational Database stores data in tables — rows and columns — much like a spreadsheet. Each row represents a record, and each column represents an attribute of that record.
Relational databases follow a structured schema, meaning data types, relationships, and constraints are predefined.
Examples: MySQL, PostgreSQL, Oracle, Microsoft SQL Server
-
Tables → Organize data into rows (records) and columns (fields)
-
Primary Key → Uniquely identifies each record in a table
-
Foreign Key → Defines relationships between tables
-
Joins → Combine data from multiple tables
-
ACID Properties:
- Atomicity → All or nothing transactions
- Consistency → Data remains valid after any operation
- Isolation → Transactions do not interfere
- Durability → Data is permanently saved even after a crash
- Rigid Schema – Changing structure (adding columns, new relations) can be hard
- Horizontal Scaling is Difficult – Scaling across multiple servers is complex
- Performance Bottlenecks – Joins and transactions can slow down large-scale apps
- Unstructured Data – Not suitable for JSON, images, or social feeds
NoSQL (Not Only SQL) databases were designed to overcome the scalability and flexibility limits of traditional relational databases.
They offer schema-less, distributed, and high-performance data storage — ideal for big data, real-time analytics, and microservices.
Examples: MongoDB, Cassandra, DynamoDB, Redis, Neo4j
| Type | Description | Example Use Case | Examples |
|---|---|---|---|
| Document Store | Stores data in JSON-like documents | E-commerce product catalogs | MongoDB, CouchDB |
| Key-Value Store | Simple key-value pairs for ultra-fast lookups | Caching, session management | Redis, DynamoDB |
| Column-Family Store | Stores data in columns (optimized for queries on large datasets) | Analytics, event tracking | Cassandra, HBase |
| Graph Database | Stores data as nodes and relationships | Social networks, recommendation systems | Neo4j, ArangoDB |
NoSQL systems usually relax ACID guarantees and follow BASE principles:
- Basically Available → System guarantees availability
- Soft State → Data may change over time (eventual consistency)
- Eventual Consistency → Data will become consistent after some delay
The CAP Theorem states that a distributed system can only guarantee two of the following three:
- Consistency (C): All clients see the same data at the same time
- Availability (A): Every request receives a response (success/failure)
- Partition Tolerance (P): System continues to operate despite network failures
| System Type | Focus | Example |
|---|---|---|
| CA | Consistency + Availability | Traditional RDBMS (Single node) |
| CP | Consistency + Partition tolerance | MongoDB, HBase |
| AP | Availability + Partition tolerance | Cassandra, DynamoDB |
| Scenario | Use SQL | Use NoSQL |
|---|---|---|
| Complex relationships & transactions | ✅ | ❌ |
| Fixed schema & structured data | ✅ | ❌ |
| Unstructured or semi-structured data | ❌ | ✅ |
| High read/write throughput | ❌ | ✅ |
| Real-time analytics or caching | ❌ | ✅ |
| Strong consistency required | ✅ | |
| Horizontal scalability priority | ✅ |
- SQL Example: Banking systems, ERP, or inventory management — where accuracy and consistency are critical.
- NoSQL Example: Social media apps, e-commerce recommendations, or IoT — where flexibility and scale matter more.
| Feature | SQL | NoSQL |
|---|---|---|
| Schema | Fixed | Dynamic |
| Scalability | Vertical | Horizontal |
| Consistency | Strong (ACID) | Eventual (BASE) |
| Query Language | SQL | Varies (JSON, APIs) |
| Best For | Structured data, relationships | Big data, flexibility, scalability |
As applications grow, databases must handle more users, more data, and faster requests. Scaling ensures the system can handle this growth without performance degradation.
There are two main strategies for scaling databases:
Definition: Adding more power (CPU, RAM, SSDs) to a single database server.
Example: Upgrading from a 4-core, 16GB RAM machine to an 8-core, 64GB RAM one.
✅ Advantages:
- Simple to implement (no changes in app logic)
- Good for small to medium workloads
- Maintains strong consistency (single-node system)
❌ Disadvantages:
- Hardware limits (can’t scale infinitely)
- Cost increases exponentially
- Single point of failure (if the machine goes down)
Common in: Traditional SQL databases like MySQL or PostgreSQL.
Definition: Adding more servers and distributing the data among them.
Example: Instead of one big MySQL instance, have multiple smaller ones handling different parts of the data.
✅ Advantages:
- Virtually infinite scalability
- Better fault tolerance
- Improved read/write throughput
❌ Disadvantages:
- Complex to manage and maintain
- Requires data partitioning or sharding
- Consistency challenges in distributed systems
Common in: Modern NoSQL databases like MongoDB, Cassandra, or DynamoDB (though SQL systems can also scale horizontally with sharding or replication).
| Feature | Vertical Scaling | Horizontal Scaling |
|---|---|---|
| Method | Add resources to one server | Add more servers |
| Complexity | Low | High |
| Cost | Increases quickly | Scales gradually |
| Performance | Limited by one node | Distributed |
| Fault tolerance | Low | High |
| Common in | SQL | NoSQL |
Replication means copying data from one database server to another to ensure availability, redundancy, and performance.
It allows multiple database copies to exist — usually one leader (primary) and multiple followers (replicas).
How it works:
- All writes go to the leader (primary) database.
- The leader replicates data changes to one or more followers (replicas).
- Followers can serve read-only queries, reducing load on the leader.
✅ Benefits:
- Improved read scalability
- High availability and fault tolerance
- Disaster recovery (if leader fails, a replica can take over)
❌ Drawbacks:
- Replication lag (followers may be slightly behind)
- More complex failover management
Used in: MySQL, PostgreSQL, MongoDB, Cassandra, DynamoDB
Definition: A read-only copy of your main database that handles read traffic to improve performance.
Example Use Case: In an e-commerce app — product details and listings can be served from replicas, while checkout operations still write to the main leader.
✅ Advantages:
- Balances load
- Reduces latency
- Supports analytics queries without affecting the main DB
❌ Disadvantages:
- Not suitable for write-heavy workloads
- Slight data staleness due to replication lag
Sharding is the process of splitting a large database into smaller, faster, more manageable parts called shards.
Each shard holds a subset of data, and together they form the full dataset.
Example: If you have 10 million users, you can store:
- Shard 1 → Users 1–2 million
- Shard 2 → Users 2–4 million
- …and so on.
| Type | Description | Example |
|---|---|---|
| Horizontal Sharding | Distribute rows across shards | Users A–M in one shard, N–Z in another |
| Vertical Sharding | Split tables/columns across databases | User info in one DB, transactions in another |
-
Key/Hash-Based Sharding
- Use a hash function on a shard key (e.g., user_id % 4)
- Evenly distributes data but hard to re-shard later
-
Range-Based Sharding
- Store data in a range (e.g., user_id 1–10000 in shard 1)
- Easy to query by range but uneven data growth can cause hotspots
-
Directory/Lookup Sharding
- Maintain a central directory that maps each key to its shard
- Flexible but introduces a single point of failure
-
Geo-Sharding
- Data is partitioned by geographical region
- Reduces latency for users in specific areas
| Pros | Cons |
|---|---|
| Infinite scalability | Complex data management |
| Faster queries | Difficult joins across shards |
| Improved availability | Rebalancing overhead |
| Localized failures | Application-level routing needed |
Definition: Polyglot persistence means using different types of databases for different components of the same system — based on the strengths of each.
Rationale: No single database fits all use cases.
Example in an E-commerce system:
| Component | Best Fit | Database |
|---|---|---|
| User Accounts | Strong consistency | PostgreSQL |
| Product Catalog | Flexible schema | MongoDB |
| Shopping Cart | Fast access | Redis |
| Analytics | High throughput | Cassandra or BigQuery |
✅ Benefits:
- Optimized for specific workloads
- Better performance and scalability
- Technology flexibility
❌ Challenges:
- More complex data architecture
- Data consistency across systems
| Concept | Description | Best For |
|---|---|---|
| Vertical Scaling | Add more power to one machine | Small systems |
| Horizontal Scaling | Add more servers | Large, distributed apps |
| Replication | Copy data across multiple servers | Read-heavy workloads |
| Sharding | Split large datasets | High data volume |
| Polyglot Persistence | Use multiple databases | Complex, modular systems |
Object Storage is a modern data storage architecture that stores data as objects, rather than files or blocks. Each object contains:
- The data itself (e.g., image, video, backup)
- Metadata (descriptive information about the data)
- A unique identifier (key)
Unlike traditional file systems (which use folders) or block storage (which uses sectors), object storage places everything in a flat, scalable storage pool accessed via APIs (usually HTTP-based).
| Concept | Description |
|---|---|
| Object | The fundamental unit — contains data, metadata, and an ID. |
| Bucket / Container | Logical grouping of objects (similar to a folder but flat). |
| Metadata | Custom data describing the object (e.g., file type, owner, creation date). |
| Object ID / Key | A unique key to retrieve the object (like a URL). |
| API Access | Objects are accessed via RESTful APIs — typically PUT, GET, DELETE. |
| Flat Namespace | No directory hierarchy — simplifies scaling and searching. |
| Platform | Provider | Highlights |
|---|---|---|
| Amazon S3 (Simple Storage Service) | AWS | Industry standard, supports versioning, lifecycle policies |
| Google Cloud Storage (GCS) | Google Cloud | Multi-regional replication, strong consistency |
| Azure Blob Storage | Microsoft Azure | Tight integration with Azure ecosystem |
| MinIO | Open Source | S3-compatible, lightweight, deployable anywhere |
| Ceph Object Gateway | Open Source | Scalable and fault-tolerant for on-prem setups |
| DigitalOcean Spaces / Backblaze B2 | Cloud Providers | Cost-effective and S3-compatible |
| Use Case | Description |
|---|---|
| Backup & Archival | Durable, low-cost storage for backups and historical data. |
| Media Storage | Store large, unstructured assets like images, videos, audio files. |
| Big Data & Analytics | Store raw data for processing with tools like Spark or Presto. |
| Static Website Hosting | Serve HTML, CSS, JS directly via public URLs (e.g., AWS S3 static hosting). |
| Machine Learning Data Lakes | Centralized object storage for training data and model artifacts. |
| Application Data | Store logs, user-generated content, and configuration data. |
| Factor | Advantages | Trade-offs |
|---|---|---|
| Scalability | Infinitely scalable with flat namespace | Slower than local/block storage |
| Durability | Data replicated across regions | Higher latency for frequent small reads |
| Cost | Low cost per GB | Egress (data out) costs can add up |
| Access Pattern | Ideal for large, infrequent reads/writes | Not suitable for high IOPS workloads (e.g., databases) |
| Consistency | Most systems now offer strong consistency | May vary between providers |
| Integration | Easy API-based access | Limited local filesystem compatibility |
| Feature | Object Storage |
|---|---|
| Data Structure | Objects with metadata and unique IDs |
| Access Method | RESTful APIs (HTTP/S3-compatible) |
| Scalability | Virtually unlimited |
| Best For | Unstructured data (media, backups, analytics) |
| Example Platforms | AWS S3, GCS, Azure Blob, MinIO |
| Main Trade-off | High scalability vs higher latency |
A File System is the method and structure an operating system uses to store, organize, and manage files on storage devices (like HDDs, SSDs). It defines how data is named, stored, retrieved, and organized into files and directories.
Common examples: NTFS, ext4, HFS+, FAT32.
| Feature | Description |
|---|---|
| Hierarchical Structure | Organizes files in directories and subdirectories. |
| Metadata Management | Stores attributes like size, permissions, timestamps. |
| Access Control | Supports permissions (read/write/execute) for users and groups. |
| Mounting | Each file system is mounted on a local device or partition. |
| Consistency | Uses journaling or logs to maintain integrity after crashes. |
| Performance | Optimized for single-node storage access. |
✅ Traditional file systems are ideal for local storage and single-server workloads.
A Distributed File System (DFS) allows files to be stored across multiple servers or nodes but appear to users as a single unified file system. It enables data sharing, scalability, and fault tolerance across large-scale systems.
Examples:
- Google File System (GFS)
- Hadoop Distributed File System (HDFS)
- CephFS
- GlusterFS
Core Components:
- Name Node / Metadata Server – Tracks file locations, directories, and metadata.
- Data Nodes / Storage Nodes – Store actual file blocks or chunks.
- Client – Interacts with the DFS, reading/writing files using DFS APIs.
Workflow Example (HDFS-like):
- A file is split into chunks (e.g., 128MB).
- Each chunk is replicated (usually 3 copies) across different nodes.
- The Name Node maps files → chunks → node locations.
- When a node fails, replicas are used to reconstruct the missing data.
Replication Benefits:
- Ensures fault tolerance
- Enables load balancing for reads
- Improves data locality (clients can read from the nearest node)
| Property | Description |
|---|---|
| Horizontal Scalability | Add more nodes to increase capacity and throughput. |
| Data Replication | Redundant copies maintain availability despite node failures. |
| Automatic Recovery | Failed nodes or lost chunks are detected and rebuilt automatically. |
| High Throughput | Parallel access to distributed blocks boosts read/write speed. |
| Fault Isolation | Node-level failures don’t impact the entire file system. |
Trade-off: DFS introduces complexity in metadata management and network latency, especially during coordination between nodes.
| Concept | Traditional File System | Distributed File System |
|---|---|---|
| Scope | Single machine | Multiple machines/nodes |
| Scalability | Limited | High (horizontal scaling) |
| Fault Tolerance | Low | High (replication, recovery) |
| Examples | NTFS, ext4 | HDFS, GFS, CephFS |
| Access | Local | Network-based |
| Use Case | Personal systems, small apps | Big Data, Cloud Storage, Large-scale apps |
Modern systems rely on different types of storage architectures — Block, File, and Object — each optimized for specific workloads, scalability levels, and access patterns. Understanding their differences helps in choosing the right storage type for databases, applications, and large-scale systems.
Definition: Block storage splits data into fixed-size blocks and stores them separately, each with a unique identifier. The OS assembles these blocks when reading or writing data.
Common Use: Databases, virtual machines, high-performance workloads.
Examples: Amazon EBS, iSCSI, SAN, NVMe storage.
Characteristics:
- Acts like a raw disk to the OS.
- High IOPS and low latency.
- Managed at the block level by the system or application.
Definition: File storage organizes data into files and directories in a hierarchical structure. It’s the most traditional and user-friendly storage model.
Common Use: Shared drives, user directories, content management.
Examples: NFS, SMB, ext4, NTFS.
Characteristics:
- Accessed via file paths (
/home/data/report.pdf). - Simple to manage but limited in scalability.
- Good for small to medium-sized systems.
Definition: Object storage manages data as objects with metadata and unique IDs, in a flat namespace. It’s ideal for massive unstructured data and accessed via APIs (HTTP/S3).
Common Use: Cloud storage, backups, media, analytics.
Examples: AWS S3, Google Cloud Storage, Azure Blob, MinIO.
Characteristics:
- Highly scalable and durable.
- Optimized for sequential reads/writes.
- API-based access (not a file system).
| Feature | Block Storage | File Storage | Object Storage |
|---|---|---|---|
| Data Structure | Blocks | Files & Folders | Objects with metadata |
| Access Method | Low-level, via OS | File path | REST APIs (HTTP/S3) |
| Performance | Very high | Moderate | High for large objects |
| Scalability | Limited | Medium | Virtually unlimited |
| Use Case | Databases, VMs | File sharing, small apps | Backups, media, data lakes |
| Consistency | Strong | Strong | Eventually / Strong (depends) |
| Cost | High | Medium | Low per GB |
| Latency | Low | Medium | Higher (network overhead) |
| Metadata | Minimal | File attributes | Rich, customizable |
| Examples | AWS EBS, SAN | NFS, SMB | AWS S3, GCS |
| Scenario | Best Storage Type | Reason |
|---|---|---|
| Database (SQL, NoSQL) | Block | Low-latency, high IOPS required |
| Application File Sharing | File | Simple file/directory structure |
| Backup / Archival / Logs | Object | Cost-effective, durable |
| Big Data / Analytics | Object | Scalable, accessible via APIs |
| Virtual Machine Storage | Block | Direct disk-level access |
| Content Delivery (Media, Assets) | Object | Ideal for large unstructured data |
- Block Storage: High performance, low latency, ideal for structured, transactional workloads.
- File Storage: Simple and familiar for users, good for collaborative or mid-scale apps.
- Object Storage: Highly scalable, cost-efficient, best for unstructured or cloud-native workloads.
Big Data refers to data sets that are too large, fast, or complex for traditional data processing systems to handle efficiently. It involves storing, processing, and analyzing vast amounts of structured, semi-structured, and unstructured data to extract insights and drive decisions.
Examples:
- Social media feeds
- IoT sensor data
- Financial transactions
- Clickstream logs
Big Data is not just about size — it’s about volume, velocity, and variety of data, and how we manage it effectively.
| V | Meaning | Description |
|---|---|---|
| Volume | Amount of data | Terabytes to petabytes generated daily. |
| Velocity | Speed of data generation | Real-time streams from devices, sensors, apps. |
| Variety | Types of data | Structured (SQL), semi-structured (JSON), unstructured (images, videos). |
| Veracity | Data quality | Ensuring accuracy and reliability of massive, noisy data. |
| Value | Business relevance | Extracting actionable insights from raw data. |
| Variability | Data inconsistency | Fluctuating data rates and formats across sources. |
Traditional relational databases (RDBMS) and file systems struggle because:
- Vertical scaling limits — adding CPU/RAM is costly and finite.
- Rigid schema — can’t handle unstructured or evolving data formats.
- Performance degradation — queries slow down as data grows.
- Single-node design — can’t distribute processing efficiently.
- Storage bottlenecks — limited by local disks and file system hierarchy.
✅ Hence, Big Data systems use distributed storage (like HDFS, S3) and parallel processing frameworks (like Spark, Hadoop).
| Workload | Description | Tools |
|---|---|---|
| Data Ingestion | Collecting data from various sources | Kafka, Flume, NiFi |
| Batch Processing | Processing large datasets at intervals | Hadoop MapReduce, Spark |
| Stream Processing | Real-time event handling | Kafka Streams, Flink, Spark Streaming |
| Data Storage | Distributed and scalable storage | HDFS, S3, Cassandra |
| Data Querying | Query large datasets interactively | Presto, Hive, BigQuery |
| Analytics & ML | Insights, dashboards, model training | Spark MLlib, TensorFlow |
| Feature | Batch Processing | Stream Processing |
|---|---|---|
| Data Type | Historical / accumulated | Real-time / continuous |
| Latency | Minutes to hours | Milliseconds to seconds |
| Tools | Hadoop, Spark | Kafka Streams, Flink |
| Use Case | Monthly sales report | Fraud detection, live analytics |
| Processing Mode | Process data in chunks | Process data event-by-event |
✅ Hybrid systems (like Spark Structured Streaming) combine both — offering real-time insights with historical context.
Big Data represents the shift from single-server systems to distributed architectures, built for scalability, speed, and flexibility. It powers analytics, AI, and decision-making in modern large-scale applications.
System performance refers to how efficiently a system handles requests, processes data, and responds to users under different workloads. In system design, performance ensures that applications remain fast, stable, and reliable as they scale.
Performance measures how well a system meets expected speed, efficiency, and reliability goals. Key areas include:
- Response speed
- Ability to handle load
- Resource utilization
- Consistency under peak traffic
- Latency: Time taken to process a single request (speed per request).
- Throughput: Number of requests the system can process per second (overall capacity).
High throughput does not always mean low latency.
- Scalability: Ability of a system to handle increased load by adding more resources.
- Responsiveness: How quickly a system reacts to user requests.
A system can scale well but still be slow if poorly designed; both must be balanced.
A formal contract defining expected performance guarantees (e.g., 99.9% uptime).
Internal target or goal that the team aims to meet (e.g., 200ms response time).
Actual measured metrics from the system (e.g., current availability = 99.93%).
Percentiles show how performance varies across many requests. Common examples:
- P50: Median response time
- P90 / P95: Slowest 10% / 5%
- P99: Worst 1% — critical for user experience
Percentiles reveal tail latency, not visible in averages.
- Users expect instant responses.
- Directly affects engagement, retention, conversions.
- Essential for real-time systems like trading, gaming, and messaging.
- Impacts cloud cost and infrastructure planning.
- Determines reliability during peak traffic.
Types of performance tests include:
- Load testing — normal expected load
- Stress testing — extreme load
- Spike testing — sudden traffic bursts
- Endurance testing — long-running workloads
Goal: identify bottlenecks before production.
Performance monitoring helps track system health in real time using metrics such as:
- CPU, memory, disk I/O
- Request latency
- Error rates
- Throughput
- Network usage
Tools: Prometheus, Grafana, Datadog, New Relic.
Monitoring ensures early detection of issues and stable system behavior.
Caching improves performance by storing frequently accessed data closer to the user or application, reducing:
- Latency
- Load on databases
- Network round trips
- Overall infrastructure cost
It helps systems scale efficiently while delivering faster responses.
- Client-side caching (browser, mobile app)
- CDN caching (edge caching of static assets)
- Application-level caching (in-memory like Redis, Memcached)
- Database caching (query caching, materialized views)
- Distributed caching (shared cache across multiple servers)
- Read-through Cache sits in front of DB; data is fetched and stored automatically.
- Write-through Writes go to cache and DB simultaneously.
- Write-back / Write-behind Writes go to cache first, DB updated asynchronously.
- Cache-aside (Lazy loading) Application checks cache first; if miss → fetch from DB and store.
Used when cache is full and data must be removed:
- LRU (Least Recently Used)
- LFU (Least Frequently Used)
- FIFO (First In First Out)
- Random eviction
- TTL-based eviction (expire after time)
Redis is an in-memory, distributed key-value store known for high speed. Key features:
- Millisecond-level latency
- Supports data structures (strings, lists, sets, hashes, sorted sets)
- Pub/sub messaging
- Persistence modes (RDB, AOF)
- TTL support for key expiration
- Widely used for caching, rate limiting, session storage, leaderboards
- Session caching (login sessions)
- Feed caching (social media timelines)
- Product catalog caching (e-commerce)
- API response caching
- Database query caching
- Leaderboard & ranking caching (gaming)
- Rate limiting counters
- Geolocation & configuration data caching
Asynchronous messaging helps systems communicate without waiting for immediate responses. It provides:
- Decoupling between services (producer doesn’t need to know consumer details)
- Improved performance by offloading heavy tasks
- Better reliability via retry and persistent queues
- Smoother traffic handling during spikes
- Scalability by processing messages at controlled pace
- Producer: Sends messages to a queue or topic.
- Consumer: Reads and processes messages.
- Queue: Stores messages in order until consumed.
- Topic: Broadcasts messages to multiple subscribers (pub/sub).
- Message: Unit of data transmitted between services.
- Broker: Middleware that routes, stores, and delivers messages.
- Ack / Nack: Consumers acknowledge or reject messages.
- Offsets: Position markers in message streams (Kafka-style).
Client → Producer → Message Broker → Consumer → Database/Service
Key property: Producers and consumers scale independently, improving resilience and throughput.
Use queues when you need:
- To process tasks asynchronously
- To handle burst traffic smoothly
- To perform long-running or heavy operations
- To decouple microservices
- To ensure reliable delivery
- To enable event-driven architectures
- To avoid blocking APIs (e.g., uploading video, generating reports)
-
Type: Message Queue (AMQP)
-
Focus: Reliable message delivery
-
Strengths:
- Complex routing (exchanges)
- Suitable for job queues and task distribution
- Good for small to medium throughput
-
Typical Use Cases:
- Background jobs
- Notification services
- Task workers
-
Type: Distributed log + event streaming platform
-
Focus: High throughput + real-time streaming
-
Strengths:
- Persistent message log
- Horizontal scalability
- Millions of messages per second
-
Typical Use Cases:
- Event-driven systems
- Log aggregation
- Real-time analytics
- Stream processing
- At most once Message is delivered once; may be lost.
- At least once Message is retried until acknowledged; may be processed twice.
- Exactly once Ensures no duplicates; hard to achieve, requires idempotent consumers.
- Order processing pipelines
- Email/SMS notification systems
- Payment processing
- Log aggregation and analytics
- Video/image processing workflows
- IoT event streams
- Microservices communication
- Chat systems
- Real-time monitoring & metrics pipelines
- Keep messages small and self-contained
- Ensure idempotency in consumers
- Use dead-letter queues (DLQ) for failed messages
- Set proper retry policies
- Monitor queue length, lag, and consumer health
- Partition data wisely (Kafka) for parallelism
- Avoid putting extremely large payloads directly in queues
- Secure brokers with authentication and encryption
- Tune prefetch/consumer concurrency based on workload
Concurrency is the ability of a system to handle multiple tasks at the same time by switching between them. Tasks appear to run simultaneously even if executed on a single CPU core. It’s about managing multiple tasks, not executing them at the same moment.
Parallelism means executing multiple tasks simultaneously using multiple CPU cores. It’s about actual simultaneous execution, not interleaving.
In short:
- Concurrency → dealing with many tasks
- Parallelism → doing many tasks at the same time
- Independent execution unit
- Has its own memory space
- Heavyweight
- Lightweight execution unit inside a process
- Shares memory with other threads
- Communication is easier but needs synchronization
Instead of creating a thread per request, systems use a thread pool:
- Pre-created threads wait for tasks
- Tasks are distributed to available workers
- Improves performance, avoids overhead, prevents crashing under load
Used in: Node.js worker threads, Java ExecutorService, Python concurrent.futures, Nginx worker model.
Async systems free the thread while waiting for I/O operations like:
- Network calls
- Database operations
- File reads
This improves scalability because threads do not block unnecessarily.
Examples:
- Node.js async I/O
- Python async/await
- Java CompletableFuture
Different web servers handle concurrency differently:
- Thread-per-request model (Tomcat, Spring) Each request gets its own thread.
- Event-driven model (Node.js, Nginx) One thread handles many requests using async callbacks.
- Hybrid worker model (FastAPI + Uvicorn, Go net/http) Mix of concurrency primitives like goroutines, green threads, or event loops.
- Race conditions (multiple threads modifying shared data)
- Deadlocks (threads waiting on each other forever)
- Starvation (a task never gets resources)
- Shared memory corruption
- Too many threads causing context-switch overhead
- Improper locking reducing performance
- Use locks only when necessary
- Prefer immutable data structures
- Use thread pools, not unlimited threads
- Keep tasks small and stateless
- Avoid sharing data unnecessarily
- Use idempotent consumers in distributed systems
- Apply timeouts and circuit breakers
- Web servers handling thousands of concurrent users
- Workers processing tasks from message queues
- Database connection pools shared across threads
- Kafka consumers running in parallel partitions
- Go goroutines serving microservices at scale
Replication means copying data across multiple database servers.
- Improves read performance (read from replicas)
- Provides high availability
- Enables failover during primary DB failure
- Leader–Follower (Master–Slave)
- Multi-leader
- Leaderless (like DynamoDB)
Sharding splits large datasets across multiple servers to improve scalability.
- Range-based: Based on value ranges (e.g., dates, IDs)
- Hash-based: Hash function distributes keys uniformly
- Geo/Zone-based: Partition by region
- Directory-based: Metadata service decides shard placement
- Distributes load
- Reduces hot spots
- Allows horizontal scaling
In distributed systems, you can choose only two at a time:
- Consistency: All nodes see the same data
- Availability: System responds even if some nodes fail
- Partition Tolerance: System continues during network splits
- CP: Prioritize consistency (e.g., HBase)
- AP: Prioritize availability (e.g., Cassandra)
- CA: Rare in distributed systems
Indexes improve query speed by avoiding full table scans.
- B-Tree Index: General queries, range queries
- Hash Index: Fast equality lookups
- Full-Text Index: Searching text fields
- Composite Index: Multi-column queries
- Inverted Index: Used in search engines
- WHERE conditions
- JOIN performance
- Sorting (ORDER BY)
- Avoid full scans on large tables
Organizing data into relational tables to reduce redundancy. Use When:
- Data integrity is critical
- Write-heavy systems
- Avoid anomalies (update, insert, delete)
Combining tables for faster reads. Use When:
- Read-heavy workloads
- Analytics, dashboards
- Need fewer JOINs in high-traffic systems
- Caching aggregated data
Reuses existing DB connections instead of opening new ones. Benefits:
- Reduces connection overhead
- Improves throughput
- Prevents DB overload
- Avoid SELECT *
- Use proper indexes
- Rewrite inefficient JOINs
- Limit result size
- Analyze slow queries (EXPLAIN, query planner)
Precomputed, stored query results. Use Case:
- Analytics and dashboards
- Expensive aggregations (SUM, COUNT, GROUP BY)
- Reducing load in read-heavy systems
Send multiple operations in one request (insert/update). Benefits:
- Fewer network calls
- Better throughput Use Case:
- Logging events
- Bulk inserts
Load results in chunks. Benefits:
- Avoids loading millions of rows
- Reduces memory usage Techniques:
- OFFSET/LIMIT
- Keyset pagination (better for large datasets)
Reliability determines whether a system consistently works as users expect. In modern systems:
- Users demand near-zero downtime.
- Businesses lose money for every second of failure (Amazon = millions/minute).
- Reputation, trust, and customer retention heavily depend on reliability.
- Many systems (payments, healthcare, transportation) are mission critical.
High reliability = better user experience, lower maintenance cost, and stable operations.
System reliability is the ability of a system to function correctly over time, without failures.
It focuses on:
- Availability (Is the system up and reachable?)
- Durability (Does the system protect data?)
- Fault tolerance (Can it survive failures?)
- Recoverability (How fast can it come back?)
In short:
👉 A reliable system keeps running even when things break.
Reliability is measured by two important metrics:
- Measures time between two failures.
- Higher MTBF → More reliable system.
Example: MTBF = 1000 hours → on average, system fails once every 1000 hours.
- Measures how quickly a system recovers after failure.
- Lower MTTR → Faster recovery → Higher availability.
Example: MTTR = 10 minutes → system comes back online in 10 minutes after a failure.
SLA is a promise made by the service provider about its reliability.
Typical SLA Guarantees:
- Uptime/Availability (ex: 99.9%, 99.99%, 99.999%)
- Performance (latency/response times)
- Support response times
- Data durability
Availability levels:
| SLA | Downtime Allowed per Year |
|---|---|
| 99% | ~3.65 days |
| 99.9% | ~8.7 hours |
| 99.99% | ~52 minutes |
| 99.999% | ~5 minutes |
These two terms are often confused:
System is accessible and working.
Example: If your database node crashes but a replica takes over immediately → Availability maintained.
Data remains safe and never lost.
Example: Data is written to 3 replicas → even if one fails, data is still safe.
| Availability | Durability |
|---|---|
| System uptime | Data permanence |
| Achieved using redundancy, load balancing | Achieved using replication, backups |
| Temporary downtime allowed | Data loss NOT allowed |
Reliability influences major design choices:
- Multi-region architectures
- Replication strategies
- Load balancing
- Failover mechanisms
- Eventual consistency models
- Monitoring + Alerts
- Retry logic, idempotency
A reliable system requires more complexity, cost, and infrastructure.
Distributed systems face unique reliability issues:
- Network failures
- Partial system failures
- Node crashes
- Data inconsistencies (due to CAP theorem)
- Clock skew & distributed time
- Split-brain scenarios
- Difficult debugging
| Challenge | Solution |
|---|---|
| Node failures | Replication, auto-healing |
| Traffic spikes | Autoscaling |
| Inconsistency | Quorums, consensus (Raft, Paxos) |
| Slow components | Timeouts, circuit breakers |
| Routing failures | Load balancers, retries, idempotency |
| Data loss | Write-ahead logs, backups, multi-region writes |
Cloud-native systems rely heavily on:
- Microservices
- Containers (Docker)
- Orchestration (Kubernetes)
- Service meshes
- Autoscaling
- Self-healing infrastructure
Cloud providers offer built-in features:
- Multi-zone deployment
- Auto-restarts
- ReplicaSets & StatefulSets
- Managed databases with failover
- Observability stacks (logs, metrics, traces)
Cloud-native reliability = Automation + Redundancy + Monitoring
| Topic | Key Idea |
|---|---|
| Why reliability matters | Prevent failures, protect business |
| What is system reliability | Consistent correct behavior |
| MTBF | Time between failures |
| MTTR | Time to recover |
| SLA | Promised uptime/performance |
| Availability vs Durability | Uptime vs Data safety |
| Distributed reliability challenges | Network, partial failure, consistency |
| Cloud-native reliability | Self-healing + autoscaling + redundancy |
High Availability means your system stays up and accessible almost all the time—even during failures.
HA is measured using availability %, like:
- 99% → 3.65 days downtime/year
- 99.9% → 8.7 hours downtime
- 99.99% → 52 minutes
- 99.999% → 5 minutes
Goal: Ensure services continue running despite failures by adding redundancy + failover + monitoring.
Fault tolerance means your system keeps working even if a component completely fails.
- A fault-tolerant system is designed so failure of one part does NOT affect the whole system.
- Usually achieved using redundant components, e.g., multiple replicas, multi-AZ, RAID, replication, clustering.
Fault tolerance = Zero downtime during a failure.
Failover is the automatic switching of traffic from a failed component to a healthy component.
Examples:
- If primary DB crashes → read replica becomes primary automatically.
- If a server fails → load balancer sends traffic to other servers.
Failover can be:
- Automatic (most common)
- Manual (rare, used in critical financial systems)
Redundancy = having extra components so that if one fails, others take over.
Types of redundancy in system design:
You have N working components + 1 spare.
Example:
- 3 servers handle load
- 1 standby server
- If one fails → spare takes over
Used in: web servers, load balancers, power supplies.
All replicas are running simultaneously and handling traffic.
- Load balancer distributes load across all nodes.
- If one node fails, remaining nodes automatically take traffic.
Pros: High throughput, continuous availability Cons: Harder consistency, conflict resolution needed (DBs especially)
Used in:
- Multi-region services
- Redis Cluster
- Cassandra, DynamoDB
- Global load-balanced apps
One node handles traffic, the other stays idle but ready.
Flow:
- Active node serves all requests
- Passive node replicates data
- If active fails → passive becomes active
Pros: Simple, predictable Cons: Passive node is underutilized Example: Primary–replica DB setup
When the system is overloaded or partially failing, it continues working with reduced functionality instead of crashing.
Examples:
- Instagram disables “Explore” feed during heavy load but lets users still scroll home feed.
- Netflix reduces video quality (from 4K → HD → SD).
- A ride-sharing app disables price history or analytics during peak traffic spikes.
Goal: Fail partially → NOT fully.
Here are the most common HA architecture patterns used in big tech:
- Run app in multiple availability zones.
- If AZ goes down → traffic automatically routes to another.
Used by: AWS RDS Multi-AZ, Kubernetes clusters, Netflix.
Client → Load Balancer → Server1, Server2, Server3
If Server2 crashes, LB routes traffic to 1 and 3.
- Primary handles writes
- Multiple replicas handle reads
- If primary fails → auto failover
Used by: MySQL, PostgreSQL, MongoDB replica sets.
Replicate data across:
- multiple nodes
- multiple racks
- multiple data centers
Used by: HDFS, Cassandra, S3.
If a downstream service fails:
- stop sending requests
- return fallback results
- avoid cascading failures
Used by: Netflix Hystrix, Spring Cloud.
To design a reliable, redundant system:
Servers, databases, caches, message brokers.
One failing component should not take down the entire system.
Global (GSLB) + Local (internal LB).
In DBs, storage, caches.
To survive entire data center failures.
Automated promotion, heartbeat checks, retries.
A system must detect, recover, and prevent failures automatically.
Tools & techniques:
- Heartbeat signals
- Health check endpoints (
/health,/live,/ready) - CloudWatch, Prometheus, Grafana
- Alerting on latency, CPU, memory, error rates
A self-healing system can fix itself:
- Kubernetes restarts crashed containers automatically.
- Auto-scaling groups replace unhealthy EC2 instances.
- Load balancer removes unhealthy nodes automatically.
- Redis cluster automatically moves shards when nodes die.
| Concept | Meaning |
|---|---|
| High Availability | System stays up most of the time |
| Fault Tolerance | System continues working even after failures |
| Failover | Automatic switch to healthy component |
| N+1 | Extra standby unit |
| Active-Active | All nodes active |
| Active-Passive | One active, one standby |
| Graceful Degradation | Reduced functionality during failures |
| HA Patterns | LB, Multi-AZ, Replication, Circuit breakers |
| Self-Healing | System auto-recovers from failure |
A backup is a copy of your data stored in a separate location so it can be restored if the original is lost or corrupted.
Recovery is the process of restoring data from a backup after failure, corruption, or disaster.
Backups protect against:
- Hardware failures
- Human mistakes (accidental deletion)
- Cyber attacks (ransomware, malware)
- Natural disasters
- Data corruption
- Software bugs
- Cloud region failures
Without backups, data loss = business loss.
- Complete copy of all data.
- Slow & storage-heavy.
- Best for weekly/monthly backups.
- Backs up only data changed since the last backup.
- Faster, lower storage.
- Recovery is slower: you need all incrementals + last full backup.
- Backs up data changed since last full backup.
- Larger than incremental but faster recovery.
- Real-time or near–real-time backup.
- Used for financial systems, trading systems.
Restore individual files or folders.
Restore entire servers, OS images, configuration.
Restore DBs, queues, caches, or application-level state.
Restore full system in another region or data center.
Disaster recovery often includes:
- Hot site (active-active)
- Warm site (active-passive)
- Cold site (infrastructure available but powered off)
“How fast should we recover after failure?”
Examples:
- RTO = 0 sec → Active-active systems
- RTO = 15 min → Critical apps
- RTO = 4–8 hours → Internal tools
Lower RTO = more cost.
“How much data loss is acceptable?”
Examples:
- RPO = 0 → Real-time replication
- RPO = 1 hour → Hourly backup
- RPO = 1 day → Daily backup
Lower RPO = more frequent backups = higher cost.
| Strategy | Pros | Cons |
|---|---|---|
| Full | Simple to restore | Expensive, slow |
| Incremental | Fast backup, cheap | Slow restore (many files) |
| Differential | Faster restore | Larger daily backups |
| Continuous | Almost no data loss | High cost, complex |
Key trade-offs:
- Cost vs Frequency
- Backup speed vs Recovery speed
- Storage cost vs RPO/RTO
- Operational complexity vs reliability
- 3 copies of data
- 2 different storage types
- 1 copy offsite (e.g., cloud)
Avoid single point of regional failure.
A backup is useless if it cannot be restored.
No manual steps → fewer human errors.
Both at rest and in transit.
Protect against ransomware and accidental overwrites.
- EBS snapshots
- RDS snapshots
- VM/Container snapshots
Alerts ensure backup failures are caught early.
| Concept | Meaning |
|---|---|
| Backup | Copy of data stored elsewhere |
| Recovery | Restoring data after failure |
| RTO | How fast to recover |
| RPO | How much data loss acceptable |
| Full/Incremental/Differential | Core backup strategies |
| DR | System-wide recovery plan |
| Best Practices | 3-2-1 rule, automation, encryption, testing |
Disaster Recovery (DR) ensures a system can quickly recover after catastrophic failures such as:
- Data center outages
- Natural disasters
- Cloud region failures
- Cyberattacks or ransomware
- Human operational mistakes
Without DR, systems face extended downtime, data loss, and business failure.
Mission-critical systems (banking, payments, healthcare, e-commerce) require:
- Minimal downtime (low RTO)
- Minimal data loss (low RPO)
- Multi-region architectures
- Continuous replication
- Automated failover
These systems must remain operational even during regional failures.
- Backups protect data (restore capability).
- Failover provides continuous service availability.
Both are required because:
- Backups alone → slow recovery, possible data loss
- Failover alone → corrupted/compromised data may replicate instantly
A resilient system uses:
- Real-time replication (for quick failover)
- Periodic backups (for long-term recovery)
Disaster recovery is reliable only when regularly validated:
- Automate failover/failback scripts
- Run Chaos/DR drills (e.g., shutting down an entire region)
- Test backup restore procedures
- Validate RTO/RPO targets
- Continuously audit recovery readiness
Automated DR reduces human error and speeds up recovery.
Systems spanning multiple regions or continents face:
- High network latency
- Data consistency issues
- Split-brain scenarios
- Complex replication logic
- Different regulatory requirements
- Higher operational cost
Achieving strong consistency across regions is hard due to network unpredictability.
Deploying services/data across multiple regions ensures:
- Region-level fault tolerance
- Availability during disasters
- Faster local access for users
Common patterns:
- Active-active multi-region
- Active-passive warm standby
- Active-cold DR region
Used to maintain consistency in distributed systems:
- A write succeeds only if a majority (quorum) of nodes acknowledge it.
- Prevents split-brain and ensures durability.
- Used by systems like Cassandra, MongoDB, Consul, Zookeeper.
Quorums help balance:
- Consistency (enough nodes must agree)
- Availability (system stays online despite failures)
Security is a non-functional requirement but one of the most critical aspects of scalable system design. A system that scales but is not secure is useless.
Security in system design ensures that applications, servers, networks, and data are protected from adversaries, misuse, and failures.
In distributed systems (microservices, cloud apps), multiple components interact across networks — making the attack surface larger. Security must be integrated from day 1 of the architecture, not patched later.
Security is important because:
User data, financial data, PII, tokens, passwords.
DDoS, data breaches, fraud can cost millions.
Users expect their data to be safe.
Attacks like DDoS and ransomware can shut down systems.
GDPR, HIPAA, SOC2, PCI-DSS rules must be followed.
A distributed system has:
- Multiple nodes
- Multiple communication channels
- Multiple storage layers
- Shared responsibilities
This makes it vulnerable to:
- Network attacks
- Service impersonation
- Traffic interception
- Data corruption
- Unauthorized access
Security in distributed systems = securing every layer:
- Client layer
- API layer
- Network layer
- Microservices
- Databases
- Caches
- Queues
- Storage
- Infrastructure
Security is built on the CIA Triad:
Ensure data is accessible only to authorized users. Techniques: Encryption, Access Control, TLS.
Ensure data isn’t tampered with. Techniques: Hashing, Checksums, Digital signatures.
Ensure the system is always up. Techniques: Redundancy, Rate-limiting, DDoS protection.
Together, these form the foundation of secure architecture.
Threat modeling helps predict:
- Who can attack?
- Why will they attack?
- How will they attack?
Use the STRIDE model:
| Threat Type | Stands For | Example |
|---|---|---|
| S | Spoofing | Fake login, impersonation |
| T | Tampering | Modify API request |
| R | Repudiation | Deny performing an action |
| I | Information Disclosure | Unauthorized access |
| D | Denial of Service | DDoS attack |
| E | Elevation of Privilege | Normal user acting as admin |
This helps design defenses early.
Attack vectors = ways attackers enter.
- Weak authentication
- Exposed APIs
- Insecure cookies
- Public S3 buckets
- Misconfigured CORS
- Outdated libraries
- SQL injection points
- No rate-limiting
- Public admin panels
Goal: Exhaust your server resources. Solution: Rate limiting, CDN, WAF, load balancer.
Attacker intercepts data in transit. Solution: HTTPS + TLS.
Attacker injects malicious code. Solution: Validation, ORM, sanitization.
Fake identity, fake IP, fake tokens. Solution: Strong auth, signatures.
Security should be part of the dev cycle:
- Requirements: define security needs
- Design: threat modeling, review
- Implementation: secure coding
- Testing: penetration tests, static analysis
- Deployment: infra hardening
- Monitoring: alerts, logs
Tools: SonarQube, Snyk, OWASP ZAP.
- Use HTTPS everywhere
- Validate every input
- Enforce least privilege
- Never store passwords in plain text
- Rotate keys/tokens
- Use secure headers (CSP, HSTS)
- Enable MFA for admin access
Auth is core to system security.
- Authentication = "Who are you?"
- Authorization = "What can you do?"
- Username + Password
- OTP / MFA
- OAuth 2.0
- JWT
- API Keys
- Refresh Tokens
Stored on server memory/Redis. Great for traditional web apps.
Stored on client side, stateless. Great for microservices & mobile apps.
- RBAC – Role Based
- ABAC – Attribute Based
- PBAC – Policy Based
- MAC/DAC – Mandatory/Discretionary control
- SSO = One login for many services
- SAML, OAuth2, OpenID Connect
Used by Google Login, GitHub Login.
To prevent:
- Data breaches
- Unauthorized access
- Insider threats
- Fraud
Two types:
- Symmetric (AES)
- Asymmetric (RSA)
- At rest = Encrypt databases, files
- In transit = HTTPS, TLS
Never store raw passwords.
Use:
- bcrypt
- Argon2
- PBKDF2
Allows secure communication using digital certificates.
Used by HTTPS, API signing, JWT validation.
- Sign requests (HMAC)
- Require tokens
- Validate input/output
- CORS restrictions
- Use API gateways
A system is only as strong as its weakest network point.
- Firewall: Blocks suspicious requests
- Reverse Proxy: Protects backend servers (E.g., Nginx, Envoy)
Used to prevent:
- DDoS
- Abuse
- Bots
- API overload
Tools: NGINX, API Gateway, Cloudflare.
Separate network layers:
- Public → Load Balancer → App → DB
- Private VPC
- Subnets
- No DB exposure to internet
"Trust nothing, verify everything."
- Every request must be authenticated
- Every device verified
- No implicit trust inside network
- IAM roles
- VPC security groups
- Secret managers
- Monitoring (CloudWatch, Datadog)
- Use minimal images
- No root user
- Scan images
- Limit execution role
- Validate input
- Set timeout
- Use VPC
Key challenges:
- Many services → many attack points
- Message queues
- Internal APIs
Solutions:
- Mutual TLS
- API Gateway
- Service mesh (Istio/Linkerd)
- Isolation per service
- Injection
- Broken Auth
- Sensitive Data Exposure
- XXE
- Access Control issues
- Security misconfig
- XSS
- Insecure Deserialization
- Using vulnerable libraries
- Insufficient logging/monitoring
Security is not a feature — it’s a mindset. Design secure systems by integrating security:
- At architecture level
- At code level
- At infra level
- During operations