Skip to content

vivek-panchal/System-Design-LLD-HLD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Fundamentals of System Design

Youtube Playlist Link :- https://youtube.com/playlist?list=PLNzPfhXM567Rv1GAnHgVslVfTsjBstuph&si=R3TXcs0F_IW0vclS


What is System Design?

Answer:

System Design is the process of defining the architecture, components, and data flow of a system to meet both functional and non-functional requirements (like scalability, reliability, and performance).

In simple terms, it’s about creating the blueprint of a large-scale system — how different parts (APIs, databases, services, caches, etc.) interact to serve millions of users efficiently.

Key Components:

  1. Functional Requirements – What the system should do.
  2. Non-Functional Requirements – How well it should perform (scalability, latency, fault tolerance).
  3. High-Level Design – System architecture, data flow, service interactions.
  4. Low-Level Design – Data models, algorithms, and detailed interactions between components.

Why is System Design Important?

Answer:

System Design is important because it ensures that a system can scale, perform, and remain reliable as it grows. It helps engineers make the right architectural decisions early, preventing costly rework later.

Key Reasons:

  1. Scalability – Handles increased user load efficiently.
  2. Reliability – Ensures the system stays available even if parts fail.
  3. Performance – Maintains low latency and fast response times.
  4. Maintainability – Makes the system easier to update and debug.
  5. Cost Efficiency – Optimizes infrastructure and operational costs.

1. Network & Communication

Why Does Networking Matter in System Design?

Answer:

Networking is the backbone of distributed systems — every service-to-service call, API request, and database query relies on network communication. Understanding networking helps you design systems that are fast, reliable, and fault-tolerant.

Key Reasons:

  1. Service Communication – Enables interaction between microservices, databases, and clients.
  2. Performance Optimization – Helps reduce latency and bandwidth usage.
  3. Scalability – Supports horizontal scaling using load balancers and distributed nodes.
  4. Fault Tolerance – Proper network design avoids single points of failure.
  5. Security – Ensures safe data transfer using encryption and secure protocols (HTTPS, TLS).

How Does Networking Impact Large-Scale Systems?

Answer:

In large-scale systems, networking directly affects performance, scalability, and reliability. As the system grows, data must travel across multiple servers, regions, and services — making network efficiency critical.

Key Impacts:

  1. Latency – Each network call adds delay; optimizing connections reduces response time.
  2. Bandwidth Usage – Inefficient data transfer can slow down or overload the system.
  3. Scalability – Proper network design (load balancers, CDNs, replication) enables horizontal scaling.
  4. Fault Tolerance – Redundant paths and retries help handle node or region failures.
  5. Consistency – Network delays can cause data synchronization issues across distributed nodes.
  6. Cost – Cross-region data transfers and high bandwidth usage increase infrastructure costs.

🌐 Understanding of IP Address


Introduction to IP Address

Answer:

An IP (Internet Protocol) Address is a unique identifier assigned to every device connected to a network. It allows devices (like servers, clients, routers) to find and communicate with each other over the internet or a local network.

Key Points:

  1. Purpose – Identifies and locates devices in a network.

  2. Format – Can be IPv4 (e.g., 192.168.1.1) or IPv6 (e.g., 2001:0db8::1).

  3. Types

    • Public IP: Used for communication over the internet.
    • Private IP: Used within internal/local networks.
  4. Static vs Dynamic:

    • Static IP – Manually assigned, doesn’t change.
    • Dynamic IP – Assigned automatically by DHCP, can change over time.

What is IPv4?

Answer:

IPv4 (Internet Protocol version 4) is the fourth version of the Internet Protocol and the most widely used system for identifying devices on a network. It uses a 32-bit address format, allowing around 4.3 billion unique addresses.

Key Points:

  1. Format: Four numbers separated by dots (e.g., 192.168.1.1).
  2. Address Range: 0.0.0.0 to 255.255.255.255.
  3. Address Space: ~4.3 billion addresses (2³²).
  4. Representation: Each part (octet) ranges from 0–255.
  5. Limitation: Due to internet growth, IPv4 addresses are running out — leading to IPv6 adoption.

What is IPv6?

Answer:

IPv6 (Internet Protocol version 6) is the successor to IPv4, developed to overcome the limitation of IPv4’s address exhaustion. It uses a 128-bit address format, providing an almost unlimited number of unique IP addresses.

Key Points:

  1. Format: Eight groups of four hexadecimal digits separated by colons (e.g., 2001:0db8:85a3:0000:0000:8a2e:0370:7334).
  2. Address Space: 2¹²⁸ possible addresses — enough for every device globally.
  3. Simplified Configuration: Supports auto-configuration without DHCP.
  4. Security: Built-in support for IPsec (encryption and authentication).
  5. Efficiency: Improved routing and reduced network congestion compared to IPv4.

Private vs Public IP Address

Answer:

Private IP addresses are used within local networks (like home, office, or internal company systems), while Public IP addresses are used to communicate over the internet.

Aspect Private IP Public IP
Scope Used within a local/private network Used across the internet
Access Not directly accessible from the internet Accessible globally
Assigned By Local network devices (e.g., router/DHCP) Internet Service Provider (ISP)
Examples (IPv4) 10.0.0.0 – 10.255.255.255
172.16.0.0 – 172.31.255.255
192.168.0.0 – 192.168.255.255
Any address not in the private IP ranges
Security More secure, isolated from external access Exposed to external traffic, needs protection
Use Case Home LANs, corporate intranets, internal servers Web servers, public APIs, cloud services

In short: Private IP = for internal communication. Public IP = for external (internet) communication.


Why Do We Need Private IPs?

Answer:

Private IPs are essential for enabling internal communication within local networks without consuming public IP addresses. They help build secure, isolated environments for devices to connect and share resources efficiently.

Key Reasons:

  1. Address Conservation: Reduces the need for public IPs, which are limited in IPv4.
  2. Internal Communication: Allows devices (like laptops, printers, and servers) to connect within the same network.
  3. Security: Keeps internal systems hidden from the public internet, reducing attack risks.
  4. Cost Efficiency: No need for every device to have its own public IP.
  5. Network Flexibility: Enables local routing, DHCP usage, and NAT (Network Address Translation).

In short: Private IPs make internal networking secure, scalable, and cost-effective without exposing internal systems to the public web.


The Role of IP in System Design

Answer:

IP (Internet Protocol) plays a fundamental role in system design by enabling communication between different components of a distributed system — servers, databases, load balancers, and clients — across networks.

Key Roles:

  1. Identification: Each device or service in a network is uniquely identified by an IP address.
  2. Routing: IP ensures that data packets reach the correct destination across local or global networks.
  3. Scalability: Allows horizontal scaling by assigning unique IPs to new servers or microservices.
  4. Load Balancing: Load balancers use IPs to route traffic to multiple backend servers.
  5. Security & Isolation: Private IPs separate internal traffic from public exposure, improving security.
  6. High Availability: Enables failover and redundancy through multiple IP-based routes and replicas.

In short: IP addresses are the foundation of connectivity in any large-scale system — making communication, scalability, and reliability possible.


🌐 How DNS Works


Introduction to DNS

Answer:

DNS (Domain Name System) is the internet’s phonebook, translating human-readable domain names (like google.com) into IP addresses (like 142.250.190.78) that computers use to identify each other on a network.

Without DNS, users would have to remember complex IP addresses to access websites.

Key Points:

  1. Purpose: Converts domain names to IP addresses for easy access.
  2. Analogy: Just like saving a contact name instead of remembering a phone number.
  3. Function: When you enter a website URL, DNS finds the correct IP so your browser can connect to the right server.
  4. Hierarchy: DNS operates through a distributed hierarchy of servers — Root, TLD, and Authoritative servers.
  5. Performance: DNS caching helps speed up future lookups and reduce latency.

In short: DNS makes the internet human-friendly by bridging the gap between names and numerical IP addresses.


Types of DNS Servers

Answer:

DNS operates through a hierarchy of servers, each playing a specific role in resolving a domain name to its IP address. The resolution process usually involves multiple servers working together.

Main Types of DNS Servers:

  1. Recursive Resolver:

    • The first server contacted by your browser or device.
    • Responsible for querying other DNS servers to find the correct IP address.
    • Often managed by ISPs or public DNS providers (e.g., Google DNS 8.8.8.8).
  2. Root DNS Server:

    • The starting point of DNS resolution.
    • Directs queries to the correct Top-Level Domain (TLD) server (like .com, .org, .net).
  3. TLD (Top-Level Domain) Server:

    • Manages domain extensions (e.g., .com, .in, .org).
    • Points to the Authoritative DNS Server of the specific domain.
  4. Authoritative DNS Server:

    • The final source of truth for a domain.
    • Holds actual DNS records (like A, AAAA, MX, and CNAME) and provides the corresponding IP address.

In short: The DNS resolution chain works as: Client → Recursive Resolver → Root Server → TLD Server → Authoritative Server → IP Address Returned


DNS Caching and Performance Optimization


Why Caching Matters

Answer:

DNS caching is crucial because it reduces lookup time and improves performance by storing previously resolved domain-to-IP mappings. Instead of querying DNS servers repeatedly, the system can reuse cached results, leading to faster response times and lower network load.

Benefits:

  • Reduces DNS query latency.
  • Minimizes load on upstream DNS servers.
  • Improves user experience with quicker website access.
  • Decreases bandwidth and infrastructure costs.

Where Caching Occurs

Answer:

Caching can happen at multiple layers in the DNS resolution chain:

  1. Browser Cache: Stores recent DNS lookups for a short duration to speed up repeated visits.

  2. Operating System (OS) Cache: The OS maintains a local DNS cache shared across applications.

  3. Recursive Resolver Cache: ISPs or public resolvers (like Google DNS, Cloudflare DNS) cache responses to serve multiple users efficiently.

  4. CDN or Proxy Cache: Content Delivery Networks often cache DNS responses close to users to reduce lookup distance.


What is TTL (Time To Live)

Answer:

TTL (Time To Live) defines how long a DNS record can be stored in a cache before it must be refreshed from the authoritative server. It’s measured in seconds and balances freshness with performance.

Example: If a DNS record has a TTL of 3600, it means the cached record is valid for 1 hour.

Key Insight:

  • Higher TTL: Better performance, slower updates.
  • Lower TTL: Faster updates, more DNS lookups.

The Domain Name Resolution Process — Step-by-step

Answer:

  1. User enters URL / app requests domain The client (browser or app) starts by asking the OS to resolve the domain name.

  2. Browser cache check Browser checks its internal DNS cache. If found and not expired, use it and finish.

  3. OS / Local resolver cache If browser cache misses, the OS DNS cache (or local DNS resolver) is checked next.

  4. Hosts file check The OS checks the local hosts file for a static mapping. If present, use it.

  5. Query to Recursive Resolver If still unresolved, the OS sends a DNS query to the configured recursive resolver (usually ISP or public DNS like 8.8.8.8). The recursive resolver will perform the full resolution on behalf of the client.

  6. Recursive resolver cache The resolver checks its cache. If a cached answer exists and is valid (TTL not expired), it returns the IP to the client.

  7. Root server query (if no cache) If the resolver has no cached answer, it queries a Root DNS server. The Root responds with the authoritative TLD server location for the domain’s extension (e.g., .com).

  8. TLD server query The resolver queries the TLD server returned by the Root. The TLD responds with the authoritative name server(s) for the specific domain.

  9. Authoritative server query The resolver queries the domain’s authoritative DNS server. The authoritative server returns the final DNS record (A, AAAA, CNAME, etc.) with its TTL.

  10. Resolver returns result to client The recursive resolver caches the answer (for TTL seconds) and returns the IP address to the client.

  11. Client caches the result The OS and browser cache the record according to TTL for future requests.

  12. Client connects to server With the IP known, the client initiates a TCP/TLS connection (e.g., TCP handshake, TLS handshake) to the server and requests the resource (HTTP request).

  13. Subsequent requests Subsequent lookups follow the same flow but often end at cache layers (browser, OS, resolver) until TTL expiry.

Notes:

  • Caching at multiple layers reduces lookup time and network load.
  • Recursive resolvers can perform iterative queries: they query Root → TLD → Authoritative and assemble the answer.
  • Modern variants include DNS over HTTPS (DoH) and DNS over TLS (DoT), which encrypt DNS queries between client and resolver.

Importance of DNS in Large-Scale Systems

Answer:

DNS plays a critical role in large-scale systems by ensuring smooth routing, scalability, and high availability across distributed infrastructure. It acts as the entry point for almost every user request to reach the right server or service.

Key Reasons:

  1. Traffic Routing: DNS directs user requests to appropriate servers, data centers, or regions based on geography, latency, or load.

  2. Scalability: Enables horizontal scaling by balancing traffic across multiple IPs or service instances.

  3. High Availability: Allows failover by rerouting traffic to healthy servers or backup regions when one fails.

  4. Load Balancing Integration: Works with DNS-based load balancing (like round-robin or geo-DNS) to distribute requests efficiently.

  5. Performance Optimization: DNS caching reduces resolution time, improving user experience for global systems.

  6. Service Discovery: Helps microservices find each other dynamically through DNS-based service registries.

  7. Security and Control: Supports protection via DNS filtering, DNSSEC, and traffic management policies.

In summary: DNS is not just a name-to-IP translator—it’s a strategic component in system design that ensures speed, resilience, and global scalability in modern distributed systems.


🖥️ Client-Server Model


What is the Client-Server Model?

Answer:

The Client-Server Model is a network architecture where multiple clients (users or devices) request and receive services or data from a centralized server. The server hosts, manages, and delivers resources, while the client consumes them.

Key Points:

  1. Client: Sends requests (e.g., browser, mobile app, API consumer).

  2. Server: Processes requests and sends responses (e.g., web server, database server).

  3. Communication: Usually happens over a network using standard protocols like HTTP or TCP/IP.

  4. Scalability: Multiple clients can interact with one or more servers concurrently.

  5. Examples:

    • Web browsing (browser → web server)
    • Mobile apps using backend APIs
    • Database queries (application → DB server)

In short: The client-server model forms the foundation of modern internet architecture, enabling distributed computing and centralized control.


Key Components of the Client-Server Model

Answer:

The Client-Server Model consists of several core components that work together to enable communication, data exchange, and service delivery between clients and servers.

Key Components:

  1. Client: The end-user device or application that sends requests to access data or services (e.g., web browser, mobile app, API consumer).

  2. Server: The centralized system that receives client requests, processes them, and returns responses (e.g., web server, database server, file server).

  3. Network: The medium that connects clients and servers, enabling data transmission over LAN, WAN, or the internet.

  4. Request: The message sent by the client to the server, specifying what data or operation is needed.

  5. Response: The server’s reply to the client, containing the requested data or the result of an operation.

  6. Protocols: The communication rules that define how clients and servers interact — commonly HTTP/HTTPS, TCP/IP, WebSocket, etc.

  7. Database (optional): A backend component where the server stores and retrieves persistent data.

In summary: These components together create a structured system where clients request, servers process, and networks connect, forming the basis of all web and distributed applications.


How Do Client and Server Communicate?

Answer:

Client and server communication happens through a network using well-defined protocols (like HTTP or TCP/IP). The client initiates a request, and the server processes it and sends back a response. This exchange enables everything from web browsing to API calls.


Basic Steps of Communication

  1. Connection Establishment: The client establishes a connection with the server, typically over TCP/IP.

  2. Request Sending: The client sends a request (e.g., an HTTP GET or POST) to the server specifying what it needs.

  3. Processing by Server: The server receives the request, processes it (e.g., fetches data, runs logic), and prepares a response.

  4. Response Sending: The server sends the response (data or status) back to the client.

  5. Connection Termination (or Reuse): The connection is closed, or kept alive for further requests (using persistent connections).


Types of Client-Server Communication

  1. Synchronous Communication:

    • The client waits for the server to respond before proceeding.
    • Example: HTTP request from a browser to a web server.
    • Suitable for real-time, request-response systems.
  2. Asynchronous Communication:

    • The client doesn’t wait for an immediate response; communication happens in the background.
    • Example: Message queues (Kafka, RabbitMQ), WebSockets, event-driven systems.
    • Useful for high-performance, decoupled architectures.

In short: Clients and servers communicate by exchanging requests and responses over network protocols — either synchronously for direct interactions or asynchronously for scalable, event-driven systems.


The HTTP Request–Response Cycle

Answer:

The HTTP request–response cycle is the standard flow where a client (browser or app) sends an HTTP request to a server, the server processes it, and returns an HTTP response. It includes optional TCP/TLS handshakes, routing via proxies/load balancers, and caching layers.

Basic Steps (short):

  1. DNS lookup — resolve domain to IP.
  2. TCP handshake — client and server establish a TCP connection (SYN, SYN-ACK, ACK).
  3. TLS handshake (optional) — negotiate encryption (if HTTPS).
  4. HTTP request — client sends request line, headers, optional body.
  5. Server processing — server (and backend services/databases) handle the request.
  6. HTTP response — server sends status line, headers, and body.
  7. Connection close / keep-alive — connection either closed or reused.
  8. Client processes response — render page, cache, or follow redirects.

ASCII Flow (fallback)

Client
  |
  v
DNS lookup -> TCP handshake -> (TLS handshake if HTTPS)
  |
  v
Send HTTP Request (request-line + headers + body)
  |
  v
[Load Balancer / Reverse Proxy]
  |
  v
Web/App Server ---> Backend services / Database
  |
  v
HTTP Response (status + headers + body)
  |
  v
Client (render / cache / follow-up)
  |
  v
Connection closed or reused (keep-alive)

Synchronous and Asynchronous Communication Model

Answer:

In system design, synchronous and asynchronous communication models define how services exchange data and when they wait for responses. Choosing between them affects performance, scalability, and user experience.


1. Synchronous Communication

Definition: In synchronous communication, the client waits for the server to respond before continuing its work. The interaction happens in real-time.

Example: A web browser sending an HTTP request and waiting for the server’s response.

Characteristics:

  • Request and response occur in the same session.
  • Simpler to implement and debug.
  • Tight coupling between client and server availability.
  • Higher latency if one service is slow.

Use Cases:

  • API calls (REST, gRPC sync).
  • Payment gateways.
  • Real-time validation (login, form submission).

2. Asynchronous Communication

Definition: In asynchronous communication, the client doesn’t wait for an immediate response. Requests are queued or processed in the background, and the client is notified later.

Example: A message is sent to a queue (e.g., Kafka, RabbitMQ), and the server processes it later.

Characteristics:

  • Non-blocking and decoupled communication.
  • Higher scalability and fault tolerance.
  • More complex to design (needs message queues, event handling).
  • Delayed response or eventual consistency.

Use Cases:

  • Background jobs (email, notifications).
  • Event-driven systems (order processing, analytics).
  • Microservice communication through queues or pub/sub.

In short:

  • Synchronous: Real-time, blocking, simple.
  • Asynchronous: Background, non-blocking, scalable.

Stateless and Stateful Servers

In system design, servers are often categorized as stateless or stateful depending on whether they retain information (state) about client interactions between requests. Understanding this distinction is critical for designing scalable and reliable distributed systems.


1. Stateless Server

Definition: A stateless server does not store any client-specific data between requests. Each request from the client contains all the necessary information for the server to process it.

Characteristics:

  • Each request is independent.
  • Easier to scale horizontally since any server can handle any request.
  • Simpler to maintain and recover after failures.
  • No session data stored on the server side.

Example: HTTP is inherently a stateless protocol. REST APIs are typically designed to be stateless, meaning every request must include authentication tokens and all required context.

Use Cases:

  • Web APIs (RESTful services).
  • Content delivery systems.
  • Load-balanced environments.

2. Stateful Server

Definition: A stateful server maintains information about the client’s session across multiple requests. The server “remembers” previous interactions to provide continuity.

Characteristics:

  • Server stores client-specific session data.
  • Requests depend on prior interactions.
  • Harder to scale horizontally because sessions must be maintained or shared.
  • Requires session management mechanisms (e.g., sticky sessions, distributed caches).

Example: Online banking applications, multiplayer games, or chat applications often require stateful servers to track user sessions and data.

Use Cases:

  • Real-time chat systems.
  • Multiplayer online games.
  • E-commerce carts with server-side sessions.

Key Differences

Aspect Stateless Server Stateful Server
Session Data Not stored on the server Stored on the server
Scalability Highly scalable Harder to scale
Fault Tolerance Easier to recover Session loss on failure
Complexity Simpler More complex
Example REST API Online banking system

In summary, stateless servers are ideal for scalability and reliability, while stateful servers are necessary when maintaining session continuity or user context across interactions.


Proxy

A proxy is an intermediary server that sits between a client and a destination server, forwarding requests and responses between them. It acts as a gateway, managing communication, enhancing performance, improving security, and providing anonymity.


How It Works

When a client sends a request, it first goes to the proxy server.

  • The proxy forwards the request to the destination server on behalf of the client.
  • The response from the destination server returns to the proxy, which then sends it back to the client.

This setup allows the proxy to inspect, modify, cache, or filter traffic as needed.

Key Benefits

  • Security: Hides client IPs and filters malicious requests.
  • Caching: Stores frequently accessed data to reduce latency and bandwidth usage.
  • Load Management: Distributes traffic among multiple servers.
  • Access Control: Restricts access to certain content or sites.
  • Anonymity: Masks client identity for privacy.

Common Types of Proxies

  • Forward Proxy: Acts on behalf of clients (used for caching or filtering outgoing requests).
  • Reverse Proxy: Acts on behalf of servers (used for load balancing, SSL termination, and caching).

In short, a proxy is a middle layer that improves security, performance, and scalability in distributed systems.


Forward Proxy

A forward proxy is a server that sits between the client and the internet, acting on behalf of the client to send requests to external servers. It hides the client’s identity and can control or monitor outbound traffic.

How It Works

  1. The client sends a request to the forward proxy.
  2. The proxy checks policies, caching, or filters before forwarding it to the target server.
  3. The server’s response returns to the proxy, which then forwards it back to the client.

Key Use Cases

  • Access Control: Restricts user access to specific websites or resources.
  • Caching: Stores frequently accessed data to reduce network load and latency.
  • Anonymity: Masks client IP addresses for privacy.
  • Monitoring: Logs or inspects outbound traffic for security or analytics.

Example

In a corporate network, a forward proxy ensures all employee web requests go through it — enforcing policies and blocking unauthorized sites.

In essence, a forward proxy primarily represents the client, helping manage, filter, and optimize outgoing traffic to external servers.



Reverse Proxy

A reverse proxy is a server that sits in front of one or more backend servers and acts on behalf of those servers to handle incoming client requests. It hides the details of the backend infrastructure and manages traffic efficiently.

How It Works

  1. The client sends a request to the reverse proxy (instead of directly to the backend server).
  2. The reverse proxy determines which backend server should handle the request.
  3. It forwards the request, receives the response from the server, and sends it back to the client.

The client never directly communicates with the backend servers.

Key Use Cases

  • Load Balancing: Distributes incoming traffic across multiple servers for better scalability.
  • Security: Masks backend servers’ IP addresses and filters malicious requests.
  • Caching: Stores static or frequently accessed responses to reduce server load.
  • SSL Termination: Handles SSL/TLS encryption and decryption to offload backend servers.
  • Compression & Optimization: Improves response speed and bandwidth efficiency.

Example

A website like example.com may use NGINX as a reverse proxy to route traffic among multiple application servers, cache responses, and manage SSL certificates.

In summary, a reverse proxy acts on behalf of servers, improving performance, scalability, and security in large-scale systems.


Difference Between Forward and Reverse Proxy

Aspect Forward Proxy Reverse Proxy
Acts On Behalf Of Client Server
Primary Purpose Controls and manages outbound traffic from clients Manages and optimizes inbound traffic to servers
Visibility The target server does not know the real client The client does not know the real backend server
Typical Use Cases Caching, access control, anonymity, monitoring Load balancing, security, caching, SSL termination
Location in Network Sits between client and the internet Sits between internet and backend servers
Example Tools Squid, Privoxy NGINX, HAProxy, Apache HTTP Server (mod_proxy)
Used By Clients (e.g., corporate users) Servers (e.g., web applications)

In short:

  • A forward proxy protects and represents clients.
  • A reverse proxy protects and represents servers.

Load Balancing

Why Load Balancing is Needed

Load balancing is essential for distributing incoming network or application traffic evenly across multiple servers. It ensures that no single server becomes a bottleneck, improving availability, performance, and scalability of the system.


Key Reasons for Using Load Balancing

  1. High Availability: If one server fails, traffic can be automatically redirected to healthy servers, preventing downtime.

  2. Scalability: Easily add or remove servers based on demand without affecting users.

  3. Optimized Resource Utilization: Balances requests to prevent overloading certain servers while others remain idle.

  4. Improved Performance: Reduces response time by routing requests to the least-loaded or nearest server.

  5. Fault Tolerance: Helps maintain smooth operations even when individual components fail.

  6. Maintenance Without Downtime: Servers can be updated or maintained without interrupting user access.

In short, load balancing enables systems to handle large-scale traffic efficiently, ensuring reliability and a seamless user experience.


Types of Load Balancers

Load balancers can be classified in two main ways — based on the OSI layer they operate on and based on how they are deployed.


1. Based on Layer

Type Layer Description Example Use Case
Layer 4 Load Balancer (Transport Layer) Operates at TCP/UDP level Routes traffic based on network information such as IP address and port. Does not inspect application data. Useful for simple, fast routing — e.g., balancing HTTP or database traffic.
Layer 7 Load Balancer (Application Layer) Operates at HTTP/HTTPS level Makes routing decisions based on content (URL path, headers, cookies, etc.). Ideal for intelligent routing — e.g., sending image requests to a CDN and API requests to backend servers.

2. Based on Deployment

Type Description Examples
Hardware Load Balancer Physical appliance designed for high-performance load balancing in enterprise data centers. F5, Citrix ADC
Software Load Balancer Runs on general-purpose servers; more flexible and cost-effective. NGINX, HAProxy, Apache HTTP Server
Cloud/Managed Load Balancer Provided as a managed service by cloud providers; automatically scales and manages routing. AWS ELB, Google Cloud Load Balancing, Azure Load Balancer

In summary:

  • Layer-based load balancers decide how routing happens (network vs. application level).
  • Deployment-based load balancers decide where and what kind of infrastructure handles the balancing.

Load Balancing Strategies

Load balancing strategies define how traffic is distributed across servers. They can be broadly categorized into static and dynamic approaches.


1. Static Load Balancing

In static load balancing, the traffic distribution is predefined and does not depend on real-time server load or performance. The load balancer follows fixed algorithms to decide which server handles each request.

Key Characteristics:

  • Simpler to implement.
  • No monitoring of server health or performance.
  • Best suited for systems where server capacities are uniform and predictable.

Common Algorithms:

  • Round Robin: Requests are distributed sequentially across servers.
  • Weighted Round Robin: Servers with higher capacity get more requests.
  • IP Hashing: Requests from the same client IP always go to the same server.

Example: A set of identical web servers handling equal-sized requests using Round Robin distribution.


2. Dynamic Load Balancing

In dynamic load balancing, the distribution adapts in real time based on the current load, health, and performance of servers.

Key Characteristics:

  • Requires monitoring of CPU, memory, or response time.
  • Automatically adjusts to handle changing workloads.
  • Better suited for large-scale, high-traffic, or variable-load systems.

Common Algorithms:

  • Least Connections: Routes requests to the server with the fewest active connections.
  • Least Response Time: Sends requests to the server responding fastest.
  • Resource-Based: Considers metrics like CPU or memory usage for routing.

Example: A dynamic web application routing traffic to the least busy or fastest-responding server to ensure consistent performance.


In summary:

  • Static strategies are simple but inflexible.
  • Dynamic strategies are intelligent and adaptive for modern, large-scale systems.

Choosing the Right Load Balancer

Selecting the right load balancer depends on your system architecture, traffic patterns, scalability goals, and operational requirements. The goal is to balance performance, cost, and complexity while ensuring reliability.

Key Factors to Consider

  1. Traffic Type and Protocols

    • For low-level TCP/UDP traffic → use a Layer 4 load balancer.
    • For HTTP/HTTPS or content-based routing → use a Layer 7 load balancer.
  2. Scalability Needs

    • If you expect traffic to grow dynamically, a cloud-based or software load balancer with auto-scaling support is ideal.
  3. Budget and Infrastructure

    • Hardware load balancers offer performance but are expensive.
    • Software or cloud load balancers are cost-effective and easier to maintain.
  4. Health Monitoring and Failover

    • Choose a load balancer that supports real-time health checks and automatic rerouting on server failures.
  5. Security Requirements

    • For SSL termination, DDoS protection, or request filtering, go with a reverse proxy-based load balancer (e.g., NGINX, AWS ALB).
  6. Deployment Environment

    • On-premises systems → Hardware or software load balancer.
    • Cloud-native systems → Managed services like AWS ELB, GCP Load Balancer, or Azure LB.

Examples

Scenario Recommended Load Balancer
Static website with predictable traffic NGINX (software L4/L7)
Scalable microservice-based system AWS Application Load Balancer
Enterprise data center F5 Hardware Load Balancer
API gateway for large-scale apps HAProxy or NGINX with Layer 7 routing

In summary: Choose a load balancer that aligns with your traffic patterns, scaling requirements, and operational constraints while ensuring high availability and performance.


API Gateway

Introduction to API Gateway

An API Gateway is a single entry point for all client requests in a distributed system, especially in microservices architectures. It acts as an intermediary between clients and backend services, managing, routing, and securing API calls.

Key Functions

  • Request Routing: Directs incoming requests to the appropriate backend service.
  • Authentication & Authorization: Verifies client identity and permissions before forwarding requests.
  • Rate Limiting & Throttling: Controls traffic flow to prevent overloading backend services.
  • Load Balancing: Distributes requests across multiple instances of a service.
  • Caching: Stores frequently accessed responses to improve performance.
  • Request Transformation: Modifies headers, parameters, or payloads before sending to backend.
  • Monitoring & Logging: Tracks API usage, latency, and errors for observability.

Why It’s Important

  • Simplifies client communication by providing a unified API endpoint.
  • Improves security, performance, and manageability of microservices.
  • Reduces client complexity by abstracting multiple backend calls into a single request.

Example

In a microservices-based e-commerce app:

  • The API Gateway routes /orders to the Order Service, /users to the User Service, and /payments to the Payment Service — all through one unified public endpoint.

In short, an API Gateway is the front door to your microservices ecosystem, handling routing, security, and performance optimization.


How API Gateways Work

An API Gateway acts as a smart intermediary between clients and backend services, managing the flow of requests and responses efficiently. It centralizes common system concerns like authentication, routing, rate limiting, and monitoring.


Step-by-Step Flow

  1. Client Request: A client (web, mobile, or IoT) sends an API request to the gateway instead of directly contacting backend services.

  2. Routing & Authentication: The gateway authenticates the request (using tokens, API keys, etc.) and determines which backend service should handle it.

  3. Request Transformation (Optional): The gateway may modify headers, payloads, or URLs to match the backend service’s requirements.

  4. Forwarding to Backend: The request is then forwarded to the appropriate backend microservice.

  5. Response Aggregation (if needed): If the request involves multiple services, the gateway aggregates responses into a single unified output.

  6. Response Back to Client: The gateway returns the final processed response to the client, possibly with caching, compression, or formatting applied.


Illustration of Flow

Client → API Gateway → Authentication → Routing → Backend Services  
                                     ↑
                          Monitoring, Caching, Logging

Example

A mobile app requests /user/profile:

  • The API Gateway authenticates the request using a JWT.
  • Routes it to the User Service.
  • Caches the response for repeated calls.
  • Sends the processed data back to the client.

In summary, the API Gateway simplifies communication between clients and microservices by handling routing, security, transformation, and aggregation at a centralized layer.


Benefits of Using an API Gateway

An API Gateway provides a centralized layer for managing and optimizing communication between clients and backend services. It simplifies system architecture and improves performance, security, and scalability.

Key Benefits

  1. Centralized Entry Point

    • Clients interact with a single endpoint instead of multiple microservices.
    • Simplifies client logic and reduces network complexity.
  2. Improved Security

    • Handles authentication, authorization, and rate limiting at one layer.
    • Hides internal service details and prevents direct client access to microservices.
  3. Load Balancing and Traffic Management

    • Distributes incoming traffic evenly across backend instances.
    • Supports throttling and circuit breaking to maintain stability under load.
  4. Request and Response Transformation

    • Modifies headers, formats, or payloads to maintain compatibility between clients and services.
  5. Caching and Performance Optimization

    • Stores frequently requested responses to reduce latency and backend load.
  6. Monitoring and Logging

    • Provides unified metrics, request tracing, and logging for better observability.
  7. Protocol Translation

    • Converts between protocols (e.g., HTTP to gRPC, REST to WebSocket) for interoperability.
  8. Response Aggregation

    • Combines data from multiple microservices into a single client response, reducing the number of network calls.

In short, an API Gateway enhances security, performance, and developer productivity by acting as a smart, centralized control point for all service communication.


Security Features in API Gateways

An API Gateway serves as the first line of defense for backend services, protecting them from unauthorized access, malicious traffic, and misuse. It centralizes security enforcement across all APIs in the system.

Key Security Features

  1. Authentication

    • Verifies the identity of clients before allowing access.
    • Supports mechanisms like JWT (JSON Web Tokens), OAuth 2.0, API keys, or Basic Auth.
  2. Authorization

    • Ensures that authenticated clients have permission to access specific resources or actions.
    • Implements role-based or policy-based access control.
  3. Rate Limiting and Throttling

    • Limits the number of requests per client in a specific time window.
    • Prevents abuse, brute-force attacks, and protects backend resources from overload.
  4. IP Whitelisting and Blacklisting

    • Restricts access based on IP addresses, blocking malicious or unauthorized sources.
  5. Encryption (SSL/TLS Termination)

    • Ensures secure data transmission over HTTPS.
    • Offloads SSL decryption from backend servers to reduce their overhead.
  6. Request Validation and Filtering

    • Validates incoming payloads, headers, and parameters to prevent malformed or malicious requests.
    • Blocks common attacks like SQL injection, XSS, or header tampering.
  7. API Key Management

    • Issues and manages unique API keys for tracking and controlling client usage.
  8. CORS (Cross-Origin Resource Sharing) Control

    • Defines which domains are allowed to access the APIs, preventing unauthorized cross-origin calls.
  9. Logging and Audit Trails

    • Captures detailed logs of requests, responses, and errors for auditing and incident investigation.

In summary, an API Gateway enforces consistent security policies across all microservices, ensuring authentication, authorization, encryption, and traffic control in one centralized layer.


Caching for Performance Optimization

Why Caching

Caching reduces latency and backend load by storing frequently accessed data closer to the client or gateway. It improves response time, scalability, and overall system performance.


Types of Caching

  1. Client-Side Caching: Data stored in the browser or app for quick reuse.
  2. API Gateway Caching: Gateway stores common API responses to avoid repeated backend calls.
  3. Server-Side Caching: Backend servers or databases use in-memory stores (e.g., Redis, Memcached) to serve repeated queries faster.
  4. CDN Caching: Static content cached at edge locations for global performance improvement.

API Composition and Aggregation

API Composition

API Composition is a technique where the API Gateway combines data from multiple microservices into a single unified response for the client.

  • Reduces the number of client requests.
  • Simplifies client logic.
  • Common in microservices architectures where data resides across multiple services.

Example: A /user/profile endpoint may fetch data from User, Order, and Payment services and return it as one combined response.

API Aggregation

API Aggregation is a broader pattern where multiple backend API calls are aggregated, transformed, or filtered before sending the final result to the client.

  • Useful when clients need composite data.
  • Can include merging, filtering, or data transformation.

Example: An e-commerce API Gateway aggregates data from inventory, pricing, and reviews services to present a complete product detail page.

In short:

  • Composition = Combining multiple API calls into one response.
  • Aggregation = Enhancing or transforming combined results for optimized delivery.

Popular API Gateway Implementations

1. Open Source Solutions

Tool Description
Kong Lightweight, high-performance gateway built on NGINX; supports plugins for authentication, rate limiting, and logging.
NGINX Widely used as a reverse proxy and API gateway; offers load balancing, caching, and SSL termination.
HAProxy Reliable and fast TCP/HTTP load balancer that can be configured for gateway functionalities.
Traefik Modern, cloud-native gateway with automatic service discovery and Let’s Encrypt integration.
Tyk Full-featured open-source API gateway with analytics, authentication, and developer portal support.
Express Gateway Built on Node.js, focuses on easy API management using JavaScript-based configuration.

2. Cloud-Based Solutions

Provider Service Description
Amazon Web Services (AWS) API Gateway Fully managed service supporting REST, WebSocket, and HTTP APIs with built-in monitoring and security.
Google Cloud Platform (GCP) API Gateway / Apigee Enterprise-grade API management with analytics, versioning, and policy enforcement.
Microsoft Azure API Management (APIM) Centralized gateway for securing, monitoring, and scaling APIs.
Kong Cloud Kong Konnect Managed version of Kong offering cloud scalability and integrated observability.
Cloudflare API Gateway Focused on edge-level protection, rate limiting, and bot mitigation.

Summary:

  • Open source gateways provide flexibility and control for custom setups.
  • Cloud-based gateways offer managed scalability, security, and reduced operational overhead.

When to Use an API Gateway

Use an API Gateway when you need centralized control, scalability, and simplified client interaction across multiple services.

Ideal Scenarios:

  1. Microservices Architecture – To route and manage traffic across many small services.
  2. Unified Entry Point – When clients (mobile/web) need a single endpoint instead of calling multiple services.
  3. Security Enforcement – To handle authentication, authorization, and rate limiting in one place.
  4. Traffic Management – For load balancing, caching, and request throttling.
  5. Cross-Cutting Concerns – When you need logging, monitoring, or analytics for all APIs.
  6. Response Aggregation – When client responses need to be composed from multiple services.

When to Avoid an API Gateway

Avoid an API Gateway if it adds unnecessary complexity or latency for your use case.

Avoid in These Situations:

  1. Small or Monolithic Applications – A direct client-to-server model is simpler and faster.
  2. Low Traffic Systems – Gateway setup and maintenance may not justify the overhead.
  3. Latency-Sensitive Applications – Extra network hop can increase response time.
  4. Simple Internal APIs – If only internal services communicate, direct service-to-service calls may be better.

In summary:

  • Use an API Gateway for scalability, security, and central management in distributed systems.
  • Avoid it when simplicity and low latency are higher priorities than centralized control.

Content Delivery Networks (CDN) in System Design

Introduction to CDN

A Content Delivery Network (CDN) is a globally distributed network of servers that deliver web content (like images, videos, scripts, and static files) to users based on their geographic location.

Definition

A CDN is a system of edge servers placed across various regions that cache and serve content from the nearest server to the user, improving speed and reliability.

Why CDN Exists

CDNs exist to reduce latency, improve content delivery speed, enhance availability, and reduce server load by bringing content physically closer to end-users.

In short: CDNs make websites faster, scalable, and more reliable worldwide.


Why CDN is Needed

CDNs are needed to enhance performance, scalability, and reliability of content delivery. They help deliver data faster to users regardless of their location by caching content on geographically distributed servers.


Problems Without CDN

  1. High Latency: Users far from the origin server experience slow loading times.
  2. Server Overload: A single origin server must handle all requests, leading to slowdowns or crashes under high traffic.
  3. Poor Global Performance: Websites perform inconsistently across regions.
  4. Bandwidth Bottlenecks: Increased network congestion and higher data transfer costs.
  5. Low Availability: Outages at the origin server can make the entire site inaccessible.

In essence: Without a CDN, performance drops, costs rise, and reliability suffers.


CDN Architecture Overview

A Content Delivery Network (CDN) is a globally distributed system of servers designed to deliver web content efficiently and reliably. Its architecture consists of several key components that work together to reduce latency and improve user experience.

Key Components:

  1. Origin Server: The main server where the original content (e.g., images, videos, HTML files) is stored.

  2. Edge Servers (PoPs – Points of Presence): Servers located in multiple geographic locations that cache and serve content closer to end users.

  3. CDN Management Layer: Handles routing, load balancing, and cache invalidation between origin and edge servers.

  4. DNS and Request Routing System: Directs user requests to the nearest or most optimal edge server using techniques like GeoDNS or Anycast.

  5. Caching Mechanism: Stores frequently requested content at the edge to minimize repeated requests to the origin.

How It Works:

  1. User requests content (e.g., a webpage or image).
  2. The DNS routes the request to the nearest edge server.
  3. If the edge has cached content, it’s delivered immediately.
  4. If not, the edge fetches it from the origin, caches it, and serves it to the user.

This distributed approach ensures low latency, high availability, and faster content delivery across the globe.


How CDN Cache Content

A CDN caches content by storing copies of static and dynamic assets on edge servers located close to users. This helps reduce latency and server load.

Caching Process:

  1. User Request: When a user requests a resource (e.g., image, CSS, video), the request first goes to the nearest CDN edge server.

  2. Cache Lookup: The edge server checks if the requested content is already stored (cached) locally.

  3. Cache Hit: If found, the edge server directly serves the cached content to the user — fast and efficient.

  4. Cache Miss: If not found, the edge requests the content from the origin server, stores it temporarily (based on caching rules), and then serves it to the user.

Caching Control Mechanisms:

  • HTTP Headers: Cache-Control, Expires, and ETag headers define how long content stays cached.
  • Time-to-Live (TTL): Specifies how long the cached item remains valid before revalidation.
  • Cache Invalidation: Allows removing or updating outdated content manually or automatically.

By intelligently caching content, CDNs reduce latency, bandwidth costs, and load on origin servers.


Load Balancing, Failover Handling, and Request Routing in CDN

1. Load Balancing

CDNs distribute incoming user requests across multiple edge servers to avoid overloading any single node.

  • Goal: Optimize performance and ensure efficient resource usage.

  • Methods Used:

    • DNS-based load balancing – directs users to the nearest or least-loaded server.
    • Anycast routing – routes requests to the geographically closest server.
    • Health checks – ensure only active servers receive traffic.

2. Failover Handling

If an edge server or data center goes down, the CDN automatically redirects traffic to the next healthy and available node.

  • Mechanisms:

    • Continuous health monitoring of nodes.
    • Automatic rerouting during outages or degradation.
    • Multi-origin support for redundancy.

3. Request Routing

Determines which edge server handles a specific user request based on various factors:

  • Geolocation: Route to nearest edge server to reduce latency.
  • Server Load: Send requests to least-busy nodes.
  • Network Conditions: Reroute dynamically based on congestion or outages.

Together, these mechanisms ensure high availability, minimal latency, and fault tolerance in a CDN-powered system.


Compression & Minification

Compression

Compression reduces the size of files transferred between servers and clients, improving load time and reducing bandwidth usage.

Common Techniques:

  • Gzip / Brotli: Compress text-based assets like HTML, CSS, JS before transmission.
  • Image Compression: Tools like WebP or AVIF reduce image size without major quality loss.
  • Video Compression: Codecs like H.264, H.265 (HEVC) for efficient media delivery.

Benefits:

  • Faster content delivery.
  • Lower bandwidth consumption.
  • Improved user experience.

Minification

Minification removes unnecessary characters (spaces, comments, line breaks) from code files without changing functionality.

Techniques & Tools:

  • CSS/JS Minifiers: Tools like UglifyJS, Terser, CSSNano.
  • HTML Minifiers: Reduce response size by stripping redundant markup.

Benefits:

  • Smaller file size.
  • Faster parsing and execution by browsers.

In CDN systems, compression and minification are key optimization steps before caching or serving content to enhance overall performance and scalability.


Use Cases of CDN

1. Static vs Dynamic Content Delivery

  • Static Content: Includes files that don’t change often — images, CSS, JS, fonts, videos.

    • Cached and served directly from CDN edge servers.
    • Example: Serving website assets or media files globally.
  • Dynamic Content: Data generated in real-time (e.g., personalized dashboards, API responses).

    • CDNs optimize routing and use TCP/UDP optimizations to accelerate delivery.
    • Can use edge caching for partial or short-term storage.

Key Point: Static content is heavily cached, while dynamic content relies on optimized routing and edge acceleration.

2. API Acceleration & Edge Computing (CDNs for APIs)

Modern CDNs are not just for websites — they also accelerate API traffic and enable edge computing.

  • API Acceleration:

    • Reduces latency for REST and GraphQL APIs by caching responses close to users.
    • Uses smart routing, connection reuse, and HTTP/2 multiplexing.
    • Example: Speeding up e-commerce or authentication APIs.
  • Edge Computing:

    • Executes lightweight logic (e.g., authentication, request validation, personalization) directly on CDN edge servers.
    • Reduces round trips to origin servers.
    • Platforms like Cloudflare Workers and Akamai EdgeWorkers enable this.

Result: APIs become faster, more reliable, and scalable — improving end-user experience and backend efficiency.


2. PROTOCOLS

TCP & UDP

What is TCP?

TCP (Transmission Control Protocol) is a connection-oriented protocol used to ensure reliable and ordered data transmission between systems over a network.

Key Characteristics:

  • Connection-Oriented: Establishes a connection using a three-way handshake before data transfer.
  • Reliable: Guarantees delivery with acknowledgment and retransmission of lost packets.
  • Ordered Delivery: Ensures packets arrive in the same sequence they were sent.
  • Error Checking: Detects and corrects transmission errors using checksums.
  • Flow & Congestion Control: Adjusts data transfer rate based on network conditions.

Use Cases:

  • Web browsing (HTTP/HTTPS)
  • Email (SMTP, IMAP, POP3)
  • File transfers (FTP)

In system design, TCP is ideal when data accuracy and reliability are more important than speed.


What is UDP?

UDP (User Datagram Protocol) is a connectionless communication protocol that focuses on speed and low latency rather than reliability.

Key Characteristics:

  • Connectionless: No handshake; data is sent directly without establishing a connection.
  • Unreliable Delivery: Packets may be lost, duplicated, or arrive out of order — no acknowledgment or retransmission.
  • Lightweight & Fast: Minimal overhead, making it faster than TCP.
  • No Flow Control: Sender transmits data without checking receiver’s capacity.

Use Cases:

  • Real-time streaming (video/audio)
  • Online gaming
  • VoIP (Voice over IP)
  • DNS lookups

In system design, UDP is preferred when speed and real-time communication matter more than guaranteed delivery.


HTTP – The Backbone of the Web

Introduction to HTTP

HTTP (HyperText Transfer Protocol) is an application-layer protocol that defines how data is transmitted between a client (usually a browser) and a server over the web.

It is the foundation of communication for websites, APIs, and web services.

Key Characteristics:

  • Stateless: Each request is independent; the server doesn’t retain client context.
  • Text-based & Simple: Easy to read and debug.
  • Request–Response Model: The client sends a request, and the server responds with data (HTML, JSON, etc.).
  • Extensible: Supports methods, headers, and status codes for flexible communication.

Common Use Cases:

  • Loading web pages.
  • Communicating with REST APIs.
  • Transferring resources like images, scripts, or files.

In system design, understanding HTTP is essential since almost every web-based system relies on it for data exchange and interoperability.


How HTTP Works

HTTP operates on a client–server model, where the client (e.g., browser, mobile app) sends a request to the server, and the server responds with the requested data. It typically runs over TCP (port 80) or HTTPS (port 443) for secure communication.

Client–Server Model

  1. Client Initiates Request: The client sends an HTTP request (e.g., GET, POST) to the server.

  2. Server Processes Request: The server interprets the request, fetches or generates the required resource.

  3. Server Sends Response: The server returns an HTTP response containing status, headers, and data.

  4. Connection Handling: Depending on configuration, the connection may close or stay open (persistent connections in HTTP/1.1+).

Components of an HTTP Request

  1. Request Line: Contains the HTTP method, target URL, and version. Example:

    GET /api/users HTTP/1.1
    
  2. Headers: Provide metadata like content type, user agent, and authorization info. Example:

    Content-Type: application/json
    Authorization: Bearer <token>
    
  3. Body (Optional): Contains data sent to the server (mainly in POST, PUT requests). Example:

    { "name": "Vivek", "role": "Engineer" }

Components of an HTTP Response

  1. Status Line: Includes protocol version, status code, and status message. Example:

    HTTP/1.1 200 OK
    
  2. Headers: Provide information like content type, length, and caching rules. Example:

    Content-Type: application/json
    Cache-Control: no-cache
    
  3. Body: Contains the actual response data — HTML, JSON, images, etc. Example:

    { "success": true, "message": "User created" }

In summary: HTTP enables structured, stateless communication between clients and servers using a simple request–response model fundamental to all web systems.


HTTP Request–Response Cycle

The HTTP request–response cycle defines how a client and server exchange data over the web. It follows a predictable sequence that ensures communication and data transfer.

Step-by-Step Flow:

  1. URL Resolution: The client (browser or app) converts the domain name to an IP address using DNS.

  2. Connection Establishment: The client opens a TCP (or TLS for HTTPS) connection with the server.

  3. HTTP Request Sent: The client sends an HTTP request with method, headers, and optional body. Example:

    GET /home HTTP/1.1
    Host: example.com
    
  4. Server Processing: The server interprets the request, fetches or generates the necessary data or resource.

  5. HTTP Response Sent: The server replies with a status code, headers, and (optionally) a body. Example:

    HTTP/1.1 200 OK
    Content-Type: text/html
    
  6. Client Renders Data: The client (browser/app) processes the response and renders it for the user.

  7. Connection Termination (or Keep-Alive): The TCP connection is closed or kept alive for reuse in subsequent requests.


Simplified Flow Diagram (Markdown)

sequenceDiagram
    participant C as Client
    participant S as Server
    C->>S: DNS Lookup & TCP Handshake
    C->>S: HTTP Request (GET /home)
    S->>C: HTTP Response (200 OK)
    C->>C: Render Content
    Note over C,S: Connection closes or stays open (Keep-Alive)
Loading

Summary: The HTTP cycle is a request–response loop built on TCP, forming the backbone of web communication between clients and servers.


Stateless Nature of HTTP

What Does “Stateless” Mean?

HTTP is a stateless protocol, meaning each request is independent — the server does not remember any previous interactions with the client. Every request must include all necessary information for the server to process it.

Example: If a user logs in and makes another request, the server doesn’t automatically know it’s the same user unless session data is provided again.

Challenges of Statelessness

  1. No Session Memory: Server can’t recall user data between requests.
  2. Repeated Authentication: Each request must carry credentials or tokens.
  3. Complex State Management: Hard to maintain user sessions, carts, or preferences.
  4. Increased Data Transfer: Each request carries more metadata (like headers, tokens).

How Do We Handle State

To overcome statelessness, systems use external mechanisms to maintain user or session data:

  • Cookies: Stored in the browser and sent automatically with each request.
  • Sessions: Server stores session data (identified by a session ID in cookies).
  • Tokens (JWTs): Client includes tokens in headers for stateless authentication.
  • Caching / Databases: Persistent stores used for user or application state.

In short: HTTP’s stateless nature simplifies scalability but requires external state management for personalized and consistent user experiences.


HTTP Methods

HTTP methods define the type of action the client wants the server to perform on a resource. They are also called HTTP verbs.

1. GET

  • Purpose: Retrieve data from the server.

  • Characteristics:

    • No request body.
    • Should not modify data (idempotent).
  • Example:

    GET /users/123
    

2. POST

  • Purpose: Send data to the server to create a new resource.

  • Characteristics:

    • Includes a request body.
    • Non-idempotent (can create duplicates if repeated).
  • Example:

    POST /users
    Body: { "name": "Vivek" }
    

3. PUT

  • Purpose: Update or replace an existing resource completely.

  • Characteristics:

    • Idempotent (same result on multiple calls).
    • Includes a request body.
  • Example:

    PUT /users/123
    Body: { "name": "Vivek Panchal" }
    

4. PATCH

  • Purpose: Partially update an existing resource.

  • Characteristics:

    • Only modifies provided fields.
    • Not necessarily idempotent.
  • Example:

    PATCH /users/123
    Body: { "email": "vivek@example.com" }
    

5. DELETE

  • Purpose: Remove a resource from the server.

  • Characteristics:

    • Idempotent.
    • Usually has no body.
  • Example:

    DELETE /users/123
    

6. HEAD

  • Purpose: Retrieve headers only (no body) for a resource.
  • Use Case: Check if a resource exists or get metadata.

7. OPTIONS

  • Purpose: Describe supported HTTP methods for a resource.
  • Use Case: Used in CORS preflight requests.

Summary Table

Method Action Idempotent Request Body Typical Use
GET Read Fetch data
POST Create Create new data
PUT Replace Full update
PATCH Modify Partial update
DELETE Remove Delete resource
HEAD Headers only Check existence
OPTIONS Capabilities CORS, introspection

HTTP Status Codes

HTTP status codes are 3-digit numbers sent by the server to indicate the result of a client’s request. They are grouped into five main categories based on their first digit.

1. 1xx – Informational

Indicate that the request is received and being processed.

  • 100 Continue: Request headers are accepted; client can send body.
  • 101 Switching Protocols: Server is switching protocols (e.g., HTTP to WebSocket).

2. 2xx – Success

The request was successfully received, understood, and processed.

  • 200 OK: Request succeeded (common for GET).
  • 201 Created: New resource successfully created (for POST).
  • 202 Accepted: Request accepted for processing, but not yet completed.
  • 204 No Content: Request successful but no data to return (for DELETE).

3. 3xx – Redirection

Client must take further action to complete the request.

  • 301 Moved Permanently: Resource has a new permanent URL.
  • 302 Found: Temporary redirection.
  • 304 Not Modified: Resource not changed; use cached version.

4. 4xx – Client Errors

The request contains bad syntax or cannot be fulfilled.

  • 400 Bad Request: Malformed request or invalid data.
  • 401 Unauthorized: Authentication required or failed.
  • 403 Forbidden: Client authenticated but not allowed access.
  • 404 Not Found: Requested resource not found.
  • 409 Conflict: Request conflicts with current server state.
  • 429 Too Many Requests: Rate limit exceeded.

5. 5xx – Server Errors

Server failed to fulfill a valid request.

  • 500 Internal Server Error: Generic server-side failure.
  • 502 Bad Gateway: Invalid response from an upstream server.
  • 503 Service Unavailable: Server temporarily overloaded or down.
  • 504 Gateway Timeout: Upstream server didn’t respond in time.

In summary: HTTP status codes provide a standardized way for the server to communicate request outcomes — helping clients handle errors, retries, and user feedback effectively.


What is HTTPS?

HTTPS (HyperText Transfer Protocol Secure) is the secure version of HTTP, where all data exchanged between the client and server is encrypted using TLS (Transport Layer Security).

It ensures that communication over the web is private, authenticated, and tamper-proof.

How HTTPS Works

  1. TLS Handshake: Before any data exchange, the client and server establish a secure channel by:

    • Exchanging cryptographic keys.
    • Authenticating the server’s identity using an SSL/TLS certificate.
    • Agreeing on encryption algorithms.
  2. Encrypted Communication: Once the handshake completes, all HTTP requests and responses are transmitted in encrypted form.

  3. Data Integrity: Ensures data isn’t modified or intercepted during transfer (prevents man-in-the-middle attacks).

Key Benefits

  • Encryption: Protects sensitive data like passwords and API tokens.
  • Authentication: Confirms the identity of the server (and optionally, the client).
  • Integrity: Prevents data tampering during transmission.
  • SEO & Trust: Browsers mark non-HTTPS sites as “Not Secure,” and search engines prefer HTTPS.

Ports Used

  • HTTP: Port 80
  • HTTPS: Port 443

In system design, HTTPS is a non-negotiable standard for all modern web systems — critical for security, compliance, and user trust.


REST & RESTfulness - API Design Principle

What is REST?

REST (Representational State Transfer) is an architectural style for designing networked applications, primarily web APIs. It defines a set of constraints that make systems scalable, stateless, and easy to maintain.

Core Concept

In REST, clients interact with resources (like users, products, or posts) through standard HTTP methods such as GET, POST, PUT, PATCH, and DELETE. Each resource is identified by a unique URI (Uniform Resource Identifier).

Example:

GET /users/123

retrieves information about user 123.

Key Principles of REST

  1. Client–Server Separation: Client and server are independent; clients handle UI, servers handle data.

  2. Statelessness: Each request contains all necessary information; the server doesn’t store client context.

  3. Uniform Interface: Standardized resource access using consistent HTTP methods and URIs.

  4. Resource Representation: Resources can be represented in multiple formats (usually JSON or XML).

  5. Cacheable: Responses can be cached to improve performance and scalability.

  6. Layered System: Intermediaries (like proxies, gateways, CDNs) can exist between client and server without affecting communication.

In essence: REST provides a simple, scalable, and standardized way to design APIs using existing web protocols — making it the foundation of most modern web services.


Why REST Matters and Why It’s Widely Used in Modern Web Applications

1. Simplicity and Standardization

REST uses standard HTTP methods and status codes, making APIs easy to design, use, and debug. No additional protocol is required — it works seamlessly over the existing web infrastructure.

2. Scalability

Because REST is stateless, servers don’t maintain client context. This allows easy horizontal scaling — multiple servers can handle requests independently.

3. Flexibility and Portability

Clients and servers are loosely coupled, so either side can evolve independently. REST APIs can be consumed by browsers, mobile apps, IoT devices, or microservices.

4. Performance Through Caching

REST supports HTTP caching mechanisms (like ETag, Cache-Control) to reduce redundant requests and improve speed.

5. Wide Adoption and Tooling

Most frameworks, libraries, and platforms have built-in support for REST — making development, monitoring, and integration straightforward.

6. Compatibility with Web Architecture

REST aligns naturally with the HTTP ecosystem — URLs, headers, and responses — making it ideal for building modern distributed systems.

In summary: REST’s simplicity, scalability, and interoperability make it the default choice for web APIs and microservices in modern applications.


REST Constraints (Core Principles of REST Architecture)

REST (Representational State Transfer) isn’t just about using HTTP — it’s defined by a set of architectural constraints. If an API follows these constraints, it is considered RESTful.

Here are the 6 core REST constraints 👇

1. Client–Server Architecture

  • The client (frontend) and server (backend) are separate and independent.
  • The client is responsible for the user interface and user experience.
  • The server handles data storage, business logic, and processing.
  • Benefit: Improves scalability, flexibility, and allows independent evolution of client and server.

2. Statelessness

  • Each HTTP request from the client to the server must contain all information needed to process it.
  • The server does not store any session state about the client between requests.
  • Benefit: Simpler design, easier scalability, and fault tolerance.
  • ⚠️ Challenge: Client must manage state (e.g., authentication tokens, shopping carts).

3. Cacheability

  • Responses from the server should explicitly define whether they are cacheable or not (using HTTP headers like Cache-Control or ETag).
  • Proper caching can improve performance and reduce server load.
  • Benefit: Faster response times and better efficiency.

4. Uniform Interface

This is the core idea that makes REST unique. It defines a standard way for clients and servers to communicate, regardless of implementation.

It includes four key rules:

  1. Resource Identification – Every resource (user, product, post, etc.) is identified by a URI (e.g., /users/101).
  2. Resource Manipulation via Representations – Clients interact with resources through representations (usually JSON or XML).
  3. Self-descriptive Messages – Each message includes enough information to describe how to process it (headers, content type, etc.).
  4. HATEOAS (Hypermedia as the Engine of Application State) – Clients should navigate the API dynamically using hyperlinks in responses (not always strictly followed in real-world APIs).

Benefit: Decouples client and server — making APIs predictable and easy to understand.

5. Layered System

  • REST APIs can have multiple layers (e.g., load balancers, caches, security gateways) between client and server.
  • The client doesn’t need to know which server actually handles its request.
  • Benefit: Increases scalability, security, and flexibility.

6. Code on Demand (Optional)

  • Servers can temporarily extend or customize client functionality by transferring executable code (e.g., JavaScript).
  • This is optional and rarely used in most REST APIs.

Benefit: Adds flexibility. ⚠️ Trade-off: Can reduce visibility and debugging simplicity.

Summary Table

# Constraint Description Benefit
1 Client–Server Separation of UI and data Scalability, flexibility
2 Stateless No session on server Easy scaling, simpler design
3 Cacheable Define cache policies Faster performance
4 Uniform Interface Standardized communication Predictable, decoupled
5 Layered System Multi-tier architecture Scalability, security
6 Code on Demand Send executable code Optional flexibility

RESTful API Design Principles

RESTful APIs follow a set of best practices that make them scalable, reliable, and easy to use. These principles ensure consistency, simplicity, and performance in web service communication.

1. Use Nouns, Not Verbs, in URLs

  • Endpoints should represent resources (nouns), not actions.

  • ✅ Example:

    GET /users/123
    POST /users
    DELETE /users/123
    
  • ❌ Avoid:

    GET /getUser
    POST /createUser
    

2. Use HTTP Methods Correctly

  • GET – Retrieve data
  • POST – Create new resource
  • PUT/PATCH – Update existing resource
  • DELETE – Remove resource

Each method should serve its semantic purpose consistently.

3. Use Proper HTTP Status Codes

  • 200 OK – Request successful
  • 201 Created – Resource created
  • 400 Bad Request – Invalid request
  • 401 Unauthorized / 403 Forbidden – Access denied
  • 404 Not Found – Resource doesn’t exist
  • 500 Internal Server Error – Server-side issue

4. Use Resource Nesting Wisely

  • Keep resource hierarchy logical and shallow. Example:

    GET /users/123/orders
    GET /users/123/orders/456
    

Avoid deep nesting like:

/users/123/orders/456/items/789/payments/456

5. Version Your API

  • Maintain backward compatibility and smooth upgrades. Example:

    /api/v1/users
    /api/v2/users
    

6. Provide Filtering, Sorting, and Pagination

  • Help clients fetch only required data. Example:

    GET /users?limit=10&page=2&sort=name&role=admin
    

7. Use Consistent Naming Conventions

  • Use lowercase and plural nouns for resources. Example: /products, /users, /orders

8. Statelessness

  • Each API call must contain all the data needed to process the request.
  • Server does not store session information.

9. Use JSON (or similar) for Data Exchange

  • JSON is lightweight and widely supported.

  • Response example:

    {
      "id": 123,
      "name": "Vivek",
      "role": "admin"
    }

10. Provide Meaningful Error Messages

  • Include error code, message, and details for debugging.

    {
      "error": "InvalidRequest",
      "message": "Email field is required"
    }

11. Security Best Practices

  • Use HTTPS for encryption.
  • Implement authentication (JWT, OAuth2).
  • Validate all inputs.

12. HATEOAS (Optional Advanced Principle)

  • Include links to related actions or resources within responses.

    {
      "userId": 123,
      "name": "Vivek",
      "links": [
        { "rel": "orders", "href": "/users/123/orders" }
      ]
    }

Real-Time Communication Protocol

Introduction to Real-Time Communication

Real-time communication (RTC) refers to the instant exchange of data between systems with minimal latency, enabling users or services to interact live without noticeable delay.

It allows continuous, immediate data flow between clients and servers — unlike traditional request-response models where communication happens only when initiated by the client.

Why It Matters

In modern applications, speed and interactivity are crucial. Real-time systems make apps more engaging, responsive, and user-friendly by delivering updates instantly as they happen.

Common Use Cases

  • Chat and messaging apps (WhatsApp, Slack)
  • Live streaming and gaming
  • Collaborative tools (Google Docs, Figma)
  • Stock market dashboards and trading systems
  • IoT and sensor data monitoring

Key Characteristics

  1. Low latency — data transfer happens within milliseconds.
  2. Continuous connection — unlike HTTP’s request-response model.
  3. Event-driven architecture — actions trigger immediate updates.
  4. Scalability and reliability — essential for handling large concurrent connections.

In essence, real-time communication bridges the gap between user action and system response, enabling seamless, live digital experiences.


Real-Time Communication Protocols

Real-time communication protocols enable instant data exchange between clients and servers — essential for chat apps, gaming, live dashboards, collaborative tools, and financial systems.

1. WebSocket

  • A full-duplex, persistent connection between client and server over a single TCP connection.

  • After the initial HTTP handshake, communication happens in both directions simultaneously.

  • Ideal for: Chat apps, live updates, multiplayer games, and trading platforms.

  • Example Flow:

    Client → Server: WebSocket handshake (via HTTP)
    Server → Client: Connection upgrade (HTTP 101)
    Then → Real-time bidirectional data exchange
    

2. Server-Sent Events (SSE)

  • A unidirectional channel where the server pushes data to the client over HTTP.

  • Lightweight and efficient for continuous updates like stock prices or notifications.

  • Unlike WebSockets, the client cannot send data back on the same connection.

  • Example:

    GET /events
    Content-Type: text/event-stream
    

3. Long Polling

  • A fallback mechanism where the client repeatedly sends requests to the server, keeping the connection open until new data arrives.
  • Simulates real-time updates when WebSockets are unavailable.
  • Common in legacy systems or where infrastructure doesn’t support persistent connections.

4. MQTT (Message Queuing Telemetry Transport)

  • A lightweight publish-subscribe protocol designed for low-bandwidth and high-latency networks.
  • Commonly used in IoT systems for sensor communication.
  • Works on TCP and supports Quality of Service (QoS) levels for reliable delivery.

5. WebRTC (Web Real-Time Communication)

  • A peer-to-peer communication protocol for audio, video, and data sharing directly between browsers.
  • Eliminates need for a central relay server (though signaling servers are used to establish connections).
  • Used in: Zoom, Google Meet, and multiplayer browser-based apps.

Comparison Summary

Protocol Type Direction Ideal Use Case
WebSocket Full-duplex Client ↔ Server Real-time chat, live data
SSE One-way Server → Client Notifications, live feeds
Long Polling Simulated real-time Client → Server Legacy support
MQTT Pub/Sub Client ↔ Broker IoT communication
WebRTC Peer-to-peer Client ↔ Client Audio/video streaming

Modern API Protocols – Beyond REST (gRPC, GraphQL)

As systems grow in complexity, REST alone may not efficiently handle high-performance, data-heavy, or microservice-based architectures. This led to the rise of modern API protocols like gRPC and GraphQL, which address REST’s limitations in flexibility, speed, and efficiency.


1. gRPC (Google Remote Procedure Call)

Definition: gRPC is a high-performance, open-source RPC framework developed by Google. It uses HTTP/2 for transport and Protocol Buffers (Protobuf) for data serialization.

Key Features:

  • Binary data format (Protobuf) → smaller payloads, faster transmission.
  • HTTP/2 multiplexing → multiple requests on one connection.
  • Strongly typed contracts defined in .proto files.
  • Bidirectional streaming supported (client ↔ server).
  • Ideal for microservices and internal service-to-service communication.

Example Use Case:

  • Communication between backend services in distributed systems (e.g., authentication service ↔ payment service).

2. GraphQL

Definition: GraphQL is a query language for APIs developed by Facebook. It allows clients to request exactly the data they need, reducing over-fetching and under-fetching common in REST.

Key Features:

  • Single endpoint for all queries and mutations.
  • Client-controlled data fetching — specify fields in query.
  • Strongly typed schema for predictable responses.
  • Supports real-time updates via subscriptions.

Example Use Case:

  • Mobile and web apps needing optimized, flexible data fetching (e.g., social media feeds, dashboards).

Comparison Table

Feature REST gRPC GraphQL
Transport Protocol HTTP/1.1 HTTP/2 HTTP/1.1 or HTTP/2
Data Format JSON Protobuf (binary) JSON
Communication Request–response Unary, Streaming Query–based
Performance Moderate Very high High for selective data
Use Case Public APIs Microservices Flexible client APIs

In summary:

  • Use gRPC when you need speed and type safety between services.
  • Use GraphQL when clients need flexible, optimized data fetching.
  • Both coexist with REST — chosen based on system goals and data needs.

Why Do We Need More Than REST?

While REST has been the backbone of web APIs for years, modern applications have evolved — demanding faster, more flexible, and efficient communication patterns. REST’s simplicity becomes a limitation at scale or in complex data scenarios.

1. Over-Fetching and Under-Fetching

  • REST endpoints return fixed data structures.
  • Clients often receive more data than needed (over-fetching) or less data (under-fetching), requiring multiple calls.
  • Example: A mobile app may only need a user’s name but receives their entire profile.

➡️ GraphQL solves this by letting clients query exactly what they need.

2. Inefficient for Microservices

  • Modern architectures use dozens of services communicating internally.
  • REST’s text-based JSON and multiple HTTP connections can create performance bottlenecks.

➡️ gRPC provides binary serialization (Protobuf) and HTTP/2 multiplexing, making it faster and more efficient for service-to-service calls.

3. Lack of Real-Time Support

  • REST is request-response based and stateless.
  • It doesn’t support real-time communication like streaming or live updates.

➡️ gRPC streaming and GraphQL subscriptions handle real-time data seamlessly.

4. Schema Evolution and Strong Typing

  • REST APIs don’t enforce strict type safety or schemas, leading to potential integration issues.
  • Difficult to evolve without breaking clients.

➡️ gRPC and GraphQL enforce typed schemas, improving reliability and backward compatibility.

5. Multiple Client Platforms

  • Modern systems serve web, mobile, IoT, and edge clients — each with different data needs.
  • A single REST response may not fit all.

➡️ GraphQL gives clients control over data shape, improving flexibility and efficiency.

In summary: We need gRPC and GraphQL because REST — though simple and reliable — struggles with performance, flexibility, and real-time communication in modern distributed systems.


How gRPC Works

gRPC (Google Remote Procedure Call) enables communication between services as if they were calling local functions, even though they run on different machines. It relies on Protocol Buffers (Protobuf) for message serialization and HTTP/2 for transport.

Step-by-Step Workflow

  1. Define the Service (.proto file)

    • Developers define the service methods and message types in a .proto file. Example:
    syntax = "proto3";
    
    service UserService {
      rpc GetUser (UserRequest) returns (UserResponse);
    }
    
    message UserRequest {
      string user_id = 1;
    }
    
    message UserResponse {
      string name = 1;
      int32 age = 2;
    }
  2. Generate Code

    • The .proto file is compiled using the gRPC compiler (protoc).
    • It generates client and server stubs in multiple languages (Java, Go, Python, etc.).
    • These stubs handle network communication automatically.
  3. Server Implementation

    • The server implements the defined methods.

    • Example:

      class UserService(UserServiceServicer):
          def GetUser(self, request, context):
              return UserResponse(name="Vivek", age=22)
  4. Client Calls

    • The client uses the generated stub to call the remote method as if it were local.

    • Example:

      response = stub.GetUser(UserRequest(user_id="123"))
      print(response.name)
  5. Data Transmission

    • gRPC serializes the request using Protocol Buffers → sends over HTTP/2.
    • The server deserializes it, executes logic, and returns a serialized response.

Key Features in Action

  • HTTP/2: Enables multiplexing, header compression, and streaming.

  • Protobuf: Binary, compact, and faster than JSON.

  • Streaming Support:

    • Unary RPC → Single request–response
    • Server streaming → One request, multiple responses
    • Client streaming → Multiple requests, one response
    • Bidirectional streaming → Continuous data flow both ways

In short: gRPC turns network calls into lightweight, fast, type-safe function calls — perfect for microservices and real-time systems.


gRPC – Use Cases and When to Use

1. Microservices Communication

  • gRPC is ideal for internal service-to-service communication in distributed systems.
  • It provides low latency, type safety, and efficient serialization, making it faster than REST for backend communication.
  • Example: Authentication Service ↔ Payment Service ↔ Order Service.

2. Real-Time Streaming

  • gRPC supports bidirectional streaming, allowing continuous data flow.
  • Useful for live chat, real-time analytics, IoT telemetry, or stock price updates.

3. Polyglot Environments

  • gRPC supports multiple programming languages (Go, Java, Python, C++, etc.).
  • Perfect for teams building systems with mixed tech stacks that need a unified communication protocol.

4. Low Bandwidth or High-Performance Systems

  • Uses binary Protocol Buffers, which are smaller and faster than JSON.
  • Ideal for low-bandwidth networks, IoT, and high-performance computing systems.

5. Internal APIs (Not Public)

  • gRPC is not meant for browser-based clients (since browsers lack HTTP/2 full support for gRPC).
  • Best suited for internal or backend APIs, not public-facing ones.

When to Use gRPC

✅ When you need high performance and low latency. ✅ When building microservices with frequent internal communication. ✅ When your system requires streaming or real-time updates. ✅ When you need strong typing and contract-based APIs. ✅ When bandwidth efficiency is critical (IoT, mobile backends).

When to Avoid gRPC

🚫 For public web APIs (REST/GraphQL are easier for browsers). 🚫 When debugging simplicity and human-readable payloads are priorities. 🚫 If your clients don’t support HTTP/2 or Protobuf.


In summary: Use gRPC for high-speed, internal, strongly typed, and streaming-based service communication — it excels where REST struggles in performance and efficiency.


How GraphQL Works

GraphQL is a query language and runtime for APIs that lets clients request exactly the data they need — no more, no less. It replaces REST’s multiple endpoints with a single flexible endpoint, improving efficiency and client control.

1. Define the Schema

  • The server defines a GraphQL schema describing data types and operations (queries, mutations, subscriptions).

  • Example:

    type User {
      id: ID!
      name: String!
      age: Int
    }
    
    type Query {
      getUser(id: ID!): User
    }

2. Single Endpoint

  • Unlike REST (multiple endpoints like /users, /posts), GraphQL exposes one endpoint, e.g.:

    POST /graphql
    

3. Client Sends a Query

  • The client specifies exactly what fields it wants.

  • Example:

    {
      getUser(id: "123") {
        name
        age
      }
    }

4. Server Executes Resolvers

  • Each field in the schema has a resolver function that fetches data from a database, another API, or microservice.

  • Example:

    const resolvers = {
      Query: {
        getUser: (_, { id }) => db.users.findById(id),
      },
    };

5. Return Exactly Requested Data

  • The server responds only with the requested fields, reducing over-fetching.

    {
      "data": {
        "getUser": {
          "name": "Vivek",
          "age": 22
        }
      }
    }

6. Real-Time Updates (Optional)

  • Using GraphQL Subscriptions, clients can get live updates over WebSockets.

  • Example:

    subscription {
      onUserUpdate {
        id
        name
      }
    }

Key Advantages

  • Single endpoint simplifies API management.
  • Client-driven queries eliminate over-fetching.
  • Strongly typed schema improves consistency.
  • Supports real-time updates via subscriptions.

In short: GraphQL works by letting the client define the data structure, while the server resolves only what’s requested — leading to more efficient, flexible, and maintainable APIs.


GraphQL – Use Cases and When to Use

1. Complex or Evolving Frontends

  • Ideal for modern web and mobile apps where different screens or devices need different data structures.
  • Example: A mobile app might need fewer fields than a desktop web app.
  • GraphQL lets each client query only the fields it needs from the same endpoint.

2. Aggregating Multiple Data Sources

  • GraphQL can combine data from multiple APIs or databases into a single unified schema.
  • Example: A dashboard app pulling data from users, orders, and payments services — all resolved in one query instead of multiple REST calls.

3. Reducing Over-Fetching and Under-Fetching

  • REST APIs often return too much or too little data.
  • GraphQL gives precise control over data shape, improving network efficiency and performance.

4. Rapid Product Iteration

  • When frontend teams frequently change UI and data needs, GraphQL allows them to adjust queries without backend changes — speeding up development.

5. Real-Time Applications

  • Using GraphQL Subscriptions, clients can receive real-time updates (e.g., chat apps, notifications, live feeds) without constant polling.

When to Use GraphQL

✅ When clients have diverse data needs (web, mobile, IoT). ✅ When aggregating data from multiple sources or microservices. ✅ When optimizing network performance by fetching only necessary data. ✅ When rapid frontend iteration is required. ✅ When supporting real-time updates with subscriptions.

When to Avoid GraphQL

🚫 When API traffic is simple and predictable (REST may be simpler). 🚫 When binary or large file uploads are frequent (REST/gRPC perform better). 🚫 When you lack tooling for caching and rate-limiting (harder in GraphQL). 🚫 When teams are small — REST is easier to set up and maintain.

In summary: Use GraphQL when you need flexibility, optimized data fetching, and real-time capabilities across multiple clients — especially in data-rich or fast-changing applications.


Architectural Pattern

What is Software Architecture?

Software architecture defines the high-level structure of a system — how components are organized, interact, and communicate with each other. It provides a blueprint for building scalable, maintainable, and reliable software systems.

Key Elements of Software Architecture

  1. Components: Independent modules or services that perform specific functions.

  2. Connectors: Define how components communicate — e.g., APIs, message queues, or function calls.

  3. Data Flow: How data moves between different layers or modules.

  4. Design Principles: Includes separation of concerns, modularity, and loose coupling.

Why It Matters

  • Ensures scalability as systems grow.
  • Improves maintainability and ease of development.
  • Helps manage complexity in large applications.
  • Supports better fault isolation and independent deployments.

In short, software architecture is the foundation of system design, guiding how different parts of a system fit and work together to meet business and technical goals.


Monolithic Architecture

Monolithic architecture is a traditional software design pattern where the entire application is built as a single, unified unit. All modules — such as authentication, database access, business logic, and UI — are tightly coupled and run within the same process.

Key Characteristics

  • Single codebase and deployment unit
  • Shared memory and resources
  • Tightly coupled components
  • Centralized data management

Example

A web application where login, product catalog, and order management all reside in one project and are deployed together as a single .jar, .war, or .exe file.

Advantages

  1. Simple to develop and deploy — one build, one deployment pipeline.
  2. Easy to test — everything runs in a single environment.
  3. Good performance — internal calls are fast since they happen in-process.
  4. Easier debugging — logs and errors are centralized.

Disadvantages

  1. Scalability limitations — cannot scale individual components.
  2. Hard to maintain — changes in one area can affect others.
  3. Slower development — larger codebase increases complexity.
  4. Deployment risk — a small change requires redeploying the entire app.
  5. Technology lock-in — hard to adopt new languages or frameworks.

When to Use

✅ Small to medium-sized applications. ✅ When the team is small and deployment simplicity is important. ✅ When scalability and modularity are not immediate concerns.

In summary: Monolithic architecture is simple and fast to start, but less flexible and harder to scale as systems grow.


Layered (N-Tier) Architecture

Layered architecture — also known as N-tier architecture — is one of the most common software design patterns where an application is divided into logical layers, each with a specific responsibility. Each layer communicates only with the layer directly below or above it, promoting separation of concerns.

Typical Layers in a 3-Tier Architecture

  1. Presentation Layer (UI Layer)

    • Handles the user interface and user interaction.
    • Sends user input to the business layer and displays output.
    • Example: HTML/CSS, React.js, Angular, Flutter.
  2. Business Logic Layer (Service Layer)

    • Contains core logic and business rules.
    • Processes data, performs calculations, and coordinates between UI and Data layers.
    • Example: Java, Node.js, Spring Boot, Express.js.
  3. Data Access Layer (Persistence Layer)

    • Manages data storage and retrieval.
    • Handles database connections, queries, and ORM (Object Relational Mapping).
    • Example: MySQL, MongoDB, PostgreSQL via Sequelize, Hibernate, etc.

Optional Additional Layers

  • Integration Layer → For external API or third-party service communication.
  • Cache Layer → For improving performance with Redis or Memcached.
  • Security Layer → For authentication, authorization, and data protection.

Diagram

+----------------------+
|   Presentation Layer |
| (UI / API Gateway)   |
+----------▲-----------+
           |
+----------▼-----------+
| Business Logic Layer |
| (Services / Rules)   |
+----------▲-----------+
           |
+----------▼-----------+
|  Data Access Layer   |
| (Database / ORM)     |
+----------------------+

Advantages

Separation of concerns — each layer has a clear responsibility. ✅ Easier to maintain — changes in one layer rarely affect others. ✅ Reusability — layers can be reused across multiple projects. ✅ Testability — each layer can be tested independently. ✅ Scalability — can scale specific layers as needed.

Disadvantages

Performance overhead — multiple layers add latency. ❌ Rigid dependencies — changes in one layer may require updates to interfaces above it. ❌ Complex deployment — in multi-tier systems, deployment and versioning can be tricky.

When to Use

  • When you need clear separation between UI, business logic, and data.
  • When building enterprise applications or web services.
  • When maintainability, testability, and scalability are important.

In summary: Layered architecture organizes code into structured tiers for maintainability and scalability, making it ideal for medium to large enterprise systems.


🧩 Microservices Architecture

Microservices architecture is a modern architectural style where an application is broken down into a collection of small, independent services, each responsible for a specific business function and communicating through lightweight APIs (usually HTTP or message queues).

⚙️ Definition

Microservices architecture structures an application as a set of loosely coupled, independently deployable services that work together to deliver business value.

Each service:

  • Has its own codebase
  • Can be developed, deployed, and scaled independently
  • Often owns its own database (decentralized data management)

🧱 Core Characteristics

  1. Independence – Each service runs in its own process.
  2. Decentralization – No single shared database; each service manages its own data.
  3. Lightweight Communication – Services interact via HTTP (REST/gRPC) or message brokers (Kafka, RabbitMQ).
  4. Autonomous Teams – Different teams manage different services.
  5. Polyglot Freedom – Each service can use different languages, frameworks, or databases.
  6. Resilience – Failure in one service does not crash the entire system.

🧠 Architecture Diagram

           +-------------------+
           |   API Gateway      |
           +---------+----------+
                     |
     ------------------------------------------
     |           |            |               |
+----------+ +----------+ +----------+ +-----------+
|  Auth    | |  Orders  | |  Users   | |  Payments |
| Service  | | Service  | | Service  | |  Service  |
+----------+ +----------+ +----------+ +-----------+
     |           |            |               |
  +------+    +------+     +------+       +------+
  | DB A |    | DB B |     | DB C |       | DB D |
  +------+    +------+     +------+       +------+

🚀 Advantages

Independent Deployment – Each service can be deployed without affecting others. ✅ Scalability – Scale only the services that need it. ✅ Fault Isolation – One service failure won’t take down the entire system. ✅ Technology Flexibility – Use different stacks per service. ✅ Faster Development – Parallel development by multiple teams.

⚠️ Disadvantages

Complexity – More services = more network calls, configs, monitoring, and debugging. ❌ Data Consistency – Managing distributed transactions is harder. ❌ Deployment & Ops Overhead – Requires DevOps maturity (Docker, Kubernetes, CI/CD). ❌ Latency – Network calls between services add overhead.

🧩 When to Use

  • Large applications needing high scalability and continuous delivery.
  • Systems managed by multiple teams working on separate domains.
  • Applications requiring rapid deployment cycles.
  • Projects adopting cloud-native or Kubernetes infrastructure.

💬 Examples

  • Netflix (streaming & recommendations as separate services)
  • Amazon (cart, payment, search — all independent)
  • Uber (trip, driver, location services, etc.)

In summary: Microservices architecture enables flexibility, scalability, and independent deployment, making it ideal for large-scale, cloud-based, and fast-moving organizations — but it requires strong DevOps, monitoring, and orchestration practices.


Event-Driven Architecture (EDA)

Event-Driven Architecture (EDA) is a software design pattern where system components communicate through events — notifications that something has happened in the system. Instead of direct service-to-service calls, components emit, consume, and react to events asynchronously, enabling loose coupling, scalability, and real-time responsiveness.

🧠 Definition

Event-Driven Architecture is a pattern where components produce and consume events via a message broker or event bus, allowing asynchronous communication between independent services.

🔄 Core Concept

  • Event: A record of something that has occurred (e.g., “OrderPlaced”, “UserSignedUp”).
  • Producer: Component that emits events.
  • Consumer: Component that listens for and reacts to events.
  • Event Broker: Middleware (like Kafka, RabbitMQ) that routes events between producers and consumers.

🏗️ Architecture Diagram

   +-------------+           +-------------------+
   |  Order      |  emits    |  Event Broker     |
   |  Service    +---------->+  (Kafka / Rabbit) |
   +-------------+           +---------+---------+
                                      |
                    +-----------------+-------------------+
                    |                                     |
          +-------------------+               +-------------------+
          |  Inventory        |               |  Notification     |
          |  Service          |               |  Service          |
          +-------------------+               +-------------------+
           consumes "OrderPlaced"              consumes "OrderPlaced"

⚙️ Key Characteristics

  1. Asynchronous communication – No waiting for direct responses.
  2. Loose coupling – Services don’t need to know about each other.
  3. Scalable & resilient – Event brokers handle spikes in workload.
  4. Reactive – System reacts instantly to changes or actions.
  5. Event persistence – Events can be stored for replay or audit.

🚀 Advantages

High decoupling – Each component evolves independently. ✅ Real-time responsiveness – Great for notification or streaming systems. ✅ Scalable – Events can be processed in parallel. ✅ Resilient – Failure in one consumer doesn’t affect others. ✅ Auditability – Events provide a clear log of system activity.

⚠️ Disadvantages

Complex debugging – Tracing event flow across systems is hard. ❌ Event ordering issues – Ensuring correct processing order can be tricky. ❌ Duplicate handling – Consumers may receive the same event multiple times. ❌ Event schema evolution – Managing backward compatibility is challenging.

🧩 When to Use

  • Real-time systems (e.g., financial transactions, IoT, stock trading).
  • Microservices communication where loose coupling is needed.
  • Notification & streaming platforms (e.g., email alerts, analytics).
  • Complex workflows triggered by multiple independent events.

🧰 Common Technologies

  • Event Brokers: Apache Kafka, RabbitMQ, Amazon SNS/SQS, Google Pub/Sub, Redis Streams.
  • Event Storage: Kafka topics, event stores, DynamoDB Streams.
  • Frameworks: Spring Cloud Stream, AWS Lambda (event triggers).

💬 Example

  • E-commerce app:

    • Order Service emits “OrderPlaced”
    • Inventory Service consumes and updates stock
    • Notification Service sends confirmation email
    • Analytics Service tracks order metrics

In summary: Event-Driven Architecture enables asynchronous, real-time communication between independent services. It improves scalability and responsiveness but requires careful event management, monitoring, and consistency handling.


Factors Influencing Architecture Selection — When to Use Which One

Choosing the right system architecture depends on business needs, team size, scalability goals, and system complexity. Below are key factors that influence your decision and guidance on when to use Monolithic, Layered, Microservices, or Event-Driven architectures.

1. System Size and Complexity

  • Small/simple applications: → Use Monolithic — easy to develop, deploy, and manage.
  • Medium complexity: → Use Layered (N-Tier) — provides structure while keeping simplicity.
  • Large-scale/distributed systems: → Use Microservices or Event-Driven — better for scalability and flexibility.

2. Team Size and Skillset

  • Small teams:Monolithic or Layered — less coordination needed, faster delivery.
  • Large or specialized teams:Microservices — teams can own individual services independently. → Event-Driven — for teams experienced in async and distributed systems.

3. Deployment and Release Strategy

  • Single deployment preferred:Monolithic or Layered architectures.
  • Frequent independent deployments needed:Microservices — allows independent service updates. → Event-Driven — decouples release cycles.

4. Scalability Requirements

  • Low to moderate scalability:Layered — vertical scaling works fine.
  • High scalability:Microservices — horizontal scaling at service level. → Event-Driven — asynchronous load handling and auto-scaling.

5. Fault Tolerance and Resilience

  • Simple recovery sufficient:Monolithic or Layered.
  • Need for high availability and isolation:Microservices or Event-Driven — service failures are isolated.

6. Performance and Latency

  • Low latency & real-time systems:Event-Driven — supports async, near real-time communication.
  • Low overhead & simplicity:Monolithic — single process with minimal network hops.

7. Maintainability and Modularity

  • Simple maintenance goal:Layered Architecture.
  • Evolving product with frequent changes:Microservices — easier to update, replace, or scale parts independently.

8. Business and Organizational Needs

  • Startups / MVPs:Monolithic — fast to build and iterate.
  • Growing businesses:Layered → Microservices (gradual evolution).
  • Large enterprises or data-intensive apps:Event-Driven — supports scalability and async workflows.

9. Example Mapping

Use Case Recommended Architecture Reason
Small web app or MVP Monolithic Simple, fast to deploy
Traditional enterprise app Layered Structured and maintainable
Large distributed web app Microservices Independent scaling and deployments
Real-time streaming/IoT system Event-Driven Asynchronous, high throughput
E-commerce or fintech Microservices + Event-Driven Scalable, fault-tolerant, reactive

Summary

  • Monolithic: Best for small teams & simple apps.
  • Layered: Structured, ideal for medium-sized applications.
  • Microservices: Highly scalable and flexible, for complex systems.
  • Event-Driven: Best for asynchronous, real-time, and reactive systems.

In short:

Choose architecture based on system scale, team maturity, performance goals, and scalability needs — not just technology trends.


Web Concepts in System Design

Why Learn Web Concepts?

Understanding web concepts is essential in system design because most modern applications — from social networks to fintech platforms — are built over the web infrastructure. A solid grasp of how the web works helps engineers design scalable, reliable, and high-performance systems.

Key Reasons

  1. Foundation of Internet-based Systems

    • Most distributed systems communicate over HTTP/HTTPS.
    • Knowing how requests, responses, and protocols work enables better architectural decisions.
  2. Performance Optimization

    • Understanding caching, CDNs, and compression helps reduce latency and bandwidth usage.
    • Helps design systems that deliver content faster across the globe.
  3. Security Awareness

    • Knowledge of SSL/TLS, authentication, and encryption helps build secure APIs and services.
  4. Scalability and Load Management

    • Concepts like DNS, proxies, and load balancers are essential for handling millions of users efficiently.
  5. Better API and Service Design

    • Understanding REST, GraphQL, and gRPC leads to cleaner, more reliable communication between services.
  6. Effective Debugging and Monitoring

    • Knowing the full web request flow — from client to server — helps identify bottlenecks or failures quickly.
  7. Interview Relevance

    • Web concepts are fundamental in system design interviews, forming the base for topics like caching, scaling, and data flow.

In summary: Learning web concepts builds the foundation for designing systems that are fast, scalable, secure, and maintainable — all critical for high-performance backend engineering.



Web Sessions: Managing State in Web Applications

Introduction

The web is stateless by default, meaning each HTTP request is independent and doesn’t retain user information between interactions. However, real-world applications often need to remember user data (like login status, cart items, or preferences). This is where web sessions come in — they enable the server to maintain state across multiple client requests.

What is a Web Session?

A session is a temporary interaction period between a client and a server. It stores user-specific data on the server (or a shared store) and identifies the user using a unique session ID.

How It Works

  1. User logs in or performs an action → server creates a session.
  2. The session ID is stored in a cookie and sent to the client.
  3. For each subsequent request, the client sends back the session ID.
  4. The server retrieves user data using that session ID.
Client ---> Login ---> Server creates session ---> Sends Session ID
Client <--- Receives cookie (session ID)
Client ---> Makes request with cookie ---> Server identifies session

Session Storage Options

  1. In-memory (e.g., in process memory) – simple but not scalable.

  2. External session store – scalable and distributed options like:

    • Redis
    • Memcached
    • Database (less common for high performance systems)

Key Concepts

  • Session ID: Unique token used to identify the session.
  • Cookies: Store and send the session ID with each request.
  • Session Expiry: Defines how long a session stays valid.
  • Secure Sessions: Session data must be encrypted and transmitted over HTTPS.

Challenges

  • Scalability: In-memory sessions don’t work well with multiple servers (load-balanced systems).
  • Security: Session hijacking or fixation attacks.
  • Persistence: Sessions should expire or invalidate properly.

Solutions

  • Use sticky sessions (bind user to one server) — simple but less scalable.
  • Use centralized session stores like Redis — scalable and distributed.
  • Implement token-based authentication (JWT) — stateless alternative.

In Summary

Web sessions enable state management in otherwise stateless HTTP communication, allowing personalized and secure user experiences. Modern scalable systems often use distributed session stores or stateless tokens (JWT) depending on system requirements.


Why Web Sessions Matter

1. Overcoming Statelessness of HTTP

  • HTTP is stateless, meaning each request is independent and doesn’t remember previous interactions.
  • Sessions bridge this gap by preserving user context between multiple requests (e.g., keeping a user logged in).

2. User Authentication & Personalization

  • Sessions store user identity and preferences after login.
  • They enable personalized experiences — such as showing a user’s profile, cart, or dashboard data without repeated authentication.

3. Security Management

  • Sessions help implement secure access control, ensuring that only authenticated users access certain pages or APIs.
  • They allow features like session expiration and logout handling to enhance security.

4. Performance Optimization

  • Instead of revalidating user credentials or fetching data repeatedly, sessions store necessary info temporarily — reducing redundant computation and database queries.

5. Consistent User Experience

  • Sessions maintain stateful continuity, such as:

    • Shopping carts in e-commerce sites
    • Multi-step forms
    • Preferences across navigation

6. Integration with Scalable Architectures

  • Even in distributed systems, sessions (or token-based equivalents like JWTs) ensure consistent identity tracking across multiple backend servers.

In summary: Web sessions are vital because they enable stateful, secure, and personalized interactions in an inherently stateless web environment, improving both user experience and system efficiency.


Understanding Statelessness in HTTP

What Does “Stateless” Mean?

HTTP is a stateless protocol, meaning each request from a client to a server is independent — the server does not remember any previous requests from that client.

In simple terms, every HTTP request is treated as a new interaction, even if it comes from the same user.

Example

When you log in to a website:

  • Without state management — the server forgets you’re logged in after each request.
  • With state management — the server “remembers” your session, so you stay logged in.

Why HTTP is Stateless

  • It was designed for simplicity and scalability.
  • Statelessness makes servers lighter, faster, and easier to scale horizontally because they don’t store user session data by default.

Techniques for Maintaining State

Since web apps need to remember user data (like login status, cart items, or preferences), developers use several techniques to simulate state in a stateless environment.

1. Cookies

  • Small pieces of data stored in the client’s browser.
  • Sent automatically with each request to the same domain.
  • Commonly used to store session IDs, preferences, or auth tokens.

🟢 Example: Set-Cookie: session_id=abc123; HttpOnly; Secure

2. Server-Side Sessions

  • Server maintains user-specific data mapped to a session ID.
  • The session ID is stored in a cookie on the client.
  • Data can be stored in memory, Redis, or a database.

🟢 Best for: Secure or sensitive applications (e.g., banking, admin dashboards).

3. Tokens (JWT – JSON Web Tokens)

  • Stateless method: all session info is encoded in the token itself.
  • Server only needs to verify the token, not store session data.
  • Common in modern microservice or API-based architectures.

🟢 Example: Authorization: Bearer <jwt_token>

4. Hidden Form Fields / URL Parameters

  • Passes state data (like user ID or cart info) through forms or URLs.
  • Simple but not secure — rarely used in modern systems.

5. Client-Side Storage

  • Uses LocalStorage or SessionStorage in browsers to persist small amounts of data.
  • Good for caching UI preferences or JWTs.

🟢 Example: localStorage.setItem("auth_token", token)

Summary Table

Technique State Stored Scalability Security Common Use
Cookies Client High Medium Preferences, session ID
Server-Side Session Server Medium High Authenticated sessions
JWT Token Client Very High High APIs, mobile apps
LocalStorage Client High Low UI settings, tokens
Hidden Fields / URL Client High Low Temporary data passing

In summary: HTTP is stateless by design, but web apps use cookies, sessions, or tokens to maintain state across requests — enabling secure, consistent, and personalized user experiences.


🧩 Serialization: Data Exchange & Storage Formats

🔹 What is Serialization?

Serialization is the process of converting an in-memory object (like a Python, Java, or Go object) into a byte stream or text format that can be:

  • Stored (in a file, cache, or database)
  • Transmitted (over a network between client and server)

The reverse process is called Deserialization.

🔹 Why Serialization Matters?

Serialization is critical in distributed systems and web applications because it enables:

  • 🔄 Data exchange between different services (e.g., client ↔ server)
  • 🗃️ Persistent storage (saving structured data)
  • 🌍 Cross-language communication (e.g., Java backend ↔ JS frontend)
  • ⚙️ API performance optimization — choosing the right format can reduce latency and bandwidth usage.

🔹 Common Serialization Formats

Format Type Human-readable Typical Use
JSON Text ✅ Yes Web APIs, config files
XML Text ✅ Yes (but verbose) Legacy systems, SOAP APIs
Protobuf (Protocol Buffers) Binary ❌ No High-performance microservices, gRPC

🔹 Trade-offs: Readability vs Efficiency vs Compatibility

Aspect JSON XML Protobuf
Readability ✅ Easy to read ⚠️ Verbose ❌ Not human-readable
Efficiency (Size & Speed) ⚠️ Moderate ❌ Slow & large ✅ Fast & compact
Compatibility (Cross-language) ✅ Widely supported ✅ Widely supported ✅ Supported in many languages, but needs schema
Schema Requirement ❌ None ⚠️ Optional (XSD) ✅ Requires .proto file
Best Use Case REST APIs, config, web apps Legacy, enterprise apps gRPC, microservices, low-latency systems

🔹 Serialization in Action

Example: Let’s say we have a simple User object:

{
  "id": 101,
  "name": "Vivek",
  "email": "vivek@xyz.com"
}
  • JSON → Used by frontend & REST APIs (easy to read and debug)
  • XML → Used in older enterprise APIs (<user><id>101</id>...</user>)
  • Protobuf → Compact binary form transmitted in gRPC call

🔹 Performance Considerations

Metric JSON XML Protobuf
Encoding/Decoding Speed ⚙️ Medium 🐢 Slow ⚡ Fast
Payload Size 📦 Medium 📦📦 Large 📦 Small
CPU Usage Moderate High Low
Network Efficiency Medium Poor Excellent

⚖️ Summary — Choosing the Right Format

Scenario Recommended Format
Web or mobile APIs JSON
Enterprise or legacy integration XML
High-performance internal microservices Protobuf

Perfect — here’s a complete structured explanation of CORS (Cross-Origin Resource Sharing) covering every key point you mentioned, simplified yet detailed enough for system design interviews or notes 👇


🌍 CORS – Cross-Origin Resource Sharing & Web Security

🔹 Why CORS Matters?

🧩 The Problem: Same-Origin Policy (SOP)

Web browsers enforce the Same-Origin Policy for security reasons. It prevents JavaScript on one domain (e.g., https://a.com) from making requests to another domain (e.g., https://b.com) unless explicitly allowed.

Goal: Prevent malicious websites from reading sensitive data from other origins (like cookies, tokens, etc.).

⚠️ Example problem:

  • Your frontend app: https://frontend.com
  • Your backend API: https://api.server.com

Without CORS, the browser blocks this cross-origin request.

💡 The Solution: CORS (Cross-Origin Resource Sharing)

CORS allows the server to specify which origins are allowed to access its resources using specific HTTP headers.

So, if https://api.server.com includes:

Access-Control-Allow-Origin: https://frontend.com

the browser will permit that request.

🔹 How CORS Works: Requests & Responses

CORS is browser-enforced and works through HTTP headers during requests & responses.

🧭 Step-by-step flow:

  1. Browser sends request to a different origin (cross-domain).

  2. Browser automatically adds the Origin header in request:

    Origin: https://frontend.com
    
  3. Server responds with CORS headers to allow/deny:

    Access-Control-Allow-Origin: https://frontend.com
    
  4. Browser checks if origin is allowed → grants or blocks access.

🔹 Preflight Requests & CORS Headers

When the request is non-simple (like using PUT, DELETE, custom headers, or JSON body), the browser first sends a preflight request — an OPTIONS request — to check if the actual request is allowed.

🛫 Example Flow:

Preflight (OPTIONS):

OPTIONS /api/data HTTP/1.1
Origin: https://frontend.com
Access-Control-Request-Method: POST
Access-Control-Request-Headers: Content-Type

Server Response:

Access-Control-Allow-Origin: https://frontend.com
Access-Control-Allow-Methods: GET, POST, OPTIONS
Access-Control-Allow-Headers: Content-Type, Authorization
Access-Control-Max-Age: 86400

If the response is valid → browser proceeds with actual request.

🔹 Common CORS Headers

Header Description
Access-Control-Allow-Origin Specifies which origins can access (e.g., * or a specific domain).
Access-Control-Allow-Methods Lists allowed HTTP methods (e.g., GET, POST, PUT).
Access-Control-Allow-Headers Lists allowed custom headers (e.g., Content-Type, Authorization).
Access-Control-Allow-Credentials Allows cookies or authorization headers if true.
Access-Control-Max-Age Time (in seconds) that preflight can be cached.

🔹 Security Risks & Common Misconfigurations

Misconfiguration Risk
Access-Control-Allow-Origin: * with credentials ❌ Major risk — exposes sensitive cookies/tokens.
Allowing too many origins 🔓 Broad access — makes API vulnerable.
Forgetting preflight headers ❌ Causes browser to block requests silently.
Reflecting Origin header blindly ⚠️ Can lead to CORS bypass attacks.

Best Practices:

  • Always whitelist specific origins.
  • Never combine * with Access-Control-Allow-Credentials: true.
  • Validate the Origin header server-side before responding.

🔹 Handling CORS in APIs

🧱 In REST APIs

  • Configure on backend via framework (e.g., Express.js, Spring Boot, Django).

  • Example (Node.js + Express):

    app.use(cors({
      origin: "https://frontend.com",
      methods: ["GET", "POST"],
      credentials: true
    }));

🧭 In GraphQL APIs

  • Same CORS rules apply.

  • GraphQL endpoints are usually single /graphql routes — just ensure preflight and headers are properly handled.

  • Example:

    app.use('/graphql', cors({ origin: 'https://frontend.com' }), graphqlHTTP({ schema, rootValue }));

🔹 Alternatives to CORS & Role of API Gateways

1. Reverse Proxy / API Gateway

Instead of exposing APIs directly from another domain, use a gateway or proxy to make it appear as same-origin.

🧩 Example: Frontend → /api → Gateway → https://backend.com/api

  • Browser thinks request is to same origin → no CORS needed.
  • Common in NGINX, AWS API Gateway, Kong, etc.

2. Server-side Communication

Move the request from frontend → backend (server-to-server), then forward results. Since servers don’t enforce CORS, this avoids browser restrictions.

3. JSONP (Legacy)

Used before CORS — only for GET requests, now obsolete.

🔹 Summary Table

Concept Description
Problem Same-Origin Policy blocks cross-domain requests.
Solution CORS lets servers declare trusted origins.
Preflight Browser checks permissions via OPTIONS before sending real request.
Risk Misconfigurations can expose sensitive data.
Best Practice Allow specific origins, use gateway for multi-origin access.
Alternative Reverse proxy / API Gateway to unify origins.

⚙️ Scalability in System Design

🔹 Introduction to Scalability

In system design, scalability refers to a system’s ability to handle increased load or demand gracefully by adding more resources — without compromising performance, availability, or reliability.

It ensures that as users, data, or requests grow, the system continues to work efficiently.

🔹 What is Scalability?

Scalability is the capacity of a system to grow in size, workload, or traffic while maintaining performance levels.

A scalable system can:

  • Handle more requests per second
  • Process larger datasets
  • Serve more users simultaneously

Example: An e-commerce website that performs equally well on Black Friday (high load) and normal days is a scalable system.

🔹 Why Do Systems Need to Scale?

  1. User Growth: As the user base expands, more requests hit servers.
  2. Data Growth: More users generate more data — storage and processing must scale.
  3. Performance: Maintain fast response times even under heavy load.
  4. Business Continuity: Prevent downtime during peak usage.
  5. Cost Efficiency: Scale resources up or down based on demand.

📈 Scaling ensures your system remains reliable, efficient, and user-friendly as it grows.

🔹 Types of Scalability

1. Vertical Scalability (Scaling Up)

  • Add more power (CPU, RAM, storage) to an existing server.
  • Simple to implement but limited by hardware capacity.

Example: Upgrading a server from 8GB RAM to 32GB.

✅ Easy to manage ❌ Has a physical/hardware limit

2. Horizontal Scalability (Scaling Out)

  • Add more machines/servers to distribute the load.
  • Requires load balancers and distributed systems.

Example: Adding more application servers behind a load balancer.

✅ Highly scalable ❌ More complex to manage

3. Diagonal Scalability

  • Combine vertical + horizontal scaling — start with scaling up, then scale out as needed.
  • Common in modern cloud systems.

🔹 Common Challenges in Scaling

Challenge Description
Data Consistency Maintaining accurate data across distributed servers.
Load Distribution Efficiently balancing traffic to avoid bottlenecks.
Latency Increased network hops can slow down responses.
Cost Management Scaling resources adds cost — needs to be optimized.
Fault Tolerance Ensuring one node failure doesn’t crash the entire system.
Complexity Distributed systems are harder to design, deploy, and debug.

🧠 Summary

Concept Key Point
Scalability System’s ability to handle growing load efficiently
Vertical Scaling Add power to a single machine
Horizontal Scaling Add more machines
Diagonal Scaling Combine both
Goal Maintain performance, reliability, and cost efficiency as system grows
Challenge Data consistency, latency, load balancing, fault tolerance

⚙️ Scaling Strategies: Horizontal, Vertical & Diagonal

🔹 Types of Scalability — Deep Dive

Scalability can be achieved through three main strategies — each with unique benefits, limitations, and trade-offs.

🧱 1. Vertical Scaling (Scale Up)

Definition: Increasing the capacity of a single machine — by adding more CPU, RAM, or storage.

How it works: Upgrade the same server → better specs → more workload handled.

Example:

  • Upgrading a database server from 8GB → 64GB RAM.
  • Moving from a single-core → 16-core processor.

Pros:

  • Simple to implement.
  • No code changes required.
  • Easier maintenance and debugging.

Cons:

  • Limited by hardware capacity.
  • Downtime may be required during upgrades.
  • Becomes expensive at high scale.

Best for: Early-stage startups or small-scale systems where simplicity matters more than scalability.

🌐 2. Horizontal Scaling (Scale Out)

Definition: Adding more servers/machines to distribute traffic and workload across multiple nodes.

How it works: Instead of one powerful server, use multiple commodity servers behind a load balancer.

Example:

  • Adding more web servers to handle increased traffic.
  • Using distributed databases (like MongoDB, Cassandra).

Pros:

  • Practically infinite scalability.
  • Fault tolerance (failure of one node doesn’t stop the system).
  • Zero downtime scaling.

Cons:

  • Complex setup and monitoring.
  • Data consistency and synchronization challenges.
  • Requires load balancing and distributed system design.

Best for: High-traffic systems (e.g., YouTube, Amazon, Netflix) where performance and availability are critical.

⚖️ 3. Diagonal Scaling (Hybrid Approach)

Definition: Combines both vertical and horizontal scaling — scale up first, then scale out as demand grows.

How it works:

  1. Start with powerful machines (vertical scaling).
  2. Add more machines (horizontal scaling) when vertical limits are reached.

Example: Start with a powerful database instance → later shard or replicate when traffic increases.

Pros:

  • Balanced performance and cost.
  • Gradual and flexible scaling path.
  • Ideal for growing systems.

Cons:

  • Still inherits partial complexity from horizontal scaling.
  • Requires good architecture planning.

Best for: Growing startups and mid-sized companies scaling from few users to millions.

💰 Trade-offs: Cost vs Complexity vs Performance

Strategy Cost Complexity Performance Example
Vertical Scaling 💰 🟢 Low ⚙️ High (initially) Single-node PostgreSQL
Horizontal Scaling 💰💰💰 🔴 High ⚡ Very High Distributed web servers, sharded DB
Diagonal Scaling 💰💰 🟡 Medium ⚙️⚡ Balanced Modern SaaS & cloud systems

🌍 Real-world Examples and When to Choose What

Use Case Scaling Strategy Example
Startup MVP / Small App Vertical Early-stage web app with limited users
High-traffic Web App Horizontal Netflix, Facebook, Amazon
Growing SaaS Product Diagonal Slack, Zoom, Shopify
Database-intensive System Start vertical, later horizontal PostgreSQL → Sharded setup
Compute-heavy workloads Horizontal AI/ML distributed training clusters

🧠 Summary

Scaling Type Description When to Use
Vertical (Scale Up) Add more power to a single node When simplicity > scale
Horizontal (Scale Out) Add more servers to handle load When reliability & scale are key
Diagonal (Hybrid) Combine both for flexibility When system grows over time

☁️ Autoscaling & Best Practices in Cloud Environments

🔹 What is Autoscaling?

Autoscaling is the cloud’s ability to automatically adjust computing resources — such as servers, containers, or instances — based on real-time demand.

Instead of manually adding or removing capacity, autoscaling dynamically scales up (add resources) when load increases and scales down (remove resources) when demand drops.

Goal: Maintain performance, availability, and cost efficiency automatically.

Example: In AWS EC2, autoscaling can increase the number of instances during traffic spikes and reduce them during off-peak hours.

🔹 How Autoscaling Works

Autoscaling systems rely on metrics and rules to decide when to scale resources.

Key Components:

  1. Metrics: CPU usage, memory, request rate, latency, etc.
  2. Thresholds/Policies: Define upper and lower limits (e.g., scale out if CPU > 70%).
  3. Scaling Actions: Add or remove compute instances.
  4. Load Balancer: Distributes traffic among instances dynamically.
  5. Health Checks: Detects and replaces unhealthy instances.

Flow Example:

  1. Traffic increases → CPU hits 80%.
  2. Autoscaler triggers a scale-out event.
  3. New instance starts → load is distributed.
  4. Traffic decreases → autoscaler scales in by removing idle instances.

🔹 Monitoring and Proactive Scaling

Monitoring Tools:

  • AWS CloudWatch
  • Azure Monitor
  • Google Cloud Operations Suite
  • Prometheus + Grafana (for Kubernetes)

Approaches:

  • Reactive Scaling: Responds after metrics cross a threshold (e.g., CPU > 80%).
  • Proactive Scaling: Uses predictive models and traffic patterns to scale before spikes occur (e.g., scale out before Black Friday).

Best Practice: Combine both reactive + proactive scaling for optimal performance.

🔹 Cost Optimization Strategies

  1. Right-sizing Instances: Use instance types that match workload needs.
  2. Dynamic Scaling Policies: Scale gradually instead of adding many instances at once.
  3. Use Spot or Preemptible Instances: For non-critical workloads.
  4. Set Minimum and Maximum Limits: Prevent over-scaling and unnecessary cost.
  5. Leverage Serverless Architectures: Pay only for actual usage (e.g., AWS Lambda).
  6. Monitor Usage Trends: Regularly analyze metrics and adjust policies.
  7. Use Auto-scaling Groups: To manage multiple instances collectively.

🧠 Summary

Concept Description Benefit
Autoscaling Automatic adjustment of resources based on demand Performance + Cost efficiency
Scaling Triggers Metrics like CPU, request rate, latency Dynamic response to load
Proactive Scaling Predictive scaling before traffic spikes Stability
Cost Optimization Smart scaling + right-sizing Lower cloud bills

💾 Database and Storage in System Design

🔹 Introduction to Storage in System Design

In system design, storage refers to how data is persisted, organized, and accessed across different components of a system. It’s the backbone of every scalable application — from storing user profiles to managing massive logs, transactions, or files.

Storage decisions directly affect performance, reliability, scalability, and cost of a system.

🔹 Why Storage Matters in System Design

  1. Data Durability: Ensures information isn’t lost even after failures.
  2. Performance: Determines read/write speed and system responsiveness.
  3. Scalability: Supports increasing data volume and concurrent users.
  4. Cost Efficiency: Optimizes resource usage for both hot and cold data.
  5. Data Integrity & Availability: Maintains consistency and uptime for critical services.

📈 In short — storage impacts every aspect of system behavior, from latency to user experience.

🔹 Structured vs Unstructured Data

Type Description Examples Common Storage
Structured Data Organized, well-defined schema (rows/columns) User tables, transactions SQL Databases (MySQL, PostgreSQL)
Unstructured Data No fixed schema or structure Images, videos, logs, emails NoSQL Databases, Object Storage (S3)

👉 Semi-structured data like JSON or XML sits between these two, used in document stores (e.g., MongoDB).

🔹 Categories of Storage

Category Description Examples
Block Storage Raw volumes used by servers to store data as blocks AWS EBS, SAN
File Storage Hierarchical structure (files & folders) Network File System (NFS), SMB
Object Storage Data stored as objects with metadata Amazon S3, Google Cloud Storage
Database Storage Data stored in structured or semi-structured form MySQL, MongoDB, Cassandra, DynamoDB
In-memory Storage Fast, temporary data storage in RAM Redis, Memcached

🔹 Storage Properties

When designing data systems, consider the following core properties:

  1. Durability – Data persists even after failures.
  2. Availability – Data can be accessed when needed.
  3. Consistency – All users see the same version of data.
  4. Latency – Time taken to read/write data.
  5. Scalability – Ability to handle more data and traffic.
  6. Fault Tolerance – System continues working during hardware/network failures.

🔹 The Trade-offs in Storage Design

No storage system is perfect — you must balance between:

  • Performance (Speed)
  • 🧩 Consistency
  • 🌍 Availability
  • 💰 Cost

Example trade-offs:

  • Caching improves speed but may reduce consistency.
  • Replication increases availability but raises cost and complexity.

👉 The right choice depends on business needs — e.g., a banking app values consistency, while a video platform values availability.

🔹 The CAP Theorem

The CAP Theorem (by Eric Brewer) states that a distributed system cannot simultaneously guarantee all three of the following:

Property Description
Consistency (C) Every read gets the latest write or an error.
Availability (A) Every request receives a non-error response.
Partition Tolerance (P) System continues to work despite network failures.

📘 A distributed system must choose any two out of the three (C, A, P).

🔹 Types of Systems Based on CAP Trade-offs

Type Prioritizes Example Systems Use Case
CP (Consistency + Partition Tolerance) Consistency over availability HBase, MongoDB (configured), Zookeeper Banking, financial systems
AP (Availability + Partition Tolerance) Availability over consistency Cassandra, DynamoDB, CouchDB Social media, messaging apps
CA (Consistency + Availability) Works only when no partition exists (non-distributed) Traditional RDBMS (MySQL, PostgreSQL) Single-node, local systems

🧠 Summary

Concept Description
Storage Manages how data is persisted and accessed
Structured Data Fixed schema (tables, rows)
Unstructured Data No schema (media, logs)
CAP Theorem Choose 2 of: Consistency, Availability, Partition Tolerance
Trade-off Goal Balance performance, cost, and reliability

Here’s a structured and engaging explanation for your “Understanding Data Models: SQL vs NoSQL” topic — ideal for a video script, ebook section, or interview preparation notes 👇


🗄️ Understanding Data Models: SQL vs NoSQL

🌟 Introduction

In modern system design, databases are the backbone of any application. They store, organize, and manage data — ensuring it can be efficiently retrieved, updated, or deleted. Choosing the right data model is crucial because it directly affects scalability, performance, and maintainability of your system.

💾 What is a Database?

A database is an organized collection of structured information, or data, typically stored electronically in a computer system. Databases allow applications to store, manage, and query data efficiently using a Database Management System (DBMS).

🧱 Relational Databases (SQL)

🔹 Introduction

A Relational Database stores data in tables — rows and columns — much like a spreadsheet. Each row represents a record, and each column represents an attribute of that record.

Relational databases follow a structured schema, meaning data types, relationships, and constraints are predefined.

Examples: MySQL, PostgreSQL, Oracle, Microsoft SQL Server

🔹 Core Concepts

  1. Tables → Organize data into rows (records) and columns (fields)

  2. Primary Key → Uniquely identifies each record in a table

  3. Foreign Key → Defines relationships between tables

  4. Joins → Combine data from multiple tables

  5. ACID Properties:

    • Atomicity → All or nothing transactions
    • Consistency → Data remains valid after any operation
    • Isolation → Transactions do not interfere
    • Durability → Data is permanently saved even after a crash

🔹 Limitations of Relational Databases

  • Rigid Schema – Changing structure (adding columns, new relations) can be hard
  • Horizontal Scaling is Difficult – Scaling across multiple servers is complex
  • Performance Bottlenecks – Joins and transactions can slow down large-scale apps
  • Unstructured Data – Not suitable for JSON, images, or social feeds

⚙️ NoSQL Databases

🔹 Introduction

NoSQL (Not Only SQL) databases were designed to overcome the scalability and flexibility limits of traditional relational databases.

They offer schema-less, distributed, and high-performance data storage — ideal for big data, real-time analytics, and microservices.

Examples: MongoDB, Cassandra, DynamoDB, Redis, Neo4j

🔹 Four Main Types of NoSQL Databases

Type Description Example Use Case Examples
Document Store Stores data in JSON-like documents E-commerce product catalogs MongoDB, CouchDB
Key-Value Store Simple key-value pairs for ultra-fast lookups Caching, session management Redis, DynamoDB
Column-Family Store Stores data in columns (optimized for queries on large datasets) Analytics, event tracking Cassandra, HBase
Graph Database Stores data as nodes and relationships Social networks, recommendation systems Neo4j, ArangoDB

🔹 BASE Properties in NoSQL

NoSQL systems usually relax ACID guarantees and follow BASE principles:

  • Basically Available → System guarantees availability
  • Soft State → Data may change over time (eventual consistency)
  • Eventual Consistency → Data will become consistent after some delay

🔹 CAP Theorem (Revisited)

The CAP Theorem states that a distributed system can only guarantee two of the following three:

  • Consistency (C): All clients see the same data at the same time
  • Availability (A): Every request receives a response (success/failure)
  • Partition Tolerance (P): System continues to operate despite network failures
System Type Focus Example
CA Consistency + Availability Traditional RDBMS (Single node)
CP Consistency + Partition tolerance MongoDB, HBase
AP Availability + Partition tolerance Cassandra, DynamoDB

⚖️ When to Use What?

Scenario Use SQL Use NoSQL
Complex relationships & transactions
Fixed schema & structured data
Unstructured or semi-structured data
High read/write throughput
Real-time analytics or caching
Strong consistency required ⚠️
Horizontal scalability priority ⚠️

🧩 Real-world Example

  • SQL Example: Banking systems, ERP, or inventory management — where accuracy and consistency are critical.
  • NoSQL Example: Social media apps, e-commerce recommendations, or IoT — where flexibility and scale matter more.

💡 Summary

Feature SQL NoSQL
Schema Fixed Dynamic
Scalability Vertical Horizontal
Consistency Strong (ACID) Eventual (BASE)
Query Language SQL Varies (JSON, APIs)
Best For Structured data, relationships Big data, flexibility, scalability

⚙️ Advanced Database Topics

🚀 Scaling Strategies — SQL vs NoSQL

As applications grow, databases must handle more users, more data, and faster requests. Scaling ensures the system can handle this growth without performance degradation.

There are two main strategies for scaling databases:

⚡ 1. Vertical Scaling (Scale-Up)

Definition: Adding more power (CPU, RAM, SSDs) to a single database server.

Example: Upgrading from a 4-core, 16GB RAM machine to an 8-core, 64GB RAM one.

✅ Advantages:

  • Simple to implement (no changes in app logic)
  • Good for small to medium workloads
  • Maintains strong consistency (single-node system)

❌ Disadvantages:

  • Hardware limits (can’t scale infinitely)
  • Cost increases exponentially
  • Single point of failure (if the machine goes down)

Common in: Traditional SQL databases like MySQL or PostgreSQL.

🌐 2. Horizontal Scaling (Scale-Out)

Definition: Adding more servers and distributing the data among them.

Example: Instead of one big MySQL instance, have multiple smaller ones handling different parts of the data.

✅ Advantages:

  • Virtually infinite scalability
  • Better fault tolerance
  • Improved read/write throughput

❌ Disadvantages:

  • Complex to manage and maintain
  • Requires data partitioning or sharding
  • Consistency challenges in distributed systems

Common in: Modern NoSQL databases like MongoDB, Cassandra, or DynamoDB (though SQL systems can also scale horizontally with sharding or replication).

💡 Summary Table

Feature Vertical Scaling Horizontal Scaling
Method Add resources to one server Add more servers
Complexity Low High
Cost Increases quickly Scales gradually
Performance Limited by one node Distributed
Fault tolerance Low High
Common in SQL NoSQL

🔁 What is Replication?

Replication means copying data from one database server to another to ensure availability, redundancy, and performance.

It allows multiple database copies to exist — usually one leader (primary) and multiple followers (replicas).

👑 Leader–Follower Replication

How it works:

  • All writes go to the leader (primary) database.
  • The leader replicates data changes to one or more followers (replicas).
  • Followers can serve read-only queries, reducing load on the leader.

✅ Benefits:

  • Improved read scalability
  • High availability and fault tolerance
  • Disaster recovery (if leader fails, a replica can take over)

❌ Drawbacks:

  • Replication lag (followers may be slightly behind)
  • More complex failover management

Used in: MySQL, PostgreSQL, MongoDB, Cassandra, DynamoDB

📚 Read Replicas

Definition: A read-only copy of your main database that handles read traffic to improve performance.

Example Use Case: In an e-commerce app — product details and listings can be served from replicas, while checkout operations still write to the main leader.

✅ Advantages:

  • Balances load
  • Reduces latency
  • Supports analytics queries without affecting the main DB

❌ Disadvantages:

  • Not suitable for write-heavy workloads
  • Slight data staleness due to replication lag

🧩 What is Sharding?

Sharding is the process of splitting a large database into smaller, faster, more manageable parts called shards.

Each shard holds a subset of data, and together they form the full dataset.

Example: If you have 10 million users, you can store:

  • Shard 1 → Users 1–2 million
  • Shard 2 → Users 2–4 million
  • …and so on.

🧱 Types of Sharding

Type Description Example
Horizontal Sharding Distribute rows across shards Users A–M in one shard, N–Z in another
Vertical Sharding Split tables/columns across databases User info in one DB, transactions in another

🎯 Sharding Strategies

  1. Key/Hash-Based Sharding

    • Use a hash function on a shard key (e.g., user_id % 4)
    • Evenly distributes data but hard to re-shard later
  2. Range-Based Sharding

    • Store data in a range (e.g., user_id 1–10000 in shard 1)
    • Easy to query by range but uneven data growth can cause hotspots
  3. Directory/Lookup Sharding

    • Maintain a central directory that maps each key to its shard
    • Flexible but introduces a single point of failure
  4. Geo-Sharding

    • Data is partitioned by geographical region
    • Reduces latency for users in specific areas

⚖️ Sharding Trade-offs

Pros Cons
Infinite scalability Complex data management
Faster queries Difficult joins across shards
Improved availability Rebalancing overhead
Localized failures Application-level routing needed

🧬 Polyglot Persistence

Definition: Polyglot persistence means using different types of databases for different components of the same system — based on the strengths of each.

Rationale: No single database fits all use cases.

Example in an E-commerce system:

Component Best Fit Database
User Accounts Strong consistency PostgreSQL
Product Catalog Flexible schema MongoDB
Shopping Cart Fast access Redis
Analytics High throughput Cassandra or BigQuery

✅ Benefits:

  • Optimized for specific workloads
  • Better performance and scalability
  • Technology flexibility

❌ Challenges:

  • More complex data architecture
  • Data consistency across systems

🧠 Summary Table

Concept Description Best For
Vertical Scaling Add more power to one machine Small systems
Horizontal Scaling Add more servers Large, distributed apps
Replication Copy data across multiple servers Read-heavy workloads
Sharding Split large datasets High data volume
Polyglot Persistence Use multiple databases Complex, modular systems

🗃️ Object Storage in Modern Systems

🧠 What is Object Storage?

Object Storage is a modern data storage architecture that stores data as objects, rather than files or blocks. Each object contains:

  • The data itself (e.g., image, video, backup)
  • Metadata (descriptive information about the data)
  • A unique identifier (key)

Unlike traditional file systems (which use folders) or block storage (which uses sectors), object storage places everything in a flat, scalable storage pool accessed via APIs (usually HTTP-based).

🔑 Key Concepts in Object Storage

Concept Description
Object The fundamental unit — contains data, metadata, and an ID.
Bucket / Container Logical grouping of objects (similar to a folder but flat).
Metadata Custom data describing the object (e.g., file type, owner, creation date).
Object ID / Key A unique key to retrieve the object (like a URL).
API Access Objects are accessed via RESTful APIs — typically PUT, GET, DELETE.
Flat Namespace No directory hierarchy — simplifies scaling and searching.

☁️ Popular Object Storage Platforms

Platform Provider Highlights
Amazon S3 (Simple Storage Service) AWS Industry standard, supports versioning, lifecycle policies
Google Cloud Storage (GCS) Google Cloud Multi-regional replication, strong consistency
Azure Blob Storage Microsoft Azure Tight integration with Azure ecosystem
MinIO Open Source S3-compatible, lightweight, deployable anywhere
Ceph Object Gateway Open Source Scalable and fault-tolerant for on-prem setups
DigitalOcean Spaces / Backblaze B2 Cloud Providers Cost-effective and S3-compatible

💼 Common Use Cases

Use Case Description
Backup & Archival Durable, low-cost storage for backups and historical data.
Media Storage Store large, unstructured assets like images, videos, audio files.
Big Data & Analytics Store raw data for processing with tools like Spark or Presto.
Static Website Hosting Serve HTML, CSS, JS directly via public URLs (e.g., AWS S3 static hosting).
Machine Learning Data Lakes Centralized object storage for training data and model artifacts.
Application Data Store logs, user-generated content, and configuration data.

⚖️ Important Considerations & Trade-offs

Factor Advantages Trade-offs
Scalability Infinitely scalable with flat namespace Slower than local/block storage
Durability Data replicated across regions Higher latency for frequent small reads
Cost Low cost per GB Egress (data out) costs can add up
Access Pattern Ideal for large, infrequent reads/writes Not suitable for high IOPS workloads (e.g., databases)
Consistency Most systems now offer strong consistency May vary between providers
Integration Easy API-based access Limited local filesystem compatibility

🧩 Summary

Feature Object Storage
Data Structure Objects with metadata and unique IDs
Access Method RESTful APIs (HTTP/S3-compatible)
Scalability Virtually unlimited
Best For Unstructured data (media, backups, analytics)
Example Platforms AWS S3, GCS, Azure Blob, MinIO
Main Trade-off High scalability vs higher latency

📂 File System and Distributed Storage

🧠 What is a File System?

A File System is the method and structure an operating system uses to store, organize, and manage files on storage devices (like HDDs, SSDs). It defines how data is named, stored, retrieved, and organized into files and directories.

Common examples: NTFS, ext4, HFS+, FAT32.

🔑 Key Characteristics of Traditional File Systems

Feature Description
Hierarchical Structure Organizes files in directories and subdirectories.
Metadata Management Stores attributes like size, permissions, timestamps.
Access Control Supports permissions (read/write/execute) for users and groups.
Mounting Each file system is mounted on a local device or partition.
Consistency Uses journaling or logs to maintain integrity after crashes.
Performance Optimized for single-node storage access.

✅ Traditional file systems are ideal for local storage and single-server workloads.

🌐 What is a Distributed File System (DFS)?

A Distributed File System (DFS) allows files to be stored across multiple servers or nodes but appear to users as a single unified file system. It enables data sharing, scalability, and fault tolerance across large-scale systems.

Examples:

  • Google File System (GFS)
  • Hadoop Distributed File System (HDFS)
  • CephFS
  • GlusterFS

⚙️ DFS Architecture and How Replication Works

Core Components:

  1. Name Node / Metadata Server – Tracks file locations, directories, and metadata.
  2. Data Nodes / Storage Nodes – Store actual file blocks or chunks.
  3. Client – Interacts with the DFS, reading/writing files using DFS APIs.

Workflow Example (HDFS-like):

  1. A file is split into chunks (e.g., 128MB).
  2. Each chunk is replicated (usually 3 copies) across different nodes.
  3. The Name Node maps files → chunks → node locations.
  4. When a node fails, replicas are used to reconstruct the missing data.

Replication Benefits:

  • Ensures fault tolerance
  • Enables load balancing for reads
  • Improves data locality (clients can read from the nearest node)

📈 Scalability and Fault Tolerance

Property Description
Horizontal Scalability Add more nodes to increase capacity and throughput.
Data Replication Redundant copies maintain availability despite node failures.
Automatic Recovery Failed nodes or lost chunks are detected and rebuilt automatically.
High Throughput Parallel access to distributed blocks boosts read/write speed.
Fault Isolation Node-level failures don’t impact the entire file system.

Trade-off: DFS introduces complexity in metadata management and network latency, especially during coordination between nodes.

🧩 Summary

Concept Traditional File System Distributed File System
Scope Single machine Multiple machines/nodes
Scalability Limited High (horizontal scaling)
Fault Tolerance Low High (replication, recovery)
Examples NTFS, ext4 HDFS, GFS, CephFS
Access Local Network-based
Use Case Personal systems, small apps Big Data, Cloud Storage, Large-scale apps

🧱 Block vs File vs Object Storage – Comparison and When to Use Each

🧠 Introduction

Modern systems rely on different types of storage architectures — Block, File, and Object — each optimized for specific workloads, scalability levels, and access patterns. Understanding their differences helps in choosing the right storage type for databases, applications, and large-scale systems.

🔹 Block Storage

Definition: Block storage splits data into fixed-size blocks and stores them separately, each with a unique identifier. The OS assembles these blocks when reading or writing data.

Common Use: Databases, virtual machines, high-performance workloads.

Examples: Amazon EBS, iSCSI, SAN, NVMe storage.

Characteristics:

  • Acts like a raw disk to the OS.
  • High IOPS and low latency.
  • Managed at the block level by the system or application.

🔹 File Storage

Definition: File storage organizes data into files and directories in a hierarchical structure. It’s the most traditional and user-friendly storage model.

Common Use: Shared drives, user directories, content management.

Examples: NFS, SMB, ext4, NTFS.

Characteristics:

  • Accessed via file paths (/home/data/report.pdf).
  • Simple to manage but limited in scalability.
  • Good for small to medium-sized systems.

🔹 Object Storage

Definition: Object storage manages data as objects with metadata and unique IDs, in a flat namespace. It’s ideal for massive unstructured data and accessed via APIs (HTTP/S3).

Common Use: Cloud storage, backups, media, analytics.

Examples: AWS S3, Google Cloud Storage, Azure Blob, MinIO.

Characteristics:

  • Highly scalable and durable.
  • Optimized for sequential reads/writes.
  • API-based access (not a file system).

⚖️ Comparison Table

Feature Block Storage File Storage Object Storage
Data Structure Blocks Files & Folders Objects with metadata
Access Method Low-level, via OS File path REST APIs (HTTP/S3)
Performance Very high Moderate High for large objects
Scalability Limited Medium Virtually unlimited
Use Case Databases, VMs File sharing, small apps Backups, media, data lakes
Consistency Strong Strong Eventually / Strong (depends)
Cost High Medium Low per GB
Latency Low Medium Higher (network overhead)
Metadata Minimal File attributes Rich, customizable
Examples AWS EBS, SAN NFS, SMB AWS S3, GCS

🧩 When to Use Each

Scenario Best Storage Type Reason
Database (SQL, NoSQL) Block Low-latency, high IOPS required
Application File Sharing File Simple file/directory structure
Backup / Archival / Logs Object Cost-effective, durable
Big Data / Analytics Object Scalable, accessible via APIs
Virtual Machine Storage Block Direct disk-level access
Content Delivery (Media, Assets) Object Ideal for large unstructured data

🧠 Summary

  • Block Storage: High performance, low latency, ideal for structured, transactional workloads.
  • File Storage: Simple and familiar for users, good for collaborative or mid-scale apps.
  • Object Storage: Highly scalable, cost-efficient, best for unstructured or cloud-native workloads.

🧩 Big Data Fundamentals – 30,000 Feet Overview

🧠 What is Big Data?

Big Data refers to data sets that are too large, fast, or complex for traditional data processing systems to handle efficiently. It involves storing, processing, and analyzing vast amounts of structured, semi-structured, and unstructured data to extract insights and drive decisions.

Examples:

  • Social media feeds
  • IoT sensor data
  • Financial transactions
  • Clickstream logs

Big Data is not just about size — it’s about volume, velocity, and variety of data, and how we manage it effectively.

🔹 The 6Vs of Big Data

V Meaning Description
Volume Amount of data Terabytes to petabytes generated daily.
Velocity Speed of data generation Real-time streams from devices, sensors, apps.
Variety Types of data Structured (SQL), semi-structured (JSON), unstructured (images, videos).
Veracity Data quality Ensuring accuracy and reliability of massive, noisy data.
Value Business relevance Extracting actionable insights from raw data.
Variability Data inconsistency Fluctuating data rates and formats across sources.

⚠️ Why Traditional Storage Fails at Scale

Traditional relational databases (RDBMS) and file systems struggle because:

  1. Vertical scaling limits — adding CPU/RAM is costly and finite.
  2. Rigid schema — can’t handle unstructured or evolving data formats.
  3. Performance degradation — queries slow down as data grows.
  4. Single-node design — can’t distribute processing efficiently.
  5. Storage bottlenecks — limited by local disks and file system hierarchy.

✅ Hence, Big Data systems use distributed storage (like HDFS, S3) and parallel processing frameworks (like Spark, Hadoop).

⚙️ Common Big Data Workloads

Workload Description Tools
Data Ingestion Collecting data from various sources Kafka, Flume, NiFi
Batch Processing Processing large datasets at intervals Hadoop MapReduce, Spark
Stream Processing Real-time event handling Kafka Streams, Flink, Spark Streaming
Data Storage Distributed and scalable storage HDFS, S3, Cassandra
Data Querying Query large datasets interactively Presto, Hive, BigQuery
Analytics & ML Insights, dashboards, model training Spark MLlib, TensorFlow

🔄 Batch vs Stream Processing

Feature Batch Processing Stream Processing
Data Type Historical / accumulated Real-time / continuous
Latency Minutes to hours Milliseconds to seconds
Tools Hadoop, Spark Kafka Streams, Flink
Use Case Monthly sales report Fraud detection, live analytics
Processing Mode Process data in chunks Process data event-by-event

Hybrid systems (like Spark Structured Streaming) combine both — offering real-time insights with historical context.

🧠 Summary

Big Data represents the shift from single-server systems to distributed architectures, built for scalability, speed, and flexibility. It powers analytics, AI, and decision-making in modern large-scale applications.

System Performance

Introduction to System Performance

System performance refers to how efficiently a system handles requests, processes data, and responds to users under different workloads. In system design, performance ensures that applications remain fast, stable, and reliable as they scale.

What is Performance in System Design?

Performance measures how well a system meets expected speed, efficiency, and reliability goals. Key areas include:

  • Response speed
  • Ability to handle load
  • Resource utilization
  • Consistency under peak traffic

Latency vs Throughput

  • Latency: Time taken to process a single request (speed per request).
  • Throughput: Number of requests the system can process per second (overall capacity).

High throughput does not always mean low latency.

Scalability vs Responsiveness

  • Scalability: Ability of a system to handle increased load by adding more resources.
  • Responsiveness: How quickly a system reacts to user requests.

A system can scale well but still be slow if poorly designed; both must be balanced.

Measuring Performance (SLA, SLO, SLI)

Service-Level Agreement (SLA)

A formal contract defining expected performance guarantees (e.g., 99.9% uptime).

Service-Level Objective (SLO)

Internal target or goal that the team aims to meet (e.g., 200ms response time).

Service-Level Indicator (SLI)

Actual measured metrics from the system (e.g., current availability = 99.93%).

Understanding Percentiles

Percentiles show how performance varies across many requests. Common examples:

  • P50: Median response time
  • P90 / P95: Slowest 10% / 5%
  • P99: Worst 1% — critical for user experience

Percentiles reveal tail latency, not visible in averages.

Why Performance Matters in Modern Applications?

  • Users expect instant responses.
  • Directly affects engagement, retention, conversions.
  • Essential for real-time systems like trading, gaming, and messaging.
  • Impacts cloud cost and infrastructure planning.
  • Determines reliability during peak traffic.

Performance Testing Overview

Types of performance tests include:

  • Load testing — normal expected load
  • Stress testing — extreme load
  • Spike testing — sudden traffic bursts
  • Endurance testing — long-running workloads

Goal: identify bottlenecks before production.

Introduction to Performance Monitoring

Performance monitoring helps track system health in real time using metrics such as:

  • CPU, memory, disk I/O
  • Request latency
  • Error rates
  • Throughput
  • Network usage

Tools: Prometheus, Grafana, Datadog, New Relic.

Monitoring ensures early detection of issues and stable system behavior.


Caching for Speed Optimization

Why Caching Matters

Caching improves performance by storing frequently accessed data closer to the user or application, reducing:

  • Latency
  • Load on databases
  • Network round trips
  • Overall infrastructure cost

It helps systems scale efficiently while delivering faster responses.

Types of Caching

  • Client-side caching (browser, mobile app)
  • CDN caching (edge caching of static assets)
  • Application-level caching (in-memory like Redis, Memcached)
  • Database caching (query caching, materialized views)
  • Distributed caching (shared cache across multiple servers)

Caching Strategies

  • Read-through Cache sits in front of DB; data is fetched and stored automatically.
  • Write-through Writes go to cache and DB simultaneously.
  • Write-back / Write-behind Writes go to cache first, DB updated asynchronously.
  • Cache-aside (Lazy loading) Application checks cache first; if miss → fetch from DB and store.

Caching Eviction Policies

Used when cache is full and data must be removed:

  • LRU (Least Recently Used)
  • LFU (Least Frequently Used)
  • FIFO (First In First Out)
  • Random eviction
  • TTL-based eviction (expire after time)

Redis Overview

Redis is an in-memory, distributed key-value store known for high speed. Key features:

  • Millisecond-level latency
  • Supports data structures (strings, lists, sets, hashes, sorted sets)
  • Pub/sub messaging
  • Persistence modes (RDB, AOF)
  • TTL support for key expiration
  • Widely used for caching, rate limiting, session storage, leaderboards

Real-World Caching Applications

  • Session caching (login sessions)
  • Feed caching (social media timelines)
  • Product catalog caching (e-commerce)
  • API response caching
  • Database query caching
  • Leaderboard & ranking caching (gaming)
  • Rate limiting counters
  • Geolocation & configuration data caching

Messaging & Queues for Decoupling

Why Use Asynchronous Messaging?

Asynchronous messaging helps systems communicate without waiting for immediate responses. It provides:

  • Decoupling between services (producer doesn’t need to know consumer details)
  • Improved performance by offloading heavy tasks
  • Better reliability via retry and persistent queues
  • Smoother traffic handling during spikes
  • Scalability by processing messages at controlled pace

Key Concepts of a Messaging System

  • Producer: Sends messages to a queue or topic.
  • Consumer: Reads and processes messages.
  • Queue: Stores messages in order until consumed.
  • Topic: Broadcasts messages to multiple subscribers (pub/sub).
  • Message: Unit of data transmitted between services.
  • Broker: Middleware that routes, stores, and delivers messages.
  • Ack / Nack: Consumers acknowledge or reject messages.
  • Offsets: Position markers in message streams (Kafka-style).

Visualizing a Decoupled Architecture

Client → Producer → Message Broker → Consumer → Database/Service

Key property: Producers and consumers scale independently, improving resilience and throughput.

When to Use Queues in Architecture

Use queues when you need:

  • To process tasks asynchronously
  • To handle burst traffic smoothly
  • To perform long-running or heavy operations
  • To decouple microservices
  • To ensure reliable delivery
  • To enable event-driven architectures
  • To avoid blocking APIs (e.g., uploading video, generating reports)

Popular Message Brokers: RabbitMQ vs Kafka

RabbitMQ

  • Type: Message Queue (AMQP)

  • Focus: Reliable message delivery

  • Strengths:

    • Complex routing (exchanges)
    • Suitable for job queues and task distribution
    • Good for small to medium throughput
  • Typical Use Cases:

    • Background jobs
    • Notification services
    • Task workers

Kafka

  • Type: Distributed log + event streaming platform

  • Focus: High throughput + real-time streaming

  • Strengths:

    • Persistent message log
    • Horizontal scalability
    • Millions of messages per second
  • Typical Use Cases:

    • Event-driven systems
    • Log aggregation
    • Real-time analytics
    • Stream processing

Delivery Guarantees

  • At most once Message is delivered once; may be lost.
  • At least once Message is retried until acknowledged; may be processed twice.
  • Exactly once Ensures no duplicates; hard to achieve, requires idempotent consumers.

Common Use Cases of Messaging Queues

  • Order processing pipelines
  • Email/SMS notification systems
  • Payment processing
  • Log aggregation and analytics
  • Video/image processing workflows
  • IoT event streams
  • Microservices communication
  • Chat systems
  • Real-time monitoring & metrics pipelines

Best Practices for Using Messaging Queues

  • Keep messages small and self-contained
  • Ensure idempotency in consumers
  • Use dead-letter queues (DLQ) for failed messages
  • Set proper retry policies
  • Monitor queue length, lag, and consumer health
  • Partition data wisely (Kafka) for parallelism
  • Avoid putting extremely large payloads directly in queues
  • Secure brokers with authentication and encryption
  • Tune prefetch/consumer concurrency based on workload

Concurrency and Parallelism

What Is Concurrency?

Concurrency is the ability of a system to handle multiple tasks at the same time by switching between them. Tasks appear to run simultaneously even if executed on a single CPU core. It’s about managing multiple tasks, not executing them at the same moment.

What Is Parallelism?

Parallelism means executing multiple tasks simultaneously using multiple CPU cores. It’s about actual simultaneous execution, not interleaving.

In short:

  • Concurrency → dealing with many tasks
  • Parallelism → doing many tasks at the same time

Processes and Threads

Process

  • Independent execution unit
  • Has its own memory space
  • Heavyweight

Thread

  • Lightweight execution unit inside a process
  • Shares memory with other threads
  • Communication is easier but needs synchronization

Thread Pool & Worker Model

Instead of creating a thread per request, systems use a thread pool:

  • Pre-created threads wait for tasks
  • Tasks are distributed to available workers
  • Improves performance, avoids overhead, prevents crashing under load

Used in: Node.js worker threads, Java ExecutorService, Python concurrent.futures, Nginx worker model.

Asynchronous Processing

Async systems free the thread while waiting for I/O operations like:

  • Network calls
  • Database operations
  • File reads

This improves scalability because threads do not block unnecessarily.

Examples:

  • Node.js async I/O
  • Python async/await
  • Java CompletableFuture

Concurrency in Web Servers

Different web servers handle concurrency differently:

  • Thread-per-request model (Tomcat, Spring) Each request gets its own thread.
  • Event-driven model (Node.js, Nginx) One thread handles many requests using async callbacks.
  • Hybrid worker model (FastAPI + Uvicorn, Go net/http) Mix of concurrency primitives like goroutines, green threads, or event loops.

Common Pitfalls

  • Race conditions (multiple threads modifying shared data)
  • Deadlocks (threads waiting on each other forever)
  • Starvation (a task never gets resources)
  • Shared memory corruption
  • Too many threads causing context-switch overhead
  • Improper locking reducing performance

Best Practices & Real World Examples

Best Practices

  • Use locks only when necessary
  • Prefer immutable data structures
  • Use thread pools, not unlimited threads
  • Keep tasks small and stateless
  • Avoid sharing data unnecessarily
  • Use idempotent consumers in distributed systems
  • Apply timeouts and circuit breakers

Real-World Examples

  • Web servers handling thousands of concurrent users
  • Workers processing tasks from message queues
  • Database connection pools shared across threads
  • Kafka consumers running in parallel partitions
  • Go goroutines serving microservices at scale

Database Performance Optimization Techniques

Replication

Replication means copying data across multiple database servers.

Benefits

  • Improves read performance (read from replicas)
  • Provides high availability
  • Enables failover during primary DB failure

Types

  • Leader–Follower (Master–Slave)
  • Multi-leader
  • Leaderless (like DynamoDB)

Sharding and Partitioning Strategies

Sharding splits large datasets across multiple servers to improve scalability.

Common Strategies

  • Range-based: Based on value ranges (e.g., dates, IDs)
  • Hash-based: Hash function distributes keys uniformly
  • Geo/Zone-based: Partition by region
  • Directory-based: Metadata service decides shard placement

Benefits

  • Distributes load
  • Reduces hot spots
  • Allows horizontal scaling

CAP Theorem

In distributed systems, you can choose only two at a time:

  • Consistency: All nodes see the same data
  • Availability: System responds even if some nodes fail
  • Partition Tolerance: System continues during network splits

System Types

  • CP: Prioritize consistency (e.g., HBase)
  • AP: Prioritize availability (e.g., Cassandra)
  • CA: Rare in distributed systems

Indexes: Types & Use Cases

Indexes improve query speed by avoiding full table scans.

Types

  • B-Tree Index: General queries, range queries
  • Hash Index: Fast equality lookups
  • Full-Text Index: Searching text fields
  • Composite Index: Multi-column queries
  • Inverted Index: Used in search engines

Use Cases

  • WHERE conditions
  • JOIN performance
  • Sorting (ORDER BY)
  • Avoid full scans on large tables

Normalization vs Denormalization — When to Use Each

Normalization

Organizing data into relational tables to reduce redundancy. Use When:

  • Data integrity is critical
  • Write-heavy systems
  • Avoid anomalies (update, insert, delete)

Denormalization

Combining tables for faster reads. Use When:

  • Read-heavy workloads
  • Analytics, dashboards
  • Need fewer JOINs in high-traffic systems
  • Caching aggregated data

Additional Techniques

1. Connection Pooling

Reuses existing DB connections instead of opening new ones. Benefits:

  • Reduces connection overhead
  • Improves throughput
  • Prevents DB overload

2. Query Optimization

  • Avoid SELECT *
  • Use proper indexes
  • Rewrite inefficient JOINs
  • Limit result size
  • Analyze slow queries (EXPLAIN, query planner)

3. Materialized Views

Precomputed, stored query results. Use Case:

  • Analytics and dashboards
  • Expensive aggregations (SUM, COUNT, GROUP BY)
  • Reducing load in read-heavy systems

4. Batching

Send multiple operations in one request (insert/update). Benefits:

  • Fewer network calls
  • Better throughput Use Case:
  • Logging events
  • Bulk inserts

5. Pagination

Load results in chunks. Benefits:

  • Avoids loading millions of rows
  • Reduces memory usage Techniques:
  • OFFSET/LIMIT
  • Keyset pagination (better for large datasets)

🛡️ Introduction to System Reliability

Why Reliability Matters?

Reliability determines whether a system consistently works as users expect. In modern systems:

  • Users demand near-zero downtime.
  • Businesses lose money for every second of failure (Amazon = millions/minute).
  • Reputation, trust, and customer retention heavily depend on reliability.
  • Many systems (payments, healthcare, transportation) are mission critical.

High reliability = better user experience, lower maintenance cost, and stable operations.

🧩 What is System Reliability?

System reliability is the ability of a system to function correctly over time, without failures.

It focuses on:

  • Availability (Is the system up and reachable?)
  • Durability (Does the system protect data?)
  • Fault tolerance (Can it survive failures?)
  • Recoverability (How fast can it come back?)

In short:

👉 A reliable system keeps running even when things break.

📏 Key Metrics – MTBF & MTTR

Reliability is measured by two important metrics:

🕒 MTBF – Mean Time Between Failures

  • Measures time between two failures.
  • Higher MTBF → More reliable system.

Example: MTBF = 1000 hours → on average, system fails once every 1000 hours.

🔧 MTTR – Mean Time To Repair

  • Measures how quickly a system recovers after failure.
  • Lower MTTR → Faster recovery → Higher availability.

Example: MTTR = 10 minutes → system comes back online in 10 minutes after a failure.

📄 What Are SLAs? (Service Level Agreements)

SLA is a promise made by the service provider about its reliability.

Typical SLA Guarantees:

  • Uptime/Availability (ex: 99.9%, 99.99%, 99.999%)
  • Performance (latency/response times)
  • Support response times
  • Data durability

Availability levels:

SLA Downtime Allowed per Year
99% ~3.65 days
99.9% ~8.7 hours
99.99% ~52 minutes
99.999% ~5 minutes

🧍‍♂️ Availability vs Durability

These two terms are often confused:

🔓 Availability

System is accessible and working.

Example: If your database node crashes but a replica takes over immediately → Availability maintained.

🔐 Durability

Data remains safe and never lost.

Example: Data is written to 3 replicas → even if one fails, data is still safe.

Quick Difference:

Availability Durability
System uptime Data permanence
Achieved using redundancy, load balancing Achieved using replication, backups
Temporary downtime allowed Data loss NOT allowed

🌎 Impact of Reliability on System Design

Reliability influences major design choices:

  • Multi-region architectures
  • Replication strategies
  • Load balancing
  • Failover mechanisms
  • Eventual consistency models
  • Monitoring + Alerts
  • Retry logic, idempotency

A reliable system requires more complexity, cost, and infrastructure.

🌐 Reliability in Distributed Systems

Challenges & Solutions

Distributed systems face unique reliability issues:

Challenges

  • Network failures
  • Partial system failures
  • Node crashes
  • Data inconsistencies (due to CAP theorem)
  • Clock skew & distributed time
  • Split-brain scenarios
  • Difficult debugging

Solutions

Challenge Solution
Node failures Replication, auto-healing
Traffic spikes Autoscaling
Inconsistency Quorums, consensus (Raft, Paxos)
Slow components Timeouts, circuit breakers
Routing failures Load balancers, retries, idempotency
Data loss Write-ahead logs, backups, multi-region writes

☁️ Reliability in Cloud-Native Systems

Cloud-native systems rely heavily on:

  • Microservices
  • Containers (Docker)
  • Orchestration (Kubernetes)
  • Service meshes
  • Autoscaling
  • Self-healing infrastructure

Cloud providers offer built-in features:

  • Multi-zone deployment
  • Auto-restarts
  • ReplicaSets & StatefulSets
  • Managed databases with failover
  • Observability stacks (logs, metrics, traces)

Cloud-native reliability = Automation + Redundancy + Monitoring

🎯 Summary

Topic Key Idea
Why reliability matters Prevent failures, protect business
What is system reliability Consistent correct behavior
MTBF Time between failures
MTTR Time to recover
SLA Promised uptime/performance
Availability vs Durability Uptime vs Data safety
Distributed reliability challenges Network, partial failure, consistency
Cloud-native reliability Self-healing + autoscaling + redundancy

🟩 High Availability, Fault Tolerance & Failover — System Design Notes

1️⃣ High Availability (HA)

High Availability means your system stays up and accessible almost all the time—even during failures.

HA is measured using availability %, like:

  • 99% → 3.65 days downtime/year
  • 99.9% → 8.7 hours downtime
  • 99.99% → 52 minutes
  • 99.999% → 5 minutes

Goal: Ensure services continue running despite failures by adding redundancy + failover + monitoring.

2️⃣ Fault Tolerance

Fault tolerance means your system keeps working even if a component completely fails.

  • A fault-tolerant system is designed so failure of one part does NOT affect the whole system.
  • Usually achieved using redundant components, e.g., multiple replicas, multi-AZ, RAID, replication, clustering.

Fault tolerance = Zero downtime during a failure.

3️⃣ Failover

Failover is the automatic switching of traffic from a failed component to a healthy component.

Examples:

  • If primary DB crashes → read replica becomes primary automatically.
  • If a server fails → load balancer sends traffic to other servers.

Failover can be:

  • Automatic (most common)
  • Manual (rare, used in critical financial systems)

4️⃣ Redundancy and Redundancy Strategies

Redundancy = having extra components so that if one fails, others take over.

Types of redundancy in system design:

🔵 N + 1 Redundancy

You have N working components + 1 spare.

Example:

  • 3 servers handle load
  • 1 standby server
  • If one fails → spare takes over

Used in: web servers, load balancers, power supplies.

🟢 Active–Active

All replicas are running simultaneously and handling traffic.

  • Load balancer distributes load across all nodes.
  • If one node fails, remaining nodes automatically take traffic.

Pros: High throughput, continuous availability Cons: Harder consistency, conflict resolution needed (DBs especially)

Used in:

  • Multi-region services
  • Redis Cluster
  • Cassandra, DynamoDB
  • Global load-balanced apps

🟡 Active–Passive (Hot Standby)

One node handles traffic, the other stays idle but ready.

Flow:

  1. Active node serves all requests
  2. Passive node replicates data
  3. If active fails → passive becomes active

Pros: Simple, predictable Cons: Passive node is underutilized Example: Primary–replica DB setup

5️⃣ Graceful Degradation

When the system is overloaded or partially failing, it continues working with reduced functionality instead of crashing.

Examples:

  • Instagram disables “Explore” feed during heavy load but lets users still scroll home feed.
  • Netflix reduces video quality (from 4K → HD → SD).
  • A ride-sharing app disables price history or analytics during peak traffic spikes.

Goal: Fail partially → NOT fully.

6️⃣ High Availability Patterns in Real-World Systems

Here are the most common HA architecture patterns used in big tech:

🟦 Pattern 1: Multi-AZ Deployment (Cloud)

  • Run app in multiple availability zones.
  • If AZ goes down → traffic automatically routes to another.

Used by: AWS RDS Multi-AZ, Kubernetes clusters, Netflix.

🟩 Pattern 2: Load Balancer + Multiple App Servers

Client → Load Balancer → Server1, Server2, Server3

If Server2 crashes, LB routes traffic to 1 and 3.

🟨 Pattern 3: Primary–Replica (Read Replicas)

  • Primary handles writes
  • Multiple replicas handle reads
  • If primary fails → auto failover

Used by: MySQL, PostgreSQL, MongoDB replica sets.

🟧 Pattern 4: Distributed Storage with Replication

Replicate data across:

  • multiple nodes
  • multiple racks
  • multiple data centers

Used by: HDFS, Cassandra, S3.

🟥 Pattern 5: Circuit Breakers & Timeouts

If a downstream service fails:

  • stop sending requests
  • return fallback results
  • avoid cascading failures

Used by: Netflix Hystrix, Spring Cloud.

7️⃣ Designing for Redundancy

To design a reliable, redundant system:

✔ Add multiple instances of every component

Servers, databases, caches, message brokers.

✔ Avoid single points of failure

One failing component should not take down the entire system.

✔ Distribute traffic using load balancers

Global (GSLB) + Local (internal LB).

✔ Replicate data

In DBs, storage, caches.

✔ Multi-zone and multi-region deployment

To survive entire data center failures.

✔ Add failover logic

Automated promotion, heartbeat checks, retries.

8️⃣ Health Monitoring and Self-Healing Systems

A system must detect, recover, and prevent failures automatically.

🌐 Health Monitoring

Tools & techniques:

  • Heartbeat signals
  • Health check endpoints (/health, /live, /ready)
  • CloudWatch, Prometheus, Grafana
  • Alerting on latency, CPU, memory, error rates

🛠️ Self-Healing

A self-healing system can fix itself:

🌀 Examples:

  • Kubernetes restarts crashed containers automatically.
  • Auto-scaling groups replace unhealthy EC2 instances.
  • Load balancer removes unhealthy nodes automatically.
  • Redis cluster automatically moves shards when nodes die.

🎯 Summary Table

Concept Meaning
High Availability System stays up most of the time
Fault Tolerance System continues working even after failures
Failover Automatic switch to healthy component
N+1 Extra standby unit
Active-Active All nodes active
Active-Passive One active, one standby
Graceful Degradation Reduced functionality during failures
HA Patterns LB, Multi-AZ, Replication, Circuit breakers
Self-Healing System auto-recovers from failure

🟦 Backup & Recovery Strategies (System Design Notes)

1️⃣ What is Backup & Recovery?

Backup

A backup is a copy of your data stored in a separate location so it can be restored if the original is lost or corrupted.

Recovery

Recovery is the process of restoring data from a backup after failure, corruption, or disaster.

2️⃣ Why is Backup Important?

Backups protect against:

  • Hardware failures
  • Human mistakes (accidental deletion)
  • Cyber attacks (ransomware, malware)
  • Natural disasters
  • Data corruption
  • Software bugs
  • Cloud region failures

Without backups, data loss = business loss.

3️⃣ Types of Backup

🟢 1. Full Backup

  • Complete copy of all data.
  • Slow & storage-heavy.
  • Best for weekly/monthly backups.

🔵 2. Incremental Backup

  • Backs up only data changed since the last backup.
  • Faster, lower storage.
  • Recovery is slower: you need all incrementals + last full backup.

🟡 3. Differential Backup

  • Backs up data changed since last full backup.
  • Larger than incremental but faster recovery.

🔴 4. Continuous Backup (CDP)

  • Real-time or near–real-time backup.
  • Used for financial systems, trading systems.

4️⃣ Recovery Types

🟢 1. File-Level Recovery

Restore individual files or folders.

🔵 2. System-Level Recovery

Restore entire servers, OS images, configuration.

🟡 3. Application-Level Recovery

Restore DBs, queues, caches, or application-level state.

🔴 4. Disaster Recovery (DR)

Restore full system in another region or data center.

Disaster recovery often includes:

  • Hot site (active-active)
  • Warm site (active-passive)
  • Cold site (infrastructure available but powered off)

5️⃣ Understanding RTO & RPO

🟦 RTO — Recovery Time Objective

“How fast should we recover after failure?”

Examples:

  • RTO = 0 sec → Active-active systems
  • RTO = 15 min → Critical apps
  • RTO = 4–8 hours → Internal tools

Lower RTO = more cost.

🟥 RPO — Recovery Point Objective

“How much data loss is acceptable?”

Examples:

  • RPO = 0 → Real-time replication
  • RPO = 1 hour → Hourly backup
  • RPO = 1 day → Daily backup

Lower RPO = more frequent backups = higher cost.

6️⃣ Trade-offs in Backup Strategies

Strategy Pros Cons
Full Simple to restore Expensive, slow
Incremental Fast backup, cheap Slow restore (many files)
Differential Faster restore Larger daily backups
Continuous Almost no data loss High cost, complex

Key trade-offs:

  • Cost vs Frequency
  • Backup speed vs Recovery speed
  • Storage cost vs RPO/RTO
  • Operational complexity vs reliability

7️⃣ Best Practices for Backup & Recovery

✅ 1. Follow the 3-2-1 Backup Rule

  • 3 copies of data
  • 2 different storage types
  • 1 copy offsite (e.g., cloud)

✅ 2. Use multi-region / multi-zone backups

Avoid single point of regional failure.

✅ 3. Test your backups regularly

A backup is useless if it cannot be restored.

✅ 4. Automate backups

No manual steps → fewer human errors.

✅ 5. Encrypt backups

Both at rest and in transit.

✅ 6. Version your backups

Protect against ransomware and accidental overwrites.

✅ 7. Use snapshots for fast recovery

  • EBS snapshots
  • RDS snapshots
  • VM/Container snapshots

✅ 8. Monitor backup success/failure

Alerts ensure backup failures are caught early.

🎯 Summary

Concept Meaning
Backup Copy of data stored elsewhere
Recovery Restoring data after failure
RTO How fast to recover
RPO How much data loss acceptable
Full/Incremental/Differential Core backup strategies
DR System-wide recovery plan
Best Practices 3-2-1 rule, automation, encryption, testing

Disaster Recovery in Practice

Why Disaster Recovery Matters

Disaster Recovery (DR) ensures a system can quickly recover after catastrophic failures such as:

  • Data center outages
  • Natural disasters
  • Cloud region failures
  • Cyberattacks or ransomware
  • Human operational mistakes

Without DR, systems face extended downtime, data loss, and business failure.

DR for Mission-Critical Applications

Mission-critical systems (banking, payments, healthcare, e-commerce) require:

  • Minimal downtime (low RTO)
  • Minimal data loss (low RPO)
  • Multi-region architectures
  • Continuous replication
  • Automated failover

These systems must remain operational even during regional failures.

Failover + Backup = True Resilience

  • Backups protect data (restore capability).
  • Failover provides continuous service availability.

Both are required because:

  • Backups alone → slow recovery, possible data loss
  • Failover alone → corrupted/compromised data may replicate instantly

A resilient system uses:

  • Real-time replication (for quick failover)
  • Periodic backups (for long-term recovery)

Testing and Automation

Disaster recovery is reliable only when regularly validated:

  • Automate failover/failback scripts
  • Run Chaos/DR drills (e.g., shutting down an entire region)
  • Test backup restore procedures
  • Validate RTO/RPO targets
  • Continuously audit recovery readiness

Automated DR reduces human error and speeds up recovery.

Challenges in Geo-Distributed Systems

Systems spanning multiple regions or continents face:

  • High network latency
  • Data consistency issues
  • Split-brain scenarios
  • Complex replication logic
  • Different regulatory requirements
  • Higher operational cost

Achieving strong consistency across regions is hard due to network unpredictability.

Geo-Redundancy & Quorum-Based Design

Geo-Redundancy

Deploying services/data across multiple regions ensures:

  • Region-level fault tolerance
  • Availability during disasters
  • Faster local access for users

Common patterns:

  • Active-active multi-region
  • Active-passive warm standby
  • Active-cold DR region

Quorum-Based Design

Used to maintain consistency in distributed systems:

  • A write succeeds only if a majority (quorum) of nodes acknowledge it.
  • Prevents split-brain and ensures durability.
  • Used by systems like Cassandra, MongoDB, Consul, Zookeeper.

Quorums help balance:

  • Consistency (enough nodes must agree)
  • Availability (system stays online despite failures)

🔐 Security in System Design

Security is a non-functional requirement but one of the most critical aspects of scalable system design. A system that scales but is not secure is useless.

1. Introduction to Security in System Design

Security in system design ensures that applications, servers, networks, and data are protected from adversaries, misuse, and failures.

In distributed systems (microservices, cloud apps), multiple components interact across networks — making the attack surface larger. Security must be integrated from day 1 of the architecture, not patched later.

2. Why Security Matters in System Design

Security is important because:

Protects sensitive data

User data, financial data, PII, tokens, passwords.

Prevents financial loss

DDoS, data breaches, fraud can cost millions.

Builds user trust

Users expect their data to be safe.

Ensures uptime

Attacks like DDoS and ransomware can shut down systems.

Compliance

GDPR, HIPAA, SOC2, PCI-DSS rules must be followed.

3. What is Security in Distributed Systems?

A distributed system has:

  • Multiple nodes
  • Multiple communication channels
  • Multiple storage layers
  • Shared responsibilities

This makes it vulnerable to:

  • Network attacks
  • Service impersonation
  • Traffic interception
  • Data corruption
  • Unauthorized access

Security in distributed systems = securing every layer:

  • Client layer
  • API layer
  • Network layer
  • Microservices
  • Databases
  • Caches
  • Queues
  • Storage
  • Infrastructure

4. The CIA Triad: The Core of System Security

Security is built on the CIA Triad:

🔒 C – Confidentiality

Ensure data is accessible only to authorized users. Techniques: Encryption, Access Control, TLS.

🔄 I – Integrity

Ensure data isn’t tampered with. Techniques: Hashing, Checksums, Digital signatures.

📦 A – Availability

Ensure the system is always up. Techniques: Redundancy, Rate-limiting, DDoS protection.

Together, these form the foundation of secure architecture.

5. Threat Modeling – Understanding Your Adversary

Threat modeling helps predict:

  • Who can attack?
  • Why will they attack?
  • How will they attack?

Use the STRIDE model:

Threat Type Stands For Example
S Spoofing Fake login, impersonation
T Tampering Modify API request
R Repudiation Deny performing an action
I Information Disclosure Unauthorized access
D Denial of Service DDoS attack
E Elevation of Privilege Normal user acting as admin

This helps design defenses early.

6. Common Attack Vectors in Systems

Attack vectors = ways attackers enter.

  • Weak authentication
  • Exposed APIs
  • Insecure cookies
  • Public S3 buckets
  • Misconfigured CORS
  • Outdated libraries
  • SQL injection points
  • No rate-limiting
  • Public admin panels

7. Common Attacks (Interview Essentials)

1. DDoS Attack

Goal: Exhaust your server resources. Solution: Rate limiting, CDN, WAF, load balancer.

🔌 2. MITM (Man-in-the-Middle)

Attacker intercepts data in transit. Solution: HTTPS + TLS.

💉 3. Injection Attacks (SQL/XSS/Command)

Attacker injects malicious code. Solution: Validation, ORM, sanitization.

🎭 4. Spoofing

Fake identity, fake IP, fake tokens. Solution: Strong auth, signatures.

8. Security in SDLC (Secure Development Life Cycle)

Security should be part of the dev cycle:

  1. Requirements: define security needs
  2. Design: threat modeling, review
  3. Implementation: secure coding
  4. Testing: penetration tests, static analysis
  5. Deployment: infra hardening
  6. Monitoring: alerts, logs

Tools: SonarQube, Snyk, OWASP ZAP.

9. Security Best Practices (Essential)

  • Use HTTPS everywhere
  • Validate every input
  • Enforce least privilege
  • Never store passwords in plain text
  • Rotate keys/tokens
  • Use secure headers (CSP, HSTS)
  • Enable MFA for admin access

🔐 Authentication & Authorization

10. Introduction

Auth is core to system security.

  • Authentication = "Who are you?"
  • Authorization = "What can you do?"

11. Common Authentication Methods

  • Username + Password
  • OTP / MFA
  • OAuth 2.0
  • JWT
  • API Keys
  • Refresh Tokens

12. Session-Based vs Token-Based Authentication

🔵 Session-Based

Stored on server memory/Redis. Great for traditional web apps.

🟠 Token-Based (JWT)

Stored on client side, stateless. Great for microservices & mobile apps.

13. Access Control Models

  1. RBAC – Role Based
  2. ABAC – Attribute Based
  3. PBAC – Policy Based
  4. MAC/DAC – Mandatory/Discretionary control

14. SSO & Identity Federation

  • SSO = One login for many services
  • SAML, OAuth2, OpenID Connect

Used by Google Login, GitHub Login.

🔐 Data Protection & Secure Communication

15. Why Data Protection Matters

To prevent:

  • Data breaches
  • Unauthorized access
  • Insider threats
  • Fraud

16. Encryption

Two types:

  • Symmetric (AES)
  • Asymmetric (RSA)

17. Encryption at Rest vs. Transit

  • At rest = Encrypt databases, files
  • In transit = HTTPS, TLS

18. Hashing & Salting Passwords

Never store raw passwords.

Use:

  • bcrypt
  • Argon2
  • PBKDF2

19. PKI (Public Key Infrastructure)

Allows secure communication using digital certificates.

Used by HTTPS, API signing, JWT validation.

20. Secure API Communication

  • Sign requests (HMAC)
  • Require tokens
  • Validate input/output
  • CORS restrictions
  • Use API gateways

🔐 Network & Infrastructure Security

21. Why Network Security Matters

A system is only as strong as its weakest network point.

22. Firewalls & Reverse Proxy

  • Firewall: Blocks suspicious requests
  • Reverse Proxy: Protects backend servers (E.g., Nginx, Envoy)

23. Rate Limiting, Throttling, IP Filtering

Used to prevent:

  • DDoS
  • Abuse
  • Bots
  • API overload

Tools: NGINX, API Gateway, Cloudflare.

24. Network Segmentation & Isolation

Separate network layers:

  • Public → Load Balancer → App → DB
  • Private VPC
  • Subnets
  • No DB exposure to internet

25. Zero Trust Security Model

"Trust nothing, verify everything."

  • Every request must be authenticated
  • Every device verified
  • No implicit trust inside network

26. Securing Cloud Environments

  • IAM roles
  • VPC security groups
  • Secret managers
  • Monitoring (CloudWatch, Datadog)

27. Securing Serverless & Containers

Containers:

  • Use minimal images
  • No root user
  • Scan images

Serverless:

  • Limit execution role
  • Validate input
  • Set timeout
  • Use VPC

28. Security in Microservices

Key challenges:

  • Many services → many attack points
  • Message queues
  • Internal APIs

Solutions:

  • Mutual TLS
  • API Gateway
  • Service mesh (Istio/Linkerd)
  • Isolation per service

29. OWASP Top 10 (Must Know)

  • Injection
  • Broken Auth
  • Sensitive Data Exposure
  • XXE
  • Access Control issues
  • Security misconfig
  • XSS
  • Insecure Deserialization
  • Using vulnerable libraries
  • Insufficient logging/monitoring

30. Conclusion

Security is not a feature — it’s a mindset. Design secure systems by integrating security:

  • At architecture level
  • At code level
  • At infra level
  • During operations

About

System Design, LLD, HLD, and Interview Preparation in Java, Python, Golang, C++, and JavaScript – with concepts, patterns, and case studies.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors