An in-depth guide to system design concepts, improving upon basic definitions with architectural patterns, trade-offs, and practical implementations.
The ability of a system to handle growing amounts of work by adding resources.
- Vertical Scaling (Scaling Up): Adding more power (CPU, RAM) to an existing machine.
- Pros: Simple, no code changes.
- Cons: Hardware limits, single point of failure (SPOF), expensive at high end.
- Horizontal Scaling (Scaling Out): Adding more machines to a pool of resources.
- Pros: Limitless theoretical scale, redundancy.
- Cons: Complexity (data consistency, distributed transactions).
The probability a system performs its required function correctly for a specified period of time.
- Resiliency: The ability to recover from failures.
- Redundancy: Eliminating single points of failure by having backups.
The proportion of time a system is functional and accessible.
- Calculated as:
Uptime / (Uptime + Downtime) - "Nines" of Availability:
- 99% (2 nines): ~3.65 days downtime/year
- 99.99% (4 nines): ~52 mins downtime/year
- 99.999% (5 nines): ~5 mins downtime/year
- Latency: Time taken to process a single request.
- Throughput: Number of requests processed per unit of time.
- Goal: High throughput and low latency.
In a distributed data store, you can only guarantee two of the three:
- Consistency (C): Every read receives the most recent write or an error.
- Availability (A): Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
- Partition Tolerance (P): The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network between nodes.
Reality: In distributed systems, P is unavoidable. You choose between CP (consistency over availability during partitions) or AP (availability over consistency).
An extension of CAP.
- If there is a Partition (P), how does the system trade off Availability and Consistency (A vs C)?
- Else (E) (no partition), how does the system trade off Latency and Consistency (L vs C)?
- Example: DynamoDB offers tunable consistency. You can choose stronger consistency (higher latency) or eventual consistency (lower latency).
- HTTP/HTTPS: Stateless, text-based (HTTP/1.1) or binary (HTTP/2, HTTP/3). Standard for web.
- TCP: Reliable, ordered delivery. 3-way handshake.
- UDP: Unreliable, unordered, faster. Used for streaming, gaming.
- REST: Resource-based, stateless, standard HTTP verbs. caching friendly.
- GraphQL: Client specifies exactly what data it needs. Solves over-fetching/under-fetching.
- gRPC: Built on HTTP/2, uses Protocol Buffers (binary). High performance, strongly typed, great for internal microservices.
- Short Polling: Client requests data frequently. Wasteful.
- Long Polling: Server holds request open until data is available. Better than short polling but distinct connections.
- WebSockets: Full-duplex, persistent connection over TCP. Best for chat, real-time gaming.
- Server-Sent Events (SSE): Unidirectional (Server -> Client) over HTTP. Good for news feeds, stock tickers.
Distributes incoming traffic across multiple servers.
- Layer 4 (Transport): Routes based on logic like IP + Port (TCP/UDP level). Very fast.
- Layer 7 (Application): Routes based on content (URL path, headers, cookies). Smarter but more CPU intensive.
Algorithms:
- Round Robin: Sequential.
- Weighted Round Robin: Sequential with capacity awareness.
- Least Connections: Sends to server with fewest active connections.
- Consistent Hashing: Use for caching LBs to minimize key redistribution.
- Forward Proxy: Sits between client and internet. Acts on behalf of client (e.g., VPN, bypassing firewalls).
- Reverse Proxy: Sits between internet and backend servers. Acts on behalf of server.
- Uses: SSL Termination, Load Balancing, Caching, Compression, Security.
| Feature | SQL (e.g., PostgreSQL, MySQL) | NoSQL (e.g., MongoDB, Cassandra, Redis) |
|---|---|---|
| Schema | Rigid, structured | Flexible, schemaless |
| Scaling | Vertical (mostly) | Horizontal (native) |
| Transactions | ACID compliant | Typically BASE (often per-document ACID) |
| Use Case | Complex queries, financial data | High throughput, unstructured data, analytics |
- ACID (Standard for SQL):
- Atomicity: All or nothing.
- Consistency: DB moves from one valid state to another.
- Isolation: Concurrent transactions don't interfere.
- Durability: Committed data is saved permanently.
- BASE (Standard for NoSQL):
- Basically Available: System guarantees availability.
- Soft state: State may change over time (even without input).
- Eventual consistency: System will become consistent over time.
Speed up reads at the cost of slower writes and storage space.
- B-Tree: Standard for relational DBs. Good for range queries and equality.
- LSM Tree (Log-Structured Merge-Tree): Optimized for heavy write workloads (used in Cassandra, Kafka).
- Bloom Filters: Probabilistic data structure to quickly test if an element is definitely not in a set. Used to avoid expensive disk lookups for non-existent keys.
Splitting data across machines.
- Horizontal: Rows distributed across nodes.
- Key-based:
hash(id) % N(Hard to rebalance). - Consistent Hashing: Maps keys and nodes to a ring. Minimizes data movement when nodes add/leave.
- Directory-based: Lookup service determines location.
- Key-based:
- Vertical: Columns distributed (e.g., storing images in blob storage, metadata in DB).
Temporary storage for frequently accessed data.
- Locations: Browser, CDN, Load Balancer, Application (local), Database (Internal).
- Cache-Aside (Lazy Loading): App checks cache. If miss, app fetches from DB and updates cache.
- Pros: Only requests data. Resilience (if cache fails, DB takes load).
- Cons: Stale data gap. Initial latency on miss.
- Write-Through: App writes to cache; cache writes to DB synchronously.
- Pros: Data consistency, cache always matches DB.
- Cons: Higher write latency.
- Write-Back (Write-Behind): App writes to cache; cache asynchronously writes to DB.
- Pros: Fast writes.
- Cons: Data loss risk if cache crashes before sync.
- LRU (Least Recently Used): Discard items not used for the longest time. Most common.
- LFU (Least Frequently Used): Discard items with lowest frequency count.
- TTL (Time to Live): Automatic expiration.
Decouple producers and consumers. Buffer bursts of traffic.
- Point-to-Point: One consumer gets the message (e.g., RabbitMQ work queues).
- Pub-Sub: Many consumers get the message (e.g., Kafka concepts, SNS).
- Decoupling: Services scale independently.
- Backpressure: Consumers process at their own pace.
- Durability: Messages persist until processed.
In a cluster, one node is designated as Leader to handle writes/coordination.
- Algorithms: Raft (most common in modern systems like etcd, Consul), Paxos (foundational but complex), ZAB (ZooKeeper Atomic Broadcast).
- Split Brain: Network partition creates two partial clusters, both electing a leader. Solved by Quorum (majority vote required - need >50% of nodes to agree).
- 2-Phase Commit (2PC): Coordinator asks "Can you commit?" -> Participants say "Yes" -> Coordinator says "Commit". Blocking, single point of failure.
- Saga Pattern: Sequence of local transactions. If one fails, execute compensating transactions (undo steps) in reverse order. Better for microservices.
- The property that an operation can be applied multiple times without changing the result.
- Crucial for retries.
- Implementation: Use a unique
idempotency-keyin requests. Server checks if key was already processed.
- Push: Nodes send "I'm alive" signals (heartbeats) to a central registry.
- Pull: Central registry pings nodes.
How services find each other's IP addresses in dynamic environments (Kubernetes).
- Client-Side: Client queries service registry (e.g., Eureka), then calls service.
- Server-Side: Client calls LB, LB queries registry and forwards.
Protecting resources from abuse.
- Token Bucket: Tokens added at rate
r. Request consumes token. Allows bursts. - Leaky Bucket: Request enters queue. Queue processed at constant rate. Smooths traffic.
- Fixed Window: Counters reset every minute. Can allow up to 2x limit at window boundaries.
- Sliding Window Log: Track timestamps. Accurate but expensive.
- Sliding Window Counter: Hybrid approximation.
If a service fails repeatedly, stop calling it ("open" the circuit) to prevent cascading failures and allow it to recover.
- States: Closed (Normal), Open (Fail immediately), Half-Open (Test if recovered).
Isolate resources into pools. If one subsystem fails (e.g., Image Processing), it doesn't exhaust connections for others (e.g., User Auth).
- Block Storage: Raw blocks (like a hard drive). High performance. (e.g., AWS EBS).
- File Storage: Hierarchical files/folders. Shared access. (e.g., AWS EFS, NAS).
- Object Storage: Flat structure with unique ID. Metadata heavy. Unlimited scale. Immutable objects. (e.g., AWS S3, Google Cloud Storage). Best for static assets, backups.