[Cost] Is there anything we can simplify do decrease cost?
1.2. Making decisions
All decisions, at a high level, are an optimisation for
Functionality (Profit)
Cost (Loss)
1.3. CAP Theorem
Characteristic
Description
Use Case
Consistency
All nodes in the system see the same data at the same time
Usually preferred for financial systems
Availability
System remains operational even if some nodes fail
Usually preferred for social media / streaming apps
Partition Tolerance
System remains operational even if network communication with some nodes fail
Non-optional because networks are not reliable, so the tradeoff is usually between C and A.
1.4. Tenancy
A tenant is a customer/organisation space with its own users, data, config
Single-tenant
Multi-tenant
Definition
One tenant per isolated stack
Multiple tenants per stack
Isolation
Strong
Weak
Per-tenant customisation
Easy
Harder
OpEx
Higher
Lower
Scale
Worse (under-utilised)
Better (pooling)
Compliance / Data residency
Easier
Harder (needs partitioning)
Onboarding Speed
Slower
Faster
1.5. Types of Development
Web
Frontend
Backend
Mobile
Game
Desktop
Embedded
DevOps
Data
ML / AI
Security
1.6. Infrastructure as a Service (IaaS) vs Platform as a Service (PaaS)
Approach
Use Case
Adv.
Disadv.
IaaS
Large-scale / custom apps
Flexibility, Pay-as-you-go
Setup & maintenance, steeper learning curve
PaaS
MVPs
Faster dev + CI/CD + easy deployment & scaling + security out of the box
Vendor lock-in, less flexibility
1.7. Compliance
1.7.1. PCI DSS
PCI DSS (Payment Card Industry Data Security Standard) is a security compliance standard governed by major card brands (Visa, Mastercard, Amex) relevant to credit/debit card data
Prefer using payment providers (e.g. Stripe) to avoid handling card data
If unavoidable:
Never store sensitive authentication data (CVV, PIN, track data)
Isolate the Card Data Environment (CDE) via network segmentation
Encrypt cardholder data in transit and at rest
Strict access control + audit logging for in-scope systems
1.8. Best Practices to Scale
Scaling Best Practices
Description
Reason
Exception
Stateless Compute
Keep biz logic compute stateless
Any instance can serve any request, add more instances to scale, replace instance in failure, easy load balancing
WS on the edge, HTTP/RPC at the core
HTTP/RPC are stateless, i.e. providing easier retries + load balancing + observability + timeouts
Idempotency
Repeating an operation has the same effect as doing it once
Pull-based backpressure is typically more forgiving than push-based
Duplexity: Initiation of communication + Sending of data + Concurrency
Duplexity
Who can initiate communication?
Who can send data?
Can both send at the same time?
Example
Use Case
Simplex
One side only
One side only
N/A
Webhooks
Event notifications
Half-duplex
Typically one side at a time (often client-first)
Both sides
No
HTTP
APIs
Full-duplex
Both sides
Both sides
Yes
WebSocket
Chat, collaboration
1.9. Message Distribution / Fanout Patterns
Key question: For one event,
who should receive it,
this determines fanout appraoch
how many recipients at peak
if >1000, avoid broadcast and look into group fanout strategies
Requests are routed based on a load balancer or scheduler
Workloads are heterogeneous, resource usage unpredicatable, fine-grained control over task placement
Assign request based on compute needs + Easy to add/remove nodes + Supports complex scheduling policies
Orchestrator / scheduler is SPOF + can be bottleneck
Static Partitioning
Requests are routed based on predefined ranges or affinity rules, e.g. ID range, location
Tasks are grouped logically
Low latency as no lookup is needed
Hotspots + manual rebalancing + difficult to add/remove nodes
Consistent Hashing
Requests are routed based on hash of request key
Stateless workloads, e.g. microservices, serverless, API gateways
Automatic load balancing + no load balancing
Range based tasks difficult + rebalancing required when nodes are added/removed
3.6. Logging
Avoid auto logging POST bodies and GET parameters
If the auto logging runs on auth endpoints, passwords could be written in plaintext to logs
3.7. Websockets
Single-Node
At high level design, a single-node WebSocket system can often handle up to ~10k concurrent connections, but to maintain a margin of safety, it’s reasonable to start thinking about distributed WebSocket systems above ~1k connections. At that point, distributed systems also bring benefits like better fault tolerance and operational robustness. When calculating costs
At a lower level, websocket soak test tools can be used to validate these assumptions by observing system behaviour over time (CPU/memory usage, message latency, connection health (success/lifetime/dropped), network egress), identifying which part of the system becomes a bottleneck and needs to be scaled. The goal at this stage is typically to meet some kind of SLO, e.g.:
99.9% of WebSocket messages delivered within 200ms
99% of API requests complete under 500ms
< 0.1% connection drops per hour
Distributed
Functionality: How do we ensure that messages get to the correct client?
Strategy
Description
Advantages
Disadvantages
Typical Use Case
Pub/Sub broadcast
Any instance publishes to a broker which broadcasts to all instances, instance holding the WS delivers, others drop
Simple, resilient to instance churn
Wasteful fan-out, message loss if nobody is listening
Small–medium clusters, low message volume
Connection registry + direct routing
Instances add {clientId → instance} in registry, sender looks up owner and forwards via RPC
Precise delivery, scales well
Registry correctness complexity, e.g. flapping ownership, more failure cases to handle
Large clusters, high throughput, real-time messaging
WebSocket gateway layer
Dedicated gateway owns all WS connections, compute instances send messages to gateway
Compute stateless, clean separation of concerns, simple delivery semantics
Stateful gateway tier, extra hop
High-scale systems, many short-lived compute instances
Robustness: How do we ensure reliable message processing and delivery over time?
Strategy
Description
Advantages
Disadvantages
Typical Use Case
Queue / Stream keyed by client
Messages placed in per-client or keyed queue, instance owning WS consumes and delivers
Durable, pull-based backpressure, supports retries/replays/offline delivery (because messages stay in log while client is offline)
Higher latency enqueueing/dequeueing than simple push with pubsub, ownership/rebalancing complexity, not true push (message is delivered when consumer polls, not at production time)
Systems needing durability, offline delivery, or replay
Client pull / reconnect catch-up
Client fetches pending messages from shared store on poll or reconnect
Extremely resilient; minimal server coupling
Higher latency; weaker real-time guarantees
Notifications, feeds, async workflows
How do we route clients to the same instance to reduce coordination?
Strategy
Description
Advantages
Disadvantages
Typical Use Case
Sticky sessions
Load balancer routes client to same instance based on hash/cookie
Very simple, reduces cross-instance routing
Breaks on instance failure, doesn’t guarantee ownership
Low churn systems, cost-sensitive setups
Consistent hashing ownership
All instances compute owner for clientId using membership + hash ring
Message is pushed to all subscribers with no persistence
Extremely fast; simple fan-out
No durability; subscribers must be online
Realtime websocket fan-out, multiplayer state updates
Durable pub/sub
Messages are persisted until acknowledged by subscribers
Survives subscriber crashes
Higher latency; storage cost
Critical event distribution, audit logs
3.8. Caching
Cache Read Strategies
Description
Adv.
Disadv.
Use Case
Read-thru
App reads cache -> on miss, cache reads from DB
Simplifies app logic
Stampede risk on hot keys + Tight coupling between cache and data store + Limited flexibility for custom fetch logic
Simple KV access
Cache Write Strategies
Description
Adv.
Disadv.
Use Case
Write-thru
App writes to cache -> cache writes to DB sync
Cache is consistent + Reads are fast after writes
Higher write latency + Cache outage blocks writes
Strong consistency / configuration data
Write-behind / back
App writes to cache -> cache writes to DB async
Very fast writes
Risk of data loss without durable buffering (queue / WAL required) + eventual consistency
High-throughput / analytics / logging / non-critical data
Cache Read/Write Strategies
Description
Adv.
Disadv.
Use Case
Cache-aside
App checks cache -> on miss, app reads from DB -> app writes to cache
Simple + cache only stores what is used
Stampede risk on hot keys + Harder to guarantee consistency under concurrent writes
Default choice for most BE systems / Read-heavy systems / microservices / web APIs
Cache-thru
Read-thru + Write-thru
Centralised data access
Cache is SPOF + Reduced observability + Harder debugging
Rare / legacy / strict data access boundaries
Cache Invalidation Strategies
Description
Adv.
Disadv.
Use Case
TTL-based
Cached entries expires after time
Simple invalidation + Prevents stale data buildup
Stampede risk on expiry + Hard to pick optimal TTL
Often combined with cache-aside / CDN caching
Event-based
Cache entries invalidated on data change events
Very fresh data + No guess work with TTL
Event loss or ordering issues can cause permanently stale cache + More moving parts
Event-driven / CQRS systems
Cache Placement Strategies
Description
Adv.
Disadv.
Use Case
Client / Browser
HTTP cache in browser
Zero latency and cost
Invalidation complexity
Static assets
CDN / Edge
Cache at edge e.g. CloudFront
Very fast + offloads backend
Auth + Invalidation complexity
Public content
In-app cache (L1)
In-process cache
Ultra-fast
Memory-bound, per-instance
Hot keys
Remote cache (L2)
Redis / Memcached
Shared across services
Network latency
Shared state
In data layer cache
DB buffer / query cache
Transparent
Limited control
Read-heavy DBs
Cache Distribution Strategies
Description
Adv.
Disadv.
Use Case
Single-node (L1)
Cache local to one instance
Simple
No sharing
Small apps
Distributed (L2)
Multiple caches, e.g. redis
Scales horizontally
Network latency + Op overhead
Microservices
Multi-level (L1/L2)
Local + Distributed
Best latency + scale
Complexity
High-scale systems
Cache Stampede Management Strategies
Description
Adv.
Disadv.
Use Case
Warmup
Prefill
4. C4: Code Design
4.1. Encoding
Encoding is used to serialise user facing data (text/image/audio/video) for storage / transport over the network.
Type
Description
Use Case
E.g.
Base32
32-character set encoding (A-Z, 2-7)
QR codes, OTP secrets
JBSWY3DPEBLW64TMMQ======
Base64
Represents binary data in ASCII
Images, API keys, JWT segments
SGVsbG8gd29ybGQ=
Base85
Represents binary data in ASCII
PDF
<~87cURD_*#TDfTZ)+T~>
URL
Makes data safe for URLs
URLs
%20 -> spaces
Hex
Represents binary as hex strings
0x12ab
ASCII / UTF-8
Maps chars as numeric codes
Text
65 -> "A"
Unicode (UTF-16, UTF-32)
Maps characters to numeric codes
Text (International)
U+4F60 -> "你"
4.2. Choosing a language for mobile app development
4.3. Choosing a language for frontend web development
Language
Use Case
Adv.
Disadv.
JS
Default
Natively supported - browsers come with JS engine
Single-threaded by default
Dart (compiled to JS)
Cross-platform
No UI interactivity
C/C++/Rust (through WASM)
3D graphics, gaming, video editing (e.g. Figma, Canva, AutoCAD Web)
High performance
No UI interactivity
Python (through WASM)
AI/ML in the browser
High performance, mature AI/ML ecosystem library
No UI interactivity
C (through Blazor WASM)
Existing .NET implementation
UI interactivity
Young ecosystem, large initial payload (downloads 6MB .NET runtime)
JS is the default choice as it is the only language that has direct access to the DOM to render UI.
4.4. Choosing a language / framework for backend web development
The choice of language for backend web development is tightly coupled to the language's runtime, libraries and frameworks as they provide key tradeoffs.
Language
Use Case
Adv.
Disadv.
Javascript
Real-time apps, typically preferred over php these days
Mature ecosystem, same language for FE and BE, great for concurrency (<10k users)
Not typed
PHP
Wordpress, CMS, e-commerce
Huge CMS ecosystem, powers wordpress
Process-per-request model limits real-time apps without extra tooling, js is typically preferred
'''
1. Recursively
- Adv.: Clean and intuitive
- Disadv.: Limited by recursion depth, stack overflow risk
'''defrecursive(root): iot(root) preOT(root) postOT(root)defiot(node):if node isNone:return iot(node.left) process(node) iot(node.right)defpreOT(node):if node isNone:return process(node) preOT(node.left) preOT(node.right)defpostOT(node):if node isNone:return preOT(node.left) preOT(node.right) process(node)'''
2. Iteratively
- Adv.: Robust for large or unbounded inputs
- Disadv.: Less intuitive and readable
'''defiot(root):if root isNone:return stack =[] node = root
while stack or node: go left as far as possible
while node: stack.append(node) node = node.left
node = stack.pop() process(node) stack.append(node.right)defpreOT(root):if root isNone:return stack =[root] switching this to a queue changes the DFS to BFS
while stack: node = stack.pop() process(node) push right first so left is processed first
if node.right: stack.append(node.right)if node.left: stack.append(node.left)defpostOT(root):if root isNone:return stack =[] lastNode =None node = root
while stack or node: go left as far as possible
if node: stack.append(node) node = node.left
continue at leftmost node,if candidate has right andisnot the last visited node, check right subtree
at
candidateNode = stack[-1]if candidateNode.right and lastNode != candidateNode.right: node = candidateNode.right
continue node = stack.pop() process(node) lastNode = node
node =None do not process node again
BFS
There are two ways to perform traversal:
Flat Traversal (FT)
Level-Order Traversal (LOT)
BFS is primarily done iteratively - it can be implemented recursively but there is no practical benefit.
defft(root):if root isNone:return queue = deque([root])while queue: node = queue.popleft() process(node)if node.left isnotNone: queue.append(node.left)if node.right isnotNone: queue.append(node.right)deflot(root):if root isNone:return queue = deque([root])while queue:for LOT, we just need to wrap the flat traversal logic in a for loop with levelSize iterations
levelSize =len(queue)for _ inrange(0,levelSize): same as flat traversal
Note:
You can also add metadata for each node by appending tuples (node, metadata) to the queue instead of just nodes
4.5.5. Array
How many times can I slide a window over an array?
Intuition
Start from the base case - window size 1
How many times can you slide it?
Increase window size
Formula
len(array) - windowSize + 1
4.5.6. Bitwise Operations
Operation
Application
Example
AND &
Get carry for binary addition of two numbers
1 & 1 = 1
AND &
Get last bit
10 & 1 = 0, 11 & 1 = 1
XOR ^
Get sum without carry for binary addition of two numbers
1 ^ 1 = 0
0 ^ 1 = 1
1 ^ 0 = 1
XOR ^
Find differences between two bit patterns
0110 ^ 1010 = 1100, i.e. different in first two bits
Bit Shift
Multiply/divide by 2
x = 2, x << 1 = 4, x >> 1 = 1
4.5.7. Dynamic Programming
Caching results for fibonacci-style recurrence
4.5.8. Binomial Theorem
Theory
The Binomial Theorem describes how to expand binomial expressions without brute force
Binomial Expression:
An expression formed from two terms,
e.g. (a+b)
Binomial Theorem Formula:
(x+y)n=∑k=0n(kn)xn−kyk
where (kn)≡nCk is the binomial coefficient a.k.a. combinations
Applications
The binomial coefficient can be used to describe symmetric number sequences, e.g. 1 4 6 4 1
4.5.9. Describing Symmetry
Linear Symmetry
Combinations / Binomial Coefficient
Modulus
Even Functions
Cosine
Rotational Symmetry
Odd Functions
Sine
5. Concrete Knowledge
BFF: Backend for Frontend
GET /dashboard instead of GET /users + GET /orders + GET /recommendations
5.1. JavaScript
Engines
V8 (Chrome)
SpiderMonkey (Firefox)
JavaScriptCore (Safari)
Hermes (React Native)
Runtimes
Node
V8 engine
Adv.
Mature ecosystem
Safest bet
Disadv.
Slower
Security via containers/OS policies
Deno
V8 engine
Adv.
Like node but faster
Disadv.
Mostly compatible with node modules
Security via containers/OS policies
Bun
JavaScriptCore engine
Adv.
Sandboxed
Disadv.
Least compatibility with node modules
5.2. CPU Optimisations
Branch Prediction
Variable reassignment
CPU Pipelining
CPU Preloading
CPU Prefetching
Cache Locality
Memory Access Patterns
5.3. Language Optimisations
Peephole Optimisations
Inline
Unroll
5.4. Operating Systems
Stack Size
Linux: 8MB
macOS: 8MB
Windows: 1MB
5.5. Recursion Depth Limits
C++: 100,000
Depends on frame size + OS stack size
Dart: 10,000
Set by default
JS: 10,000
(V8 engine/chrome)
Depends on
Java: 1,000
Depends on frame size + OS stack size
Python: 1,000
Set by default
Typically uses a conatiner or VM
5.5.1. Data Types
Objects are immutable, files are not
Files are accessed via filepath, objects are accessed via API+key
Can span multiple connections, until either peer terminates the session
Signaling: Session Management Signaling is the process of setting up, managing, and tearing down a communication session before real-time data flows. Signaling encompasses multiple processes:
Session Setup
Codec Negotiation
Process where two peers agree on a common codec for audio/video during signaling
NAT Traversal
Techniques + Protocols that allow devices behind NAT to communicate directly
There are three main techniques
Session Traversal Utilities for NAT (STUN)
Device asks STUN server "What's my public IP:port?"
Device shares info with other peer (P2P)
Works only if NAT keeps mappings stable
Traversal Using Relays around NAT (TURN)
Both devices send media to a TURN server
Used as fallback if direct P2P fails
Higher latency + server bandwith cost
Interactive Connectivity Establishment (ICE)
Gathers candidates
Private IP:port
Public IP:port from STUN
Relay addresses from TURN
Tries all possible paths
Picks the fastest, lowest-latency route
Encryption keys exchange
Exchange session metadata
5.9. HTTP
There are 3 main versions of HTTP being used
Version
Description
Adv
Disadv
Use Case
1.1
Most widely supported
Simple, easy to debug, universally compatible
One request per connection -> head-of-line blocking -> higher latency, more open connections = higher infra cost
Legacy, IoT
2
Multiplexed streams over one TCP connection
Big improvements in latency and throughput over HTTP/1, fewer connections per client, required for gRPC
Head-of-line blocking if packet loss occurs, more complex load balancing
gRPC
3
Runs over QUIC (UDP)
Lowest latency
Less mature, harder debugging, firewalls may block UDP
Mobile / unstable networks
Modern clients auto-negotiate best protocol via Application Layer Protocol Negotiation (ALPN)
client says “I support h2, http/1.1, h3”, server picks one
5.10. Transmission Control Protocol (TCP)
Lossless
5.11. User Datagram Protocol (UDP)
Lossy
5.12. Quick UDP Internet Connections (QUIC)
UDP at Transport Layer + Reliability at App Layer
5.13. Which transport protocol
5.14. Distributed Websocket Approaches
Distributed Websocket Approach
Description
Adv
Disadv
Use Case
Sticky Sessions
GWLB pins client to specific gateway instance, e.g.
Simple
Poor rebalancing
Small systems
Connection Registry + Targeted Routing
clientId -> gatewayId KV lookup
Efficient 1:1
Chat, notifications
Pubsub
Broker fans out messages
Feeds, one event must notify many recipients immeidately
Queue
Mechanism
What it stores
Strength
Weakness
Use Case
Connection registry
connectionId → gatewayId
Precise routing
Needs cleanup
1:1 messaging
Group registry
groupId → [connectionId]
Controlled fanout
Large groups expensive
Chat rooms
Pub/sub broker
topic → subscribers
Massive fanout
Coarse routing
Broadcast feeds
Identifier
Description
Advantages
Disadvantages
Typical Use Case
Connection ID
Server-generated unique ID for each WebSocket connection; changes on every reconnect
Precise 1:1 mapping to an actual socket; ideal for ownership, fencing, and liveness
Ephemeral; not useful for user-level routing or grouping
Message delivery, connection ownership, detecting stale connections
Client ID (User ID)
Logical identifier for a user or client across devices/sessions
Stable identity; good for authorization and grouping
Too coarse: one client can have many connections; unsafe for delivery
Send to all user devices, auth checks, user-level fan-out
Session ID
Identifier for a login session or browser/app context
Helps replace old connections; supports “last session wins” semantics
Still not 1:1 with sockets; session handling adds complexity
Enforcing single active session, reconnect fencing
Channel / Topic / Room ID
Logical grouping that connections subscribe to
Clean abstraction for broadcast and fan-out; decouples sender from connections
Requires subscription management; not tied to identity
Chat rooms, game lobbies, collaborative documents
Device ID
Stable identifier per physical device
Useful for presence, multi-device sync, fallback delivery
Privacy concerns; not always available or reliable
Push notification routing, device-specific state
Instance ID
Identifier of the compute instance holding the connection
Useful for routing and debugging; enables direct forwarding
Changes with churn; not meaningful at business level
Client reconnect typically handled by client ws library
Stale connection registry
TTL + Heartbeat allows stale data to be cleared from the registry
Message loss
Memory leaks
Slow clients
5.15. WebRTC
Frameworks
Web Real-Time Connection (WebRTC)
Open source framework for P2P RTC
Components
Signaling
Media Capture
Media Transport
Encryption
NAT Traversal
Adaptive Quality
Data Channels
Signaling Protocols
Session Initiation Protocol (SIP)
Set up, modify, tear down real-time sessions for voice/video/messaging
Monitoring Protocols
Real-time Transport Control Protocol (RTCP)
Measures network performance metrics for RTP
Security Protocols
Transport Layer Security (TLS)
Secures TCP
Datagram Transport Layer Security (DTLS)
Secures UDP
i.e. TLS for UDP
Transport Protocols
Real-time Transport Protocol (RTP)
Transports real-time media (audio/video)
Rides on UDP, sometimes TCP
Secure Real-time Transport Protocol (SRTP)
Encrypted RTP
Uses DTLS for key exchange
RTCP
Network Address Translation (NAT)
NAT Devices
Home Routers
Corporate Firewalls
Vanilla NAT
1:1 mapping between private IPs to public IPs (e.g. 192.168.0.1 (private) : 203.0.113.1 (public))
Provides control over private IP ranges
Single source of truth for configuring public/private IP mappings (e.g. ISP changes IP allocations)
Port Address Translation (PAT) a.k.a NAT Overload
1:many mapping between private IPs to public IPs by using ports as well
e.g.
192.168.0.10:52301 -> 203.0.113.7:40001
192.168.0.11:52301 -> 203.0.113.7:40002
Workaround to IPv4's small address space, not needed in IPv6 where 1:1 mappings are encouraged
Firewall
Decides which packets are allowed/blocked
Lives between private network and public internet
Typically blocks incoming connections, not outgoing
Corporates typically block UDP entirely because the lack of handshakes make it hard for firewalls to understand the session state If asked: “How would you design WhatsApp voice calls?” • Signaling: WebSockets (or SIP for enterprise VoIP). • Transport: RTP/SRTP for media. • NAT traversal: STUN + TURN fallback. • Encryption: SRTP end-to-end. • QoS handling: Adaptive bitrate, jitter buffer.
If asked: “How does WebRTC work?” • WebRTC = framework, uses: • Signaling (custom, often WebSocket) • RTP/SRTP for audio/video streams • STUN/TURN for NAT traversal • DTLS/SRTP for security • Adaptive bitrate + codec negotiation.
If asked: “How does VoLTE differ from WhatsApp?” • VoLTE → Managed SIP + RTP inside carrier network, guaranteed QoS, low jitter. • WhatsApp → WebRTC over the public Internet, no QoS guarantees.