Cheat sheets for software development and engineering. If software were dishes, developers are the chefs who prepare them and engineers are the architects who design the kitchen.

1. Engineering

How to design software that is robust, scalable, and efficient.

1.1. Data Structures & Algorithms

How to solve problems with code.

1.1.1. Methods to Reinterpret Problems

Create formula and see if shifting variables around can simplify solution

1.1.2. Modulo

Application	Modulo by	Example
Get n trailing digits	10^n	1234 % 100 = 34
Check even/odd	2	isEven = x % 2 == 0
Get value of bit after addition	2	(1 + 1) % 2 = 0
		(0 + 1) % 2 = 1
		(0 + 0) % 2 = 0
Check divisible by n	n	isXDivisibleByN = x % n == 0

1.1.3. Floor Division

Application	Denominator	Example
Remove n trailing digits	10^n	12345 // 100 = 123
Get carry over bit after addition	2	(1 + 1) // 2 = 1
		(0 + 1) // 2 = 0
		(0 + 0) // 2 = 0
Get midpoint of any array ([0,1,2] [0,1,2,3])	2	midpoint = len(arr) // 2

1.1.4. Binary Trees

Sizes

no. of nodes: $n$
height of tree: $log_x n$ $l o g_{x} n$ ,
- where $x$ is for a $x$ -ary tree
width of tree: $2^x$ $2^{x}$
- where $x$ is the level of the tree for which you want the width

How to navigate a Tree

There are two methods of navigating a tree: Depth-First Search (DFS) and Breadth-First Search (BFS)

DFS

There are three ways to perform traversal:

In-Order Traversal (IOT) -> left, node, right
Pre-Order Traversal (PreOT) -> node, left, right
Post-Order Traversal (PostOT) -> left, right, node

There are two ways to implement DFS:

'''
1. Recursively
    - Adv.: Clean and intuitive
    - Disadv.: Limited by recursion depth, stack overflow risk
'''

def recursive(root):
    iot(root)
    preOT(root)
    postOT(root)

def iot(node):
    if node is None:
        return

    iot(node.left)
    process(node)
    iot(node.right)

def preOT(node):
    if node is None:
        return

    process(node)
    preOT(node.left)
    preOT(node.right)

def postOT(node):
    if node is None:
        return

    preOT(node.left)
    preOT(node.right)
    process(node)

'''
2. Iteratively
    - Adv.: Robust for large or unbounded inputs
    - Disadv.: Less intuitive and readable
'''

def iot(root):
    if root is None:
        return
        
    stack = []
    node = root

    while stack or node:
         go left as far as possible
        while node:
            stack.append(node)
            node = node.left
        
        node = stack.pop()
        process(node)
        stack.append(node.right)

def preOT(root):
    if root is None:
        return 
    
    stack = [root]  switching this to a queue changes the DFS to BFS
    while stack:
        node = stack.pop()
        
        process(node)

         push right first so left is processed first
        if node.right:
            stack.append(node.right)
        if node.left:
            stack.append(node.left)


def postOT(root):
    if root is None:
        return
    
    stack = []
    lastNode = None
    node = root

    while stack or node:
         go left as far as possible
        if node:
            stack.append(node)
            node = node.left
            continue
        
         at leftmost node, if candidate has right and is not the last visited node, check right subtree
         at 
        candidateNode = stack[-1]
        if candidateNode.right and lastNode != candidateNode.right:
            node = candidateNode.right
            continue

        node = stack.pop()
        process(node)
        lastNode = node
        node = None  do not process node again

BFS

There are two ways to perform traversal:

Flat Traversal (FT)
Level-Order Traversal (LOT)

BFS is primarily done iteratively - it can be implemented recursively but there is no practical benefit.


def ft(root):
    if root is None:
        return
    
    queue = deque([root])

    while queue:
        node = queue.popleft()

        process(node)

        if node.left is not None:
            queue.append(node.left)
        if node.right is not None:
            queue.append(node.right)

def lot(root):
    if root is None:
        return

    queue = deque([root])

    while queue: 
         for LOT, we just need to wrap the flat traversal logic in a for loop with levelSize iterations
        levelSize = len(queue)
        for _ in range(0,levelSize):
             same as flat traversal

Note:

You can also add metadata for each node by appending tuples (node, metadata) to the queue instead of just nodes

1.1.5. Array

How many times can I slide a window over an array?

Intuition
- Start from the base case - window size 1
  - How many times can you slide it?
- Increase window size
Formula
- len(array) - windowSize + 1

1.1.6. Bitwise Operations

Operation	Application	Example
AND &	Get carry for binary addition of two numbers	1 & 1 = 1
AND &	Get last bit	10 & 1 = 0, 11 & 1 = 1
XOR ^	Get sum without carry for binary addition of two numbers	1 ^ 1 = 0
		0 ^ 1 = 1
		1 ^ 0 = 1
XOR ^	Find differences between two bit patterns	0110 ^ 1010 = 1100, i.e. different in first two bits
Bit Shift	Multiply/divide by 2	x = 2, x << 1 = 4, x >> 1 = 1

1.1.7. Dynamic Programming

Caching results for fibonacci-style recurrence

1.1.8. Binomial Theorem

Theory

The Binomial Theorem describes how to expand binomial expressions without brute force
- Binomial Expression:
  - An expression formed from two terms,
  - e.g. $(a + b)$
- Binomial Theorem Formula:
  - $(x+y)^n = \sum_{k=0}^{n} \binom{n}{k} x^{n-k} y^{k}$ $(x + y)^{n} = \sum_{k = 0}^{n} (k n) x^{n - k} y^{k}$
    - where $\binom{n}{k} \equiv {}^{n}C_k$ is the binomial coefficient a.k.a. combinations

Applications

The binomial coefficient can be used to describe symmetric number sequences, e.g. 1 4 6 4 1

1.1.9. Describing Symmetry

Linear Symmetry
- Combinations / Binomial Coefficient
- Modulus
- Even Functions
- Cosine
Rotational Symmetry
- Odd Functions
- Sine

1.2. System Design

How to design scalable and efficient systems.

1.2.1. Encryption / Decryption with Keys

There are two types of encryption/decryption patterns

Key Type	Description	E.g.	Adv	Disadv	Use Case
Symmetric	Private key is shared, i.e. one key for both encryption and decryption	AES	Computationally faster	Hard to distribute	Bulk data ancryption (disks, HTTPS session data, VPNs)
Asymmetric	Public/private key is set up, i.e. two keys	RSA, ECDSA	Easier to distribute	Computationally slower	Key exchange, digital signatures, SSL/TLS handshake, email encryption

Public and private keys are used for two main purposes:

Key Use Case	Private Key	Public Key / Shared Private Key
Message Authentication and Integrity (Digital Signatures)	Sign message	Verify message came from sender (authentication) + Ensure message wasn't modified in transit (integrity)
Message Confidentiality	Decrypt message	Encrypt message

1.2.2. API Architectural Styles

The three main type of API architectural style are REST APIs, RPC APIs, and GraphQL APIs.

Style	Description	Use Case	Adv	Disadv
REST, e.g. express.js, Spring Boot, Flask, Fast	Perform HTTP verbs on resources. Entity based, e.g. POST /users	Most common	Universally understood + docgen tools e.g. Swagger, OpenAPI	Slowest - One request for each entity unlike GraphQL + less space efficient than RPC
GraphQL, e.g. Apollo	Query or mutate entities. Entity based, e.g. mutation CreateUser() {...}	APIs for FE	Faster - One request for multiple entities	More setup e.g. defining the schema, resolvers + less standardised docgen e.g. GraphiQL
RPC, e.g. gRPC + Protobuf	Call functions remotely. Action based, e.g. await client.createUser()	Internal APIs	Fastest and most space efficient because it uses binary instead of text payloads	Only for internal use unless public clients uses same set up

1.2.3. Databases

Paradigm	Examples	Use Case	Adv	Disadv
SQL	PostgreSQL, MySQL, MSSQL	Structured relationships + strong consistency e.g. financial data	Powerful Querying + ACID	Slower writes due to B-Trees, slower reads/writes due to stronger consistency/locks,
Key-Value	RocksDB, DynamoDB, Cassandra	High-throughput writes, caching	Extremely fast writes + BASE	Slower writes due to LSMT
Document	MongoDB, Firestore	Semi-structured JSON-like data, e.g. mobile/web apps	Flexible schema + BASE	Slower writes due to LSMT
Columnar	Cassandra	Time series data, e.g. analytics, event logging	Fast on columnar queries, aggregations	Slower writes due to LSMT
Graph	Neo4j	Social graphs, recommendation engines	Optimised for graph traversal and relationship modeling	Limited for heavy aggregations
TypeDB		Complex knowledge graphs, strongly typed and structured relationships		Small eco system

1.2.4. Scaling

Type	Principle	Use Case	Adv	Disadv
Vertical	Upgrading CPU/RAM/Storage	Small to medium apps, monolithic systems, startups	No code change + lower latency	Limited by hardware ceilings + expensive at scale + SPOF
Horizontal	Adding more servers	Distributed systems	Fault tolerance via redundancy + Infinite scalability	Network latency + Higher complexity

Types of horizontal scaling:

Database Horizontal Scaling
Compute Horizontal Scaling

Database Horizontal Scaling, i.e. sharding

Type	Principle	Use Case	Adv	Disadv
Directory/Lookup-based	Shard where data belongs depends on manually maintained directory	Frequently changing shards / manual control	Easy to add / remove shards	Directory is a SPOF, lookup adds latency
Range-based	Shard where data belongs depends on which contiguous key ranges (e.g. A-F, G-L, ...)	Time-series data, ordered data, range queries	Efficient for range queries + simple to implement	Data skew possible, hotspots risk
Hash-based	Shard where data belongs depends on hash of key	High-write, evenly distributed workloads	Good load balancing, no need to manage ranges	Range queries inefficient, rebalancing expensive

Compute Horizontal Scaling

Type	Principle	Use Case	Adv	Disadv
Centralised Load Balancing / Orchestrator-based Scheduling	Requests are routed based on a load balancer or scheduler	Workloads are heterogeneous, resource usage unpredicatable, fine-grained control over task placement	Assign request based on compute needs + Easy to add/remove nodes + Supports complex scheduling policies	Orchestrator / scheduler is SPOF + can be bottleneck
Static Partitioning	Requests are routed based on predefined ranges or affinity rules, e.g. ID range, location	Tasks are grouped logically	Low latency as no lookup is needed	Hotspots + manual rebalancing + difficult to add/remove nodes
Consistent Hashing	Requests are routed based on hash of request key	Stateless workloads, e.g. microservices, serverless, API gateways	Automatic load balancing + no load balancing	Range based tasks difficult + rebalancing required when nodes are added/removed

1.2.5. CAP Theorem

Consistency: All nodes in the system see the same data at the same time
Availability: System remains operational even if some nodes fail
Partition Tolerance: System remains operational even if network communication with some nodes fail
In a distributed system, you can only achieve two out of CAP.
P isn't optional because networks are not reliable, so the tradeoff is usually between C and A.
C is usually preferred for financial systems
A is usually preferred for social media / streaming apps

1.2.6. Authentication

There is a trade-off betweeen safety and convenience
Best practise to use a pre-built library, but understanding the principles is helpful in system design
Authentication: verifying identity
Authorisation: checking permissions

1.2.6.1. Authentication

Transporting Passwords

Use HTTPS for password submissions
Avoid logging raw credentials

1.2.6.2. Authentication Methods

Method	Use Case
Username + Password
Username + Password + 2FA
SSO
Custom-built SSO
Securing Passwords

Hashing
- Passwords should be stored as irreversible cryptographic hashes
Salting
- A random, user-specific unique value (salt) is added to the plain-text password before hashing, which is stored in plaintext in the database
- Prevents
  - two users with the same passwords from getting the same hash
  - hackers using rainbow tables (precomputed mappings of common passwords -> hashes)
Peppering
- A random, global value (pepper) is added to the plain-text password before hashing, which is stored as an env variable on the server
- An additional layer of security on top of salting

1.2.6.3. Proof of Authentication a.k.a access tokens

After a user is authenticated, a token needs to be stored on the client
There are two main types of tokens used: session tokens and JWTs

	Session Token	JWTs
Structure	Random opaque string, e.g. `b8c9d7f1e6a24f38b1d80b7d849d3e4e`	Structured base64-encoded JSON object e.g. `<header hash>.<payload hash>.<signature hash>`
Data access	Client cannot read it, server must retrieve data for client	Client can decode payload easily, e.g. `{ "email" : "...", "iat": 1665385660, "roles": ["admin"] }`
Where data lives	In the backend (server/db/cache) alongside the token	Inside the token
Generation	Server uses cyrpotgraphically secure RNG	Builds JSON payload and signs it
Verification	Server checks that client token string matches	Server verifies signature with public key
Revocation	Easy - Delete from backend (server/db/cache)	Hard - Blacklist / short expiry
Transport	Authorization header + HttpOnly + Secure + SameSite=Strict	Authorization header + HttpOnly + Secure + SameSite=Strict
Client-side Storage	Cookies	Cookies
Server-side Storage	In the backend (server/db/cache)	n.a.
Use Case	Monolithic app	Distributed services, OAuth

1.2.6.4. Refresh Tokens

Clients can be provided with a refresh token that is used to refresh access tokens
Access tokens should be short-lived (minutes)
Refresh tokens can be long-lived (hours/days/weeks)
Adv
- Reduced exposure
- Centralised control if using JWT access tokens and session refresh tokens

1.2.7. Authorisation

Access Control Approach	Principle	Use Case
Role-Based (RBAC)	Users -> Roles -> Permissions	Easiest to implement / reason about
Attribute-Based (ABAC)	Permission based on user attributes, e.g. `user.department == doc.department and time < 18:00`	Highly customisable
Relationship-Based (ReBAC)	Permissions via graph relations, e.g. `editor of project X`	Collaboration apps
Scope-Based (SBAC)	Users -> Scope -> Permissions, e.g. `contacts.read`	OAuth

1.2.8. Where to authenticate and authorise

Authenticate token in	Adv	Disadv	Use Case
App	Most flexible (custom logic, fine grained checks)	Adds latency per request	Authorisation
Gateway	Offloads auth early, blocks bad traffic before app	Less flexible	Basic checks
Load Balancer	Centralised	Limited to basic checks (signature, expiration)	Basic checks

1.2.9. Tenancy

A tenant is a customer/organisation space with its own users, data, config

	Single-tenant	Multi-tenant
Definition	One tenant per isolated stack	Multiple tenants per stack
Isolation	Strong	Weak
Per-tenant customisation	Easy	Harder
OpEx	Higher	Lower
Scale	Worse (under-utilised)	Better (pooling)
Compliance / Data residency	Easier	Harder (needs partitioning)
Onboarding Speed	Slower	Faster

1.2.10. Logging

Avoid auto logging POST bodies and GET parameters
- If the auto logging runs on auth endpoints, passwords could be written in plaintext to logs

1.2.11. Sandboxing

1.2.12. Encoding

Encoding is used to serialise user facing data (text/image/audio/video) for storage / transport over the network.

Type	Description	Use Case	E.g.
Base32	32-character set encoding (A-Z, 2-7)	QR codes, OTP secrets	`JBSWY3DPEBLW64TMMQ======`
Base64	Represents binary data in ASCII	Images, API keys, JWT segments	`SGVsbG8gd29ybGQ=`
Base85	Represents binary data in ASCII	PDF	`<~87cURD_*#TDfTZ)+T~>`

URL	Makes data safe for URLs	URLs	`%20` -> spaces

Hex	Represents binary as hex strings		`0x12ab`

ASCII / UTF-8	Maps chars as numeric codes	Text	`65` -> "A"
Unicode (UTF-16, UTF-32)	Maps characters to numeric codes	Text (International)	`U+4F60` -> "你"

1.3. Concrete Knowledge

BFF: Backend for Frontend
- GET /dashboard instead of GET /users + GET /orders + GET /recommendations

1.3.1. JavaScript

Engines

V8 (Chrome)
SpiderMonkey (Firefox)
JavaScriptCore (Safari)
Hermes (React Native)

Runtimes

Node
- V8 engine
- Adv.
  - Mature ecosystem
  - Safest bet
- Disadv.
  - Slower
  - Security via containers/OS policies
Deno
- V8 engine
- Adv.
  - Like node but faster
- Disadv.
  - Mostly compatible with node modules
  - Security via containers/OS policies
Bun
- JavaScriptCore engine
- Adv.
  - Sandboxed
- Disadv.
  - Least compatibility with node modules

1.3.2. CPU Optimisations

Branch Prediction
Variable reassignment
CPU Pipelining
CPU Preloading
CPU Prefetching
Cache Locality
Memory Access Patterns

1.3.3. Language Optimisations

Peephole Optimisations
Inline
Unroll

1.3.4. Operating Systems

Stack Size
- Linux: 8MB
- macOS: 8MB
- Windows: 1MB

1.3.5. Recursion Depth Limits

C++: 100,000
- Depends on frame size + OS stack size
Dart: 10,000
- Set by default
JS: 10,000
- (V8 engine/chrome)
- Depends on
Java: 1,000
- Depends on frame size + OS stack size
Python: 1,000
- Set by default

1.3.6. Typical Cloud Infrastructure

Layer	Component	Description	Use Case	E.g.
Edge	DNS	Resolves domain name		AWS Route53, GCP DNS
	CDN	Caches static content for low-latency		AWS CloudFront
	WAF / DDos Protection	Protect from malicious acts		AWS WAF
	Application Load Balancer	Distributes traffic to apps using HTTP info		AWS ALB
	Network Load Balancer	Distributes traffic to apps using TCP/UDP info		AWS NLB
	Gateway Load Balancer	Distributes traffic to third party security/network applicances using TCP/UDP info		AWS GWLB
	Global Load Balancer	Distributes traffic geographically		AWS ELB

Gateway	Gateway	Routing to different services, security		AWS API Gateway
		Protocol Translation (HTTP to gRPC, REST to GraphQL)
		Aggregation (Compose multiple backend calls into one)

App	App Servers		Steady high throughput, long-lived connections, heavy local state, custom networking, predictable workloads, higher memory/CPU/GPU, strict latency floors	AWS ECS, AWS EKS
	Serverless		Spiky demand, low ops overhead, pay-per-use	AWS Lambdas
Data Proxy	Database Proxy	Manages a pool of persistent connections to the DB	>10k client connections	AWS RDS Proxy

Data	Relational DBs			AWS RDS, AWS Aurora (Serverless RDS)
	Document DBs			DynamoDB
	KV DBs / Caches			AWS ElastiCache (Redis)
	Object Storage			AWS S3
	DSQL	Distributed SQL Query Engine	Query large-scale data across object storage/ data lake with SQL	AWS Athena
	Data Lake	Centralised storage for raw data	analytics, ML workloads, batch processing	AWS Lake Formation, Iceberg on S3

Networking	VPC	Isolated virtual network for cloud resources	Define public/private subnets, control routing, isolation, multi-tier deployments	AWS VPC
	Subnets	Segments inside a VPC	Controls traffic flow and exposure of resources (e.g. public ALB, private DB)	AWS Subnets
	Security Groups	Virtual firewalls attached to resources	Control traffic at instance/service level	AWS Security Groups

Observability	Logging	Collect, aggregate and index logs from all services		AWS CloudWatch
	Monitoring / Metrics	Monitor resource usage, uptime, etc.		AWS CloudWatch
	Tracing	Traces request flow across different services		AWS X-Ray
DevOps	CI/CD			AWS CodeBuild, GitHub Actions

1.3.7. Networking Model

There are two main models that are used in the industry today:

Open Systems Intercommunication (OSI) model
1. Abstract: Typically used to discuss concepts
TCP/IP model
1. Concrete: This is what is used in the internet today

OSI Layer	Name	Purpose	TCP/IP Layer	Data Unit	Examples
7	Application	User Apps	Application	Data	Zoom, WhatsApp, Teams
		App Protocols			HTTP, WebSockets, WebRTC, SIP, DNS, WebRTC API, WebRTC Signaling, DNS, gRPC, RTP/SRTP
6	Presentation	Data formatting			JSON, XML Protobuf
6	Presentation	Encoding & Compression			JPEG, MP3, H.264, gzip
6	Presentation	Encryption			TLS, DTLS, SSL, SRTP,
5	Session	Manage session lifecycle			NetBIOS, RPC, WebRTC session setup
4	Transport	Reliable/unreliable delivery, multiplexing, manage connections	Transport	Segment (TCP) / Datagram (UDP)	TCP, UDP, QUIC
3	Network	Routing, addressing	Internet	Packet	IP, ICMP, BGP
2	Data Link	Framing, error detection	Link	Frame	Ethernet, Wi-FI MAC, PPP, 5G NR
1	Physical	Raw bits over a medium		Bits	Fiber, RF, copper, modulation

1.3.8. Sessions and connections

Definition

	Connection	Session
Layer	Transport	Application
Definition	A channel between two peers	A context between two peers
Lifespan	Exists only while data flows on the transport	Can span multiple connections, until either peer terminates the session

Signaling: Session Management Signaling is the process of setting up, managing, and tearing down a communication session before real-time data flows. Signaling encompasses multiple processes:

Session Setup
Codec Negotiation
Process where two peers agree on a common codec for audio/video during signaling
NAT Traversal
Techniques + Protocols that allow devices behind NAT to communicate directly
There are three main techniques
1. Session Traversal Utilities for NAT (STUN)
  - Device asks STUN server "What's my public IP:port?"
  - Device shares info with other peer (P2P)
  - Works only if NAT keeps mappings stable
2. Traversal Using Relays around NAT (TURN)
  - Both devices send media to a TURN server
  - Used as fallback if direct P2P fails
  - Higher latency + server bandwith cost
3. Interactive Connectivity Establishment (ICE)
  - Gathers candidates
    - Private IP:port
    - Public IP:port from STUN
    - Relay addresses from TURN
  - Tries all possible paths
  - Picks the fastest, lowest-latency route
Encryption keys exchange
Exchange session metadata

1.3.9. HTTP

There are 3 main versions of HTTP being used

Version	Description	Adv	Disadv	Use Case
1.1	Most widely supported	Simple, easy to debug, universally compatible	One request per connection -> head-of-line blocking -> higher latency, more open connections = higher infra cost	Legacy, IoT
2	Multiplexed streams over one TCP connection	Big improvements in latency and throughput over HTTP/1, fewer connections per client, required for gRPC	Head-of-line blocking if packet loss occurs, more complex load balancing	gRPC
3	Runs over QUIC (UDP)	Lowest latency	Less mature, harder debugging, firewalls may block UDP	Mobile / unstable networks

Modern clients auto-negotiate best protocol via Application Layer Protocol Negotiation (ALPN)
- client says “I support h2, http/1.1, h3”, server picks one

1.3.10. Transmission Control Protocol (TCP)

Lossless

1.3.11. User Datagram Protocol (UDP)

Lossy

1.3.12. Quick UDP Internet Connections (QUIC)

UDP at Transport Layer + Reliability at App Layer

1.3.13. Which transport protocol

1.3.14. WebRTC

Frameworks
- Web Real-Time Connection (WebRTC)
  - Open source framework for P2P RTC
  - Components
    - Signaling
    - Media Capture
    - Media Transport
    - Encryption
    - NAT Traversal
    - Adaptive Quality
    - Data Channels
Signaling Protocols
- Session Initiation Protocol (SIP)
  - Set up, modify, tear down real-time sessions for voice/video/messaging
Monitoring Protocols
- Real-time Transport Control Protocol (RTCP)
  - Measures network performance metrics for RTP
Security Protocols
- Transport Layer Security (TLS)
  - Secures TCP
- Datagram Transport Layer Security (DTLS)
  - Secures UDP
  - i.e. TLS for UDP
Transport Protocols
- Real-time Transport Protocol (RTP)
  - Transports real-time media (audio/video)
  - Rides on UDP, sometimes TCP
- Secure Real-time Transport Protocol (SRTP)
  - Encrypted RTP
  - Uses DTLS for key exchange
- RTCP
Network Address Translation (NAT)
- NAT Devices
  - Home Routers
  - Corporate Firewalls
- Vanilla NAT
  - 1:1 mapping between private IPs to public IPs (e.g. 192.168.0.1 (private) : 203.0.113.1 (public))
  - Provides control over private IP ranges
  - Single source of truth for configuring public/private IP mappings (e.g. ISP changes IP allocations)
- Port Address Translation (PAT) a.k.a NAT Overload
  - 1:many mapping between private IPs to public IPs by using ports as well
    - e.g.
      - 192.168.0.10:52301 -> 203.0.113.7:40001
      - 192.168.0.11:52301 -> 203.0.113.7:40002
  - Workaround to IPv4's small address space, not needed in IPv6 where 1:1 mappings are encouraged
Firewall
- Decides which packets are allowed/blocked
- Lives between private network and public internet
- Typically blocks incoming connections, not outgoing
- Corporates typically block UDP entirely because the lack of handshakes make it hard for firewalls to understand the session state If asked: “How would you design WhatsApp voice calls?” • Signaling: WebSockets (or SIP for enterprise VoIP). • Transport: RTP/SRTP for media. • NAT traversal: STUN + TURN fallback. • Encryption: SRTP end-to-end. • QoS handling: Adaptive bitrate, jitter buffer.

If asked: “How does WebRTC work?” • WebRTC = framework, uses: • Signaling (custom, often WebSocket) • RTP/SRTP for audio/video streams • STUN/TURN for NAT traversal • DTLS/SRTP for security • Adaptive bitrate + codec negotiation.

If asked: “How does VoLTE differ from WhatsApp?” • VoLTE → Managed SIP + RTP inside carrier network, guaranteed QoS, low jitter. • WhatsApp → WebRTC over the public Internet, no QoS guarantees.

1.3.15. Performance Metrics

Metric	Description	Layer	Units	E.g.
Bitrate	Rate at which app encodes and sends data	Application	bits/s	Voice: 10 kbps 2G, 64kbps 3G, 64 kbps LTE, 12-64 kbps VoLTE, 128 kbps Vo5G
				Video: 1 Mbps (360p), 2 Mbps (720p), 5 Mbps (1080p), 15 Mbps (4K)
Throughput	Rate at which data is sent over the network	Network	bits/s	Zoom bitrate 2Mbps, network throughput only 1.5Mbps due to packet loss
Available Bandwitdh	Rate at which a network link can support data transfer	Network	bits/s	Wi-Fi: 5Mbps
Latency / Round Trip Time (RTT)	Time taken for packet to go to peer and back		ms	<150ms before humans detect delay
Packet Loss	% of dropped packets between nodes in one direction		%	<1% before choppy/freezing videoaudio
Jitter	Variability in packet arrival time in one direction		ms	<30ms before video stutters \ audio cracks

1.3.16. Adaptive Performance Strategies

Strategy	Description	Layer	Use Cases
Jitter Buffer	Temporary storage in receiver's app that smooths out variations in packet arrival times before playback	Application	Jitter
Bitrate	Bitrate Reduction + ...
Bitrate Reduction	Reducing the encoding and sending of data	Application	Packet Loss

1.3.17. Network Protocols

Application Layer

Signaling Layer

Voice over Public Switched Telephone Network (PSTN)
- Dedicated E2E path between landlines/mobile phones using circuit switchers
- Transmits uncompressed voice using Pulse Code Modulation (PCM) at 64 kbps per call
- Used in landlines and mobile phones when on connections of < 4G
- >4G and above
- Carrier provides QoS guarantees
Voice over IP
- Transmits voice using IP
- No QoS guarantees, call quality depends on network connection
Video over IP
- Transmits video using IP

1.3.18. Wireless Systems

Application
Transport / IP
Radio Resource Control (RRC): Manages radio resources and connection states between base station and user device
- Types of radio resources:
  - Time
  - Frequency
  - Power
  - Modulation & Coding
  - Bearer
  - Control
  - Random access
  - Beamforming
- Types of connection states:
  - RRC_IDLE
  - RRC_INACTIVE (5G)
  - RRC_CONNECTED
PDCP
RLC
Medium Access Control (MAC) Layer: Decides who gets to transmit, when, and how much bandwidth
Physical (PHY) Layer: Deals with actual signal transmission over radio waves (modulation, power levels etc.)

1.3.19. Telco 101:

Cell Tower
- Software Components i.e. Base Station Software Stack
  - Radio Access Network (RAN) Software
    - Handles communication between mobile devices and cell tower, e.g.
      - Handover Control: Deciding when phone switches from one tower to another
      - Radio Resource Control (RRC): managing spectrum and assigning frequencies to devices
      - MAC & PHY Scheduling: Deciding which user gets how much bandwidth every millisecond
      - Security & Authentication: Encrypting radio traffic before it hits the core
      - Quality of Service: Prioritising latency-sensitive traffic like voice and video
  - Cell Tower OS
    - Manages hardware scheduling, memory and task prioritisation
  - Management Software
    - For engineers to monitor and configure the cell tower
- Hardware Components
  - Antennas: Send/receive radio signals
  - Remote Radio Unit (RRU): Converts radio waves to/from digital data
  - Baseband Unit (BBU): Runs the base station software stack
    - In 5G, BBUs are
      - centralised in regional data centers
      - serve dozens of towers
      - do not exist on the cell tower
  - Backhaul: Connection to core network via
    - Fiber (Most common)
    - Microwave (rural areas)
    - Satellite (remote locations)

1.3.20. Scheduler

Scheduler

Does
- Assign task to node
Does not
- Start or manage the workload

Orchestrator

Does
- Scheduler
- Provisioning and starting workloads on nodes
- Scaling workloads up/down based on demand
- Health monitoring and self-healing
- Rolling updates and rollback management
- Managing networking, storage and service discovery

1.3.21. Firewalls

Type	E.g.	Layer	Found In	Checks	Use Case
Web-Application (WAF)	AWS WAF	Application	CDNs, gateways, load balancer	Examines HTTP payload for attack detection	Web app / API protection against SQLi, XSS, bots, malicious patterns
Proxy	Nginx reverse proxy	Application	Proxy servers, gateways	Examines payload for access control and anonimisation

Packet Filtering	iptables (basic rules)	Network & Transport	Routers	Examines packets based on source/destination IP, port, protocol	Simple allow/deny rules, port blocking
Host-Based	Windows Firewall, iptables	Network & Transport	Individual servers / VMs	Examines traffic per host	Protects single servers, last line of defense

1.3.22. Optimising for reads/writes

Read Optimisation Strategy
CDN caching

The disadvantages in general are:

Higher storage
Stale data
Additional complexity with invalidation strategy

Write Optimisation Strategy

The disadvantages in general are:

More complex read paths
Additional complexity with background preprocessors

Balanced Approach
CQRS + messaging
per-endpoint SLAs with targeted caching
tiered storage (hot cache -> primary DB -> datalake)

2. Development

Software development revolves around turning ideas into working software. Developers are the chefs who prepare the dishes, organise the kitchen with the goal of getting the best tasting food to customers with the least amount of wastage with time and ingredients.

2.1. Delivery

Delivery is about delivering value to users with minimal waste using business processes.

2.1.1. Tickets

Tickets are the backbone of software delivery. They help track work, manage priorities, and ensure that the team is aligned on what needs to be done. This is similar to how chefs use order tickets in a restaurant to manage customer orders.

2.1.2. Typical stages a ticket flows through

Tickets go through multiple stages, just like how an order ticket in a restaurant go through multiple stages, e.g. waiter takes the order from the customer, sends it to the kitchen, chefs prepare the different components of the dish, head chef does the final check, waiter brings the dish to the customer.

Epic Refinement

Functional Design
- BPMN Diagrams
- High Level User Stories
- e.g. "I want spagbol"
Technical Design
- High-level answer to the question "What do we need to do?"
  - e.g. "We need to buy tomatoes, mince, etc."
- Avoid diving too deep into "How do we need to do it?"
  - e.g. "We need to cook the tomatoes for x mins"

Ticket Refinement

Business Refinement
- Validation Steps
  - e.g. "There should be mirepoix, arrabiata, browned mince, cooked spaghetti..."
Technical Refinement
- Tech Steps
- e.g. "We need to chop the tomatoes, celery, onions into squares, cook them for x mins"

Delivery

Development
- Do the tech steps
- e.g. Chefs carrying out the recipe steps
Code Review
Functional Review
- e.g. Head chef checking the food
Validation
- e.g. Waiter asks the user "How's the food?"

2.1.3. Poke Yoke

What was the root cause of the issue?
How could we have detected this issue earlier?
How can we prevent this issue from happening again?

2.2. Maintainability

How to deliver value to users with minimal waste using code.

Single Layer of Abstraction Principle (SLAP)
Dependency Injection
Clean Conditionals
Conventional Commits
Early Returns / Continues
Prefer for loops over while

2.3. Testing

E2E
- Main user stories, happy paths
Integration
- Edge cases not caught by E2E
Unit
- Small functions

2.4. Concrete Knowledge

2.4.1. Types of Development

Web
- Frontend
- Backend
Mobile
Game
Desktop
Embedded
DevOps
Data
ML / AI
Security

2.4.2. Choosing a language for mobile app development

2.4.3. Choosing a language for frontend web development

Language	Use Case	Adv.	Disadv.
JS	Default	Natively supported - browsers come with JS engine	Single-threaded by default
Dart (compiled to JS)	Cross-platform		No UI interactivity
C/C++/Rust (through WASM)	3D graphics, gaming, video editing (e.g. Figma, Canva, AutoCAD Web)	High performance	No UI interactivity
Python (through WASM)	AI/ML in the browser	High performance, mature AI/ML ecosystem library	No UI interactivity
C (through Blazor WASM)	Existing .NET implementation	UI interactivity	Young ecosystem, large initial payload (downloads 6MB .NET runtime)

JS is the default choice as it is the only language that has direct access to the DOM to render UI.

2.4.4. Choosing a language / framework for backend web development

The choice of language for backend web development is tightly coupled to the language's runtime, libraries and frameworks as they provide key tradeoffs.

Language	Use Case	Adv.	Disadv.
Javascript	Real-time apps, typically preferred over php these days	Mature ecosystem, same language for FE and BE, great for concurrency (<10k users)	Not typed
PHP	Wordpress, CMS, e-commerce	Huge CMS ecosystem, powers wordpress	Process-per-request model limits real-time apps without extra tooling, js is typically preferred

Python	ML / AI	Huge AI/ML ecosystem

Java	Enterprise, finance	Strict typing, battle tested	Heavier setup
C	Enterprise with Microsoft eco-system	Great integrations with Microsoft / Azure	Tied to Microsoft eco-system

Go	Microservices, cloud-native, high-concurrency APIs	Extremely fast, great concurrency with goroutines	Less suited for CMS, e-commerce
Rust	High-performance APIs

Ruby	Replaced by JS	-	Declining in popularity due to memory usage, scaling, and struggling with concurrency

2.4.5. Choosing an Infrastructure as Code (IaC) framework for cloud

Framework	Description	Use Case	Adv.	Disadv.
AWS
SST (Serverless Stack)	Third party abstraction on top of CDK	Small projects	Ultra-fast local lambdas with hot reload, DevX	Less flexible than CDK, third party solution, risky with breaking changes
CDK	AWS high-level code-first framework built on CloudFormation	Best all round-choice for AWS	Common programming languages supported	Steep learning curve, no local emulators for lambdas and API gateways
SAM (Serverless Application Model)	AWS high-level serverless-first legacy framework built on CloudFormation	Prefer CDK	DevX with emulators for local lambdas/API gateways	YAML config, serverless projects only
CloudFormation	AWS low-level framework	Low-level control	Access to L1 constructs for high customisability	JSON/YAML config, verbose
Azure
Bicep
ARM Templates				JSON Config
GCP
Deployment Manager			YAML
Multi-vendor
Terraform
Pulumi
Serverless Framework	Legacy vendor agnostic framework	Do not use, it is dead	Supports AWS, Azure, GCP	YAML config, mocking AWS locally required

2.4.6. Choosing a library for local dev of cloud resources

AWS

Library / Tool	Description	Use Case	Adv.	Disadv.
LocalStack	Full AWS service emulator in Docker	Best library to start with before using other libraries for specific functionality	Broad AWS coverage, runs in one container	Slower than service-specific emulators, partial coverage of some services
MinIO	S3 compatible object store	Local S3	Fast	S3 only, some S3 features differ
ElasticMQ	SQS emulator	Local SQS	Fast	SQS Only
DynamoDB Local	DynamoDB emulator	Local KV	Fast	DynamoDB only
SAM CLI	Lambdas / API Gateway emulator	Local lambdas / API Gateway	Fast	Serverless services only
SST	Lambda emulator with hot reload	Extremely fast local lambda dev	Extremely fast	Need to use SST

2.4.7. Browser Storage

Storage Type	Description	Set by	Access via	Lifetime	Access scope	Capacity	Use Cases	Security Notes
Cookies	KV pairs	Responses (Set-Cookie) + JS (document.cookie)	Requests (auto-sent) + JS	Configurable to clear after session / expiry datetime	Browser + domain	4KB each, 50 per domain	Auth, prefs	Use HttpOnly, Secure, SameSite flags
Session Storage	KV pairs	JS	JS	Cleared on tab close	Tab / Session	5MB	Temporary UI state, multi-tab separation	Accessible to JS -> XSS risk
Local Storage	KV pairs	JS	JS	Persistent until cleared	Browser + Origin	10MB	App state, non-sensitive prefs	Accessible to JS -> XSS risk
Extension Storage	???	JS (Extensions only)	JS (Extensions only)	Persistent until cleared	Extension	5MB (sync), 10MB (local)	Extension settings, sync across devices
IndexedDB	NoSQL DB	JS	JS	Persistent until cleared	Browser + Origin	xGB, depending on disk space	PWAs, offline apps, large structured data	Origin-scoped, but XSS risk

2.4.8. Request/Response Flags

Flag	Purpose	Use Case
HttpOnly	Prevents JS from reading cookies	Protect tokens from XSS
Secure	Cookie only sent over HTTPS	Protect plaintext cookies from being leaked
SameSite	Controls if cookies are sent on cross-site requests (Strict/Lax/none)	CSRF protection / cross-site marketing
Cache-Control	Controls caching of resposne data (no-store, max-age etc.)	Ensure sensitive data isn't cached
CORS headers	Control which domains can make cross-origin requests	APIs that need controlled access

2.4.9. Response Codes

Code	Meaning	When to use	Benefit of using
Informational	Request received, continuing process	Rare in practice, mostly for protocol-level interactions
100	Continue	Client should continue sending request body (after headers OK)	Saves bandwidth if request is rejected early
101	Switching Protocols	Used for HTTP to WebSocket upgrade or HTTP/1 to HTTP/2 switch	Necessary to start persistent connections
Success	Request succeeded
200	OK	Standard response for successful request (e.g. GET, POST when no resource creation)
201	Created	New resource created successfully (e.g. POST /users)
202	Accepted	Request accepted for async processing but is not done yet
204	No Content	Success, but no response body (e.g. DELETE)
Redirection	Further action needed
301	Moved Permanently	Resource permanently moved	Tells crawlers to update their search index, better SEO
302	Found (Moved Temporarily)	Temporary redirect (historically used like 303)
303	See Other	Redirect after POST -> GET (common for web forms), e.g. ???
304	Not Modified	Used with caching	Client can use cached response, lowers latency and bandwidth does not need to wait for body to arrive
Client Error	Problem with request
400	Bad Request	Malformed syntax, invalid patterns
401	Unauthorized	Missing/invalid authentication
403	Forbidden	Authenticated but not authorised
404	Not Found	Resource doesn't exist, or if you don't want malicious actors to know your API endpoints if they are not authenticated/authorised	Security through obscurity + clear feedback
409	Conflict	Resource conflict (e.g. duplicate unique field)
429	Too Many Requests	Rate limiting / throttling
Server Error	Problem on server side
500	Internal Server Error	Generic server crash/error
502	Bad Gateway	Upstream server error (e.g. reverse proxy can't reach backend)
503	Service Unavailable	Server overloaded, down for maintenance
504	Gateway Timeout	Upstream service didn't respond in time

2.4.10. Web Identifiers

Term	Definition	E.g.
Domain	Registrable name of a website / portion of host	`example.com`
Host	Network address (domain name / IP) in a request	`example.com`, `shop.example.com`

Scheme	???	`http://`, `ws://`
Port	???	`443`
Origin	Scheme + Host + Port	`https://example.com:443`
Fragment	???	`#reviews`
Uniform Resource Name (URN)	Name of a resource, not how to locate it	`urn:isbn:0451450523` (book ISBN), `urn:uuid:6fa459ea-ee8a-3ca4-894e-db77e160355e` (UUID)
Uniform Resource Locator (URL)	How to locate a resource	`https://shop.example.com:443/products?id=10#reviews`
Uniform Resource Identifier (URI)	URL / URN	-

2.4.11. React

Avoid useEffect if there are no external deps (source)

2.4.12. Database Terminology

Statement
- A single command
- e.g. SELECT, UPDATE, FROM, WHERE
Read / Query / Data Query Language (DQL)
- A complete set of statements
- Ends with a semicolon
- e.g. SELECT * FROM fooTable;
Write / Update / Data Modification Language (DML)
- A complete set of statements
- Ends with a semicolon
- e.g. UPDATE fooTable SET colName = x;
Read Result Set
- Data returned from a query
Update Acknowledgement
- Confirmation returned from a query
- e.g. x rows inserted
Transaction
- A group of queries executed as a single unit
- e.g. BEGIN / START TRANSACTION -> COMMIT / ROLLBACK
Session
- A client's connection to the DB
Database Object
- Anything defined in a DB
- e.g. Tables, Views, Indices, Stored Procedures, Triggers, Functions
Schema
- Logical grouping of DB objects
Execution Plan
- The strategy the DB optimiser chooses to run your query
- e.g. index scan vs full scan, hash join

2.4.13. Database Data Persistence

Data in Tables (Persistent)

Base/Regular Table
- Data stored in disk
- Data is persistent across sessions
Temporary Table
- Data stored in disk
- Data exists only in session
- Data can exist across sessions if cached

Data in Queries (In Memory)

Result Set
- Data stored in memory
- Data exists onl
Derived / Subquery e.g. FROM
- Data stored in memory
- Data exists only in query
Common Table Expression (CTEs) e.g. WITH
- Same as subquery, but provides syntactic alias for reusing subqueries

Named Queries

View/Virtual
- Query definition stored in disk
- Data only stored
Materialised View
- Data stored in disk
- Manual/scheduled refresh
Stored Procedure
- Data stored in disk
- ???

2.4.14. Database Isolation Levels

| Isolation Level | Dirty Reads |

2.4.15. Testing Frameworks

Frontend

Web
- Playwright (purpose built from the ground up)
- Cypress (multiple packages patched together)
Cross Platform
- integration_test (flutter)
Mobile
- Maestro (js)
  - Supports OS level interaction, e.g. going to system settings

2.4.16. Cold/Warm/Hot Starts on Mobile

Cold Start
- binary not in memory
- e.g. launching app after killing it
Warm Start
- binary in memory, app process in background
- e.g. when switching between apps
Hot Start
- binary in memory, app process in foreground
- e.g. when locking and unlocking the screen momentarily, or switching between apps briefly
  - This occurs because the Android and iOS give apps a grace period (~2s) before backgrounding
- App still has GPU and CPU priority

2.4.17. Splash Screen

Splash screens are only shown for cold start

Phase	Native iOS	Native Android	React Native	Flutter
Process Startup	OS launches app process	Same	Same	Same
Show OS-level Splash	Launch splash	Same	Same	Same
Runtime Init + Framework Boostrap	Initializes iOS runtime + UIKit, sets up main run loop, prepares initial UIViewController	Init Android Runtime + base Activity, inflates first layout	Native layer starts JS engine, loads JS bundle, sets up React tree & JS x native bridge	Native layer starts Flutter engine, loads Dart VM, initializes widget tree & Skia renderer
App Init	Set up SDKs, DB, config etc.	Same	Same	Same
Remove Splash	OS removes splash once first UIViewController is ready	OS removes splash once Activity content is ready	Native splash removed after JS bundle + RN root view are mounted	Native splash removed after Flutter engine renders first frame
First Frame Rendered	First frame is rendered	Same	Same	Same

3. Resources

DS&A
- LeetCode
System Design

4. TODO

Designing Data–Intensive Applications: Big Ideas Behind Reliable, Scalable, and Maintainable Systems
Cheatsheet
2's complement
Hypergeometric/Binomial: Given 100 faulty in sample size of 1000, what is the probability of getting
- Prefer Playwright (purpose built from the ground up) over Cypress (multiple packages patched together)
Cross Platform
- Flutter
  - integration_test

Table of Contents