Software Development Engineering Cheat Sheet

Authors
Published on
63 min read
Software Development Engineering Cheat Sheet

Table of Contents

1. C1: System Context Design

1.1. Key Questions

  1. [Functionality] Does the system work?
  2. [Security] Is the system secure?
  3. [Robustness] How does the system handle failures?
  4. [Scalability] Does the system scale?
  5. [Speed] What are the bottlenecks?
  6. [Observabilty] Is the system observable?
  7. [Cost] Is there anything we can simplify do decrease cost?

1.2. Making decisions

  • All decisions, at a high level, are an optimisation for
    • Functionality (Profit)
    • Cost (Loss)

1.3. CAP Theorem

CharacteristicDescriptionUse Case
ConsistencyAll nodes in the system see the same data at the same timeUsually preferred for financial systems
AvailabilitySystem remains operational even if some nodes failUsually preferred for social media / streaming apps
Partition ToleranceSystem remains operational even if network communication with some nodes failNon-optional because networks are not reliable, so the tradeoff is usually between C and A.

1.4. Tenancy

A tenant is a customer/organisation space with its own users, data, config

Single-tenantMulti-tenant
DefinitionOne tenant per isolated stackMultiple tenants per stack
IsolationStrongWeak
Per-tenant customisationEasyHarder
OpExHigherLower
ScaleWorse (under-utilised)Better (pooling)
Compliance / Data residencyEasierHarder (needs partitioning)
Onboarding SpeedSlowerFaster

1.5. Types of Development

  • Web
    • Frontend
    • Backend
  • Mobile
  • Game
  • Desktop
  • Embedded
  • DevOps
  • Data
  • ML / AI
  • Security

1.6. Infrastructure as a Service (IaaS) vs Platform as a Service (PaaS)

ApproachUse CaseAdv.Disadv.
IaaSLarge-scale / custom appsFlexibility, Pay-as-you-goSetup & maintenance, steeper learning curve
PaaSMVPsFaster dev + CI/CD + easy deployment & scaling + security out of the boxVendor lock-in, less flexibility

1.7. Compliance

1.7.1. PCI DSS

PCI DSS (Payment Card Industry Data Security Standard) is a security compliance standard governed by major card brands (Visa, Mastercard, Amex) relevant to credit/debit card data

  1. Prefer using payment providers (e.g. Stripe) to avoid handling card data
  2. If unavoidable:
    1. Never store sensitive authentication data (CVV, PIN, track data)
    2. Isolate the Card Data Environment (CDE) via network segmentation
    3. Encrypt cardholder data in transit and at rest
    4. Strict access control + audit logging for in-scope systems

1.8. Best Practices to Scale

Scaling Best PracticesDescriptionReasonException
Stateless ComputeKeep biz logic compute statelessAny instance can serve any request, add more instances to scale, replace instance in failure, easy load balancing
WS on the edge, HTTP/RPC at the coreHTTP/RPC are stateless, i.e. providing easier retries + load balancing + observability + timeouts
IdempotencyRepeating an operation has the same effect as doing it once
Pull-based backpressure is typically more forgiving than push-based

Duplexity: Initiation of communication + Sending of data + Concurrency

DuplexityWho can initiate communication?Who can send data?Can both send at the same time?ExampleUse Case
SimplexOne side onlyOne side onlyN/AWebhooksEvent notifications
Half-duplexTypically one side at a time (often client-first)Both sidesNoHTTPAPIs
Full-duplexBoth sidesBoth sidesYesWebSocketChat, collaboration

1.9. Message Distribution / Fanout Patterns

Key question: For one event,

  • who should receive it,
    • this determines fanout appraoch
  • how many recipients at peak
    • if >1000, avoid broadcast and look into group fanout strategies
Fanout PatternsDescriptionUse CaseTypical ImplementationExample Schema
1:1 (simple fanout/targeted routing)One sender → one recipientNotifications, repliesConnection registry + routing
1:many (group/complex)One sender → defined subsetChat rooms, teamsGroup registry or pub/sub topics
1:all (simple fanout/broadcast)One sender → everyoneLive feeds, system eventsPub/sub broadcast
Group Fanout PatternsDescriptionExample KV ShapeUse CaseTypical ImplementationRegistry Shape
ConditionalSend to subset matching a conditionAll users with role=adminAdmin alertsAttribute filtersattr → [connectionId]
HierarchicalTopics arranged in a treeregion.eu.fr.parisGeo updatesTopic hierarchytopicPath → [connectionId]
PartitionedSplit audience for scale/performanceShard users across partitionsMassive scaleHashing / partitionsshardId → [connectionId]
Fanout ImplementationAdvantagesDisadvantagesUse Case
In-memoryFast, simpleSingle node onlyMVPs, single-node WS
RegistryScales, targeted deliveryRegistry complexityChat, notifications
Pubsub BrokerMassive fanout, decoupledWasteful for 1:1, infra costFeeds, live updates

2. C2: Container Design

2.1. Typical Cloud Infrastructure

LayerComponentDescriptionUse CaseE.g.
EdgeDNSResolves domain nameAWS Route53, GCP DNS
CDNCaches static content for low-latencyAWS CloudFront
WAF / DDoS ProtectionProtect from malicious actsAWS WAF
Edge FunctionsExecute code at edge on incoming HTTP requestsAuth, redirects, request shaping, thin APIs, caching logicAWS Cloudfront Functions, AWS Lambda@Edge, Cloudflare Functions
Edge GatewayRouting/security/transform at the edgeAPI gateway-lite, auth, rate limiting, header rewritesAWS API Gateway + Lambdas, Cloudflare Workers as gateway
GatewayGatewayRouting to different services, securityTypically used to route to serverless, not usually needed for servers + ALBAWS API Gateway
Protocol Translation (HTTP to gRPC, REST to GraphQL)
Aggregation (Compose multiple backend calls into one)
ComputeServersPhysical servers (bare metal)Ultra-low latency, specialized hardware (GPU/FPGA), compliance, predictable performanceAWS Bare Metal (EC2 metal), on-prem
VMsLong-running VM that provides environment configurationSteady high throughput, long-lived connections, heavy local state, custom networking, predictable workloads, higher memory/CPU/GPU, strict latency floorsAWS EC2
Serverless (Container Runtimes)Fully managed container environmentsLong-running microservices, multi-language, complex dependenciesAWS Fargate
Serverless FunctionsEvent-driven functionsSpiky demand, MVP: Pay-per-use makes it more suitable, short, event-driven logicAWS Lambdas
OrchestrationServer OrchestrationScales VMsAWS Auto Scaling + EC2 ASG
Container OrchestrationScales containersECS/EKS
Data ProxyDatabase ProxyManages a pool of persistent connections to the DB>10k client connectionsAWS RDS Proxy
Relational Data StoreRelational DBsScale-up: Easier to manage schema as compared to Document DBsAWS RDS, AWS Aurora (Serverless RDS)
Not Only Relational Data StoreDocument DBsMVP: Pay-per-useDynamoDB
KV DBs / CachesAWS ElastiCache (Redis)
Object Data StoreObject StorageStoring large immutable objectsAWS S3
File Data StoreFile StorageShare mutable files within private networkAWS EFS
DSQLDistributed SQL Query EngineQuery large-scale data across object storage/ data lake with SQLAWS Athena
Data LakeCentralised storage for raw dataanalytics, ML workloads, batch processingAWS Lake Formation, Iceberg on S3
MessagingPubSub BrokerFanout messages to subscribersLive updates, eventsSNS, Redis PubSub
Message Broker / Message QueueOne message → one consumerAsync jobs, retriesSQS, RabbitMQ
Stream / Event QueueDurable ordered event logEvent sourcing, analyticsKafka, Kinesis
Load balancingApplication Load Balancer (ALB)Distributes traffic to apps using HTTP infoAWS ALB
Network Load Balancer (NLB)Distributes traffic to apps using TCP/UDP infoAWS NLB
Gateway Load Balancer (GWLB)Distributes traffic to third party security/network applicances using TCP/UDP infoAWS GWLB
Global Load Balancer (GLB)Distributes traffic geographicallyAWS ELB
NetworkingVPCIsolated virtual network for cloud resourcesDefine public/private subnets, control routing, isolation, multi-tier deploymentsAWS VPC
SubnetsSegments inside a VPCControls traffic flow and exposure of resources (e.g. public ALB, private DB)AWS Subnets
Security GroupsVirtual firewalls attached to resourcesControl traffic at instance/service levelAWS Security Groups
ObservabilityLoggingCollect, aggregate and index logs from all servicesAWS CloudWatch
Monitoring / MetricsMonitor resource usage, uptime, etc.AWS CloudWatch
TracingTraces request flow across different servicesAWS X-Ray
DevOpsCI/CDAWS CodeBuild, GitHub Actions
ArtifactContainer RegistryStores , versions, distributes container imagesAWS ECR
AWS CodeArtifact
IntegrationsEmail Delivery (ESP)Sends emails via API/SMTPPassword resets/receipts, notificationsAWS SES, SendGrid, Mailgun, Postmark

to be integrated into table above

Services that support websocketsDescriptionAdv.Disadv.Use Case
API Gateway WSManaged WS endpointSimpleManual fanoutServerless + Serverless Functions
AppSyncGraphQL subscriptionsGraphQL only
IoT Core

2.1.1. What is a load balancer?

2.1.2. What is a gateway?

A gateway is a specialised, stateful compute optimised for:

  • connection handling
  • routing
  • auth
  • fanout

2.2. Databases

ParadigmExamplesUse CaseAdvDisadv
SQLPostgreSQL, MySQL, MSSQLStructured relationships + strong consistency e.g. financial dataPowerful Querying + ACIDSlower writes due to B-Trees, slower reads/writes due to stronger consistency/locks,
Key-ValueRocksDB, DynamoDB, CassandraHigh-throughput writes, cachingExtremely fast writes + BASESlower writes due to LSMT
DocumentMongoDB, FirestoreSemi-structured JSON-like data, e.g. mobile/web appsFlexible schema + BASESlower writes due to LSMT
ColumnarCassandraTime series data, e.g. analytics, event loggingFast on columnar queries, aggregationsSlower writes due to LSMT
GraphNeo4jSocial graphs, recommendation enginesOptimised for graph traversal and relationship modelingLimited for heavy aggregations
TypeDBComplex knowledge graphs, strongly typed and structured relationshipsSmall eco system

2.3. Rendering Strategy

Rendering StrategyDescriptionUse CaseAdvDisadv
Client-Side Rendering (CSR) with Single-Page Apps (SPAs)Client downloads minimal HTML shell with JS, JS renders everything elseInternal tools, dashboardsCheap hostingSlow first paint, poor SEO
Server Side Rendering (SSR)Client downloads minimal HTML shell, server renders full HTML and hydrates client with contentSEO critical (e-commerce, social)Fast first paintHigher infra cost, complex
Static Site Generation (SSG)HTML rendered at build time,Blogs, docs, marketing sitesFaster first paintStatic content only, rebuild for updates
Incremental Static Regeneration (ISR)SSG with on-demand/timer based regenerationCatalogs, listingsSSG with refreshCache stale window, build limits
Islands Architecture / Partial HydrationOnly some components are SSRedSites with selective interactivityFaster first paintComplex
Multi-page Apps (MPAs)Client downloads new HTML for every pageTraditional sites, simple appsSimple model, good SEOFull page reloads, less dynamic UX
Edge-Side Rendering (ESR)SSR but running on edge functionsGlobal appsFastest first paintLimited runtime, cold start issues

2.4. Strategies to serve SPAs

Hosting StrategyDescriptionUse CaseAdvDisadv
Blob Storage Service (e.g. S3)SPA is stored in blob storage and exposed via a public URLLow-traffic / internalSimple and cheapNo edge caching and higher global latency
Blob Storage Service + Content Delivery Network (e.g. S3 + CloudFront)SPA is stored in blob storage, CDN caches assets at edge locationPublic production SPAsFast global deliveryExtra setup
Virtual Machine Hosting (e.g. EC2 + nginx)SPA is stored in a virtual machine and exposed via a portExisting monolithFlexible configurationVM management

2.5. API Architectural Styles

What API architectural style is optimal for functionality (speed) and cost (DevX, maintenance, opex)?

StyleDescriptionUse CaseAdvDisadv
RESTPerform HTTP verbs on resources. Entity based, e.g. POST /usersMost commonUniversally understood + docgen tools e.g. Swagger, OpenAPISlowest - One request for each entity unlike GraphQL + less space efficient than RPC
GraphQLQuery or mutate entities. Entity based, e.g. mutation CreateUser() {...}APIs for FEFaster - One request for multiple entitiesMore setup e.g. defining the schema, resolvers + less standardised docgen e.g. GraphiQL
RPCCall functions remotely. Action based, e.g. await client.createUser()Internal APIsFastest and most space efficient because it uses binary instead of text payloadsOnly for internal use, requires HTTP2^
tRPCType-safe RPC framework that auto-generates client and serverTypescript AppsCan run on HTTP1 because of text payloadTied to Typescript ecosystem + limited language interoperability + difficult to debug

2.6. Transport Protocols

What transport protocol is optimal for functionality (user experience) and cost (DevX, maintenance, opex)?

Transport ProtocolsAdvDisadvUse Case
HTTP
gRPC
WS

What common combinations are there?

EdgeCoreUse Case
HTTPHTTPTraditional API
WSHTTP
WSRPC
WSMQTT
MQTTMQTT

3. C3: Component Design

Queue TypeDescriptionUse Case
Simple Queue
Durable Queue
Dead-Letter Queue (DLQ)

3.1. Encryption / Decryption with Keys

There are two types of encryption/decryption patterns

Key TypeDescriptionE.g.AdvDisadvUse Case
SymmetricPrivate key is shared, i.e. one key for both encryption and decryptionAESComputationally fasterHard to distributeBulk data ancryption (disks, HTTPS session data, VPNs)
AsymmetricPublic/private key is set up, i.e. two keysRSA, ECDSAEasier to distributeComputationally slowerKey exchange, digital signatures, SSL/TLS handshake, email encryption

Public and private keys are used for two main purposes:

Key Use CasePrivate KeyPublic Key / Shared Private Key
Message Authentication and Integrity (Digital Signatures)Sign messageVerify message came from sender (authentication) + Ensure message wasn't modified in transit (integrity)
Message ConfidentialityDecrypt messageEncrypt message

3.2. Authentication

  • There is a trade-off betweeen safety and convenience
  • Best practise to use a pre-built library, but understanding the principles is helpful in system design
  • Authentication: verifying identity
  • Authorisation: checking permissions

3.2.1. Authentication

Transporting Passwords

  • Use HTTPS for password submissions
  • Avoid logging raw credentials

3.2.2. Authentication Methods

MethodUse Case
Username + Password
Username + Password + 2FA
SSO
Custom-built SSO
Securing Passwords
  • Hashing
    • Passwords should be stored as irreversible cryptographic hashes
  • Salting
    • A random, user-specific unique value (salt) is added to the plain-text password before hashing, which is stored in plaintext in the database
    • Prevents
      • two users with the same passwords from getting the same hash
      • hackers using rainbow tables (precomputed mappings of common passwords -> hashes)
  • Peppering
    • A random, global value (pepper) is added to the plain-text password before hashing, which is stored as an env variable on the server
    • An additional layer of security on top of salting

3.2.3. Proof of Authentication a.k.a access tokens

  • After a user is authenticated, a token needs to be stored on the client
  • There are two main types of tokens used: session tokens and JWTs
Session TokenJWTs
StructureRandom opaque string, e.g. b8c9d7f1e6a24f38b1d80b7d849d3e4eStructured base64-encoded JSON object e.g. <header hash>.<payload hash>.<signature hash>
Data accessClient cannot read it, server must retrieve data for clientClient can decode payload easily, e.g. { "email" : "...", "iat": 1665385660, "roles": ["admin"] }
Where data livesIn the backend (server/db/cache) alongside the tokenInside the token
GenerationServer uses cyrpotgraphically secure RNGBuilds JSON payload and signs it
VerificationServer checks that client token string matchesServer verifies signature with public key
RevocationEasy - Delete from backend (server/db/cache)Hard - Blacklist / short expiry
TransportAuthorization header + HttpOnly + Secure + SameSite=StrictAuthorization header + HttpOnly + Secure + SameSite=Strict
Client-side StorageCookiesCookies
Server-side StorageIn the backend (server/db/cache)n.a.
Use CaseMonolithic appDistributed services, OAuth

3.2.4. Refresh Tokens

  • Clients can be provided with a refresh token that is used to refresh access tokens
  • Access tokens should be short-lived (minutes)
  • Refresh tokens can be long-lived (hours/days/weeks)
  • Adv
    • Reduced exposure
    • Centralised control if using JWT access tokens and session refresh tokens

3.3. Authorisation

Access Control ApproachPrincipleUse Case
Role-Based (RBAC)Users -> Roles -> PermissionsEasiest to implement / reason about
Attribute-Based (ABAC)Permission based on user attributes, e.g. user.department == doc.department and time < 18:00Highly customisable
Relationship-Based (ReBAC)Permissions via graph relations, e.g. editor of project XCollaboration apps
Scope-Based (SBAC)Users -> Scope -> Permissions, e.g. contacts.readOAuth

3.3.1. Which cloud layer to authenticate / authorise in

Auth LocationAdvDisadvUse Case
CDN / EdgeLower latency, offloads traffic from downstream, cache authenticated responsesComplex cache invalidation, limited auth logicGlobal low-latency requirements, e.g. public content with lightweight auth
Load BalancerCentralisedLimited to basic checks (signature, expiration)Basic authentication before API Gateway / compute
API GatewaySimple and centralised, offloads traffic from downstreamPotential bottleneckMost modern serverless architectures, coarse grained auth / role-based route gating
ComputeComplex authentication logic, integrate with external auth servicesHigher latencyFine grained auth / resource-level gating

3.4. Sandboxing

3.5. Scaling

TypePrincipleUse CaseAdvDisadv
VerticalUpgrading CPU/RAM/StorageSmall to medium apps, monolithic systems, startupsNo code change + lower latencyLimited by hardware ceilings + expensive at scale + SPOF
HorizontalAdding more serversDistributed systemsFault tolerance via redundancy + Infinite scalabilityNetwork latency + Higher complexity

Types of horizontal scaling:

  1. Database Horizontal Scaling
  2. Compute Horizontal Scaling

Database Horizontal Scaling, i.e. sharding

TypePrincipleUse CaseAdvDisadv
Directory/Lookup-basedShard where data belongs depends on manually maintained directoryFrequently changing shards / manual controlEasy to add / remove shardsDirectory is a SPOF, lookup adds latency
Range-basedShard where data belongs depends on which contiguous key ranges (e.g. A-F, G-L, ...)Time-series data, ordered data, range queriesEfficient for range queries + simple to implementData skew possible, hotspots risk
Hash-basedShard where data belongs depends on hash of keyHigh-write, evenly distributed workloadsGood load balancing, no need to manage rangesRange queries inefficient, rebalancing expensive

Compute Horizontal Scaling

TypePrincipleUse CaseAdvDisadv
Centralised Load Balancing / Orchestrator-based SchedulingRequests are routed based on a load balancer or schedulerWorkloads are heterogeneous, resource usage unpredicatable, fine-grained control over task placementAssign request based on compute needs + Easy to add/remove nodes + Supports complex scheduling policiesOrchestrator / scheduler is SPOF + can be bottleneck
Static PartitioningRequests are routed based on predefined ranges or affinity rules, e.g. ID range, locationTasks are grouped logicallyLow latency as no lookup is neededHotspots + manual rebalancing + difficult to add/remove nodes
Consistent HashingRequests are routed based on hash of request keyStateless workloads, e.g. microservices, serverless, API gatewaysAutomatic load balancing + no load balancingRange based tasks difficult + rebalancing required when nodes are added/removed

3.6. Logging

  • Avoid auto logging POST bodies and GET parameters
    • If the auto logging runs on auth endpoints, passwords could be written in plaintext to logs

3.7. Websockets

Single-Node

At high level design, a single-node WebSocket system can often handle up to ~10k concurrent connections, but to maintain a margin of safety, it’s reasonable to start thinking about distributed WebSocket systems above ~1k connections. At that point, distributed systems also bring benefits like better fault tolerance and operational robustness. When calculating costs

At a lower level, websocket soak test tools can be used to validate these assumptions by observing system behaviour over time (CPU/memory usage, message latency, connection health (success/lifetime/dropped), network egress), identifying which part of the system becomes a bottleneck and needs to be scaled. The goal at this stage is typically to meet some kind of SLO, e.g.:

  • 99.9% of WebSocket messages delivered within 200ms
  • 99% of API requests complete under 500ms
  • < 0.1% connection drops per hour

Distributed

Functionality: How do we ensure that messages get to the correct client?

StrategyDescriptionAdvantagesDisadvantagesTypical Use Case
Pub/Sub broadcastAny instance publishes to a broker which broadcasts to all instances, instance holding the WS delivers, others dropSimple, resilient to instance churnWasteful fan-out, message loss if nobody is listeningSmall–medium clusters, low message volume
Connection registry + direct routingInstances add {clientId → instance} in registry, sender looks up owner and forwards via RPCPrecise delivery, scales wellRegistry correctness complexity, e.g. flapping ownership, more failure cases to handleLarge clusters, high throughput, real-time messaging
WebSocket gateway layerDedicated gateway owns all WS connections, compute instances send messages to gatewayCompute stateless, clean separation of concerns, simple delivery semanticsStateful gateway tier, extra hopHigh-scale systems, many short-lived compute instances

Robustness: How do we ensure reliable message processing and delivery over time?

StrategyDescriptionAdvantagesDisadvantagesTypical Use Case
Queue / Stream keyed by clientMessages placed in per-client or keyed queue, instance owning WS consumes and deliversDurable, pull-based backpressure, supports retries/replays/offline delivery (because messages stay in log while client is offline)Higher latency enqueueing/dequeueing than simple push with pubsub, ownership/rebalancing complexity, not true push (message is delivered when consumer polls, not at production time)Systems needing durability, offline delivery, or replay
Client pull / reconnect catch-upClient fetches pending messages from shared store on poll or reconnectExtremely resilient; minimal server couplingHigher latency; weaker real-time guaranteesNotifications, feeds, async workflows

How do we route clients to the same instance to reduce coordination?

StrategyDescriptionAdvantagesDisadvantagesTypical Use Case
Sticky sessionsLoad balancer routes client to same instance based on hash/cookieVery simple, reduces cross-instance routingBreaks on instance failure, doesn’t guarantee ownershipLow churn systems, cost-sensitive setups
Consistent hashing ownershipAll instances compute owner for clientId using membership + hash ringNo central registry; predictable routingComplex failure handling; membership convergence issuesAdvanced infra teams, custom routing layers

Issues and Mitigations

IssueScenarioMitigation
Reconnect stormsMany clients reconnect after outageExponential backoff + jitter
Gateway crashAll connections on node dropClient reconnect + registry TTL
Stale registryRegistry points to dead gatewayHeartbeats + expiry
Message duplicationRetries cause duplicatesIdempotency
BackpressureGateways send messages faster than client can read, causing buffers/memory to explodeStop sending temporarily / drop messages / disconnect clients
Connection exhaustionToo many open socketsConnection limits
Message lossCrash during sendACKs, retries, durable queues

Pubsub

What types of pubsub brokers are there?

Pub/Sub Message Delivery CharacteristicDescriptionAdvantagesDisadvantagesTypical Use Case
At-most-onceMessage is delivered zero or one time; no retriesVery low latency; simple; minimal overheadMessages can be lost silentlyMetrics, logs, realtime notifications where loss is acceptable
At-least-onceMessage is delivered one or more times; retries on failureHigher reliability; simple retry modelDuplicate messages; consumers must be idempotentEvent propagation, cache invalidation, background jobs
Exactly-once (logical)System ensures message effects occur exactly once (often via deduplication)Strong correctness guaranteesHigh complexity; coordination overheadFinancial transactions, billing, inventory updates
Best-effort broadcastMessage is pushed to all subscribers with no persistenceExtremely fast; simple fan-outNo durability; subscribers must be onlineRealtime websocket fan-out, multiplayer state updates
Durable pub/subMessages are persisted until acknowledged by subscribersSurvives subscriber crashesHigher latency; storage costCritical event distribution, audit logs

3.8. Caching

Cache Read StrategiesDescriptionAdv.Disadv.Use Case
Read-thruApp reads cache -> on miss, cache reads from DBSimplifies app logicStampede risk on hot keys + Tight coupling between cache and data store + Limited flexibility for custom fetch logicSimple KV access
Cache Write StrategiesDescriptionAdv.Disadv.Use Case
Write-thruApp writes to cache -> cache writes to DB syncCache is consistent + Reads are fast after writesHigher write latency + Cache outage blocks writesStrong consistency / configuration data
Write-behind / backApp writes to cache -> cache writes to DB asyncVery fast writesRisk of data loss without durable buffering (queue / WAL required) + eventual consistencyHigh-throughput / analytics / logging / non-critical data
Cache Read/Write StrategiesDescriptionAdv.Disadv.Use Case
Cache-asideApp checks cache -> on miss, app reads from DB -> app writes to cacheSimple + cache only stores what is usedStampede risk on hot keys + Harder to guarantee consistency under concurrent writesDefault choice for most BE systems / Read-heavy systems / microservices / web APIs
Cache-thruRead-thru + Write-thruCentralised data accessCache is SPOF + Reduced observability + Harder debuggingRare / legacy / strict data access boundaries
Cache Invalidation StrategiesDescriptionAdv.Disadv.Use Case
TTL-basedCached entries expires after timeSimple invalidation + Prevents stale data buildupStampede risk on expiry + Hard to pick optimal TTLOften combined with cache-aside / CDN caching
Event-basedCache entries invalidated on data change eventsVery fresh data + No guess work with TTLEvent loss or ordering issues can cause permanently stale cache + More moving partsEvent-driven / CQRS systems
Cache Placement StrategiesDescriptionAdv.Disadv.Use Case
Client / BrowserHTTP cache in browserZero latency and costInvalidation complexityStatic assets
CDN / EdgeCache at edge e.g. CloudFrontVery fast + offloads backendAuth + Invalidation complexityPublic content
In-app cache (L1)In-process cacheUltra-fastMemory-bound, per-instanceHot keys
Remote cache (L2)Redis / MemcachedShared across servicesNetwork latencyShared state
In data layer cacheDB buffer / query cacheTransparentLimited controlRead-heavy DBs
Cache Distribution StrategiesDescriptionAdv.Disadv.Use Case
Single-node (L1)Cache local to one instanceSimpleNo sharingSmall apps
Distributed (L2)Multiple caches, e.g. redisScales horizontallyNetwork latency + Op overheadMicroservices
Multi-level (L1/L2)Local + DistributedBest latency + scaleComplexityHigh-scale systems
Cache Stampede Management StrategiesDescriptionAdv.Disadv.Use Case
Warmup
Prefill

4. C4: Code Design

4.1. Encoding

Encoding is used to serialise user facing data (text/image/audio/video) for storage / transport over the network.

TypeDescriptionUse CaseE.g.
Base3232-character set encoding (A-Z, 2-7)QR codes, OTP secretsJBSWY3DPEBLW64TMMQ======
Base64Represents binary data in ASCIIImages, API keys, JWT segmentsSGVsbG8gd29ybGQ=
Base85Represents binary data in ASCIIPDF<~87cURD_*#TDfTZ)+T~>
URLMakes data safe for URLsURLs%20 -> spaces
HexRepresents binary as hex strings0x12ab
ASCII / UTF-8Maps chars as numeric codesText65 -> "A"
Unicode (UTF-16, UTF-32)Maps characters to numeric codesText (International)U+4F60 -> "你"

4.2. Choosing a language for mobile app development

4.3. Choosing a language for frontend web development

LanguageUse CaseAdv.Disadv.
JSDefaultNatively supported - browsers come with JS engineSingle-threaded by default
Dart (compiled to JS)Cross-platformNo UI interactivity
C/C++/Rust (through WASM)3D graphics, gaming, video editing (e.g. Figma, Canva, AutoCAD Web)High performanceNo UI interactivity
Python (through WASM)AI/ML in the browserHigh performance, mature AI/ML ecosystem libraryNo UI interactivity
C (through Blazor WASM)Existing .NET implementationUI interactivityYoung ecosystem, large initial payload (downloads 6MB .NET runtime)

JS is the default choice as it is the only language that has direct access to the DOM to render UI.

4.4. Choosing a language / framework for backend web development

The choice of language for backend web development is tightly coupled to the language's runtime, libraries and frameworks as they provide key tradeoffs.

LanguageUse CaseAdv.Disadv.
JavascriptReal-time apps, typically preferred over php these daysMature ecosystem, same language for FE and BE, great for concurrency (<10k users)Not typed
PHPWordpress, CMS, e-commerceHuge CMS ecosystem, powers wordpressProcess-per-request model limits real-time apps without extra tooling, js is typically preferred
PythonML / AIHuge AI/ML ecosystem
JavaEnterprise, financeStrict typing, battle testedHeavier setup
CEnterprise with Microsoft eco-systemGreat integrations with Microsoft / AzureTied to Microsoft eco-system
GoMicroservices, cloud-native, high-concurrency APIsExtremely fast, great concurrency with goroutinesLess suited for CMS, e-commerce
RustHigh-performance APIs
RubyReplaced by JS-Declining in popularity due to memory usage, scaling, and struggling with concurrency

4.5. Data Structures & Algorithms

How to solve problems with code.

4.5.1. Methods to Reinterpret Problems

  • Create formula and see if shifting variables around can simplify solution

4.5.2. Modulo

ApplicationModulo byExample
Get n trailing digits10^n1234 % 100 = 34
Check even/odd2isEven = x % 2 == 0
Get value of bit after addition2(1 + 1) % 2 = 0
(0 + 1) % 2 = 1
(0 + 0) % 2 = 0
Check divisible by nnisXDivisibleByN = x % n == 0

4.5.3. Floor Division

ApplicationDenominatorExample
Remove n trailing digits10^n12345 // 100 = 123
Get carry over bit after addition2(1 + 1) // 2 = 1
(0 + 1) // 2 = 0
(0 + 0) // 2 = 0
Get midpoint of any array ([0,1,2] [0,1,2,3])2midpoint = len(arr) // 2

4.5.4. Binary Trees

Sizes

  • no. of nodes: nn
  • height of tree: logxnlog_x n,
    • where xx is for a xx-ary tree
  • width of tree: 2x2^x
    • where xx is the level of the tree for which you want the width

How to navigate a Tree

There are two methods of navigating a tree: Depth-First Search (DFS) and Breadth-First Search (BFS)

DFS

There are three ways to perform traversal:

  1. In-Order Traversal (IOT) -> left, node, right
  2. Pre-Order Traversal (PreOT) -> node, left, right
  3. Post-Order Traversal (PostOT) -> left, right, node

There are two ways to implement DFS:

'''
1. Recursively
    - Adv.: Clean and intuitive
    - Disadv.: Limited by recursion depth, stack overflow risk
'''

def recursive(root):
    iot(root)
    preOT(root)
    postOT(root)

def iot(node):
    if node is None:
        return

    iot(node.left)
    process(node)
    iot(node.right)

def preOT(node):
    if node is None:
        return

    process(node)
    preOT(node.left)
    preOT(node.right)

def postOT(node):
    if node is None:
        return

    preOT(node.left)
    preOT(node.right)
    process(node)

'''
2. Iteratively
    - Adv.: Robust for large or unbounded inputs
    - Disadv.: Less intuitive and readable
'''

def iot(root):
    if root is None:
        return
        
    stack = []
    node = root

    while stack or node:
         go left as far as possible
        while node:
            stack.append(node)
            node = node.left
        
        node = stack.pop()
        process(node)
        stack.append(node.right)

def preOT(root):
    if root is None:
        return 
    
    stack = [root]  switching this to a queue changes the DFS to BFS
    while stack:
        node = stack.pop()
        
        process(node)

         push right first so left is processed first
        if node.right:
            stack.append(node.right)
        if node.left:
            stack.append(node.left)


def postOT(root):
    if root is None:
        return
    
    stack = []
    lastNode = None
    node = root

    while stack or node:
         go left as far as possible
        if node:
            stack.append(node)
            node = node.left
            continue
        
         at leftmost node, if candidate has right and is not the last visited node, check right subtree
         at 
        candidateNode = stack[-1]
        if candidateNode.right and lastNode != candidateNode.right:
            node = candidateNode.right
            continue

        node = stack.pop()
        process(node)
        lastNode = node
        node = None  do not process node again

BFS

There are two ways to perform traversal:

  1. Flat Traversal (FT)
  2. Level-Order Traversal (LOT)

BFS is primarily done iteratively - it can be implemented recursively but there is no practical benefit.


def ft(root):
    if root is None:
        return
    
    queue = deque([root])

    while queue:
        node = queue.popleft()

        process(node)

        if node.left is not None:
            queue.append(node.left)
        if node.right is not None:
            queue.append(node.right)

def lot(root):
    if root is None:
        return

    queue = deque([root])

    while queue: 
         for LOT, we just need to wrap the flat traversal logic in a for loop with levelSize iterations
        levelSize = len(queue)
        for _ in range(0,levelSize):
             same as flat traversal

Note:

  • You can also add metadata for each node by appending tuples (node, metadata) to the queue instead of just nodes

4.5.5. Array

How many times can I slide a window over an array?

  • Intuition
    • Start from the base case - window size 1
      • How many times can you slide it?
    • Increase window size
  • Formula
    • len(array) - windowSize + 1

4.5.6. Bitwise Operations

OperationApplicationExample
AND &Get carry for binary addition of two numbers1 & 1 = 1
AND &Get last bit10 & 1 = 0, 11 & 1 = 1
XOR ^Get sum without carry for binary addition of two numbers1 ^ 1 = 0
0 ^ 1 = 1
1 ^ 0 = 1
XOR ^Find differences between two bit patterns0110 ^ 1010 = 1100, i.e. different in first two bits
Bit ShiftMultiply/divide by 2x = 2, x << 1 = 4, x >> 1 = 1

4.5.7. Dynamic Programming

  • Caching results for fibonacci-style recurrence

4.5.8. Binomial Theorem

Theory

  • The Binomial Theorem describes how to expand binomial expressions without brute force
    • Binomial Expression:
      • An expression formed from two terms,
      • e.g. (a+b)(a + b)
    • Binomial Theorem Formula:
      • (x+y)n=k=0n(nk)xnkyk(x+y)^n = \sum_{k=0}^{n} \binom{n}{k} x^{n-k} y^{k}
        • where (nk)nCk\binom{n}{k} \equiv {}^{n}C_k is the binomial coefficient a.k.a. combinations

Applications

  • The binomial coefficient can be used to describe symmetric number sequences, e.g. 1 4 6 4 1

4.5.9. Describing Symmetry

  • Linear Symmetry
    • Combinations / Binomial Coefficient
    • Modulus
    • Even Functions
    • Cosine
  • Rotational Symmetry
    • Odd Functions
    • Sine

5. Concrete Knowledge

  • BFF: Backend for Frontend
    • GET /dashboard instead of GET /users + GET /orders + GET /recommendations

5.1. JavaScript

Engines

  • V8 (Chrome)
  • SpiderMonkey (Firefox)
  • JavaScriptCore (Safari)
  • Hermes (React Native)

Runtimes

  • Node
    • V8 engine
    • Adv.
      • Mature ecosystem
      • Safest bet
    • Disadv.
      • Slower
      • Security via containers/OS policies
  • Deno
    • V8 engine
    • Adv.
      • Like node but faster
    • Disadv.
      • Mostly compatible with node modules
      • Security via containers/OS policies
  • Bun
    • JavaScriptCore engine
    • Adv.
      • Sandboxed
    • Disadv.
      • Least compatibility with node modules

5.2. CPU Optimisations

  • Branch Prediction
  • Variable reassignment
  • CPU Pipelining
  • CPU Preloading
  • CPU Prefetching
  • Cache Locality
  • Memory Access Patterns

5.3. Language Optimisations

  • Peephole Optimisations
  • Inline
  • Unroll

5.4. Operating Systems

  • Stack Size
    • Linux: 8MB
    • macOS: 8MB
    • Windows: 1MB

5.5. Recursion Depth Limits

  • C++: 100,000
    • Depends on frame size + OS stack size
  • Dart: 10,000
    • Set by default
  • JS: 10,000
    • (V8 engine/chrome)
    • Depends on
  • Java: 1,000
    • Depends on frame size + OS stack size
  • Python: 1,000
    • Set by default

Typically uses a conatiner or VM

5.5.1. Data Types

  • Objects are immutable, files are not
  • Files are accessed via filepath, objects are accessed via API+key
  • Fileshare scales vertically, objects scale horizontally

5.6. Container Orchestration

5.6.1. Orchestration Framework

Container Orchestration FrameworkDescriptionUse Case
KubernetesOpen-source container orchestration framework
OpenShiftKubernetes with batteries included
ECS / ...

5.6.2. Platform

Container Orchestration PlatformKubernetes?DescriptionUse Case
ECSAWS managed orchestration serviceMinimal setup
EKSAWS hosted open-source orchestration frameworkFlexibility
OpenShift

5.7. Networking Model

There are two main models that are used in the industry today:

  1. Open Systems Intercommunication (OSI) model
    1. Abstract: Typically used to discuss concepts
  2. TCP/IP model
    1. Concrete: This is what is used in the internet today
OSI LayerNamePurposeTCP/IP LayerData UnitExamples
7ApplicationUser AppsApplicationDataZoom, WhatsApp, Teams
App ProtocolsHTTP, WebSockets, WebRTC, SIP, DNS, WebRTC API, WebRTC Signaling, DNS, gRPC, RTP/SRTP
6PresentationData formattingJSON, XML Protobuf
6PresentationEncoding & CompressionJPEG, MP3, H.264, gzip
6PresentationEncryptionTLS, DTLS, SSL, SRTP,
5SessionManage session lifecycleNetBIOS, RPC, WebRTC session setup
4TransportReliable/unreliable delivery, multiplexing, manage connectionsTransportSegment (TCP) / Datagram (UDP)TCP, UDP, QUIC
3NetworkRouting, addressingInternetPacketIP, ICMP, BGP
2Data LinkFraming, error detectionLinkFrameEthernet, Wi-FI MAC, PPP, 5G NR
1PhysicalRaw bits over a mediumBitsFiber, RF, copper, modulation

5.8. Sessions and connections

Definition

ConnectionSession
LayerTransportApplication
DefinitionA channel between two peersA context between two peers
LifespanExists only while data flows on the transportCan span multiple connections, until either peer terminates the session

Signaling: Session Management Signaling is the process of setting up, managing, and tearing down a communication session before real-time data flows. Signaling encompasses multiple processes:

  • Session Setup
  • Codec Negotiation
  • Process where two peers agree on a common codec for audio/video during signaling
  • NAT Traversal
  • Techniques + Protocols that allow devices behind NAT to communicate directly
  • There are three main techniques
    1. Session Traversal Utilities for NAT (STUN)
      • Device asks STUN server "What's my public IP:port?"
      • Device shares info with other peer (P2P)
      • Works only if NAT keeps mappings stable
    2. Traversal Using Relays around NAT (TURN)
      • Both devices send media to a TURN server
      • Used as fallback if direct P2P fails
      • Higher latency + server bandwith cost
    3. Interactive Connectivity Establishment (ICE)
      • Gathers candidates
        • Private IP:port
        • Public IP:port from STUN
        • Relay addresses from TURN
      • Tries all possible paths
      • Picks the fastest, lowest-latency route
  • Encryption keys exchange
  • Exchange session metadata

5.9. HTTP

There are 3 main versions of HTTP being used

VersionDescriptionAdvDisadvUse Case
1.1Most widely supportedSimple, easy to debug, universally compatibleOne request per connection -> head-of-line blocking -> higher latency, more open connections = higher infra costLegacy, IoT
2Multiplexed streams over one TCP connectionBig improvements in latency and throughput over HTTP/1, fewer connections per client, required for gRPCHead-of-line blocking if packet loss occurs, more complex load balancinggRPC
3Runs over QUIC (UDP)Lowest latencyLess mature, harder debugging, firewalls may block UDPMobile / unstable networks
  • Modern clients auto-negotiate best protocol via Application Layer Protocol Negotiation (ALPN)
    • client says “I support h2, http/1.1, h3”, server picks one

5.10. Transmission Control Protocol (TCP)

  • Lossless

5.11. User Datagram Protocol (UDP)

  • Lossy

5.12. Quick UDP Internet Connections (QUIC)

  • UDP at Transport Layer + Reliability at App Layer

5.13. Which transport protocol

5.14. Distributed Websocket Approaches

Distributed Websocket ApproachDescriptionAdvDisadvUse Case
Sticky SessionsGWLB pins client to specific gateway instance, e.g.SimplePoor rebalancingSmall systems
Connection Registry + Targeted RoutingclientId -> gatewayId KV lookupEfficient 1:1Chat, notifications
PubsubBroker fans out messagesFeeds, one event must notify many recipients immeidately
Queue
MechanismWhat it storesStrengthWeaknessUse Case
Connection registryconnectionId → gatewayIdPrecise routingNeeds cleanup1:1 messaging
Group registrygroupId → [connectionId]Controlled fanoutLarge groups expensiveChat rooms
Pub/sub brokertopic → subscribersMassive fanoutCoarse routingBroadcast feeds
IdentifierDescriptionAdvantagesDisadvantagesTypical Use Case
Connection IDServer-generated unique ID for each WebSocket connection; changes on every reconnectPrecise 1:1 mapping to an actual socket; ideal for ownership, fencing, and livenessEphemeral; not useful for user-level routing or groupingMessage delivery, connection ownership, detecting stale connections
Client ID (User ID)Logical identifier for a user or client across devices/sessionsStable identity; good for authorization and groupingToo coarse: one client can have many connections; unsafe for deliverySend to all user devices, auth checks, user-level fan-out
Session IDIdentifier for a login session or browser/app contextHelps replace old connections; supports “last session wins” semanticsStill not 1:1 with sockets; session handling adds complexityEnforcing single active session, reconnect fencing
Channel / Topic / Room IDLogical grouping that connections subscribe toClean abstraction for broadcast and fan-out; decouples sender from connectionsRequires subscription management; not tied to identityChat rooms, game lobbies, collaborative documents
Device IDStable identifier per physical deviceUseful for presence, multi-device sync, fallback deliveryPrivacy concerns; not always available or reliablePush notification routing, device-specific state
Instance IDIdentifier of the compute instance holding the connectionUseful for routing and debugging; enables direct forwardingChanges with churn; not meaningful at business levelInternal routing, connection registries, observability
WebSocket IssuesScenarioMitigation
Reconnect StormsBackoff + jitter in the client
Gateway crashClient reconnect typically handled by client ws library
Stale connection registryTTL + Heartbeat allows stale data to be cleared from the registry
Message loss
Memory leaks
Slow clients

5.15. WebRTC

  • Frameworks

    • Web Real-Time Connection (WebRTC)
      • Open source framework for P2P RTC
      • Components
        • Signaling
        • Media Capture
        • Media Transport
        • Encryption
        • NAT Traversal
        • Adaptive Quality
        • Data Channels
  • Signaling Protocols

    • Session Initiation Protocol (SIP)
      • Set up, modify, tear down real-time sessions for voice/video/messaging
  • Monitoring Protocols

    • Real-time Transport Control Protocol (RTCP)
      • Measures network performance metrics for RTP
  • Security Protocols

    • Transport Layer Security (TLS)
      • Secures TCP
    • Datagram Transport Layer Security (DTLS)
      • Secures UDP
      • i.e. TLS for UDP
  • Transport Protocols

    • Real-time Transport Protocol (RTP)
      • Transports real-time media (audio/video)
      • Rides on UDP, sometimes TCP
    • Secure Real-time Transport Protocol (SRTP)
      • Encrypted RTP
      • Uses DTLS for key exchange
    • RTCP
  • Network Address Translation (NAT)

    • NAT Devices
      • Home Routers
      • Corporate Firewalls
    • Vanilla NAT
      • 1:1 mapping between private IPs to public IPs (e.g. 192.168.0.1 (private) : 203.0.113.1 (public))
      • Provides control over private IP ranges
      • Single source of truth for configuring public/private IP mappings (e.g. ISP changes IP allocations)
    • Port Address Translation (PAT) a.k.a NAT Overload
      • 1:many mapping between private IPs to public IPs by using ports as well
        • e.g.
          • 192.168.0.10:52301 -> 203.0.113.7:40001
          • 192.168.0.11:52301 -> 203.0.113.7:40002
      • Workaround to IPv4's small address space, not needed in IPv6 where 1:1 mappings are encouraged
  • Firewall

    • Decides which packets are allowed/blocked
    • Lives between private network and public internet
    • Typically blocks incoming connections, not outgoing
    • Corporates typically block UDP entirely because the lack of handshakes make it hard for firewalls to understand the session state If asked: “How would you design WhatsApp voice calls?” • Signaling: WebSockets (or SIP for enterprise VoIP). • Transport: RTP/SRTP for media. • NAT traversal: STUN + TURN fallback. • Encryption: SRTP end-to-end. • QoS handling: Adaptive bitrate, jitter buffer.

If asked: “How does WebRTC work?” • WebRTC = framework, uses: • Signaling (custom, often WebSocket) • RTP/SRTP for audio/video streams • STUN/TURN for NAT traversal • DTLS/SRTP for security • Adaptive bitrate + codec negotiation.

If asked: “How does VoLTE differ from WhatsApp?” • VoLTE → Managed SIP + RTP inside carrier network, guaranteed QoS, low jitter. • WhatsApp → WebRTC over the public Internet, no QoS guarantees.

5.16. Performance Metrics

MetricDescriptionLayerUnitsE.g.
BitrateRate at which app encodes and sends dataApplicationbits/sVoice: 10 kbps 2G, 64kbps 3G, 64 kbps LTE, 12-64 kbps VoLTE, 128 kbps Vo5G
Video: 1 Mbps (360p), 2 Mbps (720p), 5 Mbps (1080p), 15 Mbps (4K)
ThroughputRate at which data is sent over the networkNetworkbits/sZoom bitrate 2Mbps, network throughput only 1.5Mbps due to packet loss
Available BandwitdhRate at which a network link can support data transferNetworkbits/sWi-Fi: 5Mbps
Latency / Round Trip Time (RTT)Time taken for packet to go to peer and backms<150ms before humans detect delay
Packet Loss% of dropped packets between nodes in one direction%<1% before choppy/freezing videoaudio
JitterVariability in packet arrival time in one directionms<30ms before video stutters \ audio cracks

5.17. Adaptive Performance Strategies

StrategyDescriptionLayerUse Cases
Jitter BufferTemporary storage in receiver's app that smooths out variations in packet arrival times before playbackApplicationJitter
BitrateBitrate Reduction + ...
Bitrate ReductionReducing the encoding and sending of dataApplicationPacket Loss

5.18. Network Protocols

Application Layer

Signaling Layer

  • Voice over Public Switched Telephone Network (PSTN)
    • Dedicated E2E path between landlines/mobile phones using circuit switchers
    • Transmits uncompressed voice using Pulse Code Modulation (PCM) at 64 kbps per call
    • Used in landlines and mobile phones when on connections of < 4G
    • >4G and above
    • Carrier provides QoS guarantees
  • Voice over IP
    • Transmits voice using IP
    • No QoS guarantees, call quality depends on network connection
  • Video over IP
    • Transmits video using IP

5.19. Wireless Systems

  • Application
  • Transport / IP
  • Radio Resource Control (RRC): Manages radio resources and connection states between base station and user device
    • Types of radio resources:
      • Time
      • Frequency
      • Power
      • Modulation & Coding
      • Bearer
      • Control
      • Random access
      • Beamforming
    • Types of connection states:
      • RRC_IDLE
      • RRC_INACTIVE (5G)
      • RRC_CONNECTED
  • PDCP
  • RLC
  • Medium Access Control (MAC) Layer: Decides who gets to transmit, when, and how much bandwidth
  • Physical (PHY) Layer: Deals with actual signal transmission over radio waves (modulation, power levels etc.)

5.20. Telco 101:

Rendering 3D models to 2D assets
Rendering 3D models to 2D assets
  • Cell Tower
    • Software Components i.e. Base Station Software Stack
      • Radio Access Network (RAN) Software
        • Handles communication between mobile devices and cell tower, e.g.
          • Handover Control: Deciding when phone switches from one tower to another
          • Radio Resource Control (RRC): managing spectrum and assigning frequencies to devices
          • MAC & PHY Scheduling: Deciding which user gets how much bandwidth every millisecond
          • Security & Authentication: Encrypting radio traffic before it hits the core
          • Quality of Service: Prioritising latency-sensitive traffic like voice and video
      • Cell Tower OS
        • Manages hardware scheduling, memory and task prioritisation
      • Management Software
        • For engineers to monitor and configure the cell tower
    • Hardware Components
      • Antennas: Send/receive radio signals
      • Remote Radio Unit (RRU): Converts radio waves to/from digital data
      • Baseband Unit (BBU): Runs the base station software stack
        • In 5G, BBUs are
          • centralised in regional data centers
          • serve dozens of towers
          • do not exist on the cell tower
      • Backhaul: Connection to core network via
        • Fiber (Most common)
        • Microwave (rural areas)
        • Satellite (remote locations)

5.21. Scheduler

Scheduler

  • Does
    • Assign task to node
  • Does not
    • Start or manage the workload

Orchestrator

  • Does
    • Scheduler
    • Provisioning and starting workloads on nodes
    • Scaling workloads up/down based on demand
    • Health monitoring and self-healing
    • Rolling updates and rollback management
    • Managing networking, storage and service discovery

5.22. Firewalls

TypeE.g.LayerFound InChecksUse Case
Web-Application (WAF)AWS WAFApplicationCDNs, gateways, load balancerExamines HTTP payload for attack detectionWeb app / API protection against SQLi, XSS, bots, malicious patterns
ProxyNginx reverse proxyApplicationProxy servers, gatewaysExamines payload for access control and anonimisation
Packet Filteringiptables (basic rules)Network & TransportRoutersExamines packets based on source/destination IP, port, protocolSimple allow/deny rules, port blocking
Host-BasedWindows Firewall, iptablesNetwork & TransportIndividual servers / VMsExamines traffic per hostProtects single servers, last line of defense

5.23. Optimising for reads/writes

Read Optimisation Strategy
CDN caching

The disadvantages in general are:

  1. Higher storage
  2. Stale data
  3. Additional complexity with invalidation strategy
Write Optimisation Strategy

The disadvantages in general are:

  1. More complex read paths
  2. Additional complexity with background preprocessors
Balanced Approach
CQRS + messaging
per-endpoint SLAs with targeted caching
tiered storage (hot cache -> primary DB -> datalake)

6. Maintainability

How to deliver value to users with minimal waste using code.

  • Single Layer of Abstraction Principle (SLAP)
  • Dependency Injection
  • Clean Conditionals
  • Conventional Commits
  • Early Returns / Continues
  • Prefer for loops over while

7. Testing

  • E2E
    • Main user stories, happy paths
  • Integration
    • Edge cases not caught by E2E
  • Unit
    • Small functions

7.1. Testing Frameworks

Frontend

  • Web
    • Playwright (purpose built from the ground up)
    • Cypress (multiple packages patched together)
  • Cross Platform
    • integration_test (flutter)
  • Mobile
    • Maestro (js)
      • Supports OS level interaction, e.g. going to system settings

8. Concrete Knowledge

8.1. Choosing an Infrastructure as Code (IaC) framework for cloud

FrameworkDescriptionUse CaseAdv.Disadv.
AWS
SST (Serverless Stack)Third party abstraction on top of CDKSmall projectsUltra-fast local lambdas with hot reload, DevXLess flexible than CDK, third party solution, risky with breaking changes
CDKAWS high-level code-first framework built on CloudFormationBest all round-choice for AWSCommon programming languages supportedSteep learning curve, no local emulators for lambdas and API gateways
SAM (Serverless Application Model)AWS high-level serverless-first legacy framework built on CloudFormationPrefer CDKDevX with emulators for local lambdas/API gatewaysYAML config, serverless projects only
CloudFormationAWS low-level frameworkLow-level controlAccess to L1 constructs for high customisabilityJSON/YAML config, verbose
Azure
Bicep
ARM TemplatesJSON Config
GCP
Deployment ManagerYAML
Multi-vendor
Terraform
Pulumi
Serverless FrameworkLegacy vendor agnostic frameworkDo not use, it is deadSupports AWS, Azure, GCPYAML config, mocking AWS locally required

8.2. Choosing a library for local dev of cloud resources

AWS

Library / ToolDescriptionUse CaseAdv.Disadv.
LocalStackFull AWS service emulator in DockerBest library to start with before using other libraries for specific functionalityBroad AWS coverage, runs in one containerSlower than service-specific emulators, partial coverage of some services
MinIOS3 compatible object storeLocal S3FastS3 only, some S3 features differ
ElasticMQSQS emulatorLocal SQSFastSQS Only
DynamoDB LocalDynamoDB emulatorLocal KVFastDynamoDB only
SAM CLILambdas / API Gateway emulatorLocal lambdas / API GatewayFastServerless services only
SSTLambda emulator with hot reloadExtremely fast local lambda devExtremely fastNeed to use SST

8.3. Browser Storage

Storage TypeDescriptionSet byAccess viaLifetimeAccess scopeCapacityUse CasesSecurity Notes
CookiesKV pairsResponses (Set-Cookie) + JS (document.cookie)Requests (auto-sent) + JSConfigurable to clear after session / expiry datetimeBrowser + domain4KB each, 50 per domainAuth, prefsUse HttpOnly, Secure, SameSite flags
Session StorageKV pairsJSJSCleared on tab closeTab / Session5MBTemporary UI state, multi-tab separationAccessible to JS -> XSS risk
Local StorageKV pairsJSJSPersistent until clearedBrowser + Origin10MBApp state, non-sensitive prefsAccessible to JS -> XSS risk
Extension Storage???JS (Extensions only)JS (Extensions only)Persistent until clearedExtension5MB (sync), 10MB (local)Extension settings, sync across devices
IndexedDBNoSQL DBJSJSPersistent until clearedBrowser + OriginxGB, depending on disk spacePWAs, offline apps, large structured dataOrigin-scoped, but XSS risk

8.4. Request/Response Flags

FlagPurposeUse Case
HttpOnlyPrevents JS from reading cookiesProtect tokens from XSS
SecureCookie only sent over HTTPSProtect plaintext cookies from being leaked
SameSiteControls if cookies are sent on cross-site requests (Strict/Lax/none)CSRF protection / cross-site marketing
Cache-ControlControls caching of resposne data (no-store, max-age etc.)Ensure sensitive data isn't cached
CORS headersControl which domains can make cross-origin requestsAPIs that need controlled access

8.5. Response Codes

CodeMeaningWhen to useBenefit of using
InformationalRequest received, continuing processRare in practice, mostly for protocol-level interactions
100ContinueClient should continue sending request body (after headers OK)Saves bandwidth if request is rejected early
101Switching ProtocolsUsed for HTTP to WebSocket upgrade or HTTP/1 to HTTP/2 switchNecessary to start persistent connections
SuccessRequest succeeded
200OKStandard response for successful request (e.g. GET, POST when no resource creation)
201CreatedNew resource created successfully (e.g. POST /users)
202AcceptedRequest accepted for async processing but is not done yet
204No ContentSuccess, but no response body (e.g. DELETE)
RedirectionFurther action needed
301Moved PermanentlyResource permanently movedTells crawlers to update their search index, better SEO
302Found (Moved Temporarily)Temporary redirect (historically used like 303)
303See OtherRedirect after POST -> GET (common for web forms), e.g. ???
304Not ModifiedUsed with cachingClient can use cached response, lowers latency and bandwidth does not need to wait for body to arrive
Client ErrorProblem with request
400Bad RequestMalformed syntax, invalid patterns
401UnauthorizedMissing/invalid authentication
403ForbiddenAuthenticated but not authorised
404Not FoundResource doesn't exist, or if you don't want malicious actors to know your API endpoints if they are not authenticated/authorisedSecurity through obscurity + clear feedback
409ConflictResource conflict (e.g. duplicate unique field)
429Too Many RequestsRate limiting / throttling
Server ErrorProblem on server side
500Internal Server ErrorGeneric server crash/error
502Bad GatewayUpstream server error (e.g. reverse proxy can't reach backend)
503Service UnavailableServer overloaded, down for maintenance
504Gateway TimeoutUpstream service didn't respond in time

8.6. Web Identifiers

TermDefinitionE.g.
TCP connectionSource IP : Source Port -> Destination IP : Destination Port192.168.1.10 : 52341 → 34.120.10.5 : 443
SocketOS-managed object that includes TCP connection + send/receive buffer
DomainRegistrable name of a website / portion of hostexample.com
HostNetwork address (domain name / IP) in a requestexample.com, shop.example.com
Ephemeral Port52341
Scheme???http://, ws://
Port???443
OriginScheme + Host + Porthttps://example.com:443
Fragment???#reviews
Uniform Resource Name (URN)Name of a resource, not how to locate iturn:isbn:0451450523 (book ISBN), urn:uuid:6fa459ea-ee8a-3ca4-894e-db77e160355e (UUID)
Uniform Resource Locator (URL)How to locate a resourcehttps://shop.example.com:443/products?id=10#reviews
Uniform Resource Identifier (URI)URL / URN-

8.7. Python

  • Celery
    • Distributed task queue
      • e.g.

8.8. Git

  • BFG Repo-Cleaner
    • CLI for cleaning up git repos
      • e.g. committed large files / sensitive data

8.9. S3

URL Types

URL TypeDescriptionAdvDisadvUse Case
UnsignedPublic URLHosting public assets, e.g. website images, JS/CSS, downloads
SignedURL signed with S3 access keys,Uploading images, private file sharing
Pre-signedAllows users who do not have AWS credentials to access S3Uploading images, private file sharing
Cloudfront SignedURL signed with CloudFront key pairsMedia streaming, CDNs, large-scale distribution

8.10. Websockets

Connection TypeTimeout
Client x API Gateway Websocket Connection2hrs
API Gateway x Lambda Integration29s

8.11. Virtual Private Clouds (VPCs)

8.11.1. Defining IP address ranges

  • CIDR blocks
    • Protocol that allows defining a range of valid IP addresses
    • Notation: <ip address>/<prefix length>
      • Prefix length determines how many bits in the address are fixed
  • Classless Inter-Domain Routing

8.11.2. Subnets

Subnet TypeDescriptionAdvDisadvUse Case
PublicHas a route to an Internet Gateway (IGW)Simpler setup + troubleshootingLess safeLoad balancers, bastion hosts, public APIs
PrivateHas no direct route to IGW. Outbound internete goes via NAT Gateway / egress-only IGW / VPN Direct ConnectSaferHigher cost (Needs NAT + proxy, tricker troubleshooting)Databases, app servers, microservices, caches/queues, internal ALB/NLB targets, analytics workers

8.11.3. Connections

Connection TypeDescriptionAdvDisadvUse Case
VPC EndpointPrivately connect VPC to AWS Services without traversing internetLower latency, higher security, lower data transfer costs
VPC Endpoint TypeDescriptionAdvDisadvUse Case
InterfaceCreates an Elastic Network Interface (ENI) in your subnet with a private IPSSM, Secrets Manager, CloudWatch
GatewayRoute table entries that direct traffic to S3 / DynamoDB

8.11.4. Routing Rules

8.11.5. Security Boundaries

8.12. React

  • Avoid useEffect if there are no external deps (source)

9. Relational Databases

9.1. Database Terminology

  • Statement
    • A single command
    • e.g. SELECT, UPDATE, FROM, WHERE
  • Read / Query / Data Query Language (DQL)
    • A complete set of statements
    • Ends with a semicolon
    • e.g. SELECT * FROM fooTable;
  • Write / Update / Data Modification Language (DML)
    • A complete set of statements
    • Ends with a semicolon
    • e.g. UPDATE fooTable SET colName = x;
  • Read Result Set
    • Data returned from a query
  • Update Acknowledgement
    • Confirmation returned from a query
    • e.g. x rows inserted
  • Transaction
    • A group of queries executed as a single unit
    • e.g. BEGIN / START TRANSACTION -> COMMIT / ROLLBACK
  • Session
    • A client's connection to the DB
  • Database Object
    • Anything defined in a DB
    • e.g. Tables, Views, Indices, Stored Procedures, Triggers, Functions
  • Schema
    • Logical grouping of DB objects
  • Execution Plan
    • The strategy the DB optimiser chooses to run your query
    • e.g. index scan vs full scan, hash join

9.2. Database Data Persistence

Data in Tables (Persistent)

  1. Base/Regular Table
    • Data stored in disk
    • Data is persistent across sessions
  2. Temporary Table
    • Data stored in disk
    • Data exists only in session
    • Data can exist across sessions if cached

Data in Queries (In Memory)

  1. Result Set
    • Data stored in memory
    • Data exists onl
  2. Derived / Subquery e.g. FROM
    • Data stored in memory
    • Data exists only in query
  3. Common Table Expression (CTEs) e.g. WITH
    • Same as subquery, but provides syntactic alias for reusing subqueries

Named Queries

  1. View/Virtual
    • Query definition stored in disk
    • Data only stored
  2. Materialised View
    • Data stored in disk
    • Manual/scheduled refresh
  3. Stored Procedure
    • Data stored in disk
    • ???

9.3. Database Isolation Levels

| Isolation Level | Dirty Reads |

10. Mobile

10.1. Cold/Warm/Hot Starts on Mobile

  1. Cold Start
    • binary not in memory
    • e.g. launching app after killing it
  2. Warm Start
    • binary in memory, app process in background
    • e.g. when switching between apps
  3. Hot Start
    • binary in memory, app process in foreground
    • e.g. when locking and unlocking the screen momentarily, or switching between apps briefly
      • This occurs because the Android and iOS give apps a grace period (~2s) before backgrounding
    • App still has GPU and CPU priority

10.2. Splash Screen

Splash screens are only shown for cold start

PhaseNative iOSNative AndroidReact NativeFlutter
Process StartupOS launches app processSameSameSame
Show OS-level SplashLaunch splashSameSameSame
Runtime Init + Framework BoostrapInitializes iOS runtime + UIKit, sets up main run loop, prepares initial UIViewControllerInit Android Runtime + base Activity, inflates first layoutNative layer starts JS engine, loads JS bundle, sets up React tree & JS x native bridgeNative layer starts Flutter engine, loads Dart VM, initializes widget tree & Skia renderer
App InitSet up SDKs, DB, config etc.SameSameSame
Remove SplashOS removes splash once first UIViewController is readyOS removes splash once Activity content is readyNative splash removed after JS bundle + RN root view are mountedNative splash removed after Flutter engine renders first frame
First Frame RenderedFirst frame is renderedSameSameSame

11. Resources