How the internet works
What a packet actually does between your laptop and a Cloudflare datacenter. Usually four or five autonomous-system hops, sometimes ten when routing gets weird.
Networking curriculumAll fifteen stages. The complete arc — what HTTP is doing, how to pick a database, what makes distributed systems hard, and how to walk into a design interview and not freeze. Start here if you're not sure. Each topic links to a Semicolony deep dive, simulator, or handbook entry where one exists, and to a curated external resource where it doesn't. Follow the arc in order, or jump to wherever you're stuck.
Core plus the recommended layer. The optional stops stay hidden — they pay off after you've shipped a couple of production services.
What actually happens when you type a URL.
Most backend confusion comes from missing this layer. Before you can debug a 502 in production or pick sensibly between gRPC and REST, you need a real picture of what a packet does between your machine and the origin server. DNS, TCP, TLS, then HTTP, and the response coming back the same way.
What a packet actually does between your laptop and a Cloudflare datacenter. Usually four or five autonomous-system hops, sometimes ten when routing gets weird.
Networking curriculumNames become IP addresses through a chain of cached lookups. Runs before every other network thing on the request path, and gets blamed for half the outages it didn't cause.
How DNS worksv4 nearly out of space, v6 still half-deployed, CIDR notation everyone fumbles in interviews. The addressing scheme that decides where your packets are even allowed to go.
IP deep diveThe protocol every backend service speaks all day. Knowing methods, status codes, and headers cold saves real time when production breaks at midnight.
How HTTP worksMany concurrent streams over a single TCP connection. Mostly solves head-of-line blocking, except when TCP itself blocks all of them at once. Still default on most big sites.
Sim HTTP/2 streams simulatorSame idea as HTTP/2 but over UDP, with 0-RTT resumption and no TCP-level head-of-line blocking. Already serving most of YouTube and Facebook.
Sim HTTP/3 + QUIC simulatorThe handshake that gives you a shared secret, the certificate chain that proves who the server actually is, and the reason "encrypted" without identity verification would buy you almost nothing.
How HTTPS worksA cache and compute layer at hundreds of points of presence near your users. Usually the first thing in front of any modern origin, and the layer that turns 200ms into 20ms for repeat visitors.
How CDNs workWhere requests turn into syscalls.
Processes, threads, file descriptors, memory pages. When a service is misbehaving at two in the morning, the engineers who can fix it usually share the same superpower: they know what the kernel is actually doing underneath the runtime.
The kernel's basic unit of isolation. Each one gets its own address space, file descriptors, and a slot in the scheduler. The shape of every server you'll ever run.
Processes deep diveSame address space as the parent process, separate stacks. Cheaper to spin up than processes; the source of the hardest bugs you'll ever debug, since memory is shared by default.
Threads deep diveHow the kernel decides who runs next on a busy CPU. Linux's CFS optimises for fairness; alternative schedulers trade fairness for throughput or latency. The choice starts mattering once you hit core saturation.
Scheduling internalsVirtual memory, paging, the allocator. The difference between fixing a memory leak in an afternoon and chasing it through three rewrites is usually how well someone knows this layer.
Memory managementThree flavours of I/O and the kernel APIs underneath: epoll on Linux, kqueue on BSD, io_uring as the new shiny. Your runtime made one of these choices for you.
I/O internalsinodes, directories, journaling, fsync. The layer a database's claim to durability actually has to pass through. Read the Postgres fsync-gate story once and it'll stay with you.
File systemsHow two processes on the same machine actually talk to each other. The mechanics that sit underneath every RPC call your service has ever made.
IPC deep diveLocks, semaphores, atomics, RCU. Each one is a small mistake away from an eight-week debugging session.
SynchronizationThe boundary between your program and the kernel. Every read, write, epoll_wait crosses it. Strace will show you the whole conversation in real time.
System callsTCP buys you ordered, reliable bytes at the cost of three handshakes and head-of-line blocking. UDP just sends. Knowing when each fits, and what each costs, is the layer most network bugs happen at.
How TCP worksBerkeley sockets, signals, file descriptors, chmod. The shell-level OS toolkit.
SocketsHow code moves through time and across people.
Git, in practice. The basic commands are an afternoon. The mental model of the commit DAG, plus the difference between rebase, cherry-pick, and revert, is what you actually need the day a deploy breaks and you have to rewind a release branch with the team watching.
Commits, branches, merges, rebase. The mental model of the DAG.
How Git worksTrunk-based vs Git Flow vs GitHub Flow. The cultural choice that shapes your CI.
External trunkbaseddevelopment.comLinear history vs preserved-context. The team's aesthetic choice that's actually about reviewability.
External rebase vs merge (Atlassian)Three-way merges, rerere, the calm 30 seconds before reaching for git reset.
External Git rerere docsWhat good review looks like. Tone, scope, what to leave for a follow-up.
External code review guide (Google)A vocabulary every CI tool can read. Cheap to adopt, useful forever.
External Conventional CommitsDepth in one beats breadth in five.
Backend roles ask for fluency in one of these, not all five. Go is the pragmatic default. Rust if you want the safest concurrency story and can afford the steeper curve. Node when sharing code with the web tier matters; Java, Python, and C# still run most of the enterprise stack. Pick the one you can defend in an interview. The others can come later.
The pragmatic choice for backend services. Concurrency, tooling, deploys: all simple.
Go curriculumNode services, the event loop, async/await. The lingua franca of the web tier.
JavaScript curriculumOwnership, borrowing, fearless concurrency. Steepest curve, biggest payoff for systems work.
Rust curriculumGlue language of the industry. Data, scripts, services with FastAPI or Django.
External Python tutorial: officialThe most mature backend runtime alive. Spring still runs half the enterprise.
External Java: Oracle tutorialsHow to make code that doesn't rot.
The shared vocabulary every code review runs on. SOLID isn't about reciting five letters. It's about hearing "this violates Open-Closed" in a pull request and seeing what the reviewer means without breaking stride. Same with KISS, YAGNI, DRY, the twelve-factor app, and the patterns book on every senior shelf.
Single responsibility, Open-closed, Liskov, Interface segregation, Dependency inversion.
External SOLID (Wikipedia)Three short slogans that prevent more bad code than every senior review combined.
External KISS (Wikipedia)Factory, Strategy, Observer, Adapter. The pattern catalog every senior engineer carries.
External patterns (Refactoring Guru)Small, behaviour-preserving changes that make the next change easy.
External refactoring catalog (Martin Fowler)Push the framework to the edges. Test the core without standing up a database.
External clean architecture (Uncle Bob)Bounded contexts, ubiquitous language, aggregates. The vocabulary for any non-trivial domain.
External DDD reference (Evans, free PDF)The cleanest statement of "what makes a service deployable." Still load-bearing in 2026.
External 12factor.netLayered, event-driven, microkernel, CQRS. The catalog above the design-pattern layer.
External Mark Richards: architecture styles (free O'Reilly)How services talk to each other.
REST is the default; every modern stack also has at least one of gRPC, GraphQL, or WebSockets in it. Each one's shaped for a different problem. Knowing what each costs in latency, bytes on the wire, and operational surface, plus when to reach for it, is the difference between an architecture that ages well and one that bends under its own weight at year three.
Resources, verbs, status codes. The default for public APIs.
REST deep diveThe default wire format. Know the spec, the gotchas, the alternative encodings.
JSON deep diveSchema-first, HTTP/2 transport, streaming. The default for internal service-to-service.
gRPC deep diveSchema language for gRPC. Binary, fast, evolvable. The alternative to JSON for internal traffic.
Protobuf deep diveOne endpoint, query language. Perfect for client-driven over-fetch problems.
GraphQL deep divePersistent bidirectional connections. For chat, presence, anything pushed from server.
How WebSockets workOne-way streaming over HTTP. Often the right answer when WebSockets are overkill.
SSE deep diveSchema definition for REST APIs. Generates clients, mocks, docs. Adopt it on day one.
External OpenAPI InitiativePath vs header vs date. The choice that bites once your API has external consumers.
API versioningWhen you need the server to call you back. Stripe, GitHub, Slack all built on this.
WebhooksAPI keys, OAuth tokens, mTLS, signed requests. The first thing every API gateway terminates.
Auth in API designIdempotency keys, exactly-once-from-the-client. Stripe-style retry safety.
Idempotence in distributed systemsPagination, filtering, error envelopes, rate-limit headers. The patterns that age well.
Best practicesWhere state lives, and what makes it hard.
The choice that's hardest to undo. Know the spectrum from a single Postgres on RDS to a sharded distributed store with secondary indexes. More importantly, know the signals that tell you it's time to move up that spectrum. Migrate too early and you've added complexity for nothing. Migrate too late and you're looking at a six-month outage backlog.
Schemas, keys, joins, normalisation. The model that runs most of the internet, mostly on Postgres and MySQL, and probably still will in ten years.
Databases curriculumINNER, LEFT, RIGHT, FULL, CROSS. The difference between them is the difference between answering a question correctly and answering a different question that sounded the same.
Sim SQL JOIN simulatorB-trees, covering indexes, partial indexes. EXPLAIN ANALYZE is the most under-used command in the toolbox; the gap between engineers who run it and engineers who don't is usually a factor of ten in production query times.
Database indexingAtomicity, consistency, isolation, durability. None of them are free; each one costs latency or throughput. Knowing what each costs is what makes the trade-off conversations productive.
Sim ACID simulatorRead-uncommitted through serializable. Each level prevents a specific class of anomaly and admits another. Most databases default to a level that's weaker than you probably want.
Sim Isolation levels simulatorHow Postgres lets readers and writers not block each other. Every senior database interview probes it, and most engineers can describe the mechanism without grasping what it costs at vacuum time.
MVCC deep diveWrite the change to a log, fsync, then update the table. The single mechanism every durable database has agreed on, more or less unchanged since the eighties.
WAL deep diveB-tree wins reads; LSM wins writes. The layer beneath your SQL plan, and the choice that decides whether your database is fast at the workload you actually have.
Sim Storage engine simulatorWhere the OS, the DB, and your hot data argue about who owns memory.
Page cache deep diveEXPLAIN, EXPLAIN ANALYZE. Cost-based vs rule-based. Why your index is being ignored.
Query plannerFour shapes: key-value, document, wide-column, graph. Each fits one access pattern very well and the others badly. The rule for picking is access pattern first, feature checklist last.
NoSQL databasesSpanner, CockroachDB, YugabyteDB, TiDB. ACID transactions across many nodes, paid for with higher write latency than a single-box Postgres can give you. Usually worth it once you can't fit on one box.
Distributed SQLInverted indexes, TF-IDF, BM25. Elasticsearch, OpenSearch, Solr, Meilisearch, Typesense.
External Elasticsearch: guideFrom a hashmap to a global edge fabric.
Every fast system caches at four or five layers. Picking which layer to cache at, deciding what your TTL actually means, and recovering when a popular key expires and fifty thousand requests hit the origin in the same second: those are the operational skills that turn a snappy product into one that survives a launch.
Cache-aside, read-through, write-through, write-behind. Pick a pattern; defend it.
Caching strategiesLRU, LFU, ARC, TinyLFU, W-TinyLFU. Each one has a workload it wins on.
Sim Cache eviction simulatorThe de-facto in-memory store. Strings, sets, sorted sets, streams. And why "single-threaded" is fine.
How Redis worksThe cache at the edge. PoPs, cache headers, invalidation lag.
How CDNs workCache-Control, ETag, Vary. The contract between origin and every cache in front of it.
External HTTP caching (MDN)Coalescing, jittered TTLs, negative caches. The two failure modes behind most incidents.
Advanced cachingThe threats most backend incidents come from.
You won't become a security expert from this page. Security is its own seven-year apprenticeship. The goal here is more modest: know enough of OWASP, OAuth, CORS, JWT, and the basics of crypto to not be the engineer who shipped the bug that landed the company in the press.
The canonical list of web vulnerabilities, refreshed every few years.
External OWASP Top 10: officialbcrypt, scrypt, Argon2. Never roll your own. Salt, iteration cost, the lot.
External password storage cheat sheet (OWASP)AES, RSA, ECC. What you encrypt with vs what you sign with.
External what is encryption (Cloudflare)Cipher suites, certificate validation, mutual TLS, TLS 1.3.
TLS deep diveThe framework every "login with X" rides on. The flows, the tokens, the traps.
How OAuth worksIdentity layer on top of OAuth. ID tokens, userinfo, the standard "sign in with Google" flow.
How OIDC worksStateless tokens with claims. Useful, footgunny. The lifecycle is the part to know.
Sim JWT lifecycle simulatorWhy your fetch() sometimes 403s and sometimes works.
Sim CORS preflight simulatorThe three classic web vulns. Each has a one-page mitigation that most teams skip.
External CSRF cheat sheet (OWASP)CSP, HSTS, X-Frame-Options, Permissions-Policy. The free defence-in-depth layer.
External security headers (MDN)Mutual TLS. Both client and server prove identity. The internal-service standard.
External what is mTLS (Cloudflare)Token bucket, leaky bucket, fixed/sliding window. The first thing in front of a public API.
Sim Rate limiter simulatorThe tests that actually catch bugs.
The test framework matters less than the habits. Two rules go a long way: write the test before the fix, and weight integration tests heavier than unit tests. Unit tests catch the bugs you imagined; integration tests catch the bugs your customers actually find.
Unit > integration > E2E. The shape that catches the most bugs per second of CI time.
External Vocke (Practical Test Pyramid)Test one thing. Run them in milliseconds. The bedrock.
Testing in Go (example)Real database, real network, real environment. The tests that catch the real bugs.
External integration test (Martin Fowler)Slowest, flakiest, most expensive. Necessary for the critical paths.
External official docs (Playwright)Stubs, spies, mocks, fakes. What each one is for and when each is wrong.
External test doubles (Martin Fowler)Generate the test cases. Catch bugs your hand-written tests never thought of.
External Hypothesis (Python)Pact, Spring Cloud Contract. Catch API breakages before integration tests run.
External official docs (Pact)k6, Vegeta, Locust. Find the cliff before production does.
Load testing deep diveThe shape every production deploy ends up in.
Docker is universal; Kubernetes is what most production stacks run on. You can be productive with both at a surface level in a few weeks. Knowing what's happening underneath (namespaces, cgroups, the kubelet, the scheduler) is what saves you the day a pod won't start and the logs aren't helpful.
Namespaces, cgroups, layered filesystems. Why Docker is "not a VM."
How containers workLayer caching, multi-stage builds, distroless. The 10-line file that determines a 300 MB image.
External Docker: Dockerfile best practicesDocker Hub, GHCR, ECR, Artifact Registry. Anonymous pull rate limits are the most common surprise.
External Docker Hub: docsPods, deployments, services, ingress. The minimum to be productive.
Kubernetes curriculumAPI server, etcd, controllers, scheduler, kubelet. The five pieces that keep your declared state real.
ControllersServices, NetworkPolicy, Ingress, Gateway API. CNI plugins do the heavy lifting.
K8s networkingTemplating vs overlays. Two ways to keep YAML from becoming a 12-thousand-line copy-paste.
External Helm: officialGitHub Actions, GitLab CI, ArgoCD, Flux. The pipeline from commit to production.
External GitHub Actions: docsTerraform, Pulumi, OpenTofu. The cluster you can recreate from a file.
External Terraform: docsWhen one box stops being enough.
The handful of techniques that turn "works on my laptop" into "works at a million requests per second." Load balancing, sharding, replication, autoscaling, edge. Each one solves a real problem and adds three new ones. Knowing the trade-offs is what separates engineers who scale systems from engineers who just add more boxes.
Vertical vs horizontal. When to add a box vs a bigger box.
Scaling outL4 vs L7, round-robin vs least-connections, sticky sessions, health checks.
How load balancing worksNginx, Apache, Caddy. The 30-year-old layer that still serves 80% of traffic.
External Nginx: official docsThe traffic-shaping layer in front of your services.
API gatewaySplitting a database when one node stops fitting. Hardest to undo.
Sharding deep diveRead replicas, primary-replica, multi-leader. Four flavours, four cost profiles.
Replication deep diveReactive vs predictive, scale-out vs scale-in, cold starts.
How autoscaling worksHow services find each other in a dynamic fleet. DNS-based, registry-based, mesh-driven.
Service discoveryLittle's Law, queueing primer, back-of-envelope. Turn a request rate into a count of cores.
Capacity planningWhen your single service becomes a cluster.
The point where backend engineering stops being mostly about correctness and starts being about correctness under failure. Consensus, replication, ordering, idempotence. Every senior loop reaches for at least three of these. The return on investment for time spent here is enormous.
When the network partitions, you have to pick between consistency and availability. There is no third option. PACELC extends the trade-off to the no-partition case, where the choice is latency vs consistency.
CAP & PACELC deep diveHow a cluster of replicas agrees on the next entry in a shared log. Paxos was the original; Raft made the same idea legible enough you can implement it from the paper in a weekend.
Consensus deep diveThe R + W > N rule. Why a write to two of three replicas is enough to guarantee a later read sees it. Shows up in Dynamo, Cassandra, etcd, and roughly every distributed store ever built.
Sim Quorum simulatorTwo-phase commit gives you ACID across services, at the price of blocking when the coordinator fails. Sagas give up the ACID and substitute compensating transactions. Most modern systems use sagas.
2PC & sagasThe property that makes a retry safe to send a second time. Every reliable API enforces it through idempotency keys; Stripe's design doc on this is the canonical reference.
IdempotenceWhy "now" is hard across machines. Lamport gave us logical clocks; vector clocks extend that to detect concurrent events; Google's TrueTime sidesteps the problem with atomic clocks and GPS in every datacenter.
Time & clocksEpidemic-style information spread. The substrate of Cassandra, Consul, SWIM.
Gossip protocolsHow a cluster figures out who's alive. Phi-accrual, the eventually-strong-S result.
Failure detectorsData types that converge regardless of order. Real-time collab without coordination.
Paper CRDT paper: Shapiro et al.The control-loop discipline that prevents one slow consumer from taking down the system.
Backpressure & retriesKafka for the durable log shape, RabbitMQ for routing patterns, SQS for managed simplicity, NATS for low latency, Pulsar for multi-tenancy. The async layer behind nearly every reliable service.
How message queues workSaga, outbox, CQRS, event sourcing, service mesh. The shape of every modern stack.
External microservices.io: patternsEvents, queues, idempotence, at-least-once. The four delivery semantics and what each costs.
Async architectureOpenTelemetry, Jaeger, Zipkin. Following a request across services.
External OpenTelemetry: officialKnowing what your system is doing, in production.
Three signals (logs, metrics, traces) plus two methodologies (USE for resource saturation, RED for request health). Together they cover most of what an on-call rotation actually needs. The harder skill isn't the tooling; it's instrumenting your service ahead of the incident so the data is already there when you go looking.
Logs (what happened), metrics (how much), traces (how it flowed).
External Achieving Observability (free book) (Honeycomb)Rate, Errors, Duration. The three numbers every request-driven service tracks.
RED method deep diveUtilisation, Saturation, Errors. The resource-side complement to RED.
USE method deep diveThe numerical contract with your users, and with yourself.
External Google SRE book: SLOsThe de-facto OSS metrics stack. PromQL, exporters, recording rules.
External Prometheus: officialThe vendor-neutral standard for traces, metrics, and logs. Auto-instrumentation.
External OpenTelemetry: officialELK, Loki, OpenSearch. Centralised logs are the cheapest debugging upgrade you can buy.
External Elastic Stack: guideDatadog, New Relic, Sentry. The hosted layer that often saves you a Friday night.
External Sentry: official docspprof, perf, flame graphs. Finding the hot path is half the work.
Profiling deep divep50 vs p99, the math that turns "make it faster" into a concrete number.
Latency budgetsThe interview, and the day job.
Reading about components and designing with them are different skills. You build the second one the same way you'd learn chess: work through the canonical problems out loud: chat, feed, URL shortener, object storage. Defend every choice. The book to read first is <em>Designing Data-Intensive Applications</em>.
Six steps in order: scope, estimate, API, data, high-level design, deepen. The repeatable forty-five-minute pass you can do in your sleep after enough practice. That fluency is the goal before the interview, not during.
Design frameworkLittle's Law, queueing primer, back-of-envelope. Turn a request rate into a count of cores.
Capacity planningDesigning a Discord-shape system. WebSocket fleet for connections, message store for history, presence, delivery semantics, group fan-out. Every interesting distributed-systems trade-off shows up at least once.
Playbook Chat playbookFan-out on write is fast to read but expensive to write; fan-out on read is the opposite; the celebrity problem breaks the naive version of either. Most real systems use the hybrid Twitter pioneered around 2013.
Playbook News feed playbookThe Hello World of distributed systems. Base62 keys, collision resistance, a CDN in front, the redirect path under ten milliseconds. Looks simple; gets interesting once the traffic is real.
Playbook URL shortener playbookWhat it takes to deliver eleven nines of durability at exabyte scale. Erasure coding instead of replication, metadata and data planes split, multipart upload for big objects, repair scanners running constantly in the background.
Playbook Object storage playbookKV store, rate limiter, notifications, web crawler, typeahead, top-K, distributed scheduler, ride matching, pastebin, event ingestion.
Playbook Playbook hubThe dozen or so papers every senior engineer should have read once. Dynamo, Spanner, MapReduce, Bigtable, GFS, Lamport on time and clocks. Each introduced an idea that became the production default a decade later.
Paper Annotated papersThe case studies that drove every architectural pattern. Free.
External case studies (High Scalability)If you buy one technical book this year, make it Designing Data-Intensive Applications. Kleppmann distilled twenty years of distributed-systems research into something a working engineer can read on a long-haul flight.
External DDIA (Martin Kleppmann)Reading is not the same as defending. Pair up; rotate; speak aloud.
External Pramp: free mock interviews19 system-design walkthroughs: chat, feed, URL shortener, Twitter, Instagram, Netflix, object storage, rate limiter, more.
Open the playbook49 interactive simulators (Raft, CAP, sharding, caching, sorting, container layers), all in the browser.
Browse simulatorsTwelve decision rules: when to shard, when to introduce a queue, how to estimate cost, what to cache.
Read the handbookTime-boxed practice rounds: a 45-minute system-design simulator and a hundred concept flashcards across six categories. The endpoint of the roadmap.
Start a roundA concept index. Pick a topic, see every page that covers it. Useful when you want to drill on one concept across guides, simulators, and papers.
Browse topics