Multi-page · for backend engineers
AI systems

The AI stack, from the engineer's seat.

Not how to train a model. How the model you call runs in production: how a prompt becomes tokens, how embeddings turn meaning into coordinates, why serving is a memory problem, and how retrieval and agents bolt real systems onto a next-token predictor. Same level the rest of the codex works at — what is the system actually doing, and where do the costs hide.

Three sub-pages are live, with two more in flight. Each links to its plain-English ELI5 front door and the matching simulator where one exists.


Live deep dives

Start here.

Planned deep dives

Two more, in flight.

The retrieval and agent layers — the two patterns most teams are actually shipping in 2026. In the order they make sense to learn:

  1. 04
    Retrieval-augmented generation
    Give the model an open-book exam. The ingestion and retrieval halves, chunking trade-offs, hybrid search, re-ranking, and how to tell whether a wrong answer came from retrieval or from generation.
    chunking · hybrid search · re-ranking · evaluation · hallucination
  2. 05
    Agents & tool use
    What turns a chat model into something that takes actions. Tool calling, the plan-act-observe loop, memory, MCP, and the guardrails that keep an autonomous loop from doing real damage.
    tool calling · ReAct · memory · MCP · guardrails