context deadline exceeded
A context’s timer ran out before the operation finished. The message tells you a budget was blown, not where — the work is finding which hop spent the time, and whether the server ever even saw the request.
The symptom
A Go call returns context.DeadlineExceeded, or a gRPC call fails with code DeadlineExceeded. The same expiry also shows up in disguise downstream — a database cancelling your statement is often your own deadline arriving there first.
Get "http://inventory.svc:8080/v1/stock": context deadline exceeded
rpc error: code = DeadlineExceeded desc = context deadline exceeded
pq: canceling statement due to user request
← that last one is Postgres’s view of YOUR context expiring: the driver
cancelled the statement when ctx firedThe diagnosis
1 Find whose deadline actually fired
// contexts inherit: the EARLIEST deadline in the chain wins.
// log the remaining budget where work starts:
if d, ok := ctx.Deadline(); ok {
log.Printf("budget remaining: %s", time.Until(d))
}
// budget remaining: 38ms ← caller handed you almost nothing:
// the time was spent upstream before you ran Your own WithTimeout(2s) is meaningless under a parent that expires in 100ms. Walk the chain: every context.WithTimeout / WithDeadline between the entrypoint and the failing call, including middleware and client defaults. Logging time-until-deadline at each hop turns "something timed out" into "hop 2 consumed 90% of the budget" — which is a finding, not a mystery.
2 Split connection time from response time
$ curl -w "dns %{time_namelookup} connect %{time_connect} tls %{time_appconnect} ttfb %{time_starttransfer} total %{time_total}\n" \
-o /dev/null -s http://inventory.svc:8080/v1/stock
dns 0.412 connect 0.413 tls 0.000 ttfb 2.318 total 2.319
← 412ms in DNS before a byte moved, then ~1.9s waiting on the server:
two separate problems, both hiding under one "deadline exceeded" Time in namelookup: the resolver — walk it with dig. Time in connect: network path or a full accept queue on the server. Time in ttfb: the server got the request and is slow producing the answer. In Go, net/http/httptrace gives the same split in-process. The point of the split: "raise the timeout" is only the right fix for exactly one of these.
3 Check what the server saw
# client said: deadline exceeded after 2s # server access log for the same request id: 10.0.3.41 "GET /v1/stock" 200 31ms ← server was fast and fine # …or: 10.0.3.41 "GET /v1/stock" 200 4920ms ← server finished AFTER the client left
Three stories. Server never logged it: the request never arrived — connection-level problem, look at step 2’s connect phase. Server answered fast: the budget was consumed before the request reached it (often DNS, dialing, or a queue in a proxy). Server finished after the client gave up: the server burned real resources for an abandoned caller — make sure the context is propagated into everything (QueryContext, not Query) so server work cancels when the client leaves.
4 gRPC: confirm the channel ever became ready
rpc error: code = DeadlineExceeded desc = context deadline exceeded # same code whether the server was slow OR the connection never existed. $ GRPC_GO_LOG_VERBOSITY_LEVEL=2 GRPC_GO_LOG_SEVERITY_LEVEL=info ./client 2>&1 | grep -i "channel\|resolver" INFO: [core] Channel switches to new LB policy "pick_first" INFO: [core] Subchannel Connectivity change to CONNECTING INFO: [core] Subchannel Connectivity change to TRANSIENT_FAILURE ← never connected
A gRPC DEADLINE_EXCEEDED with the channel stuck in CONNECTING/TRANSIENT_FAILURE means name resolution, TLS, or reachability — the server’s speed is unmeasured because no request ever left. Note that gRPC propagates your deadline to the server in the grpc-timeout header, so a well-behaved server stops working the moment the budget is gone.
The causes, ranked
- 1 A downstream dependency is genuinely slow
confirm Traces or server-side timings show the time going into one call — a query, a cold cache, a third-party API.
- 2 Budget mismatch across hops
confirm Logging time.Until(deadline) at each hop shows requests arriving with single-digit milliseconds left.
- 3 It never connected: DNS, dial, or TLS stall
confirm curl -w / httptrace shows the time in namelookup or connect; gRPC channel logs show TRANSIENT_FAILURE.
- 4 Server overload — the time is spent queueing, not working
confirm Server handler times look healthy but client-observed latency is far higher; listener accept-queue overflows climb (nstat TcpExtListenOverflows).
The fixes
Fix that dependency (index the query, warm the cache) or deliberately re-budget: raise this hop’s share and lower another’s. Raising the top-level timeout without re-budgeting just moves where users wait.
Practice deadline budgeting: the entrypoint sets the total, each hop gets an explicit fraction, and retries must fit inside the parent budget (n retries of a t-timeout call need roughly n×t plus backoff — if that exceeds the parent, the last retry was always doomed).
Give dialing its own, shorter timeout (a 5s total budget should not allow a 5s dial), fix the resolver, and keep connections warm with pools/keepalives so the dial cost is paid rarely.
Add concurrency limits and load shedding so the server fails fast instead of slowly; an early explicit rejection is cheaper for everyone than a timeout. Then add capacity if the shed rate says so.
What people get wrong
- The client timing out doesn’t stop the server. Unless the context is threaded through every layer — HTTP request, database driver, downstream RPC — the server keeps computing for a client that already left. Under load this is a death spiral: timeouts breed retries, retries breed more abandoned work. Propagate ctx into everything that accepts one.
- DeadlineExceeded and Canceled are different errors. context.DeadlineExceeded means the timer fired; context.Canceled means someone called cancel() — frequently because the caller’s own deadline fired upstream and cancellation cascaded down. Distinguish with errors.Is, never by string matching, and log which one you got: they point at different layers.
- Retrying a timeout inside the same budget is theatre. A retry inherits the same parent context. If the first attempt consumed the budget, the retry times out instantly — visible in logs as pairs of failures milliseconds apart. Retries need their own sub-budgets, and ideally hedging policies decided per-route, not a blanket wrapper.
Quick answers
Does DEADLINE_EXCEEDED mean the server is slow?
Not necessarily. The same error covers a slow server, a connection that never got established (DNS, TLS, reachability), time consumed upstream before the request was sent, and queueing in front of the handler. Split connect time from response time (curl -w or httptrace) before concluding anything about the server.
How do I find which timeout actually fired?
The earliest deadline in the context chain wins, so inventory every WithTimeout/WithDeadline from the entrypoint down, including middleware and client library defaults. Logging time.Until(deadline) at each hop shows exactly where the budget went — requests arriving with a few milliseconds left name the culprit hop.
Does the server stop working when the client times out?
Only if cancellation propagates. gRPC sends the deadline to the server (grpc-timeout header) and cancels server-side contexts; plain HTTP servers see the connection close only if they check. Anything not given the context — a Query instead of QueryContext — runs to completion for nobody.