09 / 28

Linux / 09

dig

Half of "the network is broken" is DNS. The service is fine, the route is fine, and yet the client is connecting to an address that stopped being correct an hour ago, because some cache somewhere is still handing it out. Every name lookup is a question put to a specific resolver, and that resolver is allowed to answer from memory. So the debugging question is never just "what does this name resolve to" — it is what does DNS actually say, and which resolver said it? That is the question dig answers. This page covers the five usages worth knowing, decodes every section of the output, walks three production incidents, and ends with a drill that touches nothing.

The question it answers

Most tools resolve names as a side effect. curl resolves a name on the way to making a request, ping resolves one on the way to sending a packet, and when resolution misbehaves they report it badly or not at all. dig inverts this: the lookup is the entire event. It builds a DNS query, sends it to a resolver, and prints the full response, every section, every flag, every TTL, plus a footer telling you exactly which server answered and how long it took. Nothing is summarised away. When DNS is the suspect, that completeness is the point.

The reason DNS needs a dedicated interrogation tool is that there is no single "the DNS." A name lookup is a question put to one particular resolver, and different resolvers can legitimately give different answers at the same moment. Your laptop asks the resolver in /etc/resolv.conf, which is usually a cache. That cache asked an upstream recursive resolver, which is also a cache. The recursive resolver asked the authoritative servers for the zone, which hold the actual records. Change a record at the authority and the truth ripples outward at the speed of cache expiry, not at the speed of light. Every "DNS propagation" mystery, every "it works on my machine but not in the pod" lookup bug, every stale-IP incident comes down to two questions: which resolver did this client ask, and what is that resolver currently holding? dig lets you put the same question to any resolver you like and compare the answers.

It helps to know what dig is not. It is not what your application does. An app calling getaddrinfo() goes through the C library's name machinery, which consults /etc/nsswitch.conf, /etc/hosts, and possibly systemd-resolved before any DNS packet exists. dig skips all of that and speaks raw DNS to a server. Most of the time the two agree; the times they do not are a pitfall with its own section below. It is also not a packet capture: it shows you the response it received, not what crossed the wire to get it. When you need to see the actual UDP packets, queries that never get answered, or a middlebox rewriting responses, that is a job for tcpdump with port 53.

The five usages that matter

dig takes dozens of options and you will use about five shapes of invocation for nearly everything. Each one varies a different part of the question: which name, which record type, and most importantly, which server gets asked.

Invocation	What it asks	When you reach for it
`dig example.com`	The A record, from your default resolver	The baseline: what does this machine's resolver say right now
`dig @8.8.8.8 example.com`	The same question, put to a specific server	Comparing resolvers, bypassing a suspect local cache, asking the authority directly
`dig +trace example.com`	The full delegation walk: root, then TLD, then authoritative	Delegation bugs, "is the registrar pointing at the right nameservers"
`dig example.com MX`	A different record type: AAAA, CNAME, MX, TXT, NS, SOA…	Mail routing, alias chains, ownership verification, zone metadata
`dig +short` / `dig -x 93.184.215.14`	Just the answer / the reverse lookup for an IP	Scripting and quick checks; naming an address found in logs or ss output

The bare form deserves one unpacking, because every part of dig example.com is a default you should be able to name. No record type means type A, the IPv4 address. No class means IN, internet, which is the only class anyone uses. And no server means dig reads /etc/resolv.conf, takes the first nameserver line, and sends the query there. That last default is the one that bites: the answer you get is whatever that one resolver currently believes, cache and all. The bare form tells you what this machine sees. It does not tell you what the rest of the world sees.

That is what @ is for. dig @8.8.8.8 example.com sends the identical question to Google's public resolver instead, ignoring resolv.conf entirely. dig @1.1.1.1 asks Cloudflare's. And, the move that settles most arguments, dig @ns1.your-dns-host.net asks one of the zone's own authoritative servers, the machines that hold the records rather than cache them. Authoritative servers do not guess and do not cache other people's data; what they return is the record as published. The triangle of local resolver, public resolver, and authority is the whole diagnostic method, and the first production scenario below is nothing but that triangle.

+trace changes the mode entirely. Instead of asking a recursive resolver to do the work, dig does the recursion itself in front of you: it asks a root server, which replies "the .com servers are over there," asks a .com server, which replies "the example.com servers are over there," and asks one of those, which finally returns the record. You see every referral, which makes it the tool for delegation problems: NS records that point at the wrong host, a registrar update that never took, a child zone nobody delegated. Note what it deliberately is not: a view of any cache. More on that in the pitfalls.

Record types are the third axis. A and AAAA are the IPv4 and IPv6 addresses. CNAME says "this name is an alias for that name," and chains of them are how CDNs are usually wired in. MX lists the mail exchangers for a domain, with preference numbers. TXT holds free-form text, which in practice means SPF and DKIM policies and the verification strings every SaaS vendor asks you to publish. NS lists the zone's authoritative nameservers. SOA is the zone's metadata record: serial number, refresh timers, and the field that controls negative caching. Asking for ANY used to be the lazy way to see everything; most servers now refuse it or return a minimal answer, so ask for what you want by name.

Finally the two conveniences. +short strips the response to bare answers, one per line, which is what you want in scripts and one-glance checks; everything this page says about reading the full output is the argument for not using it while debugging. -x does a reverse lookup, turning an IP back into a name by querying the PTR record under in-addr.arpa. Reverse zones are maintained by whoever owns the address block, not by whoever owns the forward name, so a missing or mismatched PTR is common and usually harmless, except to mail servers, which take PTR records personally.

Reading the output

Here is a complete, unabridged response. dig's output looks noisy the first dozen times, but it has exactly five parts, and each one answers a different question. Learn the parts once and the noise becomes a report.

$ dig shop.example.com

; <<>> DiG 9.18.24 <<>> shop.example.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23198
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;shop.example.com.		IN	A

;; ANSWER SECTION:
shop.example.com.	287	IN	A	198.51.100.7

;; Query time: 2 msec
;; SERVER: 192.168.1.1#53(192.168.1.1) (UDP)
;; WHEN: Mon Jun 08 10:14:02 IST 2026
;; MSG SIZE  rcvd: 61

The header. The line starting ->>HEADER<<- is the verdict. status: is the response code, and three values cover almost everything you will meet. NOERROR means the query was answered without complaint; note that it does not guarantee an answer record exists. A name can be real but have no record of the type you asked for, in which case you get NOERROR with ANSWER: 0, a combination worth recognising on sight. NXDOMAIN means the name does not exist at all, no records of any type, says the authority. SERVFAIL means the resolver tried and could not get a usable answer: the authoritative servers were unreachable, or DNSSEC validation failed, or recursion broke somewhere. The blame is different in each case. NXDOMAIN points at the zone's contents; SERVFAIL points at the resolution path; REFUSED, the fourth one you will occasionally see, means the server declined to serve you at all, typically because you asked a server that is not configured to answer for that zone or for your address.

The flags. Single letters, each one a fact about the conversation. qr just marks this as a response. rd means recursion desired: the client asked the server to chase the answer down for it. ra means recursion available: this server is willing to do that, which tells you that you are talking to a recursive resolver rather than a bare authority. The one to watch for is aa, authoritative answer, which appears only when the responding server actually owns the zone. Its presence or absence answers "am I hearing the source or an echo?" An answer with aa is the record as published. An answer without it came from a cache. When the flags: line shows rd but no ra and you got a refusal or an empty answer, you asked an authoritative server to recurse, and it correctly declined.

The question section plays the question back at you: name, class, type. It is worth an actual glance, because it shows the name after any munging. If a search domain got appended, this is where shop silently became shop.internal.corp.example.com, and the mystery is solved before you reach the answer.

The answer section is the payload, one line per record: the name, a number, the class, the type, and the data. The number is the TTL, in seconds, and it is the most operationally important column on the page. The TTL is a countdown: it is how much longer the resolver that answered you may keep serving this record from cache before it must re-fetch from upstream. Ask an authoritative server and you see the zone's configured TTL, full and constant. Ask a cache and you see the remaining time, ticking down with every repeated query until it hits zero, the cache refreshes, and the number snaps back up. This single observation explains "DNS propagation," a phrase that suggests pushing when the mechanism is purely expiry. Nothing is propagated anywhere. Old answers age out of caches, each on its own schedule, and the longest configured TTL is your worst-case wait.

The authority and additional sections appear when the server has something to add. In a referral, the authority section lists the NS records to go ask next, and the additional section helpfully includes their addresses; this is what +trace output is mostly made of. In a negative answer, the authority section carries the zone's SOA record, which is not decoration: its last field sets how long resolvers may cache the non-existence of the name. That is negative caching, and it has its own scenario below.

Read the SERVER line first. The stats footer tells you who actually answered (SERVER:), how long it took (Query time:), and over which transport. Every claim in the rest of the output is a claim made by that server. 127.0.0.53 means systemd-resolved's local stub answered you, not the network's resolver. A query time of 0–3 msec usually means a nearby cache; tens of milliseconds means somebody actually went and asked. An answer is only as good as the resolver that gave it, and this line names the resolver.

Three production scenarios

"Propagation" is just caches expiring

You moved shop.example.com to a new load balancer and updated the A record an hour ago. Monitoring says some traffic is still arriving at the old IP. The team chat says "DNS is still propagating," which is a description, not a diagnosis. The diagnosis is three queries:

$ dig +short @ns1.dns-host.net shop.example.com
203.0.113.50                          # the authority: the new IP is published
$ dig +short @8.8.8.8 shop.example.com
198.51.100.7                          # Google's cache: still the old IP
$ dig @8.8.8.8 shop.example.com | grep -A1 'ANSWER'
;; ANSWER SECTION:
shop.example.com.    1142    IN    A    198.51.100.7

Now you know everything. The authority is serving the new address, so the change took; whatever the registrar UI claimed, the zone is correct. Google's resolver is still holding the old record and will hold it for another 1142 seconds, because the old record carried roughly a one-hour TTL when Google cached it. There is no force-refresh to send and no support ticket to file. The stale answers will be gone, everywhere, within one old-TTL of the change. The lesson lands earlier in the process: before a planned migration, lower the record's TTL to 60 seconds, wait out the old TTL so every cache has picked up the short one, make the change, then restore the long TTL. Caches drain in a minute instead of an hour, and "propagation" stops being weather.

One sharper edge: if the query had returned NXDOMAIN rather than a stale address — say someone fat-fingered the record name, queries failed for ten minutes, and then the fix went in — the non-existence itself gets cached, for the duration set in the SOA record's last field. Users keep failing after the fix, which reads as madness until you remember negative caching exists. dig shows you the countdown on that too, in the authority section of the negative answer.

A CNAME chain gone wrong

The website is down, but only sort of. www.example.com fails to resolve, yet the DNS console shows the record sitting right there. The record is a CNAME, and the console only shows your half of the story:

$ dig www.example.com
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 4480
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1

;; ANSWER SECTION:
www.example.com.   300   IN   CNAME   web-prod-7.old-cdn-provider.net.

;; AUTHORITY SECTION:
old-cdn-provider.net.  900  IN  SOA  ns1.old-cdn-provider.net. ...

Look at that header and answer together: NXDOMAIN, and yet ANSWER: 1. That combination is the signature of a broken alias chain. The resolver found your CNAME just fine, followed it to web-prod-7.old-cdn-provider.net, and that name does not exist; the CDN deprovisioned the endpoint when the contract lapsed, and your record now points into the void. The overall status reflects the end of the chain, not the start, which is why the console looked healthy and the site did not. Resolve each hop yourself (dig web-prod-7.old-cdn-provider.net) when the chain has several links, and remember that every link is a separate record with a separate TTL, owned by a separate party, cached on a separate schedule. Long CNAME chains are a way of distributing your availability across organisations that have never heard of you.

The intermittent five-second delay

Every so often, a request that normally takes 40 ms takes 5.04 seconds. Not always; perhaps a third of the time, which is the worst kind of often. A delay of almost exactly five seconds, intermittent, is a DNS smell so distinctive you should be able to name it from across the room: one of the resolvers in resolv.conf is dead, and five seconds is the default per-nameserver timeout the C library waits before trying the next one.

$ cat /etc/resolv.conf
nameserver 10.0.0.2
nameserver 10.0.0.3
$ dig @10.0.0.2 api.example.com
;; communications error to 10.0.0.2#53: timed out
;; communications error to 10.0.0.2#53: timed out
;; communications error to 10.0.0.2#53: timed out
;; no servers could be reached
$ dig @10.0.0.3 api.example.com +short
203.0.113.50                          # the second resolver is fine

The mechanism: the stub resolver tries nameservers in listed order. When the first one is down, every fresh lookup burns the full timeout against the corpse before failing over to the healthy second entry, which answers instantly. Lookups that hit a warm local cache skip the dance entirely, which is where the intermittency comes from; only cache misses pay the toll. The fix is whatever revives or removes the dead resolver, plus, if you control the image, options timeout:1 attempts:2 or options rotate in resolv.conf to shrink the blast radius next time. dig's contribution is the proof: it isolates each nameserver and tests it alone, something the application's resolver will never do for you. And note what the application reported during all this: nothing. The lookup eventually succeeded. Slow DNS hides inside "the service is slow" with no error to its name, which is why p99 latency mysteries should meet dig early.

What dig is actually talking to

The output makes more sense once the machinery behind it is laid out flat. Resolution involves four kinds of party. The stub resolver is the little client inside libc (or systemd-resolved) on every machine; it does no chasing of its own, it just forwards the question to a configured server and waits. The recursive resolver is that server: your VPC's resolver, the home router, 8.8.8.8. It does the real work of walking the hierarchy, and it caches ferociously. The root and TLD servers hold no answers for your name, only referrals: the root knows who runs .com, and the .com servers know who runs example.com. The authoritative servers hold the records themselves. A cold lookup touches all four; a warm one stops at the first cache that has the answer.

The resolution chain. Plain dig asks the recursive resolver and reports what it says, cache included. dig +trace performs steps 2 through 5 itself and never consults a cache.

Caches sit at more layers than the diagram has room for. The browser keeps one. The OS keeps one (systemd-resolved on most modern distros, listening on 127.0.0.53). The recursive resolver keeps the big one. Some applications and language runtimes keep their own on top, with their own ideas about expiry; the JVM's resolver cache is famous for outliving the records it holds. Each layer honours the TTL independently, which means a record change becomes visible at different moments to clients sitting behind different stacks of caches. When two machines disagree about a name, they are not disagreeing about DNS; they are sitting behind different sets of memories.

This is also why your laptop and a Kubernetes pod can resolve the same short name differently. The pod's /etc/resolv.conf is not yours: it points at the cluster's DNS service and carries search domains like namespace.svc.cluster.local, plus an ndots option that controls when those suffixes get tried. A lookup for db in the pod becomes db.payments.svc.cluster.local and finds a Service; the same lookup on your laptop goes to the public DNS and dies. Same name, different question, because the resolver configuration rewrote it before any server was consulted. The question section of dig's output, and resolv.conf itself, are where that rewriting becomes visible. The deeper protocol story, message format, the hierarchy, DNSSEC, lives in the networking codex's DNS page; how DNS works tells it end to end as a guide; and if you would rather watch the chain run than read about it, the DNS resolution simulator animates every referral and cache hit in this diagram.

Pitfalls

dig does not see what your application sees. This is the big one. Applications resolve names through getaddrinfo(), which obeys /etc/nsswitch.conf: typically "check /etc/hosts first, then DNS," with mDNS or other plugins sometimes in between. dig ignores every bit of that and goes straight to a DNS server. So when someone left a stale override in /etc/hosts, the app faithfully connects to the wrong address while dig swears the DNS is perfect, because it is. The reverse happens too: dig resolves a name that the app cannot, because the app's path goes through a misconfigured systemd-resolved while dig queried the network resolver directly. The arbiter is getent hosts shop.example.com, which resolves through the same libc path the application uses. When getent and dig disagree, the difference between their two paths is the bug, and the diagram below is the map of where to look.

Where the paths diverge. The application's lookup can be answered by /etc/hosts or a local stub cache before a single DNS packet exists; dig skips straight to the wire.

The TTL you see is local truth, not global truth. A cache shows you its own remaining countdown, which says nothing about what other caches hold; the record can be fresh in Frankfurt and stale in Singapore at the same instant. And resolvers are not contractually bound to your TTL. Some clamp very low TTLs up to a floor to protect themselves, some cap very long ones, a few misbehaving ones hold records past expiry, and application-level caches sit above all of this with their own clocks. Treat the TTL as a strong default, not a guarantee, and treat "dig @the-authority" as the only answer that is not somebody's memory.

+trace answers a different question than you usually have. Because it walks from the root itself, +trace bypasses every cache on purpose. It shows the delegation as the world's authorities currently publish it, which is exactly right for "did the NS change at the registrar take" and exactly wrong for "what are my users seeing right now," since users sit behind recursive caches that +trace never consults. The pairing to remember: +trace for delegation truth, @resolver for cache truth. You frequently need both, and they frequently disagree, and the disagreement is the finding.

UDP, truncation, and the occasional lie of omission. DNS prefers UDP, and answers too large for the negotiated packet size come back truncated with the tc flag set, at which point a correct client retries over TCP. dig does the retry automatically and tells you (;; Truncated, retrying in TCP mode), but some networks block TCP/53 outright, producing answers that work for small records and fail for large ones, a failure mode that looks supernatural until you spot the tc. dig +tcp forces the issue and turns a suspicion into a one-line test.

A drill you can run right now

Everything below is read-only: queries against public names, nothing changed anywhere. Ten minutes, and the three ideas this page keeps circling — ask a specific resolver, read the SERVER line, watch the TTL — become things you have personally observed.

Step 1 — the baseline and the full read. Pick any site and run dig wikipedia.org, with no options at all. Read it top to bottom against the section decoder above: the status, each flag (you should see qr rd ra and no aa), the question played back, the answer with its TTL, and the footer. Say out loud which server answered. If the SERVER line shows 127.0.0.53, you now know your machine runs a systemd-resolved stub and everything you just read came from localhost.

Step 2 — same question, different resolver. Run dig @1.1.1.1 wikipedia.org and compare. The SERVER line changes, the query time probably grows from near-zero to a real network round trip, and the TTL is different, because Cloudflare's cache fetched the record at a different moment than yours did. If the addresses themselves differ, you have caught a real propagation gap, or a CDN giving different answers by geography, in the wild on a Tuesday.

Step 3 — walk the chain yourself. Run dig +trace wikipedia.org and watch the referrals go by: a root server naming the .org servers, a .org server naming wikipedia's nameservers, and finally the answer, this time with the aa flag set because it came from the authority itself. That is the entire global DNS hierarchy, traversed in front of you in under a second.

Step 4 — watch a cache count down. Ask the same resolver twice, a pause apart:

$ dig wikipedia.org | grep -A1 'ANSWER'
;; ANSWER SECTION:
wikipedia.org.    246    IN    A    185.15.59.224
$ sleep 30; dig wikipedia.org | grep -A1 'ANSWER'
;; ANSWER SECTION:
wikipedia.org.    216    IN    A    185.15.59.224
$ dig @1.1.1.1 wikipedia.org +short
185.15.59.224

Thirty seconds of wall clock, thirty fewer seconds of TTL. You are watching the resolver's cache age in real time. Keep querying and the number walks down to zero, then jumps back to the zone's full TTL as the cache re-fetches from upstream. That countdown is the entire mechanism behind every "propagation" delay you will ever debug, observed directly. As a final flourish, run getent hosts wikipedia.org and confirm the libc path agrees with dig on this machine; the day those two disagree, you will know exactly what kind of bug you have and which diagram to pull up.

If you remember one line. dig @resolver name type is the whole tool: vary the resolver to find out who is holding what, read the status and the aa flag to learn how much to trust it, and read the TTL to learn how long the situation will last.

dig

The question it answers

The five usages that matter

Reading the output

Three production scenarios

"Propagation" is just caches expiring

A CNAME chain gone wrong

The intermittent five-second delay

What dig is actually talking to

Pitfalls

A drill you can run right now

Further reading

10 — journalctl & dmesg