Reading List
The most recent articles from a list of feeds I subscribe to.
Reasons for servers to support IPv6
I’ve been having a hard time understanding IPv6. On one hand, the basics initially seem pretty straightforward (there aren’t enough IPv4 addresses for all the devices on the internet, so people invented IPv6! There are enough IPv6 addresses for everyone!)
But when I try to actually understand it, I run into a lot of questions. One
question is: twitter.com does not support IPv6. Presumably it can’t be causing
them THAT many issues to not support it. So why do websites support IPv6?
I asked people on Twitter why their servers support IPv6 and I got a lot of great answers, which I’ll summarize here. These all come with the disclaimer that I have basically 0 experience with IPv6 so I can’t evaluate these reasons very well.
First though, I want to explain why it’s possible for twitter.com to not
support IPv6 because I didn’t understand that initially.
how can you tell twitter.com doesn’t support IPv6?
You can tell they don’t support IPv6 is because if you look up their AAAA
record (which contains their IPv6 address), there isn’t one. Some other big
sites like github.com and stripe.com also don’t support IPv6.
$ dig AAAA twitter.com
(empty response)
$ dig AAAA github.com
(empty response)
$ dig AAAA stripe.com
(empty response)
why does twitter.com still work for IPv6 users?
I found this really confusing, because I’ve always heard that lots of internet users are forced to use IPv6 because we’ve run out of IPv4 addresses. But if that’s true, how could twitter.com continue to work for those people without IPv6 support? Here’s what I learned from the Twitter thread yesterday.
There are two kinds of internet service providers (ISPs):
- ISPs who own enough IPv4 address for all of their customers
- ISPs who don’t
My ISP is in category 1 – my computer gets its own IPv4 address, and actually my ISP doesn’t even support IPv6 at all.
But lots of ISPs (especially outside of North America) are in category 2: they don’t have enough IPv4 addresses for all their customers. Those ISPs handle the problem by:
- giving all of their customers a unique IPv6 address, so they can access IPv6 sites directly
- making large groups of their customers share IPv4 addresses. This can either be with CGNAT (”carrier-grade NAT”) or “464XLAT” or maybe something else.
All ISPs need some IPv4 addresses, otherwise it would be impossible for their customers to access IPv4-only sites like twitter.com.
what are the reasons to support IPv6?
Now we’ve explained why it’s possible to not support IPv6. So why support it? There were a lot of reasons.
reason: CGNAT is a bottleneck
The argument that was most compelling to me was: CGNAT (carrier-grade NAT) is a bottleneck and it causes performance issues, and it’s going to continue to get worse over time as access to IPv4 addresses becomes more and more restricted.
Someone also mentioned that because CGNAT is a bottleneck, it’s an attractive DDoS target because you can ruin lots of people’s internet experience just by attacking 1 server.
Servers supporting IPv6 reduces the need for CGNAT (IPv6 users can just connect directly!) which makes the internet work better for everyone.
I thought this argument was interesting because it’s a “public commons” / community argument – it’s less that supporting IPv6 will make your site specifically work better, and more that if almost everyone supports IPv6 then it’ll make the experience of the internet better for everyone, especially in countries where people don’t have easy access to IPv4 addresses.
I don’t actually know how much of an issue this is in practice.
There were lots of more selfish arguments to use IPv6 too though, so let’s get into those.
reason: so IPv6-only servers can access your site
I said before that most IPv6 users still have access to IPv4 through some kind of NAT. But apparently that’s not true for everyone – some people mentioned that they run some servers which only have IPv6 addresses and which aren’t behind any kind of NAT. So those servers are actually totally unable to access IPv4-only sites.
I imagine that those servers aren’t connecting to arbitrary machines that much – maybe they only need to connect to a few hosts with IPv6 support.
But it makes sense to me that a machine should be able to access my site even if it doesn’t have an IPv4 address.
reason: better performance
For users who are using both IPv4 and IPv6 (with a dedicated IPv6 address and a shared IPv4 address), apparently IPv6 is often faster because it doesn’t need to go through an extra translation layer.
So supporting IPv6 can make the site faster for users sometimes.
In practice clients use an algorithm called “Happy Eyeballs” which tries to figure out whether IPv4 or IPv6 will be faster and then uses whichever seems faster.
Some other performance benefits people mentioned:
- maybe sometimes using IPv6 can get you a SEO boost because of the better performance.
- maybe using IPv6 causes you to go through better (faster) network hardware because it’s a newer protocol
reason: resilience against IPv4 internet outages
One person said that they’ve run into issues where there was an internet outage that only affected IPv4 traffic, because of accidental BGP poisoining.
So supporting IPv6 means that their site can still stay partially online during those outages.
reason: to avoid NAT issues with home servers
A few people mentioned that it’s much easier to use IPv6 with home servers – instead of having to do port forwarding through your router, you can just give every server a unique IPv6 address and then access it directly.
Of course, for this to work the client needs to have IPv6 support, but more and more clients these days have IPv6 support too.
reason: to learn about IPv6
One person said they work in security and in security it’s very important to understand how internet protocols work (attackers are using internet protocols!). So running an IPv6 server helps them learn how it works.
reason: to push IPv6 forward / IPv4 is “legacy”
A couple of people said that they support IPv6 because it’s the current standard, and so they want to contribute to the success of IPv6 by supporting it.
A lot of people also said that they support IPv6 because they think sites that only support IPv4 are “behind” or “legacy”.
reason: it’s easy
I got a bunch of answers along the lines of “it’s easy, why not”. Obviously adding IPv6 support is not easy in all situations, but a couple of reasons it might be easy in some cases:
- you automatically got an IPv6 address from your hosting company, so all you need to do is add an
AAAArecord pointing to that address - your site is behind a CDN that supports IPv6, so you don’t need to do anything extra
reason: safer networking experimentation
Because the address space is so big, if you want to try something out you can just grab an IPv6 subnet, try out some things in it, and then literally never use that subnet again.
reason: to run your own autonomous system (AS)
A few people said they were running their own autonomous system (I talked about what an AS is a bit in this BGP post). IPv4 addresses are too expensive so they bought IPv6 addresses for their AS instead.
reason: security by obscurity
If your server only has a public IPv6 address, attackers can’t easily find it by scanning the whole internet. The IPv6 address space is too big to scan!
Obviously this shouldn’t be your only security measure, but it seems like a nice bonus – any time I run an IPv4 public server I’m always a tiny bit surprised by how it’s constantly being scanned for vulnerabilities (like old versions of WordPress, etc).
very silly reason: you can put easter eggs in your IPv6 address
IPv6 addresses have a lot of extra bits in them that you can do frivolous
things with. For example one of Facebook’s IPv6 addresses is
“2a03:2880:f10e:83:face:b00c:0:25de” (it has face:b00c in it).
there are more reasons than I thought
That’s all I’ve learned about the “why support IPv6?” question so far.
I came away from this conversation more motivated to support IPv6 on my (very small) servers than I had been before. But that’s because I think supporting IPv6 will require very little effort for me. (right now I’m using a CDN that supports IPv6 so it comes basically for free)
I know very little about IPv6 still but my impression is that IPv6 support often isn’t zero-effort and actually can be a lot of work. For example, I have no idea how much work it would actually be for Twitter to add IPv6 support on their edge servers.
supporting IPv6 can also cause problems
A friend who runs a large service told me that their service has tried to add IPv6 support multiple times over the last 7 years, but each time it’s caused them problems. What happened to them was:
- they advertised an AAAA record
- users would get the AAAA record and try to connect to them over IPv6
- some network equipment in the user’s ISP/internal network somewhere was broken, so the IPv6 connection failed
- as a result those users were unable to use their service
I thought it was interesting and surprising that supporting IPv6 can actually in some cases make things worse for people on dual stack (IPv4 + IPv6) networks.
some more IPv6 questions
Here are some more IPv6 questions I have that maybe I’ll explore later:
- what are the disadvantages to supporting IPv6? what goes somehow wrong? (here’s one example of an IPv6 problem someone linked me to, for example)
- what are the incentives for ISPs that own enough IPv4 addresses for their customers to support IPv6? (another way of asking: is it likely that my ISP will move to supporting IPv6 in the next few years? or are they just not incentivized to do it so it’s unlikely?)
- digital ocean seems to only support IPv4 floating IPs, not IPv6 floating IPs. Why not? Shouldn’t it be easier to give out IPv6 floating IPs since there are more of them?
- when I try to ping an IPv6 address (like example.com’s IP
2606:2800:220:1:248:1893:25c8:1946for example) I get the errorping: connect: Network is unreachable. Why? (answer: it’s because my ISP doesn’t support IPv6 so my computer doesn’t have a public IPv6 address)
This IPv4 vs IPv6 article from Tailscale looks interesting and answers some of these questions.
Hosting my static sites with nginx
Hello! Recently I’ve been thinking about putting my static sites on servers that I run myself instead of using managed services like Netlify or GitHub Pages.
Originally I thought that running my own servers would require a lot of maintenance and be a huge pain, but I was chatting with Wesley about what kind of maintainance their servers require, and they convinced me that it might not be that bad.
So I decided to try out moving all my static sites to a $5/month server to see what it was like.
Everything in here is pretty standard but I wanted to write down what I did anyway because there are a surprising number of decisions and I like to see what choices other people make.
the constraint: only static sites
To keep things simple, I decided that this server would only run nginx and
only serve static sites. I have about 10 static sites right now, mostly projects for wizard zines.
I decided to use a $5/month DigitalOcean droplet, which should very easily be able to handle my existing traffic (about 3 requests per second and 100GB of bandwidth per month). Right now it’s using about 1% of its CPU. I picked DigitalOcean because it was what I’ve used before.
Also all the sites were already behind a CDN so they’re still behind the same CDN.
step 1: get a clean Git repo for each build
This was the most interesting problem so let’s talk about it first!
Building the static sites might seem pretty easy – each one of them already has a working build script.
But I have pretty bad hygiene around files on my laptop – often I have a bunch of uncommitted files that I don’t want to go onto the live site. So I wanted to start every build with a clean Git repo. I also wanted this to be fast – I’m impatient so I wanted to be able to build and deploy most of my sites in less than 10 seconds.
I handled this by hacking together a tiny build system called tinybuild. It’s basically a 4-line bash script, but with extra some command line arguments and error checking. Here are the 4 lines of bash:
docker build - -t tinybuild < Dockerfile
CONTAINER_ID=$(docker run -v "$PWD":/src -v "./deploy:/artifact" -d -t tinybuild /bin/bash)
docker exec $CONTAINER_ID bash -c "git clone /src /build && cd /build && bash /src/scripts/build.sh"
docker exec $CONTAINER_ID bash -c "mv /build/public/* /artifact"
These 4 lines:
- Build a Dockerfile with all the dependencies for that build
- Clone my repo into
/buildin the container, so that I always start with a clean Git repo - Run the build script (
/src/scripts/build.sh) - Copy the build artifacts into
./deployin the local directory
Then once I have ./deploy, I can rsync the result onto the server
It’s fast because:
- the
docker build -means I don’t send any state from the repository to the Docker daemon. This matters because one of my repos is 1GB (it has a lot of PDFs in it) and sending all that to the Docker daemon takes forever - the
git cloneis from the local filesystem and I have a SSD so it’s fast even for a 1GB repo - most of the build scripts just run
hugoorcatso they’re fast. Thenpmbuild scripts take maybe 30 seconds.
apparently local git clones make hard links
A tiny interesting fact: I tried to do git clone --depth 1 to speed up my git
clone, but git gave me this warning:
warning: --depth is ignored in local clones; use file:// instead.
I think what’s going on here is that git makes hard links of all the objects to
make a local clone (which is a lot faster than copying). So I guess with the
hard links approach --depth 1 doesn’t make sense for some reason? And
file:// forces git to copy all objects instead, which is actually slower.
bonus: now my builds are faster than they used to be!
One nice thing about this is that my build/deploy time is less than it was on
Netlify. For jvns.ca it’s about 7 seconds to build and deploy the site
instead of about a minute previously.
running the builds on my laptop seems nice
I’m the only person who develops all of my sites, so doing all the builds in a Docker container on my computer seems to make sense. My computer is pretty fast and all the files are already right there! No giant downloads! And doing it in a Docker container keeps the build isolated.
example build scripts
Here are the build scripts for this blog (jvns.ca).
Dockerfile
FROM ubuntu:20.04
RUN apt-get update && apt-get install -y git
RUN apt-get install -y wget python2
RUN wget https://github.com/gohugoio/hugo/releases/download/v0.40.1/hugo_0.40.1_Linux-64bit.tar.gz
RUN wget https://github.com/sass/dart-sass/releases/download/1.49.0/dart-sass-1.49.0-linux-x64.tar.gz
RUN tar -xf dart-sass-1.49.0-linux-x64.tar.gz
RUN tar -xf hugo_0.40.1_Linux-64bit.tar.gz
RUN mv hugo /usr/bin/hugo
RUN mv dart-sass/sass /usr/bin/sass
build-docker.sh:
set -eu
scripts/parse_titles.py
sass sass/:static/stylesheets/
hugo
deploy.sh:
set -eu
tinybuild -s scripts/build-docker.sh \
-l "$PWD/deploy" \
-c /build/public
rsync-showdiff ./deploy/ root@staticsites:/var/www/jvns.ca
rm -rf ./deploy
step 2: get rsync to just show me which files it updated
When I started using rsync to sync the files, it would list every single file instead of just files that had changed. I think this was because I was generating new files for every build, so the timestamps were always newer than the files on the server.
I did a bunch of Googling and figured out this incantation to get rsync to just show me files that were updated;
rsync -avc --out-format='%n' "$@" | grep --line-buffered -v '/$'
I put that in a script called rsync-showdiff so I could reuse it. There might
be a better way, but this seems to work.
step 3: configuration management
All I needed to do to set up the server was:
- install nginx
- create directories in /var/www for each site, like
/var/www/jvns.ca - create an nginx configuration for each site, like
/etc/nginx/sites-enabled/jvns.ca.conf - deploy the files (with my deploy script above)
I wanted to use some kind of configuration management to do this because that’s how I’m used to managing servers. I’ve used Puppet a lot in the past at work, but I don’t really like using Puppet. So I decided to use Ansible even though I’d never used it before because it seemed simpler than using Puppet. Here’s my current Ansible configuration, minus some of the templates it depends on.
I didn’t use any Ansible plugins because I wanted to maximize the probability that I would actually be able to run this thing in 3 years.
The most complicated thing in there is probably the reload nginx handler,
which makes sure that the configuration is still valid after I make an nginx
configuration update.
step 4: replace a lambda function
I was using one Netlify lambda function to calculate purchasing power parity (“PPP”) for countries that have a weaker currency relative to the US on https://wizardzines.com. Basically it gets your country using IP geolocation and then returns a discount code if you’re in a country that has a discount code. (like 70% off for India, for example). So I needed to replace it.
I handled this by rewriting the (very small) program in Go, copying the
static binary to the server, and adding a proxy_pass for that site.
The program just looks up the country code from the geolocation HTTP header in a hashmap, so it doesn’t seem like it should cause maintenance problems.
a very simple nginx config
I used the same nginx config file for templates for almost all my sites:
server {
listen 80;
listen [::]:80;
root /var/www/{{item.dir}};
index index.html index.htm;
server_name {{item.server}};
location / {
# First attempt to serve request as file, then
# as directory, then fall back to displaying a 404.
try_files $uri $uri/ =404;
}
}
The {{item.dir}} is an Ansible thing.
I also added support for custom 404 pages (error_page /404.html) in the main nginx.conf.
I’ll probably add TLS support with certbot later. My CDN handles TLS to the client, I just need to make the connection between the CDN and the origin server use TLS
Also I don’t know if there are problems with using such a simple nginx config. Maybe I’ll learn about them!
bonus: I can find 404s more easily
Another nice bonus of this setup is that it’s easier to see what’s happening with my site – I can just look at the nginx logs!
I ran grep 404 /var/log/nginx/access.log to figure out if I’d broken
anything during the migration, and I actually ended up finding a lot of
links that had been broken for many years, but that I’d just never noticed.
Netlify’s analytics has a “Top resources not found” that shows you the most common 404s, but I don’t think there’s any way to see all 404s.
a small factor: costs
Part of my motivation for this switch was – I was getting close to the Netlify free tier’s bandwidth limit (100GB/month), and Netlify charges $20/100GB for additional bandwidth. Digital Ocean charges $1/100GB for additional bandwidth (20x less), and my droplet comes with 1TB of bandwidth. So the bandwidth pricing feels a lot more reasonable to me.
we’ll see how it goes!
All my static sites are running on my own server now. I don’t really know what this will be like to maintain, we’ll see how it goes – maybe I’ll like it! maybe I’ll hate it! I definitely like the faster build times and that I can easily look at my nginx logs.
Some ways DNS can break
When I first learned about it, DNS didn’t seem like it should be THAT complicated. Like, there are DNS records, they’re stored on a server, what’s the big deal?
But with DNS, reading about how it works in a textbook doesn’t prepare you for the sheer volume of different ways DNS can break your system in practice. It’s not just caching problems!
So I asked people on Twitter for example of DNS problems they’ve run into, especially DNS problems that didn’t initially appear to be DNS problems. (the popular “it’s always DNS” meme)
I’m not going to discuss how to solve or avoid any of these problems in this post, but I’ve linked to webpages discussing the problem where I could find them.
problem: slow network requests
Your network requests are a little bit slower than expected, and it’s actually because your DNS resolver is slow for some reason. This might be because the resolver is under a lot of load, or it has a memory leak, or something else.
I’ve run into this before with my router’s DNS forwarder – all of my DNS requests were slow, and I restarted my router and that fixed the problem.
problem: DNS timeouts
A couple of people mentioned network requests that were taking 2+ seconds or 30 seconds because of DNS queries that were timing out. This is sort of the same as “slow requests”, but it’s worse because queries can take several seconds to time out.
Sophie Haskins has a great blog post Misadventures with Kube DNS about DNS timeouts with Kubernetes.
problem: ndots
A few people mentioned a specific issue where Kubernetes sets ndots:5 in its /etc/resolv.conf
Here’s an example /etc/resolv.conf from Kubernetes pods /etc/resolv.conf ndots:5 option and why it may negatively affect your application performances.
nameserver 100.64.0.10
search namespace.svc.cluster.local svc.cluster.local cluster.local eu-west-1.compute.internal
options ndots:5
My understanding is that if this is your /etc/resolv.conf and you look up
google.com, your application will call the C getaddrinfo function, and
getaddrinfo will:
- look up
google.com.namespace.svc.cluster.local. - look up
google.com.svc.cluster.local. - look up
google.com.cluster.local. - look up
google.com.eu-west-1.compute.internal. - look up
google.com.
Basically it checks if google.com is actually a subdomain of everything on the search line.
So every time you make a DNS query, you need to wait for 4 DNS queries to fail before you can get to the actual real DNS query that succeeds.
problem: it’s hard to tell what DNS resolver(s) your system is using
This isn’t a bug by itself, but when you run into a problem with DNS, often it’s related in some way to your DNS resolver. I don’t know of any foolproof way to tell what DNS resolver is being used.
A few things I know:
- on Linux, I think that most things use /etc/resolv.conf to choose a DNS resolver. There are definitely exceptions though, for example your browser might ignore /etc/resolv.conf and use a different DNS-over-HTTPS service instead.
- if you’re using UDP DNS, you can use
sudo tcpdump port 53to see where DNS requests are being sent. This doesn’t work if you’re using DNS over HTTPS or DNS over TLS though.
I also vaguely remember it being even more confusing on MacOS than on Linux, though I don’t know why.
problem: DNS servers that return NXDOMAIN instead of NOERROR
Here’s a problem that I ran into once, where nginx couldn’t resolve a domain.
- I set up nginx to use a specific DNS server to resolve DNS queries
- when visiting the domain, nginx made 2 queries, one for an
Arecord, and one for anAAAArecord - the DNS server returned a
NXDOMAINreply for theAquery - nginx decided “ok, that domain doesn’t exist”, and gave up
- the DNS server returned a successful reply for the
AAAAquery - nginx ignored the
AAAArecord because it had already given up
The problem was that the DNS server should have returned NOERROR – that
domain did exist, it was just that there weren’t any A records for it. I
reported the bug, they fixed it, and that fixed the problem.
I’ve implemented this bug myself too, so I understand why it happens – it’s
easy to think “there aren’t any records for this query, I should return an
NXDOMAIN error”.
problem: negative DNS caching
If you visit a domain before creating a DNS record for it, the absence of the record will be cached. This is very surprising the first time your run into it – I only learned about this last year!
The TTL for cache entry is the TTL of the domain’s SOA record – for example
for jvns.ca, it’s an hour.
problem: nginx caching DNS records forever
If you put this in your nginx config:
location / {
proxy_pass https://some.domain.com;
}
then nginx will resolve some.domain.com once on startup and never again. This
is especially dangerous if the IP address for some.domain.com changes
infrequently, because it might keep happily working for months and then
suddenly break at 2am one day.
There are pretty well-known ways to fix this and this post isn’t about nginx so I won’t get into it, but it’s surprising the first time you run into it.
Here’s a blog post with a story of how this happened to someone with an AWS load balancer.
problem: Java caching DNS records forever
Same thing, but for Java: Apparently depending on how you configure Java, “the JVM default TTL [might be] set so that it will never refresh DNS entries until the JVM is restarted.”
I haven’t run into this myself but I asked a friend about it who writes more Java than me and they told me that it’s happened to them.
Of course, literally any software could have this problem of caching DNS records forever, but the main cases I’ve heard of in practice are nginx and Java.
problem: that entry in /etc/hosts you forgot about
Another variant on caching issues: entries in /etc/hosts that override your
usual DNS settings!
This is extra confusing because dig ignores /etc/hosts, so everything SEEMS
like it should be fine (”dig whatever.com is working!“).
problem: your email isn’t being sent / is going to spam
The way email is sent and validated is through DNS (MX records, SPF records, DKIM records), so a lot of email problems are DNS problems.
problem: internationalized domain names don’t work
You can register domain names with non-ASCII characters or emoji like https://💩.la.
The way this works with DNS is that 💩.la gets translated into xn--ls8h.la with an encoding called “punycode”.
But even though there’s a clear standard for how they should work with DNS, a lot of software doesn’t handle internationalized domain names well! There’s a fun story about this in Julian Squires’ great talk The emoji that Killed Chrome!!.
problem: TCP DNS is blocked by a firewall
A couple of people mentioned that some firewalls allow UDP port 53 but not TCP port 53. But large DNS queries need to use TCP port 53, so this can cause weird intermittent problems that are hard to debug.
problem: musl doesn’t support TCP DNS
A lot of applications use libc’s getaddrinfo to make DNS queries. musl is an
alternative to glibc that’s used in Alpine Docker container which doesn’t
support TCP DNS. This can cause problems if you make DNS queries where the
response would be too big to fit inside a regular DNS UDP packet (512 bytes).
I’m still a bit fuzzy on this so I might have it wrong, but my understanding of how this can break is:
- musl’s getaddrinfo makes a DNS query
- the DNS server notices that the response is too big to fit in a single DNS response packet
- the DNS server returns an empty truncated response, expecting that the client will retry by making a TCP DNS query
musldoes not support TCP so it does not retry
A blog post about this: DNS resolution issue in Alpine Linux
problem: round robin DNS doesn’t work with getaddrinfo
One way you could approach load balancing is to use “round robin DNS”. The idea
is that every time you make a DNS query, you get a different IP address.
Apparently this works if you use gethostbyname to make DNS queries, but it
does not work if you use getaddrinfo because getaddrinfo sorts the IP
responses it receives.
So you could run into an upsetting problem if you switch from gethostbyname to getaddrinfo behind the scenes without realising that this will break your DNS load balancing.
This is especially insidious because you might not realize that you’re
switching to gethostbyname to getaddrinfo at all – if you’re not writing a
C program, those functions calls are hidden inside some library. So it could be
part of a seemingly innocuous upgrade.
Here are a couple of pages discussing this:
problem: a race condition when starting a service
A problem someone mentioned with Kubernetes DNS: they had 2 containers which started simultaneously and immediately tried to resolve each other. But the DNS lookup failed because the Kubernetes DNS change hadn’t happened yet, and then the failure was cached so it kept failing.
that’s all!
I’ve definitely missed some important DNS problems here, so I’d love to hear what I’ve missed. I’d also love links to blog posts that write up examples of these problems – I think it’s really useful to see how the problem specifically manifests in practice and how people debugged it.
How to find a domain's authoritative nameservers
Here’s a very quick “how to” post on how to find your domain’s authoritative nameserver.
I’m writing this because if you made a DNS update and it didn’t work, there are 2 options:
- Your authoritative nameserver doesn’t have the correct record
- Your authoritative nameserver does have the correct record, but an old record is cached and you need to wait for the cache to expire
To be able to tell which one is happening (do you need to make a change, or do you just need to wait?), you need to be able to find your domain’s authoritative nameserver and query it to see what records it has.
But when I looked up “how to find a domain’s authoritative nameserver” to see what advice was out there, I found a lot of different methods being mentioned, some of which can give you the wrong answer.
So let’s walk through a way to find your domain’s authoritative nameservers that’s guaranteed to always give you the correct answer. I’ll also explain why some of the other methods aren’t always accurate.
first, an easy but less accurate way
If you definitely haven’t updated your authoritative DNS server in the last
week or so, a very easy way to find it is to run dig +short ns DOMAIN
$ dig +short ns jvns.ca
art.ns.cloudflare.com.
roxy.ns.cloudflare.com.
In this case, we get the correct answer. Great!
But if you have updated your authoritative DNS server in the last few days (maybe because you just registered the domain!), that can give you an inaccurate answer. So here’s the slightly more complicated way that’s guaranteed to always give you the correct answer.
step 1: query a root nameserver
We’re going to look up the authoritative nameserver for jvns.ca in this example.
No matter what domain we’re looking up, we need to start with the root
nameservers. h.root-servers.net is one of the 13 DNS root nameservers, and dig @h.root-servers.net means “send the query to h.root-servers.net”.
$ dig @h.root-servers.net jvns.ca
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42165
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 9
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;jvns.ca. IN A
;; AUTHORITY SECTION: <------------ this is the section we're interested in
ca. 172800 IN NS c.ca-servers.ca. <------- we'll use this record
ca. 172800 IN NS j.ca-servers.ca.
ca. 172800 IN NS x.ca-servers.ca.
ca. 172800 IN NS any.ca-servers.ca.
;; ADDITIONAL SECTION:
c.ca-servers.ca. 172800 IN A 185.159.196.2
j.ca-servers.ca. 172800 IN A 198.182.167.1
x.ca-servers.ca. 172800 IN A 199.253.250.68
any.ca-servers.ca. 172800 IN A 199.4.144.2
c.ca-servers.ca. 172800 IN AAAA 2620:10a:8053::2
j.ca-servers.ca. 172800 IN AAAA 2001:500:83::1
x.ca-servers.ca. 172800 IN AAAA 2620:10a:80ba::68
any.ca-servers.ca. 172800 IN AAAA 2001:500:a7::2
;; Query time: 96 msec
;; SERVER: 198.97.190.53#53(198.97.190.53)
;; WHEN: Tue Jan 11 08:30:57 EST 2022
;; MSG SIZE rcvd: 289
The answer we’re looking for is this line in the “AUTHORITY SECTION”:
ca. 172800 IN NS c.ca-servers.ca.
It doesn’t matter which line in this section you pick, you can use any of them. I just picked the first one.
This tells us the server we need to talk to in step 2: c.ca-servers.ca.
step 2: query the .ca nameservers
Now we run dig @c.ca-servers.ca jvns.ca
$ dig @c.ca-servers.ca jvns.ca
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24920
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;jvns.ca. IN A
;; AUTHORITY SECTION: <------------ this is the section we're interested in
jvns.ca. 86400 IN NS art.ns.cloudflare.com. <---- we'll use this record
jvns.ca. 86400 IN NS roxy.ns.cloudflare.com.
;; Query time: 26 msec
;; SERVER: 185.159.196.2#53(185.159.196.2)
;; WHEN: Tue Jan 11 08:32:44 EST 2022
;; MSG SIZE rcvd: 90
Same as last time: the answer we’re looking for is this line in the “AUTHORITY SECTION”:
jvns.ca. 86400 IN NS art.ns.cloudflare.com.
Again, it doesn’t matter which line in this section you pick, you can use any of them. I just picked the first one.
success! we know the authoritative nameserver!
The authoritative nameserver for jvns.ca is art.ns.cloudflare.com.. Now you
can now query art.ns.cloudflare.com. directly to see what DNS records it has
for jvns.ca.
$ dig @art.ns.cloudflare.com. jvns.ca
jvns.ca. 292 IN A 172.64.80.1
Nice, it worked.
this is exactly what’s happening behind the scenes when you make a DNS query
The reason I like this method is that it mimics what’s happening behind the
scenes when you make a DNS query. When Google’s DNS resolver 8.8.8.8. looks
up jvns.ca, the server it queries to to get jvns.ca’s authoritative nameserver is
c.ca-servers.net (or one of the other options, like j.ca-servers.ca. or x.ca-servers.ca.)
Because this method uses the exact same information source as a real DNS query, you’re guaranteed to get a correct answer every time.
Often in practice I skip step 1 because I remember that the answer for .ca
domains is c.ca-servers.net, so I can skip straight to step 2.
this is useful to do when you’re updating your nameservers
When I update my nameservers with my domain registrar, they don’t actually update the authoritative nameserver right away. It takes a while, maybe an hour. So I like to go through these steps to check if my registrar has actually updated my authoritative nameserver yet.
other ways to get a domain’s authoritative nameserver
Here are a few other ways you can get the authoritative nameserver for a domain and why I didn’t recommend them as the main method.
dig +trace jvns.ca
This does the exact same thing so it will always give you the right answer, but the output is a bit confusing to read so I’m a bit more hesitant to recommend it.
dig ns jvns.ca
This will usually give you the right answer, but there are 2 reasons it might be wrong:
- You might get an old cached record
- The NS record you get doesn’t come from the same place as it does when we do
the method described in this post. In this example, instead of getting a NS
record from
c.ca-servers.net,dig ns jvns.cawill give you an NS record fromart.ns.cloudflare.com. In practice usually these are the exact same thing, but in some weird edge cases they might not be.
dig soa jvns.ca
You can also find nameservers in the SOA record!
$ dig SOA jvns.ca
jvns.ca. 3600 IN SOA art.ns.cloudflare.com. dns.cloudflare.com. 2267173366 10000 2400 604800 3600
^^^^^^^^^^^^^^^^^^^^^
here it is
This will usually give the right answer, there are 2 reasons it might be wrong, similarly to the NS record:
- This response comes from your authoritative nameserver. So if you’re in the middle of updating your nameserver, you might get the wrong answer because your DNS resolver sent the request to the old nameserver.
- Your authoritative nameserver could be returning a SOA record which doesn’t have the correct nameserver for some reason
whois jvns.ca
This will usually give you the right answer, but it might be an old cached version.
Here’s what this looks like on my machine for this example: (it gives us the right answer)
$ whois jvns.ca | grep 'Name Server'
Name Server: art.ns.cloudflare.com
Name Server: roxy.ns.cloudflare.com
that’s all!
I hope this helps some of you debug your DNS issues!
Why might you run your own DNS server?
One of the things that makes DNS difficult to understand is that it’s decentralized. There are thousands (maybe hundreds of thousands? I don’t know!) of authoritative nameservers, and at least 10 million resolvers. And they’re running lots of different software! All these different servers running software means that there’s a lot of inconsistency in how DNS works, which can cause all kinds of frustrating problems.
But instead of talking about the problems, I’m interested in figuring out – why is it a good thing that DNS is decentralized?
why is it good that DNS is decentralized?
One reason is scalability – the decentralized design of DNS makes it easier to scale and more resilient to failures. I find it really amazing that DNS is still scaling well even though it’s almost 40 years old. This is very important but it’s not what this post is about.
Instead, I want to talk about how the fact that it’s decentralized means that you can have control of how your DNS works. You can add more servers to the giant complicated mess of DNS servers! Servers that you control!
Yesterday I asked on Twitter why you might want to run your own DNS servers, and I got a lot of great answers that I wanted to summarize here.
you can run 2 types of DNS servers
There are 2 main types of DNS servers you can run:
- if you own a domain, you can run an authoritative nameserver for that domain
- if you have a computer (or a company with lots of computers), you can run a resolver that’s resolves DNS for those computers
DNS isn’t a static database
I’ve seen the “phone book” metaphor for DNS a lot, where domain names are like names and IP addresses are like phone numbers.
This is an okay mental model to start with. But the “phone book” mental model
might make you think that if you make a DNS query for google.com, you’ll
always get the same result. And that’s not true at all!
Which record you get in reply to a DNS query can depend on:
- where you are in the world (maybe you’ll get an IP address of a server that’s physically closer to you!)
- if you’re on a corporate network (where you might be able to resolve internal domain names)
- whether the domain name is considered “bad” by your DNS resolver (it might be blocked!)
- the previous DNS query (maybe the DNS resolver is doing DNS-based load balancing to give you a different IP address every time)
- whether you’re using an airport wifi captive portal (airport wifi will resolve DNS records differently before you log in, it’ll send you a special IP to redirect you)
- literally anything
A lot of the reasons you might want to control your own server are related to the fact that DNS isn’t a static database – there are a lot of choices you might want to make about how DNS queries are handled (either for your domain or for your organization).
reasons to run an authoritative nameserver
These reasons aren’t in any particular order.
For some of these you don’t necessarily have to run your own authoritative nameserver, you can just choose an authoritative nameserver service that has the features you want.
To be clear: there are lots of reasons not to run your own authoritative nameserver – I don’t run my own, and I’m not trying to convince you that you should. It takes time to maintain, your service might not be as reliable, etc.
reason: security
[There’s a] risk of an attacker gaining DNS change access through your vendor’s customer support people, who only want to be helpful. Or getting locked out from your DNS (perhaps because of the lack of that). In-house may be easier to audit and verify the contents.
reason: you like running bind/nsd
One reason several people mentioned was “I’m used to writing zone files and
running bind or nsd, it’s easier for me to just do that”.
If you like the interface of bind/nsd but don’t want to operate your own server, a couple of people mentioned that you can also get the advantages of bind by running a “hidden primary” server which stores the records, but serve all of the actual DNS queries from a “secondary” server. Here are some pages I found about configuring secondary DNS from from NS1 and cloudflare and Dyn as an example.
I don’t really know what the best authoritative DNS server to run is. I think I’ve only used nsd at work.
reason: you can use new record types
Some newer DNS record types aren’t supported by all DNS services, but if you run your own you can support any record types you want.
reason: user interface
You might not like the user interface (or API, or lack of API) of the DNS service you’re using. This is pretty related to the “you like running BIND” reason – maybe you like the zone file interface!
reason: you can fix problems yourself
There are some obvious pros and cons to being able to fix problems yourself when they arise (pro: you can fix the problem, con: you have to fix the problem).
reason: do something weird and custom
You can write a DNS server that does anything you want, it doesn’t have to just return a static set of records.
A few examples:
- Replit has a blog post about why they wrote their own authoritative DNS server to handle routing
- nip.io maps 10.0.0.1.nip.io to 10.0.0.1
- I wrote a custom DNS server for mess with dns
reason: to save money
Authoritative nameservers seem to generally charge per million DNS queries. As an example, at a quick glance it looks like Route 53 charges about $0.50 per million queries and NS1 charges about $8 per million queries.
I don’t have the best sense for how many queries a large website’s authoritative DNS server can expect to actually need to resolve (what kinds of sites get 1 billion DNS queries to their authoritative DNS server? Probably a lot, but I don’t have experience with that.). But a few people in the replies mentioned cost as a reason.
reason: you can change your registrar
If you use a separate authoritative nameserver for your domain instead of your registrar’s nameserver, then when you move to a different registrar all you hvae to do to get your DNS back up is to set your authoritative DNS server to the right value. You don’t need to migrate all your DNS records, which is a huge pain!
You don’t need to run your own nameserver to do this.
reason: geo DNS
You might want to return different IP addresses for your domain depending on where the client is, to give them a server that’s close to them.
This is a service lots of authoritative nameserver services offer, you don’t need to write your own to do this.
reason: avoid denial of service attacks targeted at someone else
Many authoritative DNS servers are shared. This means that if someone attacks
the DNS server for google.com or something and you happen to be using the
same authoritative DNS server, you could be affected even though the attack
wasn’t aimed at you. For example, this DDoS attack on Dyn in 2016.
reason: keep all of your configuration in one place
One person mentioned that they like to keep all of their configuration (DNS records, let’s encrypt, nginx, etc) in the same place on one server.
wild reason: use DNS as a VPN
Apparently iodine is an authoritative DNS server that lets you tunnel your traffic over DNS, if you’re on a network that only allows you to contact the outside world as a VPN.
reasons to run a resolver
reason: privacy
If someone can see all your DNS lookups, they have a complete list of all the domains you (or everyone from your organization) is visiting! You might prefer to keep that private.
reason: block malicious sites
If you run your own resolver, you can refuse to resolve DNS queries (by just not returning any results) for domains that you consider “bad”.
A few examples of resolvers that you can run yourself (or just use):
- Pi-Hole blocks advertisers
- Quad9 blocks domains that do malware/phishing/spyware. Cloudflare seems to have a similar service
- I imagine there’s also corporate security software that blocks DNS queries for domains that host malware
- DNS isn’t a static database. It’s very dynamic, and answers often depend in real time on the IP address a query came from, current load on content servers etc. That’s hard to do in real time unless you delegate serving those records to the entity making those decisions.
- DNS delegating control makes access control very simple. Everything under a zone cut is controlled by the person who controls the delegated server, so responsibility for a hostname is implicit in the DNS delegation.
reason: get dynamic proxying in nginx
Here’s a cool story from this tweet:
I wrote a DNS server into an app and then set it as nginx’s resolver so that I could get dynamic backend proxying without needing nginx to run lua. Nginx sends DNS query to app, app queries redis and responds accordingly. It worked pretty great for what I was doing.
reason: avoid malicious resolvers
Some ISPs run DNS resolvers that do bad things like nonexistent domains to an IP they control that shows you ads or a weird search page that they control.
Using either a resolver you control or a different resolver that you trust can help you avoid that.
reason: resolve internal domains
You might have an internal network with domains (like
blah.corp.yourcompany.com) that aren’t on the public internet. Running your
own resolver for machines in the internal network makes it possible to access
those domains.
You can do the same thing on a home network, either to access local-only services or to just get local addresses for services that are on the public internet.
reason: avoid your DNS queries being MITM’d
One person said:
I run a resolver on my LAN router that uses DNS over HTTPS for its upstream, so IoT and other devices that don’t support DoH or DoT don’t spray plaintext DNS outside
that’s all for now
It feels important to me to explore the “why” of DNS, because it’s such a complicated messy system and I think most people find it hard to get motivated to learn about complex topics if they don’t understand why all this complexity is useful.
Thanks to Marie and Kamal for discussing this post, and to everyone on Twitter who provided reasons