Reading List
The most recent articles from a list of feeds I subscribe to.
New playground: memory spy
Hello! Today we’re releasing a new playground called “memory spy”. It lets you run C programs and see how their variables are represented in memory. It’s designed to be accessible to folks who don’t know C – it comes with bunch of extremely simple example C programs that you can poke at. Here’s the link:
This is a companion to the “how integers and floats work” zine we’ve been working on, so the goal is mostly to look at how number types (integers and floats) are represented.
why spy on memory?
How computers actually represent variables can seem kind of abstract, so I wanted to make it easy for folks to see how a real computer actually represents variables in memory.
why is it useful to look at C?
You might be wondering – I don’t write C! Why should I care how C programs represent variables in memory?
In this playground I’m mostly interested in showing people how integers and floats are represented. And low-level languages generally all represent integers and floats in the same way – a 32-bit unsigned int is going to be the same in C, C++, Rust, Go, Swift, etc. The exact name of the type is different, but the representation is the same.
In higher-level languages like Python it’s a little different, but under the
hood a float
in Python contains a C double
, so the C representation is
still pretty relevant.
you don’t have to know C
It uses C because C is the language where it’s the most straightforward to map between “the code in your program” and “what’s in your computer’s memory”.
But if you’re not comfortable with C, this playground is still for you! We put together a bunch of example programs where you can run them and look at each variable’s value.
None of the example programs use any fancy features of C – a lot of the code
is extremely simple, like char byte = 'a';
. So you should be mostly
able to understand what’s going on even if you don’t know C at all.
how does it work?
Behind the scenes, there’s a server that:
- compiles the program with
clang
- runs the program with the C debugger
lldb
(using a Python lldb script) - returns a JSON file with the values of the variable on every line, as an array of bytes
Then the frontend formats the array of bytes so you can look at it. The display logic isn’t very fancy – ultimately it’s a pretty thin wrapper around lldb.
some limitations
The two main limitations I can think of right now are:
- there’s no support for loops (it’ll run them, but it’ll only tell you the value of the variable the first time through the loop)
- it only supports defining one variable per line
There are probably more, it’s a very simple project.
the inspiration
Python Tutor by Philip Guo was a huge inspiration. It has a different focus – it also lets you step through programs in a debugger, but it’s more focused on helping the user build a mental model for how variables and control flow work.
what about security?
In general my approach to running arbitrary untrusted code is 20% sandboxing and 80% making sure that it’s an extremely low value attack target so it’s not worth trying to break in.
Programs are terminated after 1 second of runtime, they run in a container with no network access, and the machine they’re running on has no sensitive data on it and a very small CPU.
some notes on the tech stack
The backend is in Go, plus a Python script to script the interactions with lldb. (here’s the source for the lldb script and the source for the Go server right now). I’m using bubblewrap to sandbox lldb.
As always the frontend is using Vue. You can see the frontend source with “view source” if you want.
The main fancy thing that happens on the frontend is that I use tree sitter to figure out which lines of the code have variables defined on them.
some design notes
As usual these days, I built this project with Marie Claire LeBlanc Flanagan. I think the design decision I’m the happiest with is how we handled navigating the program you’re running. Instead of using next/previous arrows to step through the code one line at a time, you can just click on a line to view its variables.
This “click on a line” design wouldn’t make sense in a normal debugger context because usually you have loops and a line might be run more than once. But our focus here isn’t on control flow, and none of the example programs have loops.
The other thing I’m happy with is the decision to use regular links like (<a href="#example=hexadecimal">
) for all the navigation. There’s an
onhashchange
Javascript event that takes care of making sure we update the
page to match the new URL.
I think there were more design struggles but I forget what they were right now.
that’s all!
Here’s the link again:
Let me know on Twitter or Mastodon if you notice any problems.
Introducing "Implement DNS in a Weekend"
Hello! I’m excited to announce a project I’ve been working on for a long time: a free guide to implementing your own DNS resolver in a weekend.
The whole thing is about 200 lines of Python, including implementing all of the binary DNS parsing from scratch. Here’s the link:
This project is a fun way to learn:
- How to parse a binary network protocol like DNS
- How DNS works behind the scenes (what’s actually happening when you make a DNS query?)
The testers have reported that it takes around 2-4 hours to do in Python.
what’s a DNS resolver?
A DNS resolver is a program that knows how to figure out what the IP address for a domain is. Here’s what the command line interface of the resolver you’ll write looks like:
$ python3 resolve.py example.com
93.184.216.34
implementing DNS gives me an amazing sense of confidence
In Learning DNS in 10 years, I talked about how having implemented a toy version of DNS myself from scratch gives me an unparalleled sense of confidence in my understanding of DNS.
So this guide is my attempt to share that sense of confidence with you all.
Also, if you’ve bought How DNS Works, I think this guide is a nice companion – you can implement your own DNS resolver to solidify your understanding of the concepts in the zine.
it’s a Jupyter notebook
In this guide, I wanted to mix code that you could run with explanations. I struggled to figure out the right format for months, and then I finally thought of using a Jupyter notebook! This meant that I could easily check that all of the code actually ran.
I used Jupyter Book to convert the Jupyter notebooks into a website. It reruns the notebook before converting it to HTML, so I could easily guarantee that all of the code actually runs and outputs what it says that it outputs. I ended up hacking the theme a lot to make it more minimal, as well as doing some terrible things with Beautiful Soup to get a table of contents that shows you the global TOC as well as the page’s local section headings all in one place.
You can also download the Jupyter notebooks and run them on your own computer if you’d like, using the “download the code” button on the homepage.
why Python?
I used Python for this guide instead of a lower-level language like Go or Rust to make it more approachable – when I started learning networking 10 years ago, I didn’t really know any systems languages well, and I found them kind of intimidating. Implementing traceroute using scapy in Python felt much less scary.
You can very easily pack/unpack binary data in Python with struct.pack
and
struct.unpack
, so Python being a higher-level language doesn’t really cause
any problems.
The idea is that you can either follow the guide in Python (which is the easiest mode), or if you want a bigger challenge, you can translate the code to any language you’d like. (Go? Javascript? Rust? Bash? Lua? Ruby?)
only the standard library
It was important to me to really show how to implement DNS “from scratch”, so
the guide only uses a few very basic standard library modules: struct
,
socket
, io
, random
, and dataclasses
.
Here’s what we use each module for:
random
is used for generating DNS query IDssocket
is used to make a UDP connectionstruct
is used for converting to/from binary (struct.pack
andstruct.unpack
)dataclasses
are used to make serializing / deserializing records a little more ergonomicio
is used forBytesIO
, which gives us a reader interface which stores a pointer to how much of the packet we’ve read so far. If I were implementing DNS in a language that didn’t have this kind of reader interface, I might implement my own.
it includes some bonus exercises
The toy DNS resolver is obviously missing a bunch of important features, so I’ve added some exercises at the end with examples of features you could add (and bugs you could fix) to make it a little more like a “real” DNS resolver.
This list isn’t particularly exhaustive though, and I’d love to hear other ideas for relatively-easy-to-implement DNS resolver features I’ve missed.
next goal: TLS
I’ve actually written toy implementations of a bunch of other network protocols in Python (ICMP, UDP, TCP, HTTP, and TLS), and I’m hoping to release “Implement TLS in a weekend” at some point.
No promises though – I have another zine to finish writing first (on all the surprising things about how integers and floats work on computers), and a toy TLS implementation is quite a bit more involved than a toy DNS implementation.
thanks to the beta testers
Thanks to everyone (Atticus, Miccah, Enric, Ben, Ben, Maryanne, Adam, Jordan, and anyone else I missed) who tested this guide and reported confusing or missing explanations, mistakes, and typos.
Also a huge thanks to my friend Allison Kaptur who designed the first “Domain Name Saturday” workshop with me at the Recurse Center in 2020.
The name was inspired by Ray Tracing in One Weekend.
here’s the link again
Here’s the link to the guide again if you’d like to try it out:
New talk: Learning DNS in 10 years
Here’s a keynote I gave at RubyConf Mini last year: Learning DNS in 10 years. It’s about strategies I use to learn hard things. I just noticed that they’d released the video the other day, so I’m just posting it now even though I gave the talk 6 months ago.
Here’s the video, as well as the slides and a transcript of (roughly) what I said in the talk.
the video
the transcript
So, we're going to talk about learning through a series of tiny deep dives. My favorite way of learning things is to do nothing, most of the time.
That's why it takes 10 years.
So for six months I'll do nothing and then like I'll furiously learn something for maybe 30 minutes or three hours or an afternoon. And then I'll declare success and go back to doing nothing for months. I find this works really well for me.
Here are some of the strategies we're going to talk about for doing these tiny deep dives
First, we're going to start briefly by talking about what DNS is.
Next, we're going to talk about spying on DNS.
Then we're gonna talk about being confused, which is my main mode. (I'm always confused about something!)
Then we'll talk about reading the specification, we'll going to do some experiments, and we're going to implement our own terrible version of DNS.
www.example.com
, your browser
needs to look up that website's IP address. So DNS translates
domain names into IP addresses. It looks up other information about domain
names too, but we're mostly just going to talk about IP addresses today.
For example, you're on your phone, you're using Google Maps, it needs to know, where is maps.google.com, right? Or on your computer, where's reddit.com? What's the IP address? And if we didn't have DNS, the entire internet would collapse.
I think it's fun to learn how this behind the scenes stuff works.
The other thing about DNS I find interesting is that it's really old. There's this document (RFC 1035) which defines how DNS works, that was written in 1987. And if you take that document and you write a program that works the way that documents says to work, your program will work. And I think that's kind of wild, right?
The basics haven't changed since before I was born. So if you're a little slow about learning about it, that's ok: it's not going to change out from under you.
maps.google.com
. We
can do that in dig!
dig maps.google.com
, it prints out 5 fields. Let's
talk about what those 5 fields are.
I've used example.com instead of maps.google.com on this slide, but the fields are the same. Let's talk about 4 of them:
We have the domain name, no big deal
The Time To Live, which is how long to cache that record for so this is a one day
You have the record type, A stands for address because this is an IP address
And you have the content, which is the IP address
But there are other kinds of records like TXT records. So we're going to look at a TXT record really quickly just because I think this is very fun. We're going to look at twitter.com's TXT records.
So TXT records are something that people use for domain verification, for example to prove to Google that you own twitter.com.
So what you can do is you can set this DNS
record google-site-verification
. Google will tell you what to set
it to, you'll set it, and then Google will believe you.
I think it's kind of fun that you can like kind of poke around with DNS and see that Twitter is using Miro or Canva or Mixpanel, that's all public. It's like a little peek into what people are doing inside their companies
+noall +answer
and
then your dig responses look much nicer (like they did in the screenshots
above) instead of having a lot of nonsense in them. Whenever possible, I try to
make my tools behave in a more human way.
And so here's what it said might be going on. The first time I opened the website (before the DNS records had been set up), the DNS servers returned a negative answer, saying hey,this domain doesn't exist yet. The code for that is NXDOMAIN, which is like a 404 for DNS.
And the resolver cached that negative NXDOMAIN response. So the fact that it didn't exist was cached.
In networking, everything has a specification. The boring technical documents are called RFC is for request for comments. I find this name a bit funny, because for DNS, some of the main RFCs are RFC 1034 and 1035. These were written in 1987, and the comment period ended in 1987. You can definitely no longer make comments. But anyway, that's what they're called.
I personally kind of love RFCs because they're like the ultimate answer to many questions. There's a great series of HTTP RFCs, 9110 to 9114. DNS actually has a million different RFCs, it's very upsetting, but the answers are often there. So I went looking. And I think I went looking because when I read comments on StackOverflow, I don't always trust them. How do I know if they're accurate? So I wanted to go to an authoritative source.
So, um, ok, cool. What does that mean, right? Luckily, we only have one question: I don't need to read the entire boring document. I just need to like analyze this one sentence and figure it out.
So it's saying that the cache time depends on two fields. I want to show you the actual data it's talking about, the SOA record.
dig +all asdfasdfasdfasdfasdf.jvns.ca
It says that the domain doesn't exist, NXDOMAIN. But it also returns this
record called the SOA record, which has some domain metadata. And there are two
fields here that are relevant.
Here. I put this on a slide to try to make it a little bit clearer. This slide is a bit messed up, but there's this field at the end that's called the MINIMUM field, and there's the TTL, time to live of the record, that I've tried to circle.
And what it's saying is that if a record doesn't exist, the amount of time the resolver should cache "it doesn't exist" for is the minimum of those two numbers.
And so I waited three hours and then everything worked. And I found this kind of fun to know because often like if you look up DNS advice it will say something like, if something has gone wrong, you need to wait 48 hours. And I do not want to wait 48 hours! I hate waiting. So I love it when I can like use my brain to figure out that I can wait for less time.
Sometimes when I find my mental model is broken, it feels like I don't know anything
But in this case, and I think in a lot of cases, there's often just a few things I'm missing? Like this negative caching thing is like kind of weird, but it really was the one thing I was missing. There are a few more important facts about how DNS caching works that I haven't mentioned, but I haven't run into more problems I didn't understand since then. Though I'm sure there's something I don't know.
So sometimes learning one small thing really can solve all your problems.
So let's say we want to do some experiments with caching.
I think most people don't want to make experimental changes to their domain names, because they're worried about breaking something. Which I think is very understandable.
Because I was really into DNS, I wanted to experiment with DNS. And I also wanted other people to experiment with DNS without having to worry about breaking something. So I made this little website with my friend, Marie, called Mess with DNS
The idea is, if you don't want to do that DNS experiments on your domain, you can do them on my domain. And if you mess something up, it's my problem, it's not your problem. And there have been no problems, so that's fine.
So let's use Mess With DNS to do a little DNS experimentation
dig @1.1.1.1 test.chair131.messwithdns.com
.
I've queried it a bunch of times, maybe 10 or 20.
Oh, cool. This isn't what I expected to see. This is fun, though, that's great. We made about 20 queries for that DNS record. The server logs all queries it receives, so we can count them. Our server got 1, 2, 3, 4, 5, 6, 7, 8 queries. That's kind of fun. 8 is less than 20.
One reason I like to do demos live on stage is that sometimes what I what happens isn't exactly what I think will happen. When I do this exact experiment at home, I just get 1 query to the resolver.
So we only saw like eight queries here. And I assume that this is because the resolver, 1.1.1.1, we're talking to has more than one independent cache, I guess there are 8 caches. This makes sense to me because Cloudflare's network is distributed -- the exact machines I'm talking to here in Providence are not the same as the ones in Montreal.
This is interesting because it complicates your idea about how caching works a little bit, right? Like maybe a given DNS resolver actually has like eight caches and which one you get is random, and you're not always talking to the same one. I think that's what's going on here.
Let's go to Wireshark and look for the packet we just sent. And we can see it there! There's some other noise in between, so I'll stop the capture.
We can see that it's the same packet because the query ID matches, B962.
So we sent a query to Google the answer server and we got a response right? It was like this is totally legitimate. There's no problem. It doesn't know that we copied and pasted it and that we have no idea what it means!
We're going to see how to construct these in Ruby, but first I want to talk about what a byte is for one second. So this (b9) is the hexadecimal representation of a byte. The way I like to look at figure out what that means is just type it into IRB, if you type in 0xB9 it'll print out, that's the number 184.
So the question is 12 bytes
b962
which is the query ID. The next number is the flags, which
basically in this case, means like this is a query like hello, I have a
question. And then there's four more sections, the number of questions and then
the number of answers. We do not have any answers. We only have a question. So
we're saying, hello, I have one question. That's what the header means.
And the way that we can do this in Ruby, is we can make a little array that has the query ID, and then these numbers which correspond to the other the other header fields, the flags and then 1 for 1 question, and then three zeroes for each of the 3 sections of answers.
And then we need to tell Ruby how to take these like six numbers and then represent them as bytes. So n here means each of these is supposed to represent it as two bytes, and it also means to use big endian byte order.
I broke up the question section here. There are two parts
you might recognize from example.com
: there's example, and com.
The way it works is that first you have a number (like 7), and then a
7-character string, like "example". The number tells you how many characters to
expect in each part of the domain name. So it's 7, example, 3, com, 0.
And then at the end, you have two more fields for the type and the class. Class 1 is code for "internet". And type 1 is code for "IP address", because we want to look up the IP address. is
First, spy on it. I find that when I look at things like to see like really what's happening under the hood, and when I look at like, what's in the bytes, you know what's going on? It's often like not as complicated as I think. Like, oh, there's just the domain name and the type. It really makes me feel far more confident that I understand that thing.
I try to notice when I'm confused, and I want to say again, that noticing when you're confused is something that like we don't always have time for right? It's something to do when you have the energy. For example there's this weird DNS query I saw in one of the demos today that I don't understand, but I ignored it because, well, I'm giving a talk. But maybe one day I'll feel like looking at it.
We talked about reading the specification, which, there are few times I feel like more powerful than when I'm in like a discussion with someone, and I KNOW that I have the right answer because, well, I read the specification! It's a really nice way to feel certain.
I love to do experiments to check that my understanding of stuff is right. And often I learn that my understanding of something is wrong! I had an example in this talk that I was going to include and I did an experiment to check that that example was true, and it wasn't! And now I know that. I love that experiments on computers are very fast and cheap and usually have no consequences.
And then the last thing we talked about and truly my favorite, but the most work is like implementing your own terrible version. For me, the confidence I get from writing like a terrible DNS implementation that works on 11 different domain names is unmatched. If my thing works at all, I feel like, wow, you can't tell me that I don't know how DNS works! I implemented it! And it doesn't matter if my implementation is "bad" because I know that it works! I've tested it. I've seen it with my own eyes. And I think that just feels amazing. And there are also no consequences because you're never going to run it in production. So it doesn't matter if it's terrible. It just exists to give you huge amounts of confidence in yourself. And I think that's really nice.
thanks to the organizers!
Thanks to the RubyConf Mini organizers for doing such a great job with the conference – it was the first conference I’d been to since 2019, and I had a great time.
a quick plug for “How DNS Works”
If you liked this talk and want to to spend less than 10 years learning about how DNS works, I spent 6 months condensing everything I know about DNS into 28 pages. It’s here and you can get it for $12: How DNS Works.
New playground: integer.exposed
Hello! For the last few months we’ve been working on a zine about how integers and floating point numbers work. Whenever I make a zine I like to release a playground to go with it, like mess with dns for the DNS zine or the sql playground.
For this one, I made a simple playground called integer.exposed, inspired by Bartosz Ciechanowski’s float.exposed.
It’s a lot less elaborate than Mess With DNS, so I’ll keep this blog post short.
the inspiration: float.exposed
I did a couple of talks about how integers and floating point work last month, and in the talk about floating point I found myself CONSTANTLY referring to this site called Float Exposed by Bartosz Ciechanowski to demonstrate various things. (Aside: If you haven’t seen Ciechanowski’s incredible interactive explainers on bicycles, mechanical watches, lenses, the internal combustion engine, and more, you should check them out!)
Here’s what it it looks like:
Things I’ve done with it:
- Increment the significand of a float (to show people how close together successive floats are)
- Show special values like NaN and infinity, and show how if you change the bits in NaN, it’s still NaN
- Go to a large integer value and show how the distance between floats is very large
- Show how you get drastically different precision for one million as a 32-bit float and as a 64-bit float (try incrementing the significand for each one!)
and lots more! It’s an incredible way to get hands on with floats and improve your intuition around how they work.
float.exposed, but for integers
Integers aren’t as complicated as floats, but there are some nonobvious things about them: you have signed integers and unsigned integers, you have endianness, and there are some weird operations like right/left shift. So when I was talking about integers, I found myself wanting a similar website to float.exposed to demonstrate things.
So with permission, I put one together at integer.exposed. Here’s a screenshot:
The UI is a little different: integers don’t have many different parts the way floating point numbers do, so there’s a single row of buttons that you can use to do various operations on the integer.
A note on byte order: Like float.exposed, it uses a big endian byte order, because I think it’s more intuitive to read. But you do have to keep in mind that on most computers the bytes will actually be in the reverse order.
some interesting things to try
Here are some things I think are fun to try:
- signed integers: Look at how -1 is represented. Increment and decrement it a few times and see how the signed and unsigned values change. Do the same with -128. Also look at how -1 is represented as a 16/32/64-bit integer.
- signed/unsigned right shift: Similarly with -1: try out signed right shift (also known as “arithmetic right shift”) and see how the result is different from unsigned right shift (aka “logical right shift”).
- counting in binary: Start at 0 and increment a bunch of times and watch the binary value count up.
- not: Take any number (like 123) and NOT it. See how
NOT
is almost exactly the same as negation, but not quite. - swap the byte order. Take a number like 12345678 and see how if you swap the byte order, the result is an unrecognizably different number.
- look at how powers of 2 are represented
the tech stack
As usual for me it uses Vue.js. If you want to see how it works you can just
view source – it’s only two files, index.html
and script.js
.
I took a bunch of the CSS from float.exposed.
that’s all!
Let me know if you notice any bugs! I might add more features, but I want to keep it pretty simple.
I’ve also built another more involved playground that I’m hoping to release and write up soon.
A list of programming playgrounds
I really like using (and making!) programming playgrounds, and I got thinking the other day about how I didn’t have a great list of playgrounds to refer to. So I asked on Mastodon for links to cool playgrounds.
Here’s what I came up with. I’d love to know what I missed.
- Compilers: godbolt compiler explorer by Matt Godbolt
- Shaders: shadertoy by Inigo Quilez and Pol Jeremias
- Arduino / IoT: wokwi from CodeMagic
- CSS/HTML/JS: CodePen by Chris Coyier, Alex Vasquez, and team
- CSS/HTML/JS: JSFiddle by Oskar Krawczyk and Piotr Zalewa
- CSS/HTML/JS: flems by Rasmus Porsager (saves all state in the URL)
- regular expressions:
- DNS: Mess With DNS by Julia Evans and Marie Flanagan
- DNS: DNS lookup tool by Julia Evans
- nginx: nginx playground by Julia Evans
- varnish: fastly fiddle from fastly
- SQLite: sqlime by Anton Zhiyanov (lets you load arbitrary SQLite databases)
- SQL: DB fiddle from Status200
- SQL: sql playground by Julia Evans
- Postgres: postgres playground from Crunchydata (runs postgres in the browser!)
- Git: oh my git by blinry and bleeptrack
- .NET bytecode: SharpLab by Andrey Shchekin
- Python bytecode: dis this by Pamela Fox
data formats
- Floating point: Float Exposed by Bartosz Ciechanowski
- Unicode: Unicode analyzer from fontspace
- Unicode: What unicode character is this? from babelstone
- ASN.1 certificates: ASN.1 JavaScript debugger by Lapo Luchini
- SVG: sssvg (interactive SVG reference) from fffuel (lots of other cool tools there)
- CBOR: CBOR playground
- JSON: JSON editor online by Jos de Jong
- cron: crontab guru from cronitor
programming languages
- official playgrounds:
- unofficial playgrounds:
- PHP: 3v4l by Sjon Hortensius
- Python/JS/C/C++/Java: Python Tutor by Philip Guo
- Javascript: JS Console by @rem
- many languages: riju by Radon Rosborough
- many languages: replit
- others: jqplay for jq, tryapl for APL