Reading List
The most recent articles from a list of feeds I subscribe to.
Some tactics for writing in public
Someone recently asked me – “how do you deal with writing in public? People on the internet are such assholes!”
I’ve often heard the advice “don’t read the comments”, but actually I’ve learned a huge amount from reading internet comments on my posts from strangers over the years, even if sometimes people are jerks. So I want to explain some tactics I use to try to make the comments on my posts more informative and useful to me, and to try to minimize the number of annoying comments I get.
talk about facts
On here I mostly talk about facts – either facts about computers, or stories about my experiences using computers.
For example this post about tcpdump contains some basic facts about how to use tcpdump, as well as an example of how I’ve used it in the past.
Talking about facts means I get a lot of fact-based comments like:
- people sharing their own similar (or different) experiences (“I use tcpdump a lot to look at our RTP sequence numbers”)
- pointers to other resources (“the documentation from F5 about tcpdump is great”)
- other interesting related facts I didn’t mention (“you can use tcpdump -X
too”, “netsh on windows is great”, “you can use
sudo tcpdump -s 0 -A 'tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420'to filter for HTTP GET requests) - potential problems or gotchas (“be careful about running tcpdump as root, try just setting the required capabilities instead”)
- questions (“Is there a way to place the BPF filter after IP packet reassembly?” or “what’s the advantage of tcpdump over wireshark?”)
- mistakes I made
In general, I’d say that people’s comments about facts tend to stay pretty normal. The main kinds of negative comments I get about facts are:
- occasionally people get a little rude about facts I didn’t mention (“Didn’t
use -n in any of the examples…please…“). I think I didn’t mention
-nin that post because at the time I didn’t know why the-nflag was useful (it’s useful because it turns off this annoying reverse DNS lookup that tcpdump does by default so you can see the IP addresses). - people are also sometimes weird about mistakes. I mostly try to head this off by trying to be self-aware about my knowledge level on a topic, and saying “I’m not sure…” when I’m not sure about something.
stories are great
I think stories encourage pretty good discussion. For example, why you should understand (a little) about TCP is a story about a time it was important for me to understand how TCP worked.
When I share stories about problems I solved, the comments really help me understand how what I learned fits into a bigger context. For example:
- is this a common problem? people will often comment saying “this happened to me too!”
- what are other common related problems that come up?
- are there other possible solutions I didn’t consider?
Also I think these kinds of stories are incredibly important – that post describes a bug that was VERY hard for me to solve, and the only reason I was able to figure it out in the first place was that I read this blog post.
ask technical questions
Often in my blog posts I ask technical questions that I don’t know the answer to (or just mention “I don’t know X…”). This helps people focus their replies a little bit – an obvious comment to make is to provide an answer to the question, or explain the thing I didn’t know!
This is fun because it feels like a guaranteed way to get value out of people’s comments – people LOVE answering questions, and so they get to look smart, and I get the answer to a question I have! Everyone wins!
fix mistakes
I make a lot of mistakes in my blog posts, because I write about a lot of things that are on the edge of my knowledge. When people point out mistakes, I often edit the blog post to fix it.
Usually I’ll stay near a computer for a few hours after I post a blog post so that I can fix mistakes quickly as they come up.
Some people are very careful to list every single error they made in their blog posts (“errata: the post previously said X which was wrong, I have corrected it to say Y”). Personally I make mistakes constantly and I don’t have time for that so I just edit the post to fix the mistakes.
ask for examples/experiences, not opinions
A lot of the time when I post a blog post, people on Twitter/Mastodon will reply with various opinions they have about the thing. For example, someone recently replied to a blog post about DNS saying that they love using zone files and dislike web interfaces for managing DNS records. That’s not an opinion I share, so I asked them why.
They explained that there are some DNS record types (specifically TLSA) that they find
often aren’t supported in web interfaces. I didn’t know that people used TLSA
records, so I learned something! Cool!
I’ve found that asking people to share their experiences (“I wanted to use X DNS record type and I couldn’t”) instead of their opinions (“DNS web admin interfaces are bad”) leads to a lot of useful information and discussion. I’ve learned a lot from it over the years, and written a lot of tweets like “which DNS record types have you needed?” to try to extract more information about people’s experiences.
I try to model the same behaviour in my own work when I can – if I have an opinion, I’ll try to explain the experiences I’ve had with computers that caused me to have that opinion.
start with a little context
I think internet strangers are more likely to reply in a weird way when they have no idea who you are or why you’re writing this thing. It’s easy to make incorrect assumptions! So often I’ll mention a little context about why I’m writing this particular blog post.
For example:
A little while ago I started using a Mac, and one of my biggest frustrations with it is that often I need to run Linux-specific software.
or
I’ve started to run a few more servers recently (nginx playground, mess with dns, dns lookup), so I’ve been thinking about monitoring.
or
Last night, I needed to scan some documents for some bureaucratic reasons. I’d never used a scanner on Linux before and I was worried it would take hours to figure out…
avoid causing boring conversations
There are some kinds of programming conversations that I find extremely boring (like “should people learn vim?” or “is functional programming better than imperative programming?“). So I generally try to avoid writing blog posts that I think will result in a conversation/comment thread that I find annoying or boring.
For example, I wouldn’t write about my opinions about functional programming: I don’t really have anything interesting to say about it and I think it would lead to a conversation that I’m not interested in having.
I don’t always succeed at this of course (it’s impossible to predict what people are going to want to comment about!), but I try to avoid the most obvious flamebait triggers I’ve seen in the past.
There are a bunch of “flamebait” triggers that can set people off on a conversation that I find boring: cryptocurrency, tailwind, DNSSEC/DoH, etc. So I have a weird catalog in my head of things not to mention if I don’t want to start the same discussion about that thing for the 50th time.
Of course, if you think that conversations about functional programming are interesting, you should write about functional programming and start the conversations you want to have!
Also, it’s often possible to start an interesting conversation about a topic where the conversation is normally boring. For example I often see the same talking points about IPv6 vs IPv4 over and over again, but I remember the comments on Reasons for servers to support IPv6 being pretty interesting. In general if I really care about a topic I’ll talk about it anyway, but I don’t care about functional programming very much so I don’t see the point of bringing it up.
preempt common suggestions
Another kind of “boring conversation” I try to avoid is suggestions of things I have already considered. Like when someone says “you should do X” but I already know I could have done X and chose not to because of A B C.
So I often will add a short note like “I decided not to do X because of A B C” or “you can also do X” or “normally I would do X, here I didn’t because…”. For example, in this post about nix, I list a bunch of Nix features I’m choosing not to use (nix-shell, nix flakes, home manager) to avoid a bunch of helpful people telling me that I should use flakes.
Listing the things I’m not doing is also helpful to readers – maybe someone new to nix will discover nix flakes through that post and decide to use them! Or maybe someone will learn that there are exceptions to when a certain “best practice” is appropriate.
set some boundaries
Recently on Mastodon I complained about some gross terminology (“domain information groper”) that I’d just noticed in the dig man page on my machine. A few dudes in the replies (who by now have all deleted their posts) asked me to prove that the original author intended it to be offensive (which of course is besides the point, there’s just no need to have a term widely understood to be referring to sexual assault in the dig man page) or tried to explain to me why it actually wasn’t a problem.
So I blocked a few people and wrote a quick post:
man so many dudes in the replies demanding that i prove that the person who named dig “domain information groper” intended it in an offensive way. Big day for the block button I guess :)
I don’t do this too often, but I think it’s very important on social media to occasionally set some rules about what kind of behaviour I won’t tolerate. My goal here is usually to drive away some of the assholes (they can unfollow me!) and try to create a more healthy space for everyone else to have a conversation about computers in.
Obviously this only works in situations (like Twitter/Mastodon) where I have the ability to garden my following a little bit over time – I can’t do this on HN or Reddit or Lobsters or whatever and wouldn’t try.
As for fixing it – the dig maintainers removed the problem language years ago, but Mac OS still has a very outdated version for license reasons.
(you might notice that this section is breaking the “avoid boring conversations” rule above, this section was certain to start a very boring argument, but I felt it was important to talk about boundaries so I left it in)
don’t argue
Sometimes people seem to want to get into arguments or make dismissive comments. I don’t reply to them, even if they’re wrong. I dislike arguing on the internet and I’m extremely bad at it, so it’s not a good use of my time.
analyze negative comments
If I get a lot of negative comments that I didn’t expect, I try to see if I can get something useful out of it.
For example, I wrote a toy DNS resolver once and some of the commenters were upset that I didn’t handle parsing the DNS packet. At the time I thought this was silly (I thought DNS parsing was really straightforward and that it was obvious how to do it, who cares that I didn’t handle it?) but I realized that maybe the commenters didn’t think it was easy or obvious, and wanted to know how to do it. Which makes sense! It’s not obvious at all if you haven’t done it before!
Those comments partly inspired implement DNS in a weekend, which focuses much more heavily on the parsing aspects, and which I think is a much better explanation how to write a DNS resolver. So ultimately those comments helped me a lot, even if I found them annoying at the time.
(I realize this section makes me sound like a Perfectly Logical Person who does not get upset by negative public criticism, I promise this is not at all the case and I have 100000 feelings about everything that happens on the internet and get upset all the time. But I find that analyzing the criticism and trying to take away something useful from it helps a bit)
that’s all!
Thanks to Shae, Aditya, Brian, and Kamal for reading a draft of this.
Some other similar posts I’ve written in the past:
Street team.
My new book, You Deserve a Tech Union, is almost here. Here’s how you can help support it!
Assistive technology shouldn’t be a mystery box
Behind "Hello World" on Linux
Today I was thinking about – what happens when you run a simple “Hello World” Python program on Linux, like this one?
print("hello world")
Here’s what it looks like at the command line:
$ python3 hello.py
hello world
But behind the scenes, there’s a lot more going on. I’ll
describe some of what happens, and (much much more importantly!) explain some tools you can use to
see what’s going on behind the scenes yourself. We’ll use readelf, strace,
ldd, debugfs, /proc, ltrace, dd, and stat. I won’t talk about the Python-specific parts at all – just what happens when you run any dynamically linked executable.
Here’s a table of contents:
- parse “python3 hello.py”
- figure out the full path to python3
- stat, under the hood
- time to fork
- the shell calls execve
- get the binary’s contents
- find the interpreter
- dynamic linking
- go to _start
- write a string
before execve
Before we even start the Python interpreter, there are a lot of things that have to happen. What executable are we even running? Where is it?
1: The shell parses the string python3 hello.py into a command to run and a list of arguments: python3, and ['hello.py']
A bunch of things like glob expansion could happen here. For example if you run python3 *.py, the shell will expand that into python3 hello.py
2: The shell figures out the full path to python3
Now we know we need to run python3. But what’s the full path to that binary? The way this works is that there’s a special environment variable named PATH.
See for yourself: Run echo $PATH in your shell. For me it looks like this.
$ echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
When you run a command, the shell will search every directory in that list (in order) to try to find a match.
In fish (my shell), you can see the path resolution logic here.
It uses the stat system call to check if files exist.
See for yourself: Run strace -e stat, and then run a command like python3. You should see output like this:
stat("/usr/local/sbin/python3", 0x7ffcdd871f40) = -1 ENOENT (No such file or directory)
stat("/usr/local/bin/python3", 0x7ffcdd871f40) = -1 ENOENT (No such file or directory)
stat("/usr/sbin/python3", 0x7ffcdd871f40) = -1 ENOENT (No such file or directory)
stat("/usr/bin/python3", {st_mode=S_IFREG|0755, st_size=5479736, ...}) = 0
You can see that it finds the binary at /usr/bin/python3 and stops: it
doesn’t continue searching /sbin or /bin.
(if this doesn’t work for you, instead try strace -o out bash, and then grep
stat out. One reader mentioned that their version of libc uses a different
system call instead of stat)
2.1: A note on execvp
If you want to run the same PATH searching logic as the shell does without
reimplementing it yourself, you can use the libc function execvp (or one of
the other exec* functions with p in the name).
3: stat, under the hood
Now you might be wondering – Julia, what is stat doing? Well, when your OS opens a file, it’s split into 2 steps.
- It maps the filename to an inode, which contains metadata about the file
- It uses the inode to get the file’s contents
The stat system call just returns the contents of the file’s inodes – it
doesn’t read the contents at all. The advantage of this is that it’s a lot
faster. Let’s go on a short adventure into inodes. (this great post “A disk is a bunch of bits” by Dmitry Mazin has more details)
$ stat /usr/bin/python3
File: /usr/bin/python3 -> python3.9
Size: 9 Blocks: 0 IO Block: 4096 symbolic link
Device: fe01h/65025d Inode: 6206 Links: 1
Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2023-08-03 14:17:28.890364214 +0000
Modify: 2021-04-05 12:00:48.000000000 +0000
Change: 2021-06-22 04:22:50.936969560 +0000
Birth: 2021-06-22 04:22:50.924969237 +0000
See for yourself: Let’s go see where exactly that inode is on our hard drive.
First, we have to find our hard drive’s device name
$ df
...
tmpfs 100016 604 99412 1% /run
/dev/vda1 25630792 14488736 10062712 60% /
...
Looks like it’s /dev/vda1. Next, let’s find out where the inode for /usr/bin/python3 is on our hard drive:
$ sudo debugfs /dev/vda1
debugfs 1.46.2 (28-Feb-2021)
debugfs: imap /usr/bin/python3
Inode 6206 is part of block group 0
located at block 658, offset 0x0d00
I have no idea how debugfs is figuring out the location of the inode for that filename, but we’re going to leave that alone.
Now, we need to calculate how many bytes into our hard drive “block 658, offset 0x0d00” is on the big array of bytes that is your hard drive. Each block is 4096 bytes, so we need to go 4096 * 658 + 0x0d00 bytes. A calculator tells me that’s 2698496
$ sudo dd if=/dev/vda1 bs=1 skip=2698496 count=256 2>/dev/null | hexdump -C
00000000 ff a1 00 00 09 00 00 00 f8 b6 cb 64 9a 65 d1 60 |...........d.e.`|
00000010 f0 fb 6a 60 00 00 00 00 00 00 01 00 00 00 00 00 |..j`............|
00000020 00 00 00 00 01 00 00 00 70 79 74 68 6f 6e 33 2e |........python3.|
00000030 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |9...............|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000060 00 00 00 00 12 4a 95 8c 00 00 00 00 00 00 00 00 |.....J..........|
00000070 00 00 00 00 00 00 00 00 00 00 00 00 2d cb 00 00 |............-...|
00000080 20 00 bd e7 60 15 64 df 00 00 00 00 d8 84 47 d4 | ...`.d.......G.|
00000090 9a 65 d1 60 54 a4 87 dc 00 00 00 00 00 00 00 00 |.e.`T...........|
000000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
Neat! There’s our inode! You can see it says python3 in it, which is a really
good sign. We’re not going to go through all of this, but the ext4 inode struct from the Linux kernel
says that the first 16 bits are the “mode”, or permissions. So let’s work that out how ffa1 corresponds to file permissions.
- The bytes
ffa1correspond to the number0xa1ff, or 41471 (because x86 is little endian) - 41471 in octal is
0120777 - This is a bit weird – that file’s permissions could definitely be
777, but what are the first 3 digits? I’m not used to seeing those! You can find out what the012means in man inode (scroll down to “The file type and mode”). There’s a little table that says012means “symbolic link”.
Let’s list the file and see if it is in fact a symbolic link with permissions 777:
$ ls -l /usr/bin/python3
lrwxrwxrwx 1 root root 9 Apr 5 2021 /usr/bin/python3 -> python3.9
It is! Hooray, we decoded it correctly.
4: Time to fork
We’re still not ready to start python3. First, the shell needs to create a
new child process to run. The way new processes start on Unix is a little weird
– first the process clones itself, and then runs execve, which replaces the
cloned process with a new process.
*See for yourself: Run strace -e clone bash, then run python3. You should see something like this:
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f03788f1a10) = 3708100
3708100 is the PID of the new process, which is a child of the shell process.
Some more tools to look at what’s going on with processes:
pstreewill show you a tree of all the processes on your systemcat /proc/PID/statshows you some information about the process. The contents of that file are documented inman proc. For example the 4th field is the parent PID.
4.1: What the new process inherits.
The new process (which will become python3) has inherited a bunch of from the shell. For example, it’s inherited:
- environment variables: you can look at them with
cat /proc/PID/environ | tr '\0' '\n' - file descriptors for stdout and stderr: look at them with
ls -l /proc/PID/fd - a working directory (whatever the current directory is)
- namespaces and cgroups (if it’s in a container)
- the user and group that’s running it
- probably more things I’m not thinking of right now
5: The shell calls execve
Now we’re ready to start the Python interpreter!
See for yourself: Run strace -f -e execve bash, then run python3. The -f is important because we want to follow any forked child subprocesses. You should see something like this:
[pid 3708381] execve("/usr/bin/python3", ["python3"], 0x560397748300 /* 21 vars */) = 0
The first argument is the binary, and the second argument is the list of command line arguments. The command line arguments get placed in a special location in the program’s memory so that it can access them when it runs.
Now, what’s going on inside execve?
6: get the binary’s contents
The first thing that has to happen is that we need to open the python3
binary file and read its contents. So far we’ve only used the stat system call to access its metadata,
but now we need its contents.
Let’s look at the output of stat again:
$ stat /usr/bin/python3
File: /usr/bin/python3 -> python3.9
Size: 9 Blocks: 0 IO Block: 4096 symbolic link
Device: fe01h/65025d Inode: 6206 Links: 1
...
This takes up 0 blocks of space on the disk. This is because the contents of
the symbolic link (python3.9) are actually in the inode itself: you can see
them here (from the binary contents of the inode above, it’s split across 2
lines in the hexdump output):
00000020 00 00 00 00 01 00 00 00 70 79 74 68 6f 6e 33 2e |........python3.|
00000030 39 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |9...............|
So we’ll need to open /usr/bin/python3.9 instead. All of this is happening
inside the kernel so you won’t see it another system call for that.
Every file is made up of a bunch of blocks on the hard drive. I think each of these blocks on my system is 4096 bytes, so the minimum size of a file is 4096 bytes – even if the file is only 5 bytes, it still takes up 4KB on disk.
See for yourself: We can find the block numbers using debugfs like this: (again, I got these instructions from dmitry mazin’s “A disk is a bunch of bits” post)
$ debugfs /dev/vda1
debugfs: blocks /usr/bin/python3.9
145408 145409 145410 145411 145412 145413 145414 145415 145416 145417 145418 145419 145420 145421 145422 145423 145424 145425 145426 145427 145428 145429 145430 145431 145432 145433 145434 145435 145436 145437
Now we can use dd to read the first block of the file. We’ll set the block size to 4096 bytes, skip 145408 blocks, and read 1 block.
$ dd if=/dev/vda1 bs=4096 skip=145408 count=1 2>/dev/null | hexdump -C | head
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 02 00 3e 00 01 00 00 00 c0 a5 5e 00 00 00 00 00 |..>.......^.....|
00000020 40 00 00 00 00 00 00 00 b8 95 53 00 00 00 00 00 |@.........S.....|
00000030 00 00 00 00 40 00 38 00 0b 00 40 00 1e 00 1d 00 |....@.8...@.....|
00000040 06 00 00 00 04 00 00 00 40 00 00 00 00 00 00 00 |........@.......|
00000050 40 00 40 00 00 00 00 00 40 00 40 00 00 00 00 00 |@.@.....@.@.....|
00000060 68 02 00 00 00 00 00 00 68 02 00 00 00 00 00 00 |h.......h.......|
00000070 08 00 00 00 00 00 00 00 03 00 00 00 04 00 00 00 |................|
00000080 a8 02 00 00 00 00 00 00 a8 02 40 00 00 00 00 00 |..........@.....|
00000090 a8 02 40 00 00 00 00 00 1c 00 00 00 00 00 00 00 |..@.............|
You can see that we get the exact same output as if we read the file with cat, like this:
$ cat /usr/bin/python3.9 | hexdump -C | head
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 02 00 3e 00 01 00 00 00 c0 a5 5e 00 00 00 00 00 |..>.......^.....|
00000020 40 00 00 00 00 00 00 00 b8 95 53 00 00 00 00 00 |@.........S.....|
00000030 00 00 00 00 40 00 38 00 0b 00 40 00 1e 00 1d 00 |....@.8...@.....|
00000040 06 00 00 00 04 00 00 00 40 00 00 00 00 00 00 00 |........@.......|
00000050 40 00 40 00 00 00 00 00 40 00 40 00 00 00 00 00 |@.@.....@.@.....|
00000060 68 02 00 00 00 00 00 00 68 02 00 00 00 00 00 00 |h.......h.......|
00000070 08 00 00 00 00 00 00 00 03 00 00 00 04 00 00 00 |................|
00000080 a8 02 00 00 00 00 00 00 a8 02 40 00 00 00 00 00 |..........@.....|
00000090 a8 02 40 00 00 00 00 00 1c 00 00 00 00 00 00 00 |..@.............|
an aside on magic numbers
This file starts with ELF, which is a “magic number”, or a byte sequence that
tells us that this is an ELF file. ELF is the binary file format on Linux.
Different file formats have different magic numbers, for example the magic
number for gzip is 1f8b. The magic number at the beginning is how file blah.gz knows that it’s a gzip file.
I think file has a variety of heuristics for figuring out the file type of a
file, not just magic numbers, but the magic number is an important one.
7: find the interpreter
Let’s parse the ELF file to see what’s in there.
See for yourself: Run readelf -a /usr/bin/python3.9. Here’s what I get (though I’ve redacted a LOT of stuff):
$ readelf -a /usr/bin/python3.9
ELF Header:
Class: ELF64
Machine: Advanced Micro Devices X86-64
...
-> Entry point address: 0x5ea5c0
...
Program Headers:
Type Offset VirtAddr PhysAddr
INTERP 0x00000000000002a8 0x00000000004002a8 0x00000000004002a8
0x000000000000001c 0x000000000000001c R 0x1
-> [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
...
-> 1238: 00000000005ea5c0 43 FUNC GLOBAL DEFAULT 13 _start
Here’s what I understand of what’s going on here:
- it’s telling the kernel to run
/lib64/ld-linux-x86-64.so.2to start this program. This is called the dynamic linker and we’ll talk about it next - it’s specifying an entry point (at
0x5ea5c0, which is where this program’s code starts)
Now let’s talk about the dynamic linker.
8: dynamic linking
Okay! We’ve read the bytes from disk and we’ve started this “interpreter” thing. What next? Well, if you run strace -o out.strace python3, you’ll see a bunch of stuff like this right after the execve system call:
execve("/usr/bin/python3", ["python3"], 0x560af13472f0 /* 21 vars */) = 0
brk(NULL) = 0xfcc000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=32091, ...}) = 0
mmap(NULL, 32091, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f718a1e3000
close(3) = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 l\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=149520, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f718a1e1000
...
close(3) = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
This all looks a bit intimidating at first, but the part I want you to pay
attention to is openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libpthread.so.0".
This is opening a C threading library called pthread that the Python
interpreter needs to run.
See for yourself: If you want to know which libraries a binary needs to load at runtime, you can use ldd. Here’s what that looks like for me:
$ ldd /usr/bin/python3.9
linux-vdso.so.1 (0x00007ffc2aad7000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f2fd6554000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f2fd654e000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f2fd6549000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f2fd6405000)
libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1 (0x00007f2fd63d6000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f2fd63b9000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2fd61e3000)
/lib64/ld-linux-x86-64.so.2 (0x00007f2fd6580000)
You can see that the first library listed is /lib/x86_64-linux-gnu/libpthread.so.0, which is why it was loaded first.
on LD_LIBRARY_PATH
I’m honestly still a little confused about dynamic linking. Some things I know:
- Dynamic linking happens in userspace and the dynamic linker on my system is at
/lib64/ld-linux-x86-64.so.2. If you’re missing the dynamic linker, you can end up with weird bugs like this weird “file not found” error - The dynamic linker uses the
LD_LIBRARY_PATHenvironment variable to find libraries - The dynamic linker will also use the
LD_PRELOADenvironment to override any dynamically linked function you want (you can use this for fun hacks, or to replace your default memory allocator with an alternative one like jemalloc) - there are some
mprotects in the strace output which are marking the library code as read-only, for security reasons - on Mac, it’s
DYLD_LIBRARY_PATHinstead ofLD_LIBRARY_PATH
You might be wondering – if dynamic linking happens in userspace, why don’t we
see a bunch of stat system calls where it’s searching through
LD_LIBRARY_PATH for the libraries, the way we did when bash was searching the
PATH?
That’s because ld has a cache in /etc/ld.so.cache, and all of those
libraries have already been found in the past. You can see it opening the cache
in the strace output – openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3.
There are still a bunch of system calls after dynamic linking in the full strace output that I
still don’t really understand (what’s prlimit64 doing? where does the locale
stuff come in? what’s gconv-modules.cache? what’s rt_sigaction doing?
what’s arch_prctl? what’s set_tid_address and set_robust_list?). But this feels like a good start.
aside: ldd is actually a simple shell script!
Someone on mastodon pointed out that ldd is actually a shell script
that just sets the LD_TRACE_LOADED_OBJECTS=1 environment variable and
starts the program. So you can do exactly the same thing like this:
$ LD_TRACE_LOADED_OBJECTS=1 python3
linux-vdso.so.1 (0x00007ffe13b0a000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f01a5a47000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f01a5a41000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f2fd6549000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f2fd6405000)
libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1 (0x00007f2fd63d6000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f2fd63b9000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2fd61e3000)
/lib64/ld-linux-x86-64.so.2 (0x00007f2fd6580000)
Apparently ld is also a binary you can just run, so /lib64/ld-linux-x86-64.so.2 --list /usr/bin/python3.9 also does the the same thing.
on init and fini
Let’s talk about this line in the strace output:
set_tid_address(0x7f58880dca10) = 3709103
This seems to have something to do with threading, and I think this might be
happening because the pthread library (and every other dynamically loaded)
gets to run initialization code when it’s loaded. The code that runs when the
library is loaded is in the init section (or maybe also the .ctors section).
See for yourself: Let’s take a look at that using readelf:
$ readelf -a /lib/x86_64-linux-gnu/libpthread.so.0
...
[10] .rela.plt RELA 00000000000051f0 000051f0
00000000000007f8 0000000000000018 AI 4 26 8
[11] .init PROGBITS 0000000000006000 00006000
000000000000000e 0000000000000000 AX 0 0 4
[12] .plt PROGBITS 0000000000006010 00006010
0000000000000560 0000000000000010 AX 0 0 16
...
This library doesn’t have a .ctors section, just an .init. But what’s in
that .init section? We can use objdump to disassemble the code:
$ objdump -d /lib/x86_64-linux-gnu/libpthread.so.0
Disassembly of section .init:
0000000000006000 <_init>:
6000: 48 83 ec 08 sub $0x8,%rsp
6004: e8 57 08 00 00 callq 6860 <__pthread_initialize_minimal>
6009: 48 83 c4 08 add $0x8,%rsp
600d: c3
So it’s calling __pthread_initialize_minimal. I found the code for that function in glibc,
though I had to find an older version of glibc because it looks like in more
recent versions libpthread is no longer a separate library.
I’m not sure whether this set_tid_address system call actually comes from
__pthread_initialize_minimal, but at least we’ve learned that libraries can
run code on startup through the .init section.
Here’s a note from man elf on the .init section:
$ man elf
.init This section holds executable instructions that contribute to the process initialization code. When a program starts to run
the system arranges to execute the code in this section before calling the main program entry point.
There’s also a .fini section in the ELF file that runs at the end, and
.ctors / .dtors (constructors and destructors) are other sections that
could exist.
Okay, that’s enough about dynamic linking.
9: go to _start
After dynamic linking is done, we go to _start in the Python interpreter.
Then it does all the normal Python interpreter things you’d expect.
I’m not going to talk about this because here I’m interested in general facts about how binaries are run on Linux, not the Python interpreter specifically.
10: write a string
We still need to print out “hello world” though. Under the hood, the Python print function calls some function from libc. But which one? Let’s find out!
See for yourself: Run ltrace -o out python3 hello.py.
$ ltrace -o out python3 hello.py
$ grep hello out
write(1, "hello world\n", 12) = 12
So it looks like it’s calling write
I honestly am always a little suspicious of ltrace – unlike strace (which I
would trust with my life), I’m never totally sure that ltrace is actually
reporting library calls accurately. But in this case it seems to be working. And
if we look at the cpython source code, it does seem to be calling write() in some places. So I’m willing to believe that.
what’s libc?
We just said that Python calls the write function from libc. What’s libc?
It’s the C standard library, and it’s responsible for a lot of basic things
like:
- allocating memory with
malloc - file I/O (opening/closing/
- executing programs (with
execvp, like we mentioned before) - looking up DNS records with
getaddrinfo - managing threads with
pthread
Programs don’t have to use libc (on Linux, Go famously doesn’t use it and calls Linux system calls directly instead), but most other programming languages I use (node, Python, Ruby, Rust) all use libc. I’m not sure about Java.
You can find out if you’re using libc by running ldd on your binary: if you
see something like libc.so.6, that’s libc.
why does libc matter?
You might be wondering – why does it matter that Python calls the libc write
and then libc calls the write system call? Why am I making a point of saying
that libc is in the middle?
I think in this case it doesn’t really matter (AFAIK the write libc function
maps pretty directly to the write system call)
But there are different libc implementations, and sometimes they behave differently. The two main ones are glibc (GNU libc) and musl libc.
For example, until recently musl’s getaddrinfo didn’t support TCP DNS, here’s a blog post talking about a bug that that caused.
a little detour into stdout and terminals
In this program, stdout (the 1 file descriptor) is a terminal. And you can do
funny things with terminals! Here’s one:
- In a terminal, run
ls -l /proc/self/fd/1. I get/dev/pts/2 - In another terminal window, write
echo hello > /dev/pts/2 - Go back to the original terminal window. You should see
helloprinted there!
that’s all for now!
Hopefully you have a better idea of how hello world gets printed! I’m going to stop
adding more details for now because this is already pretty long, but obviously there’s
more to say and I might add more if folks chip in with extra details. I’d
especially love suggestions for other tools you could use to inspect parts of
the process that I haven’t explained here.
Thanks to everyone who suggested corrections / additions – I’ve edited this blog post a lot to incorporate more things :)
Some things I’d like to add if I can figure out how to spy on them:
- the kernel loader and ASLR (I haven’t figured out yet how to use bpftrace + kprobes to trace the kernel loader’s actions)
- TTYs (I haven’t figured out how to trace the way
write(1, "hello world", 11)gets sent to the TTY that I’m looking at)
I’d love to see a Mac version of this
One of my frustrations with Mac OS is that I don’t know how to introspect my
system on this level – when I print hello world, I can’t figure out how to
spy on what’s going on behind the scenes the way I can on Linux. I’d love to
see a really in depth explainer.
Some Mac equivalents I know about:
ldd->otool -Lreadelf->otool- supposedly you can use
dtrussordtraceon mac instead of strace but I’ve never been brave enough to turn off system integrity protection to get it to work strace->sc_usageseems to be able to collect stats about syscall usage, andfs_usageabout file usage
more reading
Some more links:
- A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux
- an exploration of “hello world” on FreeBSD
- hello world under the microscope for Windows
- From LWN: how programs get run (and part two) have a bunch more details on the internals of
execve - Putting the “You” in CPU by Lexi Mattick
- “Hello, world” from scratch on a 6502 (video from Ben Eater)
Post by post.
Life after Twitter remains, well, weird. Maybe this is better.