Reading List
The most recent articles from a list of feeds I subscribe to.
Write good examples by starting with real code
When I write about programming, I spend a lot of time trying to come up with good examples. I haven’t seen a lot written about how to make examples, so here’s a little bit about my approach to writing examples!
The basic idea here is to start with real code that you wrote and then remove irrelevant details to make it into a self-contained example instead of coming up with examples out of thin air.
I’ll talk about two kinds of examples: realistic examples and suprising examples.
good examples are realistic
To see why examples should be realistic, let’s first talk about an unrealistic
example! Let’s say we’re trying to explain Python lambdas (which is just the
first concept I thought of). You could give this example, of using map and a
lambda to double a set of numbers.
numbers = [1, 2, 3, 4]
squares = map(lambda x: x * x, numbers)
I think this example is unrealistic for a couple of reasons:
- squaring a set of numbers isn’t something you’re super likely to do in a real program unless it’s for Project Euler or something (there are LOTS of operations on lists that are a lot more likely)
- This usage of
mapis not idiomatic Python, even if you were doing this I would write[x*x for x in numbers]instead
A more realistic example of Python lambdas is using them with sort, like this;
children = [{"name": "ashwin", "age": 12}, {"name": "radhika", "age": 3}]
sorted_children = sorted(children, key=lambda x: x['age'])
But this example is still pretty contrived (why exactly do we need to sort these children by age?). So how do we actually make realistic examples?
how to make your examples realistic: look at actual code you wrote
I think the easiest way to make realistic examples is, instead of pulling an
example out of thin air (like I did with that children example), instead just
start by looking at real code!
For example, if I grep a bunch of Python code I wrote for sort.+key, I find
LOTS of real examples of me sorting a list by some criterion, like:
tasks.sort(key=lambda task: task['completed_time'])emails = reversed(sorted(emails, key=lambda x:x['receivedAt']))sorted_keysizes = sorted(scores.keys(), key=scores.get)shows = sorted(dates[date], key=lambda x: x['time']['performanceTime'])
It’s pretty easy to see a pattern here – a lot of these are sorting by time! So now we can make a simple realistic example of sorting some objects (emails, events, etc) by time, like sorting some calendar events by their unix timestamp:
events = [
{ 'date': 1625837042, 'name': 'birthday party'},
{ 'date': 1620581136, 'name': 'dinner with Yifei'},
{ 'date': 1589045136, 'name': 'dentist appointment'},
]
sorted_events = sorted(events, key=lambda x: x['date'])
I think this is more realistic than the “sort children by age” example, and it’s just as simple!
realistic examples help “sell” the concept you’re trying to explain
When I’m trying to explain an idea (like Python lambdas), I’m usually also trying to convince the reader that it’s worth learning! Python lambdas are super useful! And to convince someone that lambdas are useful, it really helps to show someone how lambdas could help them do a task that they could actually imagine themselves doing, and ideally a task that they’ve done before.
distilling down examples from real code can take a long time
The example I just gave of explaining how to use sort with lambda is
pretty simple and it didn’t take me a long time to come up with, but turning
real code into a standalone example can take a really long time!
For example, I was thinking of including an example of some weird CSS behaviour in this post to illustrate how it’s fun to create examples with weird or surprising behaviour. I spent 2 hours taking a real problem I had this week, making sure I understood what was actually happening with the CSS, and making it into a minimal example.
In the end it “just” took 5 lines of HTML and a tiny bit of CSS to demonstrate the problem and it doesn’t really look like it took hours to write. But originally it was hundreds of lines of JS/CSS/JavaScript, and it takes time to untangle all that and come up with something small that gets at the heart of the issue!
But I think it’s worth it to take the time to make examples really clear and minimal – if hundreds of people are reading your example, you’re saving them all so much time!
that’s all for now!
I think there’s a lot more to say about examples – for instance I think there are a few different types of useful examples, like:
- examples that are surprising to the reader, which are more about changing someone’s mental model than providing code to use directly
- examples that are easy to copy and paste to use as a starting point
but maybe I’ll write about that another day. :)
Reasons why bugs might feel "impossible"
Hello! I’m very slowly working on writing a zine about debugging, so I asked on Twitter the other day:
If you’ve run into a bug where it felt “impossible” to understand what was happening – what made it feel that way?
Of course, bugs always happen for logical reasons, but I’ve definitely run into bugs that felt like they might be impossible for me to understand (until I figured them out!)
I got about 400 responses, which I’ll try to summarize here. I’m not going to talk about how to deal with these various kinds of “impossible” bugs in this post, I’ll just try to classify them.
Here are the categories I came up with for ways a bug might feel impossible to understand. Each one of them has a bunch of sub variants which are bolded below.
- it’s hard to reproduce
- you don’t understand the overall system well
- it’s hard to get data about the bug
- one of your assumptions is wrong
- the bug is really complex
1. the bug is hard to reproduce locally
I thought this description was really great:
The ones that make me contemplate a career change are usually bugs that are only happen to a few users, can’t be reproduced consistently by users or at all in-house, and have slightly varying descriptions in each bug report (kinda like Bigfoot sightings).
Here are some specific ways a bug can be hard to reproduce:
the bug is nondeterministic
You run your program with the exact same inputs 1000 times, and it only fails once. This happens a lot with race conditions in multithreaded programs.
the bug only happens in production
Lots of bugs are hard to reproduce in your dev environment, either because it’s hard to figure out exactly which inputs trigger the bug, or because they only happen under certain conditions (like a lot of traffic) which are hard to recreate.
you don’t have access to the machine where the bug is happening
Three examples of this:
- you’re shipping software (a binary or a website) that runs on your customer’s computer, they have a problem, and you don’t have direct access to their computer to see what’s going on.
- the problem involves a managed cloud service that you don’t have a lot of access to.
- the problem only happens on an input of data that you don’t have access to (perhaps because the data is classified/private)
you don’t have access to the data you need to reproduce the bug
One person mentioned a case where the bug was easy to reproduce, but the data they needed to reproduce it was confidential, so they weren’t allowed to have access to it.
it’s very slow to reproduce
There are bugs where you know exactly how to reproduce it, but it takes a long time (like 20 minutes or way longer) to reproduce the bug. This is hard because it’s hard to maintain your focus: maybe you can only try 1 experiment per day!
2. you don’t understand the overall system well
Even if you can reproduce the bug, if you don’t understand how the part of the program with the bug works, you can end up VERY stuck.
Some examples of this that came up:
unknown unknowns: the bug involves a system or concept you’ve didn’t know about
Sometimes bugs are caused by a part of the system that you didn’t even know existed. For example, when I was debugging this TCP issue, I’d never heard of Nagle’s algorithm or delayed ACKs. So it was pretty difficult to recognize that they were causing the problem!
The only reason I was able to diagnose that bug was that someone at work had coincidentally posted a blog post about it and I remembered the symptoms were similar.
Here’s another example of this from the Twitter replies:
I was sending strings containing null bytes (long story) between two systems that support them, but in some cases, theres a step along the way that doesn’t support them
Another example of “the bug is in a surprising place” is this case of a bug in a scanner.
The next few sections are more specific ways confusion about the program works can make a bug difficult to solve.
the bug is in an external library you don’t understand
Sometimes the bug is in a library or an open source program you’re completely unfamiliar with, but you have to fix it anyway. This makes debugging hrad because:
- you need to learn how the library works
- it’s not always easy to modify the library and get your program to use your modified version of the library, so it’s hard to experiment and make changes or add extra instrumentation to the library
you don’t understand the error message at all
Some error messages initially seem totally incomprehensible. A couple of examples of this:
- “values of β may give rise to dom!”, from this talk by Mark Allen on that error message or
- “Size must be between and 16793600(16MB) First element: oints” from the talk The tales of the cursed operating systems textbook by Kiran Bhattaram
- Some compiler error messages can be very confusing if you don’t know what they mean
These are tricky because it’s not clear where to start – what is β? What is this element oints doing here?
Another variant of this is debugging output that’s formatted in a confusing way.
you don’t know what keywords to search to get more information
One case that a lot of people mentioned is: you search for a keyword that you think is related to your bug, you get 10 million results, and none of them are helpful.
the bug is in a proprietary system
Figuring out an unfamiliar system is already hard, and it’s even worse when you can’t even read the source code!
the system is poorly documented
A few variants of this:
- there’s no documentation, or very sparse documentation
- the only information about the system is from someone you can’t contact – person who does understand it has left the company, or you don’t know who they are, or they work at a company you can’t find any contact information for
- the information you need is in a 2000 page PDF and you don’t know where to start looking
3. it’s hard to get information about the program’s internal state
Even if you generally understand the system you’re working with and you can reproduce the bug, debugging is almost impossible if you can’t get enough information about the program’s internal state when the bug happens.
Here are a few specific reasons it can be hard to get data about the program’s internal state.
there’s no output at all
Your program failed, but there’s no output at all to read to tell you why it failed. Not even an error message! It just didn’t work.
This has happened to me before with operating systems bugs – my toy OS didn’t start and because it failed before I had any way of printing output, I had no idea was wrong – it just didn’t work!
there’s way too much output
It’s also easy to drown in too much output – I’ve turned on debug output and then been totally overwhelmed by how much information there is. It’s very hard to tell what’s relevant and what’s irrelevant in a million log lines!
information about the bug is split across many places
When investigating a distributed systems bug, the log lines related to the bug are often spread across a bunch of different services. And sometimes there’s no request ID that you can use to easily figure out which log lines from service A corresponded to the exception you saw in service B.
So you end up spending a long time manually staring at logs and trying to correlate them. I’ve spent more of my life doing this than I’d prefer :)
it’s not possible to use a debugger/add print statements
For example, if you want to know something about the state of your database (like Postgres), you’re definitely not going to attach a debugger to your production database, and you probably don’t want to recompile it to add extra logging information. (though I have definitely recompiled programs just to add extra logging information I needed!)
So you need to rely on the program’s existing logging mechanisms and hope that they have the information you need.
the bug goes away when you use a debugger
Here’s a story from the Twitter replies about that:
I had a bug in C++ code that would cause a seg fault. When I compiled with the debug flag on, it worked fine. So really hard to find. Turned out I was copying a string that was 2 bytes too big into a struct. The debug flag created extra space for it!
Another reason a debugger can cause a bug to go away is if it’s a race condition – debuggers often make the program run a little bit slower which can cause the race not to happen.
A related story about how a print statement can make the bug disappear:
In c or c++ printf can act as an ad-hoc synchronization point/cooperative MT point so adding printf changes the execution order of the threads, making them problem go away.
4. one of your assumptions is wrong
For example, in almost all cases it’s fair to assume that the compiler does not have a bug and that the bug is in your code. But as someone on Twitter pointed out, very rarely it is a compiler bug! (here’s the compiler bug they experienced)
Other examples of (more mundane) assumptions that can be wrong:
- assuming your new code is being run when in fact something is being cached
- assuming some environment variable is set when it isn’t
- assuming the bug is in the software when it’s in the hardware (like a bad cable!)
- assuming the documentation is correct
Let’s go over a few variants of “one of your assumptions is wrong”.
the red herring
Sometimes you see something early on when debugging that looks VERY suspicious and spend a long time investigating it, but then it turns out to be totally unrelated to the bug. This is pretty normal and it often doesn’t mean you did anything wrong (you can’t take the perfect most efficient path to understanding the bug every time!). But it can be really demoralizing.
the case that works and the case that doesn’t work look EXACTLY the same
This one is SO frustrating when it happens – you’re 100% sure nothing changed but somehow the code is no longer working! (of course, the answer is that something did change, you just can’t see it)
A few examples of this.
- one input causes your code to break, but it succeeds on a bunch of other inputs and you can’t figure out what’s different about the input that makes the code break
- there’s a typo that your brain is just refusing to notice
- a very small code change has caused a bug and you really think it shouldn’t have made any difference
- the exact same code is running on the same inputs, but there’s some external factor causing the bug that you haven’t considered (like a file on disk or an environment variable)
The last type we’ll talk about is bugs that are just really complex!
5. the bug is really complicated
I wanted to separate this one out because a lot of bugs that are VERY DIFFICULT to understand are actually pretty simple in the end! They’re just difficult to understand because of some of the above reasons (incorrect assumptions! you don’t understand the system! it’s hard to observe the program’s state!).
But some bugs are genuinely very complicated. A few variants of this one:
the code is complicated
One example from twitter:
too many, far-flung, and unknown influences on system behavior. e.g. multiple inheritance run amok across libraries
the error message has 0 results when you Google it
This doesn’t always mean the bug is complicated, but it’s alarming when there are 0 results, or there’s 1 result and it’s… the library’s source code, or 1 sad person on a forum posting about your exact bug but there are no replies. (“Oh no, has NOBODY ever run into this bug before?!?!”)
the bug is actually 3 bugs
With most bugs, only one thing is going wrong – everything in the system is working correctly except 1 thing and you just need to identify the 1 thing that’s causing the problem.
It’s a lot harder when multiple things are broken at once – maybe there’s a bug in your program, and also a bug in a library you’re using, and also some unexpected behaviour on the part of your load balancer.
One common example of this is security vulnerabilities – they often involve pretty complex bugs that take a long time to explain and understand even when you figure out exactly what’s going on.
bonus: you’re tired
This isn’t really a technical reason, but tricky bugs are WAY harder to fix when you’re tired or stressed out after a long day.
it’s fun to see that many people have the same types of impossible bugs
I really enjoyed seeing how many people talked about the same reasons for “impossible” bugs. Debugging sometimes feels like a really intense personal struggle (WHY is this happening to ME?!?!) and I thought it was really cool to see that even some of the weirdest reasons for bugs are shared by a lot of people! More than one person mentioned “the debugger stops the bug from happening”!
many of these can happen all at once
I was chatting with my partner about a performance problem at work that took them months to diagnose. It was challenging because:
- it was intermittent (only happened when there was a lot of traffic)
- it only happened in production
- they didn’t have direct access to the system where it was happening (it was managed by a vendor)
- it involved a Linux kernel system that they didn’t previously know existed
They figured it out, but because there were so many things that made it difficult, it took a lot of time!
If you’re interested in hearing about this debugging zine if/when I ever finish it, you can subscribe to my zine announcements mailing list. And of course I’ll post about it on this blog.
You can now buy print version of my zines!
Hello! Quick announcement: I opened a new print zine store last week, so now you can buy print copies of my zines! To start I’ve printed 350 copies of each of the “Bite Size…” zines.
Here’s a photo of the front of the zines and some stickers:
and the back covers:
Here are some notes about how the store works:
great print quality!
I worked with a really good print company (Girlie Press) and printed the zines on some nice paper, so they look WAY nicer than they do when printed on a home printer :). I’m delighted with how they turned out.
When I originally started working on this project I thought about using a print-on-demand company briefly (it sounds so convenient!) but I ordered test prints from all the ones I could find and it turns out that none of them could produce the print quality I wanted.
free shipping!
I never like paying for shipping, so I’ve set up free shipping for US orders over $30, and international orders over $50.
All of the shipping is being managed by a delightful small company called White Squirrel near Seattle, who specialize in shipping for artists. (I’m not handling it myself because I am extremely forgetful and I would just get distracted and forget to ship your order, everybody would suffer)
stickers!
I’ve also added also some stickers on the store as a bonus – there’s an strace sticker because of my great love for strace, a TCP witch (from the cover of let’s learn tcpdump), and a REALLY CUTE space penguin who I’m personally obsessed with from the cover of how containers work.
If you order the pack of all 4 zines, you’ll get all of the stickers as well as a sticker sheet of the cover of Bite Size Command Line, so you can have awk and grep stickers :)
a discount if you already bought the PDF version!
If you already bought the PDF version of these zines – thank you so much!! You can use the PASTBUYER discount code for 40% off. You’ll need to use the same email address you used when you bought the PDF. If you run into any problems with that, email me at julia@wizardzines.com.
all print zines include the PDF version too!
If you order the print version and you don’t already have the PDF version – it’s included! You’ll get a link with your confirmation email that’ll let you download the PDF right away.
more zines coming soon!
Right now only 4 zines (the “Bite Size…” zines) are available on this store because I wasn’t sure how many to order and didn’t want to end up with thousands of zines in a warehouse by accident (think of the trees!).
But I’ll definitely be adding more zines relatively soon!
hopefully also bulk rates and posters
I’m hoping to add bulk rates soon – like if you want to buy 10 copies of a zine for everyone on team, 30 copies for a class, or 100 copies as swag for a conference.
I’d also like to add some posters to the store at some point, like a how to be a wizard programmer poster.
All of that is coming later though! Sales have been going pretty well so far (we’ve sold almost half of the initial print run!), so thank you ❤.
the link again
The print zines ore at https://store.wizardzines.com, or you can find it linked from each zine’s page at https://wizardzines.com/zines/bite-size-bash (click on “print version”)
Blog about what you've struggled with
I was talking to Jemma recently about what stops people from blogging. One barrier that stood out to me was: it’s hard to identify which things you know will be useful to other people!
The process I use for a lot of my blog posts is:
- Struggle with something (usually computer-related)
- Eventually (days or months or years later), figure out how to solve some of the problems I had
- Write a blog post about what helped me
I think this approach is effective because if I struggled with something, there’s a pretty good chance that other people are struggling with it too, and what I learned is likely to be useful to at least some of them!
Obviously this isn’t the only approach to blogging, but it’s my approach, so that’s what I’m going to write about here :). I’ll give a few examples of specific blog posts that came out of something I struggled with.
it’s not about the struggle, it’s about what you learned
The first important thing here is that the blog posts aren’t about the struggle, exactly. I’m still not that great at writing Rust, but I wouldn’t write a blog post called “I find Rust hard” – that wouldn’t help anyone!
Instead, when I learn something that helps me, I write about it so that it can help other people too. For example, one specific thing I struggled with in Rust was understanding references, and so I wrote what’s a reference in Rust? about what I learned.
what you struggled with shows you what to focus on
Okay, Julia, you might be thinking – if it’s about what you learned, why isn’t
this blog post called “Blog about what you learned” then? Well, we’ve all
learned lots of things! For example at some point in the last 8 years I learned
Go. But what’s worth talking about with Go? Should I explain the syntax? Talk
about net/http? Explain Go modules?
If I instead think about what I’ve struggled with Go, it suddenly gets MUCH clearer – one thing I’ve had trouble with is deadlocks! That’s way more specific, and a lot more likely to be useful to other people than an intro to Go modules – it’s not obvious how to use Go’s concurrency features well!
it can take years to figure out what you learned
When I started my first job at a “big” company 7 years ago (“big” being more than 5 people), I really didn’t understand how to work with my manager effectively and it sometimes caused misunderstandings. It wasn’t great!
But when I was first having problems with this, I didn’t have anything that useful to say about this other than “oh no, um, this is hard”. This was because I hadn’t solved my problems for myself yet, so I definitely could not tell anyone else what I learned! It took me a few years to figure out how to work with a manager well.
And I’m still figuring out new ways to explain what I learned – for example just a few months ago I realized (while talking to my old manager) that there are a lot of concrete facts that managers don’t know, and if you think your manager does know those facts, you’ll end up with a lot of miscommunications and problems.
So I wrote Things your manager might not know as another attempt at helping people who are learning to work with their manager effectively. I wrote that post a year and a half after I left my job, so I didn’t even have a manager at the time!
write it down while you still remember what was hard
It’s very easy to misidentify what you learned if you don’t remember what it was like to struggle with the topic.
When I first started using git at work, it was confusing and I made a lot of mistakes. But that was in 2011 and I can’t remember what was hard about it anymore! So I could say that the most important thing to learn to solve your git issues is git’s object model (like how branches / commits work), but I don’t exactly know if that’s true! I know that I used to struggle with git, and now I don’t, and now I have a pretty good model of how git’s object works, but I don’t really remember exactly what got me from there to here.
advanced mode: write about other people’s struggles
But if you don’t remember what was hard about something, not all is lost! It’s definitely possible to write about a topic that somebody else is struggling with. I find that the easiest way to do this is to first teach the topic, so here’s a quick story about that.
In 2019, I wrote a zine about SQL. When I started, I thought it would be easy because I was pretty comfortable with SQL – I’d done a LOT of data analysis in SQL and so I thought I could explain it.
But I couldn’t have been more wrong. It turned out that when I started I had no idea what was actually challenging about learning SQL.
I spent a lot of time talking to a friend who was new to SQL about how it worked, and we realized that one of the blockers was that it wasn’t obvious to them in what order a given SQL query was running. So I wrote SQL queries don’t start with SELECT, and a bunch of related examples and that helped a lot of people understand SQL queries better!
The cool thing about this is that when I dig into something that I think is easy but someone else is struggling with, often I learn something new too. For example I did sort of know in what order SQL queries ran but I’d never really thought about it explicitly. And being more explicit about how it worked helped me understand window functions better, which was something I was a bit shaky on!
sometimes you just haven’t learned enough about a topic yet (and that’s ok)
There are still a lot of programming and career things that I’ve struggled with in the past where I still don’t have a concrete lesson that I can write about. For example, I’ve struggled a lot with Kubernetes and Envoy and I’ve written about that a bit on this blog, but I’m still not sure what I learned from some of the problems I had. And I don’t work with either of them anymore so it’s possible I’ll never really be able to say! This kind of feels bad, but it’s okay.
Every so often I’ll think about a topic I’ve struggled with in the past and reflect on whether I’ve learned anything I can write about. Usually the answer is no, but sometimes the answer is yes!
it’s a bit weird to be vulnerable on the internet
Talking about things I struggled with on the internet is kind of scary sometimes! Here are a few things I do to make it less scary:
- Mostly talk about technical problems! Talking about computer problems I had (“I didn’t understand how groups worked on Linux”) feels very neutral to me. We’re not born learning how groups work on Linux and everyone has to learn it at some point.
- Be a little vague when talking about people problems! For example, get your work recognized: write a brag document comes out of some stress I had around getting promoted. I’m not very specific about my problems because everyone’s experience with getting promoted is super different and I think focusing too much on my specific issues would distract from the lesson (“track your accomplishments!”).
- Spend a lot of time processing things! In general the more I struggled with something, the more time I need to spend processing it before I can figure out how to talk about what I learned from it in public.
- Don’t talk about everything! There are obviously lots of things I never talk about on my blog at all :)
I wrote another blog post about blogging principles I use a few years ago that talks about some more tactics I use here.
you can practice identifying what you learned
Going from “I have a problem!” to “I don’t have that problem anymore!” to “here are the specific things I learned!” is not actually that easy! But it is something you can practice. It’s easy to skip that last step – you can learn things on an intuitive level but never actually identify what exactly it was that you learned.
For example, I’m definitely better at testing than I used to be but I haven’t taken the time to identify exactly what I’ve learned about testing over the years! I think I’d write better tests if I explicitly wrote down what I’ve learned about testing so that I could more consistently do those things in the future.
talk to a friend or coworker to figure out what you’ve learned
It can be really hard to notice things you’ve learned on your own. Like we just talked about, I don’t really konw what I’ve learned about testing!
I find that having conversations with friends or coworkers makes it MUCH easier to figure out what I want to write about a topic. A few reasons talking to others is great:
- It can help clarify your thoughts!
- They probably have different ideas from you!
- They can tell you if what you’re saying resonates with them or not!
why I like writing about what I learned in public
I think that whether or not you write about what you learned in public, it’s super valuable to keep track of what you learned from doing hard things. It helps you remember what you’ve learned so that you can do better work in the future!
Here are a few things I like about writing about what I’ve learned in public, though:
- It helps other people! It feels way better to have struggled with a super confusing situation and come out of it with something concrete that can help others navigate a similar situation
- Putting the writing on the internet really forces me to think about whether the lessons I think I learned actually make sense (“wait, is this REALLY true?“)
- When I’m writing I often come up with additional questions and do a little bit of extra research, so I learn even more!
- Seeing other people’s reactions often helps me learn something new
- If I want to remember what I learned about something in the past, I can just look it up on my blog!
Thanks to Jemma, Kamal, Shae, Matthieu, and Travis for feedback on a draft of this.
How to look at the stack with gdb
I was chatting with someone yesterday and they mentioned that they don’t really understand exactly how the stack works or how to look at it.
So here’s a quick walkthrough of how you can use gdb to look at the stack of a C program. I think this would be similar for a Rust program, but I’m going to use C because I find it a little simpler for a toy example and also you can do Terrible Things in C more easily.
our test program
Here’s a simple C program that declares a few variables and reads two strings from standard input. One of the strings is on the heap, and one is on the stack.
#include <stdio.h>
#include <stdlib.h>
int main() {
char stack_string[10] = "stack";
int x = 10;
char *heap_string;
heap_string = malloc(50);
printf("Enter a string for the stack: ");
gets(stack_string);
printf("Enter a string for the heap: ");
gets(heap_string);
printf("Stack string is: %s\n", stack_string);
printf("Heap string is: %s\n", heap_string);
printf("x is: %d\n", x);
}
This program uses the extremely unsafe function gets which you should never
use, but that’s on purpose – we learn more when things go wrong.
step 0: compile the program.
We can compile it with gcc -g -O0 test.c -o test.
The -g flag compiles the program with debugging symbols, which is going to
make it a lot easier to look at our variables.
-O0 tells gcc to turn off optimizations which I did just to make sure our x
variable didn’t get optimized out.
step 1: start gdb
We can start gdb like this:
$ gdb ./test
It prints out some stuff about the GPL and then gives a prompt. Let’s create a breakpoint on the main function.
(gdb) b main
Breakpoint 1 at 0x1171: file test.c, line 4.
Then we can run the program:
(gdb) run
Starting program: /home/bork/work/homepage/test
Breakpoint 1, main () at test.c:4
4 int main() {
Okay, great! The program is running and we can start looking at the stack
step 2: look at our variables’ addresses
Let’s start out by learning about our variables. Each of them has an address in memory, which we can print out like this:
(gdb) p &x
$3 = (int *) 0x7fffffffe27c
(gdb) p &heap_string
$2 = (char **) 0x7fffffffe280
(gdb) p &stack_string
$4 = (char (*)[10]) 0x7fffffffe28e
So if we look at the stack at those addresses, we should be able to see all of these variables!
concept: the stack pointer
We’re going to need to use the stack pointer so I’ll try to explain it really quickly.
There’s an x86 register called ESP called the “stack pointer”. Basically
it’s the address of the start of the stack for the current function. In gdb you can access it
with $sp. When you call a new function or return from a function, the value
of the stack pointer changes.
step 3: look at our variables on the stack at the beginning of main
First, let’s look at the stack at the start of the main function. Here’s
the value of our stack pointer right now:
(gdb) p $sp $7 = (void *) 0x7fffffffe270
So the stack for our current function starts at 0x7fffffffe270. Cool.
Now let’s use gdb to print out the first 40 words (aka 160 bytes) of memory after the start of the current function’s stack. It’s possible that some of this memory isn’t part of the stack because I’m not totally sure how big the stack is here. But at least the beginning of this is part of the stack.
(gdb) x/40x $sp 0x7fffffffe270: 0x00000000 0x00000000 0x55555250 0x00005555 0x7fffffffe280: 0x00000000 0x00000000 0x55555070 0x00005555 0x7fffffffe290: 0xffffe390 0x00007fff 0x00000000 0x00000000 0x7fffffffe2a0: 0x00000000 0x00000000 0xf7df4b25 0x00007fff 0x7fffffffe2b0: 0xffffe398 0x00007fff 0xf7fca000 0x00000001 0x7fffffffe2c0: 0x55555169 0x00005555 0xffffe6f9 0x00007fff 0x7fffffffe2d0: 0x55555250 0x00005555 0x3cae816d 0x8acc2837 0x7fffffffe2e0: 0x55555070 0x00005555 0x00000000 0x00000000 0x7fffffffe2f0: 0x00000000 0x00000000 0x00000000 0x00000000 0x7fffffffe300: 0xf9ce816d 0x7533d7c8 0xa91a816d 0x7533c789
I’ve bolded approximately where the stack_string, heap_string, and x
variables are and colour coded them:
xis red and starts at0x7fffffffe27cheap_stringis blue and starts at0x7fffffffe280stack_stringis purple and starts at0x7fffffffe28e
I think I might have bolded the location of some of those variables a bit wrong here but that’s approximately where they are.
One weird thing you might notice here is that x is the number 0x5555, but
we set x to 10! That because x doesn’t actually get set until after our
main function starts, and we’re at the very beginning of main.
step 3: look at the stack again on line 10
Let’s skip a few lines and wait for our variables to actually get set to the
values we initialized them to. By the time we get to line 10, x should be set to 10.
First, we need to set another breakpoint:
(gdb) b test.c:10
Breakpoint 2 at 0x5555555551a9: file test.c, line 11.
and continue the program running:
(gdb) continue
Continuing.
Breakpoint 2, main () at test.c:11
11 printf("Enter a string for the stack: ");
Okay! Let’s look at all the same things again! gdb is formatting the bytes in
a slightly different way here and I don’t actually know why. Here’s a reminder of where to find our variables on the stack:
xis red and starts at0x7fffffffe27cheap_stringis blue and starts at0x7fffffffe280stack_stringis purple and starts at0x7fffffffe28e
(gdb) x/80x $sp 0x7fffffffe270: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fffffffe278: 0x50 0x52 0x55 0x55 0x0a 0x00 0x00 0x00 0x7fffffffe280: 0xa0 0x92 0x55 0x55 0x55 0x55 0x00 0x00 0x7fffffffe288: 0x70 0x50 0x55 0x55 0x55 0x55 0x73 0x74 0x7fffffffe290: 0x61 0x63 0x6b 0x00 0x00 0x00 0x00 0x00 0x7fffffffe298: 0x00 0x80 0xf7 0x8a 0x8a 0xbb 0x58 0xb6 0x7fffffffe2a0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fffffffe2a8: 0x25 0x4b 0xdf 0xf7 0xff 0x7f 0x00 0x00 0x7fffffffe2b0: 0x98 0xe3 0xff 0xff 0xff 0x7f 0x00 0x00 0x7fffffffe2b8: 0x00 0xa0 0xfc 0xf7 0x01 0x00 0x00 0x00
There are a couple of interesting things to discuss here before we go further in the program.
how stack_string is represented in memory
Right now (on line 10) stack_string is set to “stack”. Let’s take a look at
how that’s represented in memory.
We can print out the bytes in the string like this:
(gdb) x/10x stack_string
0x7fffffffe28e: 0x73 0x74 0x61 0x63 0x6b 0x00 0x00 0x00
0x7fffffffe296: 0x00 0x00
The string “stack” is 5 characters which corresponds to 5 ASCII bytes –
0x73, 0x74, 0x61, 0x63, and 0x6b. 0x73 is s in ASCII, 0x74 is
t, etc.
We can also get gdb to show us the string with x/1s:
(gdb) x/1s stack_string
0x7fffffffe28e: "stack"
how heap_string and stack_string are different
You’ll notice that stack_string and heap_string are represented in very
different ways on the stack:
stack_stringhas the contents of the string (“stack”)heap_stringis a pointer to an address somewhere else in memory
Here are the bytes on the stack for the heap_string variable:
0xa0 0x92 0x55 0x55 0x55 0x55 0x00 0x00
These bytes actually get read backwards because x86 is little-endian, so the
memory address of heap_string is 0x5555555592a0
Another way to see the address of heap_string in gdb is just to print it out
with p:
(gdb) p heap_string
$6 = 0x5555555592a0 ""
the bytes that represent the integer x
x is a 32-bit integer, and the bytes that represent it are 0x0a 0x00 0x00 0x00.
We need to read these bytes backwards again (the same way reason we read the
bytes for heap_string address backwards), so this corresponds to the number
0x000000000a, or 0xa, which is 10.
That makes sense! We set int x = 10;!
step 4: read input from standard input
Okay, we’ve initialized the variables, now let’s see how the stack changes when this part of the C program runs:
printf("Enter a string for the stack: ");
gets(stack_string);
printf("Enter a string for the heap: ");
gets(heap_string);
We need to set another breakpoint:
(gdb) b test.c:16
Breakpoint 3 at 0x555555555205: file test.c, line 16.
and continue running the program
(gdb) continue
Continuing.
We’re prompted for 2 strings, and I entered 123456789012 for the stack string
and bananas for the heap.
let’s look at stack_string first (there’s a buffer overflow!)
(gdb) x/1s stack_string
0x7fffffffe28e: "123456789012"
That seems pretty normal, right? We entered 123456789012 and now it’s set to 123456789012.
But there’s something weird about this. Here’s what those bytes look like on the stack. They’re highlighted in purple again.
0x7fffffffe270: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fffffffe278: 0x50 0x52 0x55 0x55 0x0a 0x00 0x00 0x00 0x7fffffffe280: 0xa0 0x92 0x55 0x55 0x55 0x55 0x00 0x00 0x7fffffffe288: 0x70 0x50 0x55 0x55 0x55 0x55 0x31 0x32 0x7fffffffe290: 0x33 0x34 0x35 0x36 0x37 0x38 0x39 0x30 0x7fffffffe298: 0x31 0x32 0x00 0x8a 0x8a 0xbb 0x58 0xb6 0x7fffffffe2a0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fffffffe2a8: 0x25 0x4b 0xdf 0xf7 0xff 0x7f 0x00 0x00 0x7fffffffe2b0: 0x98 0xe3 0xff 0xff 0xff 0x7f 0x00 0x00 0x7fffffffe2b8: 0x00 0xa0 0xfc 0xf7 0x01 0x00 0x00 0x00
The weird thing about this is that stack_string was only supposed to be 10 bytes. But now suddenly we’ve put 13 bytes in it? What’s happening?
This is a classic buffer overflow, and what’s happening is that stack_string
wrote over other data from the program. This hasn’t caused a problem yet in our
case, but it can crash your program or, worse, open you up to Very Bad Security
Problems.
For example, if stack_string were before heap_string in memory, then we
could overwrite the address that heap_string points to. I’m not sure exactly
what’s in memory after stack_string here but we could probably use this to do
some kind of shenanigans.
something actually detects the buffer overflow
When I cause this buffer overflow problem, here’s
./test Enter a string for the stack: 01234567891324143 Enter a string for the heap: adsf Stack string is: 01234567891324143 Heap string is: adsf x is: 10 *** stack smashing detected ***: terminated fish: Job 1, './test' terminated by signal SIGABRT (Abort)
My guess about what’s happening here is that the stack_string variable is
actually at the end of this function’s stack, and so the extra bytes are going into a
different region of memory.
When you do this intentionally as a security exploit it’s called “stack smashing”, and somehow something is detecting that this is happening. Originally I wasn’t sure how this was being detected, but a couple of people emailed me to say that it’s a compiler feature called “stack protection”. Basically it adds a “canary” value to the end of the stack and when the function returns it checks to see if that value has been changed. Here’s an article about the stack smashing protector on the OSDev wiki.
That’s all I have to say about buffer overflows.
now let’s look at heap_string
We also read a value (bananas) into the heap_string variable. Let’s see what that
looks like in memory.
Here’s what heap_string looks on the stack after we read the variable in.
(gdb) x/40x $sp 0x7fffffffe270: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x7fffffffe278: 0x50 0x52 0x55 0x55 0x0a 0x00 0x00 0x00 0x7fffffffe280: 0xa0 0x92 0x55 0x55 0x55 0x55 0x00 0x00 0x7fffffffe288: 0x70 0x50 0x55 0x55 0x55 0x55 0x31 0x32 0x7fffffffe290: 0x33 0x34 0x35 0x36 0x37 0x38 0x39 0x30
The thing to notice here is that it looks exactly the same! It’s an address, and the address hasn’t changed. But let’s look at what’s at that address.
(gdb) x/10x 0x5555555592a0
0x5555555592a0: 0x62 0x61 0x6e 0x61 0x6e 0x61 0x73 0x00
0x5555555592a8: 0x00 0x00
Those are the bytes for bananas! Those bytes aren’t in the stack at all,
they’re somewhere else in memory (on the heap)
where are the stack and the heap?
We’ve talked about how the stack and the heap are different regions of memory, but how can you tell where they are in memory?
There’s a file for each process called /proc/$PID/maps that shows you the
memory maps for each process. Here’s where you can see the stack and the heap
in there.
$ cat /proc/24963/maps
... lots of stuff omitted ...
555555559000-55555557a000 rw-p 00000000 00:00 0 [heap]
... lots of stuff omitted ...
7ffffffde000-7ffffffff000 rw-p 00000000 00:00 0 [stack]
One thing to notice is that here the heap addresses start with 0x5555 and
the stack addresses start with 0x7fffff. So it’s pretty easy to tell the
difference between an address on the stack and an address on the heap.
playing about with gdb like this is really helpful
This was kind of a whirlwind tour and I didn’t explain everything, but hopefully seeing what the data actually looks like in memory makes it a little more clear what the stack actually is.
I really recommend playing around with gdb like this – even if you don’t understand every single thing that you see in memory, I find that actually seeing the data in my program’s memory like this makes these abstract concepts like “the stack” and “the heap” and “pointers” a lot easier to understand.
maybe lldb is easier to use
A couple of people suggested that lldb is easier to use than gdb. I haven’t
used it yet but I looked at it quickly, and it does seem like it might be
simpler! As far as I can tell from a quick inspection everything in this
walkthrough also works in lldb, except that you need to do p/s instead of
p/1s.
ideas for more exercises
A few ideas (in no particular order) for followup exercises to think about the stack:
- try adding another function to
test.cand make a breakpoint at the beginning of that function and see if you can find the stack frommain! They say that “the stack grows down” when you call a function, can you see that happening in gdb? - return a pointer from a function to a string on the stack and see what goes wrong. Why is it bad to return a pointer to a string on the stack?
- try causing a stack overflow in C and try to understand exactly what happens when the stack overflows by looking at it in gdb!
- look at the stack in a Rust program and try to find the variables!
- try some of the buffer overflow challenges in the nightmare course. The README for each challenge is the solution so avoid reading it if you don’t want to be spoiled. The idea with all of those challenges is that you’re given a binary and you need to figure out how to cause a buffer overflow to get it to print out the “flag” string.