Reading List

The most recent articles from a list of feeds I subscribe to.

How do Nix builds work?

Hello! For some reason after the last nix post I got nerdsniped by trying to understand how Nix builds work under the hood, so here’s a quick exploration I did today. There are probably some mistakes in here.

I started by complaining on Mastodon:

are there any guides to nix that start from the bottom up (for example starting with this bash script and then working up the layers of abstraction) instead of from the top down?

all of the guides I’ve seen start by describing the nix programming language or other abstractions, and I’d love to see a guide that starts with concepts I already understand like compiler flags, linker flags, Makefiles, environment variables, and bash scripts

Ross Light wrote a great blog post in response called Connecting Bash to Nix, that shows how to compile a basic C program without using most of Nix’s standard machinery.

I wanted to take this a tiny bit further and compile a slightly more complicated C program.

the goal: compile a C program, without using Nix’s standard machinery

Our goal is to compile a C program called paperjam. This is a real C program that wasn’t in the Nix repository already. I already figured out how to compile it in this post by copying and pasting a bunch of stuff I didn’t understand, but this time I wanted to do it in a more principled way where I actually understand more of the steps.

We’re going to avoid using most of Nix’s helpers for compiling C programs.

The plan is to start with an almost empty build script, and then resolve errors until we have a working build.

first: what’s a derivation?

I said that we weren’t going to talk about too many Nix abstractions (and we won’t!), but understanding what a derivation is really helped me.

Everything I read about Nix talks about derivations all the time, but I was really struggling to figure out what a derivation is. It turns out that derivation is a function in the Nix language. But not just any function! The whole point of the Nix language seems to be to to call this function. The official documentation for the derivation function is actually extremely clear. Here’s what I took away:

derivation takes a bunch of keys and values as input. There are 3 required keys:

  1. system: the system, for example x86_64-darwin
  2. name: the name of the package you’re building
  3. builder: a program (usually a bash script) that runs the build

Every other key is an arbitrary string that gets passed as an environment variable to the builder shell script.

derivations automatically build all their inputs

A derivation doesn’t just call a shell script though! Let’s say I reference another derivation called pkgs.qpdf in my script.

Nix will:

  • automatically build the qpdf package
  • put the resulting output directory somewhere like /nix/store/4garxzr1rpdfahf374i9p9fbxnx56519-qpdf-11.1.0
  • expand pkgs.qpdf into that output directory (as a string), so that I can reference it in my build script

The derivation function does some other things (described in the documentation), but “it builds all of its inputs” is all we really need to know for now.

step 1: write a derivation file

Let’s write a very simple build script and call the derivation function. These don’t work yet, but I found it pretty fun to go through all the errors, fix them one at a time, and learn a little more about how Nix works by fixing them.

Here’s the build script (build_paperjam.sh). This just unpacks the tarball and runs make install.

#!/bin/bash

tar -xf "$SOURCE"
cd paperjam-1.2 
make install

And here’s the Nix code calling the derivation function (in paperjam.nix). This calls the core derivation function, without too much magic.

let pkgs = import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/4d2b37a84fad1091b9de401eb450aae66f1a741e.tar.gz") {};

builtins.derivation {
  name = "paperjam-fake";
  builder = ./build-paperjam.sh;
  system = builtins.currentSystem;

  SOURCE = pkgs.fetchurl {
    url = "https://mj.ucw.cz/download/linux/paperjam-1.2.tar.gz";
    hash = "sha256-0AziT7ROICTEPKaA4Ub1B8NtIfLmxRXriW7coRxDpQ0";
  };

}

The main things here are:

  • fetchurl (which downloads an url and puts the path in to the SOURCE environment variable)
  • pkgs (which lets us depend on other Nix packages from the central repository). I don’t totally understand this but I’m already in a pretty deep rabbit hole so we’re going to leave that for now.

SOURCE evaluates to a string – it’s the path to the downloaded source tarball.

problem 1: tar: command not found

Nix needs you to declare all the dependencies for your builds. It forces this by removing your PATH environment variable so that you have no binaries in your PATH at all.

This is pretty easy to fix: we just need to edit our PATH.

I added this to paperjam.nix to get tar, gzip, and make:

  PATH = "${pkgs.gzip}/bin:${pkgs.gnutar}/bin:${pkgs.gnumake}/bin";

problem 2: we need a compiler

Next, we had this error:

g++ -O2 -Wall -Wextra -Wno-parentheses -std=gnu++11 -g -DVERSION='"1.2"' -DYEAR='"2022"' -DBUILD_DATE='""' -DBUILD_COMMIT='""'   -c -o paperjam.o paperjam.cc
make: g++: No such file or directory

So we need to put a compiler in our PATH. For some reason I felt like using clang++ to compile, not g++. To do that I need to make 2 changes to paperjam.nix:

  1. Add the line CXX="clang++";
  2. Add ${pkgs.clang}/bin to my PATH

problem 3: missing header files

The next error was:

 > ./pdf-tools.h:13:10: fatal error: 'qpdf/QPDF.hh' file not found
 > #include <qpdf/QPDF.hh>

Makes sense: everything is isolated, so it can’t access my system header files. Figuring out how to handle this was a little more confusing though.

It turns out that the way Nix handles header files is that it has a shell script wrapper around clang. So when you run clang++, you’re actually running a shell script.

On my system, the clang++ wrapper script was at /nix/store/d929v59l9a3iakvjccqpfqckqa0vflyc-clang-wrapper-11.1.0/bin/clang++. I searched that file for LDFLAGS and found that it uses 2 environment variables:

  1. NIX_LDFLAGS_aarch64_apple_darwin
  2. NIX_CFLAGS_COMPILE_aarch64_apple_darwin

So I figured I needed to put all the arguments to clang in the NIX_CFLAGS variable and all the linker arguments in NIX_LDFLAGS. Great! Let’s do that.

I added these 2 lines to my paperjam.nix, to link the libpaper and qpdf libraries:

NIX_LDFLAGS_aarch64_apple_darwin = "-L ${pkgs.qpdf}/lib   -L ${pkgs.libpaper}/lib";
NIX_CFLAGS_COMPILE_aarch64_apple_darwin = "-isystem ${pkgs.qpdf}/include   -isystem ${pkgs.libpaper}/include";

And that worked!

problem 4: missing c++abi

The next error was:

> ld: library not found for -lc++abi

Not sure what this means, but I searched for “abi” in the Nix packages and fixed it by adding -L ${pkgs.libcxxabi}/lib to my NIX_LDFLAGS environment variable.

problem 5: missing iconv

Here’s the next error:

> Undefined symbols for architecture arm64:
>   "_iconv", referenced from: ...

I started by adding -L ${pkgs.libiconv}/lib to my NIX_LDFLAGS environment variable, but that didn’t fix it. Then I spent a while going around in circles and being confused.

I eventually figured out how to fix this by taking a working version of the paperjam build that I’d made before and editing my clang++ wrapper file to print out all of its environment variables. The LDFLAGS environment variable in the working version was different from mine: it had -liconv in it.

So I added -liconv to NIX_LDFLAGS as well and that fixed it.

why doesn’t the original Makefile have -liconv?

I was a bit puzzled by this -liconv thing though: the original Makefile links in libqpdf and libpaper by passing -lqpdf -lpaper. So why doesn’t it link in iconv, if it requires the iconv library?

I think the reason for this is that the original Makefile assumed that you were running on Linux and using glibc, and glibc includes these iconv functions by default. But I guess Mac OS libc doesn’t include iconv, so we need to explicitly set the linker flag -liconv to add the iconv library.

problem 6: missing codesign_allocate

Time for the next error:

libc++abi: terminating with uncaught exception of type std::runtime_error: Failed to spawn codesign_allocate: No such file or directory

I guess this is some kind of Mac code signing thing. I used find /nix/store -name codesign_allocate to find codesign_allocate on my system. It’s at /nix/store/a17dwfwqj5ry734zfv3k1f5n37s4wxns-cctools-binutils-darwin-973.0.1/bin/codesign_allocate.

But this doesn’t tell us what the package is called – we need to be able to refer to it as ${pkgs.XXXXXXX} and ${pkgs.cctools-binutils-darwin} doesn’t work.

I couldn’t figure out a way go from a Nix folder to the name of the package, but I ended up poking around and finding out that it was called pkgs.darwin.cctools.

So I added ${pkgs.darwin.cctools}/bin to the PATH.

problem 7: missing a2x

Easy, just add ${pkgs.asciidoc}/bin to the PATH.

problem 8: missing install

make: install: No such file or directory

Apparently install is a program? This turns out to be in coreutils, so we add ${pkgs.coreutils}/bin to the PATH. Adding coreutils also fixes some other warnings I was seeing about missing commands like date.

problem 9: can’t create /usr/local/bin/paperjam

This took me a little while to figure out because I’m not very familiar with make. The Makefile has a PREFIX of /usr/local, but we want it to be the program’s output directory in /nix/store/

I edited the build-paperjam.sh shell script to say:

make install PREFIX="$out"

and everything worked! Hooray!

our final configuration

Here’s the final paperjam.nix. It’s not so different from what we started with – we just added 4 environment variables.

let pkgs = import (fetchTarball "https://github.com/NixOS/nixpkgs/archive/ae8bdd2de4c23b239b5a771501641d2ef5e027d0.tar.gz") {};
in

builtins.derivation {
  name = "paperjam-fake";
  builder = ./build-paperjam.sh;
  system = builtins.currentSystem;

  SOURCE = pkgs.fetchurl {
    url = "https://mj.ucw.cz/download/linux/paperjam-1.2.tar.gz";
    hash = "sha256-0AziT7ROICTEPKaA4Ub1B8NtIfLmxRXriW7coRxDpQ0";
  };

  CXX="clang++";
  PATH = "${pkgs.gzip}/bin:${pkgs.gnutar}/bin:${pkgs.gnumake}/bin:${pkgs.clang}/bin:${pkgs.darwin.cctools}/bin:${pkgs.asciidoc}/bin:${pkgs.coreutils}/bin:${pkgs.bash}/bin";
  NIX_LDFLAGS_aarch64_apple_darwin = "-L ${pkgs.qpdf}/lib   -L ${pkgs.libpaper}/lib -L ${pkgs.libcxxabi}/lib -liconv -L ${pkgs.libiconv}/lib ";
  NIX_CFLAGS_COMPILE_aarch64_apple_darwin = "-isystem ${pkgs.qpdf}/include   -isystem ${pkgs.libpaper}/include";
}

And here’s the final build-paperjam.sh build script. Here we just needed to edit the make install line to set the PREFIX.

#!/bin/bash

tar -xf "$SOURCE"
cd paperjam-1.2
make install PREFIX="$out"

let’s look at our compiled derivation!

Now that we understand this configuration a little better, let’s talk about what nix-build is doing a little more.

Behind the scenes, nix-build paperjam.nix actually runs nix-instantiate and nix-store --realize:

$ nix-instantiate paperjam.nix
/nix/store/xp8kibpll55s0bm40wlpip51y7wnpfs0-paperjam-fake.drv
$ nix-store --realize /nix/store/xp8kibpll55s0bm40wlpip51y7wnpfs0-paperjam-fake.drv

I think what this means is that paperjam.nix get compiled to some intermediate representation (also called a derivation?), and then the Nix runtime takes over and is in charge of actually running the build scripts.

We can look at this .drv intermediate representation with nix show-derivation

{
  "/nix/store/xp8kibpll55s0bm40wlpip51y7wnpfs0-paperjam-fake.drv": {
    "outputs": { "out": { "path": "/nix/store/bcnyqizvcysqc1vy382wfx015mmwn3bd-paperjam-fake" }
    },
    "inputSrcs": [ "/nix/store/pbjj91f0qr8g14k58m744wdl9yvr2f5k-build-paperjam.sh" ],
    "inputDrvs": {
      "/nix/store/38sikqcggyishxbgi2xnyrdsnq928gqx-asciidoc-10.2.0.drv": [ "out" ],
      "/nix/store/3llc749f9pn0amlb9vgwsi22hin7kmz4-libcxxabi-11.1.0.drv": [ "out" ],
      "/nix/store/a8ny8lrbpyn15wdxk3v89f4bdr08a38a-libpaper-1.1.28.drv": [ "out" ],
      "/nix/store/d888pj9lll12s5qx11v850g1vd4h3vxq-cctools-port-973.0.1.drv": [ "out" ],
      "/nix/store/gkpdv7xl39x9yxch0wjarq19mmv7j1pm-bash-5.2-p15.drv": [ "out" ],
      "/nix/store/hwx16m7hmkp2rcik8h67nnyjp52zj849-gnutar-1.34.drv": [ "out" ],
      "/nix/store/kqqwffajj24fmagxqps3bjcbrglbdryg-gzip-1.12.drv": [ "out" ],
      "/nix/store/lnrxa45bza18dk8qgqjayqb65ilfvq2n-qpdf-11.2.0.drv": [ "out" ],
      "/nix/store/rx7a5401h44dqsasl5g80fl25jqqih8r-gnumake-4.4.drv": [ "out" ],
      "/nix/store/sx8blaza5822y51abdp3353xkdcbkpkb-coreutils-9.1.drv": [ "out" ],
      "/nix/store/v3b7r7a8ipbyg9wifcqisf5vpy0c66cs-clang-wrapper-11.1.0.drv": [ "out" ],
      "/nix/store/wglagz34w1jnhr4xrfdk0g2jghbk104z-paperjam-1.2.tar.gz.drv": [ "out" ],
      "/nix/store/y9mb7lgqiy38fbi53m5564bx8pl1arkj-libiconv-50.drv": [ "out" ]
    },
    "system": "aarch64-darwin",
    "builder": "/nix/store/pbjj91f0qr8g14k58m744wdl9yvr2f5k-build-paperjam.sh",
    "args": [],
    "env": {
      "CXX": "clang++",
      "NIX_CFLAGS_COMPILE_aarch64_apple_darwin": "-isystem /nix/store/h25d99pd3zln95viaybdfynfq82r2dqy-qpdf-11.2.0/include   -isystem /nix/store/agxp1hx267qk1x79dl4jk1l5cg79izv1-libpaper-1.1.28/include",
      "NIX_LDFLAGS_aarch64_apple_darwin": "-L /nix/store/h25d99pd3zln95viaybdfynfq82r2dqy-qpdf-11.2.0/lib   -L /nix/store/agxp1hx267qk1x79dl4jk1l5cg79izv1-libpaper-1.1.28/lib -L /nix/store/awkb9g93ci2qy8yg5jl0zxw46f3xnvgv-libcxxabi-11.1.0/lib -liconv -L /nix/store/nmphpbjn8hhq7brwi9bw41m7l05i636h-libiconv-50/lib ",
      "PATH": "/nix/store/90cqrp3nxbcihkx4vswj5wh85x5klaga-gzip-1.12/bin:/nix/store/siv9312sgiqwsjrdvj8lx0mr3dsj3nf5-gnutar-1.34/bin:/nix/store/yy3fdgrshcblwx0cfp76nmmi24szw89q-gnumake-4.4/bin:/nix/store/cqag9fv2gia03nzcsaygan8fw1ggdf4g-clang-wrapper-11.1.0/bin:/nix/store/f16id36r9xxi50mgra55p7cf7ra0x96k-cctools-port-973.0.1/bin:/nix/store/x873pgpwqxkmyn35jvvfj48ccqav7fip-asciidoc-10.2.0/bin:/nix/store/vhivi799z583h2kf1b8lrr72h4h3vfcx-coreutils-9.1/bin:/nix/store/0q1jfjlwr4vig9cz7lnb5il9rg0y1n84-bash-5.2-p15/bin",
      "SOURCE": "/nix/store/6d2fcw88d9by4fz5xa9gdpbln73dlhdk-paperjam-1.2.tar.gz",
      "builder": "/nix/store/pbjj91f0qr8g14k58m744wdl9yvr2f5k-build-paperjam.sh",
      "name": "paperjam-fake",
      "out": "/nix/store/bcnyqizvcysqc1vy382wfx015mmwn3bd-paperjam-fake",
      "system": "aarch64-darwin"
    }
  }
}

This feels surprisingly easy to understand – you can see that there are a bunch of environment variables, our bash script, and the paths to our inputs.

the compilation helpers we’re not using: stdenv

Normally when you build a package with Nix, you don’t do all of this stuff yourself. Instead, you use a helper called stdenv, which seems to have two parts:

  1. a function called stdenv.mkDerivation which takes some arguments and generates a bunch of environment variables (it seems to be documented here)
  2. a 1600-line bash build script (setup.sh) that consumes those environment variables. This is like our build-paperjam.sh, but much more generalized.

Together, these two tools:

  • add LDFLAGS automatically for each C library you depend on
  • add CFLAGS automatically so that you can get your header files
  • run make
  • depend on clang and coreutils and bash and other core utilities so that you don’t need to add them yourself
  • set system to your current system
  • let you easily add custom bash code to run at various phases of your build
  • maybe also manage versions somehow? Not sure about this one.

and probably lots more useful things I don’t know about yet

let’s look at the derivation for jq

Let’s look at one more compiled derivation, for jq. This is quite long but there are some interesting things in here. I wanted to look at this because I wanted to see what a more typical derivation generated by stdenv.mkDerivation looked like.

$ nix show-derivation /nix/store/q9cw5rp0ibpl6h4i2qaq0vdjn4pyms3p-jq-1.6.drv
{
  "/nix/store/q9cw5rp0ibpl6h4i2qaq0vdjn4pyms3p-jq-1.6.drv": {
    "outputs": {
      "bin": { "path": "/nix/store/vabn35a2m2qmfi9cbym4z50bwq94fdzm-jq-1.6-bin" },
      "dev": { "path": "/nix/store/akda158i8gr0v0w397lwanxns8yrqldy-jq-1.6-dev" },
      "doc": { "path": "/nix/store/6qimafz8q88l90jwrzciwc27zhjwawcl-jq-1.6-doc" },
      "lib": { "path": "/nix/store/3wzlsin34l1cs70ljdy69q9296jnvnas-jq-1.6-lib" },
      "man": { "path": "/nix/store/dl1xf9w928jai5hvm5s9ds35l0m26m0k-jq-1.6-man" },
      "out": { "path": "/nix/store/ivzm5rrr7riwvgy2xcjhss6lz55qylnb-jq-1.6" }
    },
    "inputSrcs": [
      "/nix/store/6xg259477c90a229xwmb53pdfkn6ig3g-default-builder.sh",
      "/nix/store/jd98q1h1rxz5iqd5xs8k8gw9zw941lj6-fix-tests-when-building-without-regex-supports.patch"
    ],
    "inputDrvs": {
      "/nix/store/0lbzkxz56yhn4gv5z0sskzzdlwzkcff8-autoreconf-hook.drv": [ "out" ],
      "/nix/store/6wh5w7hkarfcx6fxsdclmlx097xsimmg-jq-1.6.tar.gz.drv": [ "out" ],
      "/nix/store/87a32xgqw85rxr1fx3c5j86y177hr9sr-oniguruma-6.9.8.drv": [ "dev" ],
      "/nix/store/gkpdv7xl39x9yxch0wjarq19mmv7j1pm-bash-5.2-p15.drv": [ "out" ],
      "/nix/store/xn1mjk78ly9wia23yvnsyw35q1mz4jqh-stdenv-darwin.drv": [ "out" ]
    },
    "system": "aarch64-darwin",
    "builder": "/nix/store/0q1jfjlwr4vig9cz7lnb5il9rg0y1n84-bash-5.2-p15/bin/bash",
    "args": [
      "-e",
      "/nix/store/6xg259477c90a229xwmb53pdfkn6ig3g-default-builder.sh"
    ],
    "env": {
      "__darwinAllowLocalNetworking": "",
      "__impureHostDeps": "/bin/sh /usr/lib/libSystem.B.dylib /usr/lib/system/libunc.dylib /dev/zero /dev/random /dev/urandom /bin/sh",
      "__propagatedImpureHostDeps": "",
      "__propagatedSandboxProfile": "",
      "__sandboxProfile": "",
      "__structuredAttrs": "",
      "bin": "/nix/store/vabn35a2m2qmfi9cbym4z50bwq94fdzm-jq-1.6-bin",
      "buildInputs": "/nix/store/xfnl6xqbvnpacx8hw9d99ca4mly9kp0h-oniguruma-6.9.8-dev",
      "builder": "/nix/store/0q1jfjlwr4vig9cz7lnb5il9rg0y1n84-bash-5.2-p15/bin/bash",
      "cmakeFlags": "",
      "configureFlags": "--bindir=${bin}/bin --sbindir=${bin}/bin --datadir=${doc}/share --mandir=${man}/share/man",
      "depsBuildBuild": "",
      "depsBuildBuildPropagated": "",
      "depsBuildTarget": "",
      "depsBuildTargetPropagated": "",
      "depsHostHost": "",
      "depsHostHostPropagated": "",
      "depsTargetTarget": "",
      "depsTargetTargetPropagated": "",
      "dev": "/nix/store/akda158i8gr0v0w397lwanxns8yrqldy-jq-1.6-dev",
      "doCheck": "",
      "doInstallCheck": "1",
      "doc": "/nix/store/6qimafz8q88l90jwrzciwc27zhjwawcl-jq-1.6-doc",
      "installCheckTarget": "check",
      "lib": "/nix/store/3wzlsin34l1cs70ljdy69q9296jnvnas-jq-1.6-lib",
      "man": "/nix/store/dl1xf9w928jai5hvm5s9ds35l0m26m0k-jq-1.6-man",
      "mesonFlags": "",
      "name": "jq-1.6",
      "nativeBuildInputs": "/nix/store/ni9k35b9llfc3hys8nv5qsipw8pfy1ln-autoreconf-hook",
      "out": "/nix/store/ivzm5rrr7riwvgy2xcjhss6lz55qylnb-jq-1.6",
      "outputs": "bin doc man dev lib out",
      "patches": "/nix/store/jd98q1h1rxz5iqd5xs8k8gw9zw941lj6-fix-tests-when-building-without-regex-supports.patch",
      "pname": "jq",
      "postInstallCheck": "$bin/bin/jq --help >/dev/null\n$bin/bin/jq -r '.values[1]' <<< '{\"values\":[\"hello\",\"world\"]}' | grep '^world$' > /dev/null\n",
      "preBuild": "rm -r ./modules/oniguruma\n",
      "preConfigure": "echo \"#!/bin/sh\" > scripts/version\necho \"echo 1.6\" >> scripts/version\npatchShebangs scripts/version\n",
      "propagatedBuildInputs": "",
      "propagatedNativeBuildInputs": "",
      "src": "/nix/store/ggjlgjx2fw29lngbnvwaqr6hiz1qhy8g-jq-1.6.tar.gz",
      "stdenv": "/nix/store/qrz2mnb2gsnzmw2pqax693daxh5hsgap-stdenv-darwin",
      "strictDeps": "",
      "system": "aarch64-darwin",
      "version": "1.6"
    }
  }
}

I thought it was interesting that some of the environment variables in here are actually bash scripts themselves – for example the postInstallCheck environment variable is a bash script. Those bash script environment variables are evaled in the main bash script (you can see that happening in setup.sh here)

The postInstallCheck environment variable in this particular derivation starts like this:

$bin/bin/jq --help >/dev/null
$bin/bin/jq -r '.values[1]' <<< '{"values":["hello","world"]}' | grep '^world$' > /dev/null

I guess this is a test to make sure that jq installed correctly.

finally: clean up

All of my compiler experiments used about 3GB of disk space, but nix-collect-garbage cleaned up all of it.

let’s recap the process!

I feel like I understand Nix a bit better after going through this. I still don’t feel very motivated to learn the Nix language, but now I have some idea of what Nix programs are actually doing under the hood!

My understanding is:

  1. First, .nix files get compiled into a .drv file, which is mostly a bunch of inputs and outputs and environment variables. This is where the Nix language stops being relevant.
  2. Then all the environment variables get passed to a build script, which is in charge of doing the actual build
  3. In the Nix standard environment (stdenv), some of those environment variables are themselves bash code, which gets evaled by the big build script setup.sh

That’s all! I probably made some mistakes in here, but this was kind of a fun rabbit hole.

Some notes on using nix

Recently I started using a Mac for the first time. The biggest downside I’ve noticed so far is that the package management is much worse than on Linux. At some point I got frustrated with homebrew because I felt like it was spending too much time upgrading when I installed new packages, and so I thought – maybe I’ll try the nix package manager!

nix has a reputation for being confusing (it has its whole own programming language!), so I’ve been trying to figure out how to use nix in a way that’s as simple as possible and does not involve managing any configuration files or learning a new programming language. Here’s what I’ve figured out so far! We’ll talk about how to:

  1. install packages with nix
  2. build a custom nix package for a C++ program called paperjam
  3. install a 5-year-old version of hugo with nix

As usual I’ve probably gotten some stuff wrong in this post since I’m still pretty new to nix. I’m also still not sure how much I like nix – it’s very confusing! But it’s helped me compile some software that I was struggling to compile otherwise, and in general it seems to install things faster than homebrew.

what’s interesting about nix?

People often describe nix as “declarative package management”. I don’t care that much about declarative package management, so here are two things that I appreciate about nix:

  1. It provides binary packages (hosted at https://cache.nixos.org/) that you can quickly download and install
  2. For packages which don’t have binary packages, it makes it easier to compile them

I think that the reason nix is good at compiling software is that:

  • you can have multiple versions of the same library or program installed at a time (you could have 2 different versions of libc for instance). For example I have two versions of node on my computer right now, one at /nix/store/4ykq0lpvmskdlhrvz1j3kwslgc6c7pnv-nodejs-16.17.1 and one at /nix/store/5y4bd2r99zhdbir95w5pf51bwfg37bwa-nodejs-18.9.1.
  • when nix builds a package, it builds it in isolation, using only the specific versions of its dependencies that you explicitly declared. So there’s no risk that the package secretly depends on another package on your system that you don’t know about. No more fighting with LD_LIBRARY_PATH!
  • a lot of people have put a lot of work into writing down all of the dependencies of packages

I’ll give a couple of examples later in this post of two times nix made it easier for me to compile software.

how I got started with nix

here’s how I got started with nix:

  1. Install nix. I forget exactly how I did this, but it looks like there’s an official installer and an unofficial installer from zero-to-nix.com. The instructions for uninstalling nix on MacOS with the standard multi-user install are a bit complicated, so it might be worth choosing an installation method with simpler uninstall instructions.
  2. Put ~/.nix-profile/bin on my PATH
  3. Install packages with nix-env -iA nixpkgs.NAME
  4. That’s it.

Basically the idea is to treat nix-env -iA like brew install or apt-get install.

For example, if I want to install fish, I can do that like this:

nix-env -iA nixpkgs.fish

This seems to just download some binaries from https://cache.nixos.org – pretty simple.

Some people use nix to install their Node and Python and Ruby packages, but I haven’t been doing that – I just use npm install and pip install the same way I always have.

some nix features I’m not using

There are a bunch of nix features/tools that I’m not using, but that I’ll mention. I originally thought that you had to use these features to use nix, because most of the nix tutorials I’ve read talk about them. But you don’t have to use them.

I won’t go into these because I haven’t really used them and there are lots of explanations out there.

where are nix packages defined?

I think packages in the main nix package repository are defined in https://github.com/NixOS/nixpkgs/

It looks like you can search for packages at https://search.nixos.org/packages. The two official ways to search packages seem to be:

  • nix-env -qaP NAME, which is very extremely slow and which I haven’t been able to get to actually work
  • nix --extra-experimental-features 'nix-command flakes' search nixpkgs NAME, which does seem to work but is kind of a mouthful. Also all of the packages it prints out start with legacyPackages for some reason

I found a way to search nix packages from the command line that I liked better:

  1. Run nix-env -qa '*' > nix-packages.txt to get a list of every package in the Nix repository
  2. Write a short nix-search script that just greps packages.txt (cat ~/bin/nix-packages.txt | awk '{print $1}' | rg "$1")

One of nix’s major design choices is that there isn’t one single bin with all your packages, instead you use symlinks. There are a lot of layers of symlinks. A few examples of symlinks:

  • ~/.nix-profile on my machine is (indirectly) a symlink to /nix/var/nix/profiles/per-user/bork/profile-111-link/
  • ~/.nix-profile/bin/fish is a symlink to /nix/store/afkwn6k8p8g97jiqgx9nd26503s35mgi-fish-3.5.1/bin/fish

When I install something, it creates a new profile-112-link directory with new symlinks and updates my ~/.nix-profile to point to that directory.

I think this means that if I install a new version of fish and I don’t like it, I can easily go back just by running nix-env --rollback – it’ll move me to my previous profile directory.

uninstalling packages doesn’t delete them

If I uninstall a nix package like this, it doesn’t actually free any hard drive space, it just removes the symlinks.

$ nix-env --uninstall oil

I’m still not sure how to actually delete the package – I ran a garbage collection like this, which seemed to delete some things:

$ nix-collect-garbage
...
85 store paths deleted, 74.90 MiB freed

But I still have oil on my system at /nix/store/8pjnk6jr54z77jiq5g2dbx8887dnxbda-oil-0.14.0.

There’s a more aggressive version of nix-collect-garbage that also deletes old versions of your profiles (so that you can’t rollback)

$ nix-collect-garbage -d --delete-old

That doesn’t delete /nix/store/8pjnk6jr54z77jiq5g2dbx8887dnxbda-oil-0.14.0 either though and I’m not sure why.

upgrading

It looks like you can upgrade nix packages like this:

nix-channel --update
nix-env --upgrade

(similar to apt-get update && apt-get upgrade)

I haven’t really upgraded anything yet. I think that if something goes wrong with an upgrade, you can roll back (because everything is immutable in nix!) with

nix-env --rollback

Someone linked me to this post from Ian Henry that talks about some confusing problems with nix-env --upgrade – maybe it doesn’t work the way you’d expect? I guess I’ll be wary around upgrades.

next goal: make a custom package of paperjam

After a few months of installing existing packages, I wanted to make a custom package with nix for a program called paperjam that wasn’t already packaged.

I was actually struggling to compile paperjam at all even without nix because the version I had of libiconv I has on my system was wrong. I thought it might be easier to compile it with nix even though I didn’t know how to make nix packages yet. And it actually was!

But figuring out how to get there was VERY confusing, so here are some notes about how I did it.

how to build an example package

Before I started working on my paperjam package, I wanted to build an example existing package just to make sure I understood the process for building a package. I was really struggling to figure out how to do this, but I asked in Discord and someone explained to me how I could get a working package from https://github.com/NixOS/nixpkgs/ and build it. So here are those instructions:

step 1: Download some arbitrary package from nixpkgs on github, for example the dash package:

wget https://raw.githubusercontent.com/NixOS/nixpkgs/47993510dcb7713a29591517cb6ce682cc40f0ca/pkgs/shells/dash/default.nix -O dash.nix

step 2: Replace the first statement ({ lib , stdenv , buildPackages , autoreconfHook , pkg-config , fetchurl , fetchpatch , libedit , runCommand , dash }: with with import <nixpkgs> {}; I don’t know why you have to do this, but it works.

step 3: Run nix-build dash.nix

This compiles the package

step 4: Run nix-env -i -f dash.nix

This installs the package into my ~/.nix-profile

That’s all! Once I’d done that, I felt like I could modify the dash package and make my own package.

how I made my own package

paperjam has one dependency (libpaper) that also isn’t packaged yet, so I needed to build libpaper first.

Here’s libpaper.nix. I basically just wrote this by copying and pasting from other packages in the nixpkgs repository. My guess is what’s happening here is that nix has some default rules for compiling C packages (like “run make install”), so the make install happens default and I don’t need to configure it explicitly.

with import <nixpkgs> {};

stdenv.mkDerivation rec {
  pname = "libpaper";
  version = "0.1";

  src = fetchFromGitHub {
    owner = "naota";
    repo = "libpaper";
    rev = "51ca11ec543f2828672d15e4e77b92619b497ccd";
    hash = "sha256-S1pzVQ/ceNsx0vGmzdDWw2TjPVLiRgzR4edFblWsekY=";
  };

  buildInputs = [ ];

  meta = with lib; {
    homepage = "https://github.com/naota/libpaper";
    description = "libpaper";
    platforms = platforms.unix;
    license = with licenses; [ bsd3 gpl2 ];
  };
}

Basically this just tells nix how to download the source from GitHub.

I built this by running nix-build libpaper.nix

Next, I needed to compile paperjam. Here’s a link to the nix package I wrote. The main things I needed to do other than telling it where to download the source were:

  1. add some extra build dependencies (like asciidoc)
  2. set some environment variables for the install (installFlags = [ "PREFIX=$(out)" ];) so that it installed in the correct directory instead of /usr/local/bin.

I set the hashes by first leaving the hash empty, then running nix-build to get an error message complaining about a mismatched hash. Then I copied the correct hash out of the error message.

I figured out how to set installFlags just by running rg PREFIX in the nixpkgs repository – I figured that needing to set a PREFIX was pretty common and someone had probably done it before, and I was right. So I just copied and pasted that line from another package.

Then I ran:

nix-build paperjam.nix
nix-env -i -f paperjam.nix

and then everything worked and I had paperjam installed! Hooray!

next goal: install a 5-year-old version of hugo

Right now I build this blog using Hugo 0.40, from 2018. I don’t need any new features so I haven’t felt a need to upgrade. On Linux this is easy: Hugo’s releases are a static binary, so I can just download the 5-year-old binary from the releases page and run it. Easy!

But on this Mac I ran into some complications. Mac hardware has changed in the last 5 years, so the Mac Hugo binary I downloaded crashed. And when I tried to build it from source with go build, that didn’t work either because Go build norms have changed in the last 5 years as well.

I was working around this by running Hugo in a Linux docker container, but I didn’t love that: it was kind of slow and it felt silly. It shouldn’t be that hard to compile one Go program!

Nix to the rescue! Here’s what I did to install the old version of Hugo with nix.

installing Hugo 0.40 with nix

I wanted to install Hugo 0.40 and put it in my PATH as hugo-0.40. Here’s how I did it. I did this in a kind of weird way, but it worked (Searching and installing old versions of Nix packages describes a probably more normal method).

step 1: Search through the nixpkgs repo to find Hugo 0.40

I found the .nix file here https://github.com/NixOS/nixpkgs/blob/17b2ef2/pkgs/applications/misc/hugo/default.nix

step 2: Download that file and build it

I downloaded that file (and another file called deps.nix in the same directory), replaced the first line with with import <nixpkgs> {};, and built it with nix-build hugo.nix.

That almost worked without any changes, but I had to make two changes:

  • replace with stdenv.lib to with lib for some reason.
  • rename the package to hugo040 so that it wouldn’t conflict with the other version of hugo that I had installed

step 3: Rename hugo to hugo-0.40

I write a little post install script to rename the Hugo binary.

  postInstall = ''
    mv $out/bin/hugo $out/bin/hugo-0.40
  '';

I figured out how to run this by running rg 'mv ' in the nixpkgs repository and just copying and modifying something that seemed related.

step 4: Install it

I installed into my ~/.nix-profile/bin by running nix-env -i -f hugo.nix.

And it all works! I put the final .nix file into my own personal nixpkgs repo so that I can use it again later if I want.

reproducible builds aren’t magic, they’re really hard

I think it’s worth noting here that this hugo.nix file isn’t magic – the reason I can easily compile Hugo 0.40 today is that many people worked for a long time to make it possible to package that version of Hugo in a reproducible way.

that’s all!

Installing paperjam and this 5-year-old version of Hugo were both surprisingly painless and actually much easier than compiling it without nix, because nix made it much easier for me to compile the paperjam package with the right version of libiconv, and because someone 5 years ago had already gone to the trouble of listing out the exact dependencies for Hugo.

I don’t have any plans to get much more complicated with nix (and it’s still very possible I’ll get frustrated with it and go back to homebrew!), but we’ll see what happens! I’ve found it much easier to start in a simple way and then start using more features if I feel the need instead of adopting a whole bunch of complicated stuff all at once.

I probably won’t use nix on Linux – I’ve always been happy enough with apt (on Debian-based distros) and pacman (on Arch-based distros), and they’re much less confusing. But on a Mac it seems like it might be worth it. We’ll see! It’s very possible in 3 months I’ll get frustrated with nix and just go back to homebrew.

5-month update: rebuilding my nix profile

Update from 5 months in: nix is still going well, and I’ve only run into 1 problem, which is that every nix-env -iA package installation started failing with the error “bad meta.outputsToInstall”.

This script from Ross Light fixes that problem though. It lists every derivation installed in my current profile and creates a new profile with the exact same derivations. This feels like a nix bug (surely creating a new profile with the exact same derivations should be a no-op?) but I haven’t looked into it more yet.

Writing Javascript without a build system

Hello! I’ve been writing some Javascript this week, and as always when I start a new frontend project, I was faced with the question: should I use a build system?

I want to talk about what’s appealing to me about build systems, why I (usually) still don’t use them, and why I find it frustrating that some frontend Javascript libraries require that you use a build system.

I’m writing this because most of the writing I see about JS assumes that you’re using a build system, and it can be hard to navigate for folks like me who write very simple small Javascript projects that don’t require a build system.

what’s a build system?

The idea is that you have a bunch of Javascript or Typescript code, and you want to translate it into different Javascript code before you put it on your website.

Build systems can do lots of useful things, like:

  • combining 100s of JS files into one big bundle (for efficiency reasons)
  • translating Typescript into Javascript
  • typechecking Typescript
  • minification
  • adding polyfills to support older browsers
  • compiling JSX
  • treeshaking (remove unused JS code to reduce file sizes)
  • building CSS (like tailwind does)
  • and probably lots of other important things

Because of this, if you’re building a complex frontend project today, probably you’re using a build system like webpack, rollup, esbuild, parcel, or vite.

Lots of those features are appealing to me, and I’ve used build systems in the past for some of these reasons: Mess With DNS uses esbuild to translate Typescript and combine lots of files into one big file, for example.

the goal: easily make changes to old tiny websites

I make a lot of small simple websites, I have approximately 0 maintenance energy for any of them, and I change them very infrequently.

My goal is that if I have a site that I made 3 or 5 years ago, I’d like to be able to, in 20 minutes:

  • get the source from github on a new computer
  • make some changes
  • put it on the internet

But my experience with build systems (not just Javascript build systems!), is that if you have a 5-year-old site, often it’s a huge pain to get the site built again.

And because most of my websites are pretty small, the advantage of using a build system is pretty small – I don’t really need Typescript or JSX. I can just have one 400-line script.js file and call it a day.

example: trying to build the SQL playground

One of my sites (the sql playground) uses a build system (it’s using Vue). I last edited that project 2 years ago, on a different machine.

Let’s see if I can still easily build it today on my machine. To start out, we have to run npm install. Here’s the output I get.

$ npm install
[lots of output redacted]
npm ERR! code 1
npm ERR! path /Users/bork/work/sql-playground.wizardzines.com/node_modules/grpc
npm ERR! command failed
npm ERR! command sh /var/folders/3z/g3qrs9s96mg6r4dmzryjn3mm0000gn/T/install-b52c96ad.sh
npm ERR! CXX(target) Release/obj.target/grpc/deps/grpc/src/core/lib/surface/init.o
npm ERR!   CXX(target) Release/obj.target/grpc/deps/grpc/src/core/lib/avl/avl.o
npm ERR!   CXX(target) Release/obj.target/grpc/deps/grpc/src/core/lib/backoff/backoff.o
npm ERR!   CXX(target) Release/obj.target/grpc/deps/grpc/src/core/lib/channel/channel_args.o
npm ERR!   CXX(target) Release/obj.target/grpc/deps/grpc/src/core/lib/channel/channel_stack.o
npm ERR!   CXX(target) Release/obj.target/grpc/deps/grpc/src/core/lib/channel/channel_stack_builder.o
npm ERR!   CXX(target) Release/obj.target/grpc/deps/grpc/src/core/lib/channel/channel_trace.o
npm ERR!   CXX(target) Release/obj.target/grpc/deps/grpc/src/core/lib/channel/channelz.o

There’s some kind of error building grpc. No problem. I don’t really need that dependency anyway, so I can just take 5 minutes to tear it out and rebuild. Now I can npm install and everything works.

Now let’s try to build the project:

$ npm run build
  ?  Building for production...Error: error:0308010C:digital envelope routines::unsupported
    at new Hash (node:internal/crypto/hash:71:19)
    at Object.createHash (node:crypto:130:10)
    at module.exports (/Users/bork/work/sql-playground.wizardzines.com/node_modules/webpack/lib/util/createHash.js:135:53)
    at NormalModule._initBuildHash (/Users/bork/work/sql-playground.wizardzines.com/node_modules/webpack/lib/NormalModule.js:414:16)
    at handleParseError (/Users/bork/work/sql-playground.wizardzines.com/node_modules/webpack/lib/NormalModule.js:467:10)
    at /Users/bork/work/sql-playground.wizardzines.com/node_modules/webpack/lib/NormalModule.js:499:5
    at /Users/bork/work/sql-playground.wizardzines.com/node_modules/webpack/lib/NormalModule.js:356:12
    at /Users/bork/work/sql-playground.wizardzines.com/node_modules/loader-runner/lib/LoaderRunner.js:373:3
    at iterateNormalLoaders (/Users/bork/work/sql-playground.wizardzines.com/node_modules/loader-runner/lib/LoaderRunner.js:214:10)
    at iterateNormalLoaders (/Users/bork/work/sql-playground.wizardzines.com/node_modules/loader-runner/lib/LoaderRunner.js:221:10)
    at /Users/bork/work/sql-playground.wizardzines.com/node_modules/loader-runner/lib/LoaderRunner.js:236:3
    at runSyncOrAsync (/Users/bork/work/sql-playground.wizardzines.com/node_modules/loader-runner/lib/LoaderRunner.js:130:11)
    at iterateNormalLoaders (/Users/bork/work/sql-playground.wizardzines.com/node_modules/loader-runner/lib/LoaderRunner.js:232:2)
    at Array.<anonymous> (/Users/bork/work/sql-playground.wizardzines.com/node_modules/loader-runner/lib/LoaderRunner.js:205:4)
    at Storage.finished (/Users/bork/work/sql-playground.wizardzines.com/node_modules/enhanced-resolve/lib/CachedInputFileSystem.js:43:16)
    at /Users/bork/work/sql-playground.wizardzines.com/node_modules/enhanced-resolve/lib/CachedInputFileSystem.js:79:9

This stack overflow answer suggests running export NODE_OPTIONS=--openssl-legacy-provider to fix this error.

That works, and finally I can npm run build to build the project.

This isn’t really that bad (I only had to remove a dependency and pass a slightly mysterious node option!), but I would rather not be derailed by those build errors.

for me, a build system isn’t worth it for small projects

For me, a complicated Javascript build system just doesn’t seem worth it for small 500-line projects – it means giving up being able to easily update the project in the future in exchange for some pretty marginal benefits.

esbuild seems a little more stable

I want to give a quick shoutout to esbuild: I learned about esbuild in 2021 and used for a project, and so far it does seem a more reliable way to build JS projects.

I just tried to build an esbuild project that I last touched 8 months ago on a new computer, and it worked. But I can’t say for sure if I’ll be able to easily build that project in 2 years. Maybe it will, I hope so!

not using a build system is usually pretty easy

Here’s what the part of nginx playground code that imports all the libraries looks like:

<script src="js/vue.global.prod.js"></script>
<script src="codemirror-5.63.0/lib/codemirror.js"></script>
<script src="codemirror-5.63.0/mode/nginx/nginx.js"></script>
<script src="codemirror-5.63.0/mode/shell/shell.js"></script>
<script src="codemirror-5.63.0/mode/javascript/javascript.js"></script>
<link rel="stylesheet" href="codemirror-5.63.0/lib/codemirror.css">
<script src="script.js "></script>

This project is also using Vue, but it just uses a <script src to load Vue – there’s no build process for the frontend.

a no-build-system template for using Vue

A couple of people asked how to get started writing Javascript without a build system. Of course you can write vanilla JS if you want, but my usual framework is Vue 3.

Here’s a tiny template I built for starting a Vue 3 project with no build system. It’s just 2 files and ~30 lines of HTML/JS.

some libraries require you to use a build system

This build system stuff is on my mind recently because I’m using CodeMirror 5 for a new project this week, and I saw there was a new version, CodeMirror 6.

So I thought – cool, maybe I should use CodeMirror 6 instead of CodeMirror 5. But – it seems like you can’t use CodeMirror 6 without a build system (according to the migration guide). So I’m going to stick with CodeMirror 5.

Similarly, you used to be able to just download Tailwind as a giant CSS file, but Tailwind 3 doesn’t seem to be available as a big CSS file at all anymore, you need to run Javascript to build it. So I’m going to keep using Tailwind 2 for now. (I know, I know, you’re not supposed to use the big CSS file, but it’s only 300KB gzipped and I really don’t want a build step)

(edit: it looks like Tailwind released a standalone CLI in 2021 which seems like a nice option)

I’m not totally sure why some libraries don’t provide a no-build-system version – maybe distributing a no-build-system version would add a lot of additional complexity to the library, and the maintainer doesn’t think it’s worth it. Or maybe the library’s design means that it’s not possible to distribute a no-build-system version for some reason.

I’d love more tips for no-build-system javascript

My main strategies so far are:

  • search for “CDN” on a library’s website to find a standalone javascript file
  • use https://unpkg.com to see if the library has a built version I can use
  • host my own version of libraries instead of relying on a CDN that might go down
  • write my own simple integrations instead of pulling in another dependency (for example I wrote my own CodeMirror component for Vue the other day)
  • if I want a build system, use esbuild

A couple of other things that look interesting but that I haven’t looked into:

Print copies of The Pocket Guide to Debugging have arrived

Hello! We released The Pocket Guide to Debugging back in December, and here’s a final update: the print copies are done printing and they’ve arrived at the warehouse, ready to ship to anyone who wants one.

You can buy the print or PDF version now, and if you preordered it, your copy should already have shipped. Some people have told me that they already received theirs! Email me if you haven’t gotten the shipping confirmation.

some pictures

Here are some photos of what the print version looks like:

what was involved in printing it

In case anyone is interested, here’s what was involved in putting together the print version:

  1. Make a PDF copy that people can print on their home printer (with a 360-line Python program)
  2. Test on my home printer that the “print at home version” prints properly
  3. Release the “print at home” version (this was back in December)
  4. Take a couple of weeks off, since it’s the end of the year
  5. Ask the illustrator to make a back cover
  6. Get a quote from the print company
  7. Agonize a bit over whether to print the zine as perfect bound or saddle stitched (stapled). Pick perfect bound.
  8. Find out from the print company how wide the spine has to be
  9. With the help of the illustrator, make a design for the spine.
  10. Get an ISBN number (just a couple of clicks at Libraries and Archives Canada)
  11. Get a bar code for the ISBN (from bookow), edit it to make it a little smaller, and put it on the back cover
  12. Send the new PDF to the print company and request a print proof
  13. Wait a week or so for the proof to get shipped across the continent
  14. Once the proof arrives, realize that the inner margins are too small, because it was perfect bound and perfect bound books need bigger margins (We’d already tried to account for that, but we didn’t make them big enough)
  15. Measure various books I have around the house and print some new sample pages to figure out the right margins
  16. Painstakingly manually readjust every single page to have slightly different proportions, so that I can increase the margins
  17. Edit the Python script to make a new PDF with the bigger margins
  18. Send the final files to the print company
  19. Wait a week for them to print 1500 copies
  20. The print copies arrive at the warehouse!
  21. Wait another 3 business days for the (amazing) folks who do the shipping to send out all 700 or so preorders
  22. Success!

Printing 1500 copies of something is always a little scary, but I’m really happy with how it turned out.

thanks so much to everyone who preordered!

If you preordered the print version, thanks so much for your patience – having the preorders really helps me decide how many to print.

And please let me know if something went wrong – 1 or 2 packages always get lost in the mail and while I can’t help find them, it’s very easy for me to just ship you another one :)

Why does 0.1 + 0.2 = 0.30000000000000004?

Hello! I was trying to write about floating point yesterday, and I found myself wondering about this calculation, with 64-bit floats:

>>> 0.1 + 0.2
0.30000000000000004

I realized that I didn’t understand exactly how it worked. I mean, I know floating point calculations are inexact, and I know that you can’t exactly represent 0.1 in binary, but: there’s a floating point number that’s closer to 0.3 than 0.30000000000000004! So why do we get the answer 0.30000000000000004?

If you don’t feel like reading this whole post with a bunch of calculations, the short answer is that 0.1000000000000000055511151231257827021181583404541015625 + 0.200000000000000011102230246251565404236316680908203125 lies exactly between 2 floating point numbers, 0.299999999999999988897769753748434595763683319091796875 (usually printed as 0.3) and 0.3000000000000000444089209850062616169452667236328125 (usually printed as 0.30000000000000004). The answer is 0.30000000000000004 (the second one) because its significand is even.

how floating point addition works

This is roughly how floating point addition works:

  1. Add together the numbers (with extra precision)
  2. Round the result to the nearest floating point number

So let’s use these rules to calculate 0.1 + 0.2. I just learned how floating point addition works yesterday so it’s possible I’ve made some mistakes in this post, but I did get the answers I expected at the end.

step 1: find out what 0.1 and 0.2 are

First, let’s use Python to figure out what the exact values of 0.1 and 0.2 are, as 64-bit floats.

>>> f"{0.1:.80f}"
'0.10000000000000000555111512312578270211815834045410156250000000000000000000000000'
>>> f"{0.2:.80f}"
'0.20000000000000001110223024625156540423631668090820312500000000000000000000000000'

These really are the exact values: because floating point numbers are in base 2, you can represent them all exactly in base 10. You just need a lot of digits sometimes :)

step 2: add the numbers together

Next, let’s add those numbers together. We can add the fractional parts together as integers to get the exact answer:

>>> 1000000000000000055511151231257827021181583404541015625 + 2000000000000000111022302462515654042363166809082031250
3000000000000000166533453693773481063544750213623046875

So the exact sum of those two floating point numbers is 0.3000000000000000166533453693773481063544750213623046875

This isn’t our final answer though because 0.3000000000000000166533453693773481063544750213623046875 isn’t a 64-bit float.

step 3: look at the nearest floating point numbers

Now, let’s look at the floating point numbers around 0.3. Here’s the closest floating point number to 0.3 (usually written as just 0.3, even though that isn’t its exact value):

>>> f"{0.3:.80f}"
'0.29999999999999998889776975374843459576368331909179687500000000000000000000000000'

We can figure out the next floating point number after 0.3 by serializing 0.3 to 8 bytes with struct.pack, adding 1, and then using struct.unpack:

>>> struct.pack("!d", 0.3)
b'?\xd3333333'
# manually add 1 to the last byte
>>> next_float = struct.unpack("!d", b'?\xd3333334')[0]
>>> next_float
0.30000000000000004
>>> f"{next_float:.80f}"
'0.30000000000000004440892098500626161694526672363281250000000000000000000000000000'

Apparently you can also do this with math.nextafter:

>>> math.nextafter(0.3, math.inf)
0.30000000000000004

So the two 64-bit floats around 0.3 are 0.299999999999999988897769753748434595763683319091796875 and 0.3000000000000000444089209850062616169452667236328125

step 4: find out which one is closest to our result

It turns out that 0.3000000000000000166533453693773481063544750213623046875 is exactly in the middle of 0.299999999999999988897769753748434595763683319091796875 and 0.3000000000000000444089209850062616169452667236328125.

You can see that with this calculation:

>>> (3000000000000000444089209850062616169452667236328125000 + 2999999999999999888977697537484345957636833190917968750) // 2 == 3000000000000000166533453693773481063544750213623046875
True

So neither of them is closest.

how does it know which one to round to?

In the binary representation of a floating point number, there’s a number called the “significand”. In cases like this (where the result is exactly in between 2 successive floating point number, it’ll round to the one with the even significand.

In this case that’s 0.300000000000000044408920985006261616945266723632812500

We actually saw the significand of this number a bit earlier:

  • 0.30000000000000004 is struct.unpack('!d', b'?\xd3333334')
  • 0.3 is struct.unpack('!d', b'?\xd3333333')

The last digit of the big endian hex representation of 0.30000000000000004 is 4, so that’s the one with the even significand (because the significand is at the end).

let’s also work out the whole calculation in binary

Above we did the calculation in decimal, because that’s a little more intuitive to read. But of course computers don’t do these calculations in decimal – they’re done in a base 2 representation. So I wanted to get an idea of how that worked too.

I don’t think this binary calculation part of the post is particularly clear but it was helpful for me to write out. There are a really a lot of numbers and it might be terrible to read.

how 64-bit floats numbers work: exponent and significand

64-bit floating point numbers are represented with 2 integers: an exponent and the significand and a 1-bit sign.

Here’s the equation of how the exponent and significand correspond to an actual number

$$\text{sign} \times 2^\text{exponent} (1 + \frac{\text{significand}}{2^{52}})$$

For example if the exponent was 1 the significand was 2**51, and the sign was positive, we’d get

$$2^{1} (1 + \frac{2^{51}}{2^{52}})$$

which is equal to 2 * (1 + 0.5) , or 3.

step 1: get the exponent and significand for 0.1 and 0.2

I wrote some inefficient functions to get the exponent and significand of a positive float in Python:

def get_exponent(f):
    # get the first 12 bytes
    bytestring = struct.pack('!d', f)
    return int.from_bytes(bytestring, byteorder='big') >> 52
def get_significand(f):
    # get the last 52 bytes
    bytestring = struct.pack('!d', f)
    x = int.from_bytes(bytestring, byteorder='big')
    exponent = get_exponent(f)
    return x ^ (exponent << 52)

I’m ignoring the sign bit (the first bit) because we only need these functions to work on two numbers (0.1 and 0.2) and those two numbers are both positive.

First, let’s get the exponent and significand of 0.1. We need to subtract 1023 to get the actual exponent because that’s how floating point works.

>>> get_exponent(0.1) - 1023
-4
>>> get_significand(0.1)
2702159776422298

The way these numbers work together to get 0.1 is 2**exponent + significand / 2**(52 - exponent).

Here’s that calculation in Python:

>>> 2**-4 + 2702159776422298 / 2**(52 + 4)
0.1

(you might legitimately be worried about floating point accuracy issues with this calculation, but in this case I’m pretty sure it’s fine because these numbers by definition don’t have accuracy issues – the floating point numbers starting at 2**-4 go up in steps of 1/2**(52 + 4))

We can do the same thing for 0.2:

>>> get_exponent(0.2) - 1023
-3
>>> get_significand(0.2)
2702159776422298

And here’s how that exponent and significand work together to get 0.2:

>>> 2**-3 + 2702159776422298 / 2**(52 + 3)
0.2

(by the way, it’s not a coincidence that 0.1 and 0.2 have the same significand – it’s because x and 2*x always have the same significand)

step 2: rewrite 0.1 to have a bigger exponent

0.2 has a bigger exponent than 0.1 – -3 instead of -4.

So we need to rewrite

2**-4 + 2702159776422298 / 2**(52 + 4)

to be X / (2**52 + 3)

If we solve for X in 2**-4 + 2702159776422298 / 2**(52 + 4) = X / (2**52 + 3), we get:

X = 2**51 + 2702159776422298 /2

We can calculate that in Python pretty easily:

>>> 2**51 + 2702159776422298 //2
3602879701896397

step 3: add the significands

Now we’re trying to do this addition

2**-3 + 2702159776422298 / 2**(52 + 3) + 3602879701896397 / 2**(52 + 3)

So we need to add together 2702159776422298 and 3602879701896397

>>> 2702159776422298  + 3602879701896397
6305039478318695

Cool. But 6305039478318695 is more than 2**52 - 1 (the maximum value for a significand), so we have a problem:

>>> 6305039478318695 > 2**52
True

step 4: increase the exponent

Right now our answer is

2**-3 + 6305039478318695 / 2**(52 + 3)

First, let’s subtract 2**52 to get

2**-2 + 1801439850948199 / 2**(52 + 3)

This is almost perfect, but the 2**(52 + 3) at the end there needs to be a 2**(52 + 2).

So we need to divide 1801439850948199 by 2. This is where we run into inaccuracies – 1801439850948199 is odd!

>>> 1801439850948199  / 2
900719925474099.5

It’s exactly in between two integers, so we round to the nearest even number (which is what the floating point specification says to do), so our final floating point number result is:

>>> 2**-2 + 900719925474100 / 2**(52 + 2)
0.30000000000000004

That’s the answer we expected:

>>> 0.1 + 0.2
0.30000000000000004

this probably isn’t exactly how it works in hardware

The way I’ve described the operations here isn’t literally exactly what happens when you do floating point addition (it’s not “solving for X” for example), I’m sure there are a lot of efficient tricks. But I think it’s about the same idea.

printing out floating point numbers is pretty weird

We said earlier that the floating point number 0.3 isn’t equal to 0.3. It’s actually this number:

>>> f"{0.3:.80f}"
'0.29999999999999998889776975374843459576368331909179687500000000000000000000000000'

So when you print out that number, why does it display 0.3?

The computer isn’t actually printing out the exact value of the number, instead it’s printing out the shortest decimal number d which has the property that our floating point number f is the closest floating point number to d.

It turns out that doing this efficiently isn’t trivial at all, and there are a bunch of academic papers about it like Printing Floating-Point Numbers Quickly and Accurately. or How to print floating point numbers accurately.

would it be more intuitive if computers printed out the exact value of a float?

Rounding to a nice clean decimal value is nice, but in a way I feel like it might be more intuitive if computers just printed out the exact value of a floating point number – it might make it seem a lot less surprising when you get weird results.

To me, 0.1000000000000000055511151231257827021181583404541015625 + 0.200000000000000011102230246251565404236316680908203125 = 0.3000000000000000444089209850062616169452667236328125 feels less surprising than 0.1 + 0.2 = 0.30000000000000004.

Probably this is a bad idea, it would definitely use a lot of screen space.

a quick note on PHP

Someone in the comments somewhere pointed out that <?php echo (0.1 + 0.2 );?> prints out 0.3. Does that mean that floating point math is different in PHP?

I think the answer is no – if I run:

<?php echo (0.1 + 0.2 )- 0.3);?> on this page, I get the exact same answer as in Python 5.5511151231258E-17. So it seems like the underlying floating point math is the same.

I think the reason that 0.1 + 0.2 prints out 0.3 in PHP is that PHP’s algorithm for displaying floating point numbers is less precise than Python’s – it’ll display 0.3 even if that number isn’t the closest floating point number to 0.3.

that’s all!

I kind of doubt that anyone had the patience to follow all of that arithmetic, but it was helpful for me to write down, so I’m publishing this post anyway. Hopefully some of this makes sense.