Reading List
The most recent articles from a list of feeds I subscribe to.
Some notes on nix flakes
I’ve been using nix for about 9 months now. For all of that time I’ve been steadfastly ignoring flakes, but everyone keeps saying that flakes are great and the best way to use nix, so I decided to try to figure out what the deal is with them.
I found it very hard to find simple examples of flake files and I ran into a few problems that were very confusing to me, so I wanted to write down some very basic examples and some of the problems I ran into in case it’s helpful to someone else who’s getting started with flakes.
First, let’s talk about what a flake is a little.
addition from a couple months later: I still do not actually understand flakes, but a couple of months after I wrote this post, Jade wrote Flakes aren’t real and cannot hurt you: a guide to using Nix flakes the non-flake way which I still haven’t fully processed but is the closest thing I’ve found to an explanation of flakes that I can understand
flakes are self-contained
Every explanation I’ve found of flakes explains them in terms of other nix concepts (“flakes simplify nix usability”, “flakes are processors of Nix code”). Personally I really needed a way to think about flakes in terms of other non-nix things and someone made an analogy to Docker containers that really helped me, so I’ve been thinking about flakes a little like Docker container images.
Here are some ways in which flakes are like Docker containers:
- you can install and compile any software you want in them
- you can use them as a dev environment (the flake sets up all your dependencies)
- you can share your flake with other people with a
flake.nixfile and then they can build the software exactly the same way you built it (a little like how you can share aDockerfile, though flakes are MUCH better at the “exactly the same way you built it” thing)
flakes are also different from Docker containers in a LOT of ways:
- with a
Dockerfile, you’re not actually guaranteed to get the exact same results as another user. Withflake.nixandflake.lockyou are. - they run natively on Mac (you don’t need to use Linux / a Linux VM the way you do with Docker)
- different flakes can share dependencies very easily (you can technically share layers between Docker images, but flakes are MUCH better at this)
- flakes can depend on other flakes and pick and choose which parts they want to take from their dependencies
flake.nixfiles are programs in the nix programming language instead of mostly a bunch of shell commands- the way they do isolation is completely different (nix uses dynamic linker/rpath tricks instead of filesystem overlays, and there are no cgroups or namespaces or VMs or anything with nix)
Obviously this analogy breaks down pretty quickly (the list of differences is VERY long), but they do share the “you can share a dev environment with a single configuration file” design goal.
nix has a lot of pre-compiled binaries
To me one of the biggest advantages of nix is that I’m on a Mac and nix has a repository with a lot of pre-compiled binaries of various packages for Mac. I mostly mention this because people always say that nix is good because it’s “declarative” or “reproducible” or “functional” or whatever but my main motivation for using nix personally is that it has a lot of binary packages. I do appreciate that it makes it easier for me to build a 5-year-old version of hugo on mac though.
My impression is that nix has more binary packages than Homebrew does, so installing things is faster and I don’t need to build as much from source.
my goal: make a flake with every package I want installed on my system
Previously I was using nix as a Homebrew replacement like this (which I talk about more in this blog post):
- run
nix-env -iA nixpkgs.whateverto install stuff - that’s it
This worked great (except that it randomly broke occasionally, but someone helped me find a workaround for that so the random breaking wasn’t a big issue).
I thought it might be fun to have a single flake.nix file where I could maintain a list
of all the packages I wanted installed and then put all that stuff in a
directory in my PATH. This isn’t very well motivated: my previous setup was
generally working just fine, but I have a long history of fiddling with my
computer setup (Arch Linux ftw) and so I decided to have a Day Of Fiddling.
I think the only practical advantages of flakes for me are:
- I could theoretically use the
flake.nixfile to set up a new computer more easily - I can never remember how to uninstall software in nix, deleting a line in a configuration file is maybe easier to remember
These are pretty minor though.
how do we make a flake?
Okay, so I want to make a flake with a bunch of packages installed in it, let’s say Ruby and cowsay to start. How do I
do that? I went to zero-to-nix and copied and pasted some things and ended up with this flake.nix file (here it is in a gist):
{
inputs.nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-23.05-darwin";
outputs = { self, nixpkgs }: {
devShell.aarch64-darwin = nixpkgs.legacyPackages.aarch64-darwin.mkShell {
buildInputs = with nixpkgs.legacyPackages.aarch64-darwin; [
cowsay
ruby
];
};
};
}
This has a little bit of boilerplate so let’s list the things I understand about this:
- nixpkgs is a huge central repository of nix packages
aarch64-darwinis my machine’s architecture, this is important because I’m asking nix to download binaries- I’ve been thinking of an “input” as a sort of dependency.
nixpkgsis my one input. I get to pick and choose which bits of it I want to bring into my flake though. - the
github:NixOS/nixpkgs/nixpkgs-23.05-darwinurl scheme is a bit unusual: the format isgithub:USER/REPO_NAME/TAG_OR_BRANCH_NAME. So this is looking at thenixpkgs-23.05-darwintag in theNixOS/nixpkgsrepository. mkShellis a nix function that’s apparently useful if you want to runnix develop. I stopped using it after this so I don’t know more than that.devShell.aarch64-darwinis the name of the output. Apparently I need to give it that exact name or elsenix developwill yell at mecowsayandrubyare the things I’m taking from nixpkgs to put in my output- I don’t know what
selfis doing here or whatlegacyPackagesis about
Okay, cool. Let’s try to build it:
$ nix build
error: getting status of '/nix/store/w1v41cyqyx4d7q4g7c8nb50bp9dvjm29-source/flake.nix': No such file or directory
This error is VERY mysterious – what is /nix/store/w1v41cyqyx4d7q4g7c8nb50bp9dvjm29-source/ and why does nix think it should exist???
I was totally stuck until a very nice person on Mastodon helped me. So let’s talk about what’s going wrong here.
problem 1: nix completely ignores untracked files
Apparently nix flakes have some Weird Rules about git. The way it works is:
- if your current directory isn’t a git repo, everything is fine
- if your are in a git repository, and all your files have been
git added to git, everything is fine - but if you’re in a git directory and your
flake.nixfile isn’t tracked by git yet (because you just created it and are trying to get it to work), nix will COMPLETELY IGNORE YOUR FILE
After someone kindly told me what was happening, I found that this is mentioned in this blog post about flakes, which says:
Note that any file that is not tracked by Git is invisible during Nix evaluation
There’s also a github issue discussing what to do about this.
So we need to git add the file to get nix to pay attention to it. Cool. Let’s keep going.
a note on enabling the flake feature
To get any of the commands we’re going to talk about to work (like nix build), you need to enable two nix features:
- flakes
- “commands”
I set this up by putting experimental-features = nix-command flakes in my
~/.config/nix/nix.conf, but you can also run nix --extra-experimental-features "flakes nix-command" build instead of nix build.
time for nix develop
The instructions I was following told me that I could now run nix develop and get a shell inside my new environment. I tried it and it works:
$ nix develop
grapefruit:nix bork$ cowsay hi
____
< hi >
----
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
Cool! I was curious about how the PATH was set up inside this environment so I took a look:
grapefruit:nix bork$ echo $PATH
/nix/store/v5q1bxrqs6hkbsbrpwc81ccyyfpbl8wk-clang-wrapper-11.1.0/bin:/nix/store/x9jmvvxcys4zscff39cnpw0kyfvs80vp-clang-11.1.0/bin:/nix/store/3f1ii2y5fs1w7p0id9mkis0ffvhh1n8w-coreutils-9.1/bin:/nix/store/8ldvi6b3ahnph19vm1s0pyjqrq0qhkvi-cctools-binutils-darwin-wrapper-973.0.1/bin:/nix/store/5kbbxk18fp645r4agnn11bab8afm0ry3-cctools-binutils-darwin-973.0.1/bin:/nix/store/5si884h02nqx3dfcdm5irpf7caihl6f8-cowsay-3.7.0/bin:/nix/store/5bs5q2dw5bl7c4krcviga6yhdrqbvdq6-ruby-3.1.4/bin:/nix/store/3f1ii2y5fs1w7p0id9mkis0ffvhh1n8w-coreutils-9.1/bin
It looks like every dependency has been added to the PATH separately: for example there’s
/nix/store/5si884h02nqx3dfcdm5irpf7caihl6f8-cowsay-3.7.0/bin for cowsay and
/nix/store/5bs5q2dw5bl7c4krcviga6yhdrqbvdq6-ruby-3.1.4/bin for ruby. That’s
fine but it’s not how I wanted my setup to work: I wanted a single directory of
symlinks that I could just put in my PATH in my normal shell.
getting a directory of symlinks with buildEnv
I asked in the Nix discord and someone told me I could use buildEnv to turn
my flake into a directory of symlinks. As far as I can tell it’s just a way to
take nix packages and copy their symlinks into another directory.
After some fiddling, I ended up with this: (here’s a gist)
{
inputs.nixpkgs.url = "github:NixOS/nixpkgs/nixpkgs-23.05-darwin";
outputs = { self, nixpkgs }: {
defaultPackage.aarch64-darwin = nixpkgs.legacyPackages.aarch64-darwin.buildEnv {
name = "julia-stuff";
paths = with nixpkgs.legacyPackages.aarch64-darwin; [
cowsay
ruby
];
pathsToLink = [ "/share/man" "/share/doc" "/bin" "/lib" ];
extraOutputsToInstall = [ "man" "doc" ];
};
};
}
This put a bunch of symlinks in result/bin:
$ ls result/bin/
bundle bundler cowsay cowthink erb gem irb racc rake rbs rdbg rdoc ri ruby typeprof
Sweet! Now I have a thing I can theoretically put in my PATH – this result directory. Next I mostly just
needed to add every other package I wanted to install to this flake.nix file (I got the list
from nix-env -q).
next step: add all the packages
I ran into a bunch of weird problems adding all the packges I already had installed to my nix, so let’s talk about them.
problem 2: an unfree package
I wanted to install a non-free package called ngrok. Nix gave me 3 options for how I could do this. Option C seemed the most promising:
c) For `nix-env`, `nix-build`, `nix-shell` or any other Nix command you can add
{ allowUnfree = true; }
to ~/.config/nixpkgs/config.nix.
But adding { allowUnfree = true} to ~/.config/nixpkgs/config.nix didn’t do
anything for some reason so instead I went with option A, which did seem to
work:
$ export NIXPKGS_ALLOW_UNFREE=1
Note: For `nix shell`, `nix build`, `nix develop` or any other Nix 2.4+
(Flake) command, `--impure` must be passed in order to read this
environment variable.
problem 3: installing a flake from a relative path doesn’t work
I made a couple of flakes for custom Nix packages I’d made (which I wrote about in my first nix blog post, and I wanted to set them up like this (you can see the full configuration here):
hugoFlake.url = "path:../hugo-0.40";
paperjamFlake.url = "path:../paperjam";
This worked fine the first time I ran nix build, but when I reran nix build
again later I got some totally inscrutable error.
My workaround was just to run rm flake.lock everytime before running nix
build, which seemed to fix the problem.
I don’t really understand what’s going on here but there’s a very long github issue thread about it.
problem 4 : “error while reading the response from the build hook”
For a while, every time I ran nix build, I got this error:
$ nix build
error:
… while reading the response from the build hook
error: unexpected EOF reading a line
I spent a lot of time poking at my flake.nix trying to guess at what I could
have gone wrong.
A very nice person on Mastodon also helped me with this one and it turned out
that what I needed to do was find the nix-daemon process and kill it. I still
have no idea what happened here or what that error message means, I did upgrade
nix at some point during this whole process so I guess the
upgrade went wonky somehow.
I don’t think this one is a common problem.
problem 5: error with share/man symlink
I wanted to install the zulu package for Java, but when I tried to add it to
my list of packages I got this error complaining about a broken symlink:
$ nix build
error: builder for '/nix/store/4n9c4707iyiwwgi9b8qqx7mshzrvi27r-julia-dev.drv' failed with exit code 2;
last 1 log lines:
> error: not a directory: `/nix/store/2vc4kf5i28xcqhn501822aapn0srwsai-zulu-11.62.17/share/man'
For full logs, run 'nix log /nix/store/4n9c4707iyiwwgi9b8qqx7mshzrvi27r-julia-dev.drv'.
$ ls /nix/store/2vc4kf5i28xcqhn501822aapn0srwsai-zulu-11.62.17/share/ -l
lrwxr-xr-x 29 root 31 Dec 1969 man -> zulu-11.jdk/Contents/Home/man
I think what’s going on here is that the zulu package in nixpkgs-23.05 was
just broken (looks like it’s since been fixed in the unstable version).
I decided I didn’t feel like dealing with that and it turned out I already had
Java installed another way outside nix, so I just removed zulu from my list
and moved on.
putting it in my PATH
Now that I knew how to fix all of the weird problems I’d run into, I wrote a
little shell script called nix-symlink to build my flake and symlink it to
the very unimaginitively named ~/.nix-flake. The idea was that then I could
put ~/.nix-flake in my PATH and have all my programs available.
I think people usually use nix flakes in a per-project way instead of “a single global flake”, but this is how I wanted my setup to work so that’s what I did.
Here’s the nix-symlink script. The rm flake.lock is because of that relative path issue,
and the NIXPKGS_ALLOW_UNFREE is so I could install ngrok.
#!/bin/bash
set -euo pipefail
export NIXPKGS_ALLOW_UNFREE=1
cd ~/work/nixpkgs/flakes/grapefruit || exit
rm flake.lock
nix build --impure --out-link ~/.nix-flake
I put ~/.nix-flake at the beginning of my PATH (not at the end), but I might revisit that, we’ll see.
a note on GC roots
At the end of all this, I wanted to run a garbage collection because I’d
installed a bunch of random stuff that was taking about 20GB of extra hard
drive space in my /nix/store. I think there are two different ways to collect
garbage in nix:
nix-store --gcnix-collect-garbage
I have no idea what the difference between them is, but nix-collect-garbage
seemed to delete more stuff for some reason.
I wanted to check that my ~/.nix-flake directory was actually a GC root, so
that all my stuff wouldn’t get deleted when I ran a GC.
I ran nix-store --gc --print-roots to print out all the GC roots and my
~/.nix-flake was in there so everything was good! This command also runs a GC
so it was kind of a dangerous way to check if a GC was going to delete
everything, but luckily it worked.
problem 6: it’s a little slow
The last problem I ran into is speed. Previously, installing a new small package took me 2 seconds with nix-env -iA:
$ time nix-env -iA nixpkgs.sl
installing 'sl-5.05'
these 2 paths will be fetched (0.41 MiB download, 3.77 MiB unpacked):
/nix/store/yv1c98m5pncx3i5q7nr7i7mfjkiyii72-ncurses-6.4
/nix/store/2k78vf30czicjs0dq9x0sj4017ziwxkn-sl-5.05
copying path '/nix/store/yv1c98m5pncx3i5q7nr7i7mfjkiyii72-ncurses-6.4' from 'https://cache.nixos.org'...
copying path '/nix/store/2k78vf30czicjs0dq9x0sj4017ziwxkn-sl-5.05' from 'https://cache.nixos.org'...
building '/nix/store/zadpfs9k1cw5x7iniwwcqd8lb7nnc7bb-user-environment.drv'...
________________________________________________________
Executed in 1.96 secs fish external
Installing the same package with flakes takes 7 seconds, plus the time to edit the config file:
$ vim ~/work/nixpkgs/flakes/grapefruit/flake.nix
$ time nix-symlink
________________________________________________________
Executed in 7.04 secs fish external
usr time 1.78 secs 0.29 millis 1.78 secs
sys time 0.51 secs 2.03 millis 0.51 secs
I don’t know what to do about this so I’ll just live with it. We’ll see if this ends up being annoying or not
that’s it!
Now my new nix workflow is:
- edit my
flake.nixto add or remove packages (this file) - rerun my
nix-symlinkscript after editing it - maybe periodically run
nix-collect-garbage - that’s it
setting up the nix registry
The last thing I wanted to do was run
nix registry add nixpkgs github:NixOS/nixpkgs/nixpkgs-23.05-darwin
so that if I want to ad-hoc run a flake with nix run nixpkgs#cowsay, it’ll
take the version from the 23.05 version of nixpkgs. Mostly I just wanted this
so I didn’t have to download new versions of the nixpkgs repository all the
time – I just wanted to pin the 23.05 version.
I think nixpkgs-unstable is the default which I’m sure is fine too if you
want to have more up-to-date software.
my solutions are probably not the best
My solutions to all the nix problems I described are maybe not The Best ™,
but I’m happy that I figured out a way to install stuff that just involves one
relatively simple flake.nix file and a 6-line bash script and not a lot of other
machinery.
Personally I still feel extremely uncomfortable with nix and so it’s important to me to keep my configuration as simple as possible without a lot of extra abstraction layers that I don’t understand. I might try out flakey-profile at some point though because it seems extremely simple.
you can do way fancier stuff
You can manage a lot more stuff with nix, like:
- your npm / ruby / python / etc packages (I just do
npm installandpip installandbundle install) - your config files
There are all kind of tools that build on top of nix and flakes like home-manager. Like I said before though, it’s important to me to keep my configuration super simple so that I can have any hope of understanding how it works and being able to fix problems when it breaks so I haven’t paid attention to any of that stuff.
there’s a useful discord
I’ve been complaining about nix a little in this post, but as usual with open source projects I assume that nix has all of these papercuts because it’s a complicated system run by a small team of volunteers with very limited time.
Folks on the unofficial nix discord have been helpful, I’ve had a somewhat mixed experience there but they have a “support forum” section in there and I’ve gotten answers to a lot of my questions.
some other nix resources
the main resources I’ve found for understanding nix flakes are:
- Nix Flakes, Part 1: An introduction and tutorial, I think by their creator
- xe iaso’s blog
- ian henry’s blog
- the nix docs
- zero to nix
Also Kamal (my partner) uses nix and that really helps, I think using nix with an experienced friend around is a lot easier.
that’s all!
I still kind of like nix after using it for 9 months despite how confused I am about it all the time, I feel like once I get things working they don’t usually break.
We’ll see if that’s continues to be the case with flakes! Maybe I’ll go back to
just using nix-env -iAing everything if it goes badly.
How git cherry-pick and revert use 3-way merge
Hello! I was trying to explain to someone how git cherry-pick works the other
day, and I found myself getting confused.
What went wrong was: I thought that git cherry-pick was basically applying a
patch, but when I tried to actually do it that way, it didn’t work!
Let’s talk about what I thought cherry-pick did (applying a patch), why
that’s not quite true, and what it actually does instead (a “3-way merge”).
This post is extremely in the weeds and you definitely don’t need to understand this stuff to use git effectively. But if you (like me) are curious about git’s internals, let’s talk about it!
cherry-pick isn’t applying a patch
The way I previously understood git cherry-pick COMMIT_ID is:
- calculate the diff for
COMMIT_ID, likegit show COMMIT_ID --patch > out.patch - Apply the patch to the current branch, like
git apply out.patch
Before we get into this – I want to be clear that this model is mostly right, and if that’s your mental model that’s fine. But it’s wrong in some subtle ways and I think that’s kind of interesting, so let’s see how it works.
If I try to do the “calculate the diff and apply the patch” thing in a case where there’s a merge conflict, here’s what happens:
$ git show 10e96e46 --patch > out.patch
$ git apply out.patch
error: patch failed: content/post/2023-07-28-why-is-dns-still-hard-to-learn-.markdown:17
error: content/post/2023-07-28-why-is-dns-still-hard-to-learn-.markdown: patch does not apply
This just fails – it doesn’t give me any way to resolve the conflict or figure out how to solve the problem.
This is quite different from what actually happens when run git cherry-pick,
which is that I get a merge conflict:
$ git cherry-pick 10e96e46
error: could not apply 10e96e46... wip
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
So it seems like the “git is applying a patch” model isn’t quite right. But the error message literally does say “could not apply 10e96e46”, so it’s not quite wrong either. What’s going on?
so what is cherry-pick doing?
I went digging through git’s source code to see how cherry-pick works, and
ended up at this line of code:
res = do_recursive_merge(r, base, next, base_label, next_label, &head, &msgbuf, opts);
So a cherry-pick is a… merge? What? How? What is it even merging? And how does merging even work in the first place?
I realized that I didn’t really know how git’s merge worked, so I googled it and found out that git does a thing called “3-way merge”. What’s that?
how git merges files: the 3-way merge
Let’s say I want to merge these 2 files. We’ll call them v1.py and v2.py.
def greet():
greeting = "hello"
name = "julia"
return greeting + " " + name
def say_hello():
greeting = "hello"
name = "aanya"
return greeting + " " + name
There are two lines that differ: we have
def greet()anddef say_helloname = "aanya"andname = "julia"
How do we know what to pick? It seems impossible!
But what if I told you that the original function was this (base.py)?
def say_hello():
greeting = "hello"
name = "julia"
return greeting + " " + name
Suddenly it seems a lot clearer! v1 changed the function’s name to greet
and v2 set name = "aanya". So to merge, we should make both those changes:
def greet():
greeting = "hello"
name = "aanya"
return greeting + " " + name
We can ask git to do this merge with git merge-file, and it gives us exactly
the result we expected: it picks def greet() and name = "aanya".
$ git merge-file v1.py base.py v2.py -p
def greet():
greeting = "hello"
name = "aanya"
return greeting + " " + name⏎
This way of merging where you merge 2 files + their original version is called a 3-way merge.
If you want to try it out yourself in a browser, I made a little playground at jvns.ca/3-way-merge/. I made it very quickly so it’s not mobile friendly.
git merges changes, not files
The way I think about the 3-way merge is – git merges changes, not files. We have an original file and 2 possible changes to it, and git tries to combine both of those changes in a reasonable way. Sometimes it can’t (for example if both changes change the same line), and then you get a merge conflict.
Git can also merge more than 2 possible changes: you can have an original file and 8 possible changes, and it can try to reconcile all of them. That’s called an octopus merge but I don’t know much more than that, I’ve never done one.
how git uses 3-way merge to apply a patch
Now let’s get a little weird! When we talk about git “applying a patch” (as you
do in a rebase or revert or cherry-pick), it’s not actually creating a
patch file and applying it. Instead, it’s doing a 3-way merge.
Here’s how applying commit X as a patch to your current commit corresponds to
this v1, v2, and base setup from before:
- The version of the file in your current commit is
v1. - The version of the file before commit X is
base - The version of the file in commit X. Call that
v2 - Run
git merge-file v1 base v2to combine them (technically git does not actually rungit merge-file, it runs a C function that does it)
Together, you can think of base and v2 as being the “patch”: the diff between
them is the change that you want to apply to v1.
how cherry-pick works
Let’s say we have this commit graph, and we want to cherry-pick Y on to main:
A - B (main)
\
\
X - Y - Z
How do we turn that into a 3-way merge? Here’s how it translates into our v1, v2 and base from earlier:
Bis v1Xis the base,Yis v2
So together X and Y are the “patch”.
And git rebase is just like git cherry-pick, but repeated a bunch of times.
how revert works
Now let’s say we want to run git revert Y on this commit graph
X - Y - Z - A - B
Bis v1Yis the base,Xis v2
This is exactly like a cherry-pick, but with X and Y reversed. We have to
flip them because we want to apply a “reverse patch”.
Revert and cherry-pick are so closely related in git that they’re actually implemented in the same file: revert.c.
this “3-way patch” is a really cool trick
This trick of using a 3-way merge to apply a commit as a patch seems really clever and cool and I’m surprised that I’d never heard of it before! I don’t know of a name for it, but I kind of want to call it a “3-way patch”.
The idea is that with a 3-way patch, you specify the patch as 2 files: the file
before the patch and after (base and v2 in our language in this post).
So there are 3 files involved: 1 for the original and 2 for the patch.
The point is that the 3-way patch is a much better way to patch than a normal patch, because you have a lot more context for merging when you have both full files.
Here’s more or less what a normal patch for our example looks like:
@@ -1,1 +1,1 @@:
- def greet():
+ def say_hello():
greeting = "hello"
and a 3-way patch. This “3-way patch” is not a real file format, it’s just something I made up.
BEFORE: (the full file)
def greet():
greeting = "hello"
name = "julia"
return greeting + " " + name
AFTER: (the full file)
def say_hello():
greeting = "hello"
name = "julia"
return greeting + " " + name
“Building Git” talks about this
The book Building Git by James Coglan
is the only place I could find other than the git source code explaining how
git cherry-pick actually uses 3-way merge under the hood (I thought Pro Git might
talk about it, but it didn’t seem to as far as I could tell).
I actually went to buy it and it turned out that I’d already bought it in 2019 so it was a good reference to have here :)
merging is actually much more complicated than this
There’s more to merging in git than the 3-way merge – there’s something called a “recursive merge” that I don’t understand, and there are a bunch of details about how to deal with handling file deletions and moves, and there are also multiple merge algorithms.
My best idea for where to learn more about this stuff is Building Git, though I haven’t read the whole thing.
so what does git apply do?
I also went looking through git’s source to find out what git apply does, and it
seems to (unsurprisingly) be in apply.c. That code parses a patch file, and
then hunts through the target file to figure out where to apply it. The core logic
seems to be around here:
I think the idea is to start at the line number that the patch suggested and
then hunt forwards and backwards from there to try to find it:
/*
* There's probably some smart way to do this, but I'll leave
* that to the smart and beautiful people. I'm simple and stupid.
*/
backwards = current;
backwards_lno = line;
forwards = current;
forwards_lno = line;
current_lno = line;
for (i = 0; ; i++) {
...
That all seems pretty intuitive and about what I’d naively expect.
how git apply --3way works
git apply also has a --3way flag that does a 3-way merge. So we actually
could have more or less implemented git cherry-pick with git apply like
this:
$ git show 10e96e46 --patch > out.patch
$ git apply out.patch --3way
Applied patch to 'content/post/2023-07-28-why-is-dns-still-hard-to-learn-.markdown' with conflicts.
U content/post/2023-07-28-why-is-dns-still-hard-to-learn-.markdown
--3way doesn’t just use the contents of the patch file though! The patch file starts with:
index d63ade04..65778fc0 100644
d63ade04 and 65778fc0 are the IDs of the old/new versions of that file in
git’s object database, so git can retrieve them to do a 3-way patch
application. This won’t work if someone emails you a patch and you don’t have
the files for the new/old versions of the file though: if you’re missing the
blobs you’ll get this error:
$ git apply out.patch
error: repository lacks the necessary blob to perform 3-way merge.
3-way merge is old
A couple of people pointed out that 3-way merge is much older than git, it’s from the late 70s or something. Here’s a paper from 2007 talking about it
that’s all!
I was pretty surprised to learn that I didn’t actually understand the core way that git applies patches internally – it was really cool to learn about!
I have lots of issues with git’s UI but I think this particular thing is not one of them. The 3-way merge seems like a nice unified way to solve a bunch of different problems, it’s pretty intuitive for people (the idea of “applying a patch” is one that a lot of programmers are used to thinking about, and the fact that it’s implemented as a 3-way merge under the hood is an implementation detail that nobody actually ever needs to think about).
Also a very quick plug: I’m working on writing a zine about git, if you’re interested in getting an email when it comes out you can sign up to my very infrequent announcements mailing list.
git rebase: what can go wrong?
Hello! While talking with folks about Git, I’ve been seeing a comment over and over to the effect of “I hate rebase”. People seemed to feel pretty strongly about this, and I was really surprised because I don’t run into a lot of problems with rebase and I use it all the time.
I’ve found that if many people have a very strong opinion that’s different from mine, usually it’s because they have different experiences around that thing from me.
So I asked on Mastodon:
today I’m thinking about the tradeoffs of using
git rebasea bit. I think the goal of rebase is to have a nice linear commit history, which is something I like.but what are the costs of using rebase? what problems has it caused for you in practice? I’m really only interested in specific bad experiences you’ve had here – not opinions or general statements like “rewriting history is bad”
I got a huge number of incredible answers to this, and I’m going to do my best to summarize them here. I’ll also mention solutions or workarounds to those problems in cases where I know of a solution. Here’s the list:
- fixing the same conflict repeatedly is annoying
- rebasing a lot of commits is hard
- undoing a rebase is hard
- force pushing to shared branches can cause lost work
- force pushing makes code reviews harder
- losing commit metadata
- more difficult reverts
- rebasing can break intermediate commits
- accidentally run git commit –amend instead of git rebase –continue
- splitting commits in an interactive rebase is hard
- complex rebases are hard
- rebasing long lived branches can be annoying
- rebase and commit discipline
- a “squash and merge” workflow
- miscellaneous problems
My goal with this isn’t to convince anyone that rebase is bad and you shouldn’t use it (I’m certainly going to keep using rebase!). But seeing all these problems made me want to be more cautious about recommending rebase to newcomers without explaining how to use it safely. It also makes me wonder if there’s an easier workflow for cleaning up your commit history that’s harder to accidentally mess up.
my git workflow assumptions
First, I know that people use a lot of different Git workflows. I’m going to be talking about the workflow I’m used to when working on a team, which is:
- the team uses a central Github/Gitlab repo to coordinate
- there’s one central
mainbranch. It’s protected from force pushes. - people write code in feature branches and make pull requests to
main - The web service is deployed from
mainevery time a pull request is merged. - the only way to make a change to
mainis by making a pull request on Github/Gitlab and merging it
This is not the only “correct” git workflow (it’s a very “we run a web service” workflow and open source project or desktop software with releases generally use a slightly different workflow). But it’s what I know so that’s what I’ll talk about.
two kinds of rebase
Also before we start: one big thing I noticed is that there were 2 different kinds of rebase that kept coming up, and only one of them requires you to deal with merge conflicts.
- rebasing on an ancestor, like
git rebase -i HEAD^^^^^^^to squash many small commits into one. As long as you’re just squashing commits, you’ll never have to resolve a merge conflict while doing this. - rebasing onto a branch that has diverged, like
git rebase main. This can cause merge conflicts.
I think it’s useful to make this distinction because sometimes I’m thinking about rebase type 1 (which is a lot less likely to cause problems), but people who are struggling with it are thinking about rebase type 2.
Now let’s move on to all the problems!
fixing the same conflict repeatedly is annoying
If you make many tiny commits, sometimes you end up in a hellish loop where you have to fix the same merge conflict 10 times. You can also end up fixing merge conflicts totally unnecessarily (like dealing with a merge conflict in code that a future commit deletes).
There are a few ways to make this better:
- first do a
git rebase -i HEAD^^^^^^^^^^^to squash all of the tiny commits into 1 big commit and then agit rebase mainto rebase onto a different branch. Then you only have to fix the conflicts once. - use
git rerereto automate repeatedly resolving the same merge conflicts (“rerere” stands for “reuse recorded resolution”, it’ll record your previous merge conflict resolutions and replay them). I’ve never tried this but I think you need to setgit config rerere.enabled trueand then it’ll automatically help you.
Also if I find myself resolving merge conflicts more than once in a rebase,
I’ll usually run git rebase --abort to stop it and then squash my commits into
one and try again.
rebasing a lot of commits is hard
Generally when I’m doing a rebase onto a different branch, I’m rebasing 1-2 commits. Maybe sometimes 5! Usually there are no conflicts and it works fine.
Some people described rebasing hundreds of commits by many different people onto a different branch. That sounds really difficult and I don’t envy that task.
undoing a rebase is hard
I heard from several people that when they were new to rebase, they messed up a rebase and permanently lost a week of work that they then had to redo.
The problem here is that undoing a rebase that went wrong is much more complicated
than undoing a merge that went wrong (you can undo a bad merge with something like git reset --hard HEAD^).
Many newcomers to rebase don’t even realize that undoing a rebase is even
possible, and I think it’s pretty easy to understand why.
That said, it is possible to undo a rebase that went wrong. Here’s an example of how to undo a rebase using git reflog.
step 1: Do a bad rebase (for example run git rebase -I HEAD^^^^^ and just delete 3 commits)
step 2: Run git reflog. You should see something like this:
ee244c4 (HEAD -> main) HEAD@{0}: rebase (finish): returning to refs/heads/main
ee244c4 (HEAD -> main) HEAD@{1}: rebase (pick): test
fdb8d73 HEAD@{2}: rebase (start): checkout HEAD^^^^^^^
ca7fe25 HEAD@{3}: commit: 16 bits by default
073bc72 HEAD@{4}: commit: only show tooltips on desktop
step 3: Find the entry immediately before rebase (start). In my case that’s ca7fe25
step 4: Run git reset --hard ca7fe25
A couple of other ways to undo a rebase:
- Apparently
@always refers to your current branch in git, so you can rungit reset --hard @{1}to reset your branch to its previous location. - Another solution folks mentioned that avoids having to use the reflog is to
make a “backup branch” with
git switch -c backupbefore rebasing, so you can easily get back to the old commit.
force pushing to shared branches can cause lost work
A few people mentioned the following situation:
- You’re collaborating on a branch with someone
- You push some changes
- They rebase the branch and run
git push --force(maybe by accident) - Now when you run
git pull, it’s a mess – you get the afatal: Need to specify how to reconcile divergent brancheserror - While trying to deal with the fallout you might lose some commits, especially if some of the people are involved aren’t very comfortable with git
This is an even worse situation than the “undoing a rebase is hard” situation because the missing commits might be split across many different people’s and the only worse thing than having to hunt through the reflog is multiple different people having to hunt through the reflog.
This has never happened to me because the only branch I’ve ever collaborated on
is main, and main has always been protected from force pushing (in my
experience the only way you can get something into main is through a pull
request). So I’ve never even really been in a situation where this could
happen. But I can definitely see how this would cause problems.
The main tools I know to avoid this are:
- don’t rebase on shared branches
- use
--force-with-leasewhen force pushing, to make sure that nobody else has pushed to the branch since your last fetch
Apparently the “since your last fetch” is important here – if you run git
fetch immediately before running git push --force-with-lease, the
--force-with-lease won’t protect you at all.
I was curious about why people would run git push --force on a shared branch. Some reasons people gave were:
- they’re working on a collaborative feature branch, and the feature branch needs to be rebased onto
main. The idea here is that you’re just really careful about coordinating the rebase so nothing gets lost. - as an open source maintainer, sometimes they need to rebase a contributor’s branch to fix a merge conflict
- they’re new to git, read some instructions online that suggested
git rebaseandgit push --forceas a solution, and followed them without understanding the consequences - they’re used to doing
git push --forceon a personal branch and ran it on a shared branch by accident
force pushing makes code reviews harder
The situation here is:
- You make a pull request on GitHub
- People leave some comments
- You update the code to address the comments, rebase to clean up your commits, and force push
- Now when the reviewer comes back, it’s hard for them to tell what you changed since the last time you saw it – all the commits show up as “new”.
One way to avoid this is to push new commits addressing the review comments, and then after the PR is approved do a rebase to reorganize everything.
I think some reviewers are more annoyed by this problem than others, it’s kind of a personal preference. Also this might be a Github-specific issue, other code review tools might have better tools for managing this.
losing commit metadata
If you’re rebasing to squash commits, you can lose important commit metadata
like Co-Authored-By. Also if you GPG sign your commits, rebase loses the
signatures.
There’s probably other commit metadata that you can lose that I’m not thinking of.
I haven’t run into this one so I’m not sure how to avoid it. I think GPG signing commits isn’t as popular as it used to be.
more difficult reverts
Someone mentioned that it’s important for them to be able to easily revert merging any branch (in case the branch broke something), and if the branch contains multiple commits and was merged with rebase, then you need to do multiple reverts to undo the commits.
In a merge workflow, I think you can revert merging any branch just by reverting the merge commit.
rebasing can break intermediate commits
If you’re trying to have a very clean commit history where the tests pass on every commit (very admirable!), rebasing can result in some intermediate commits that are broken and don’t pass the tests, even if the final commit passes the tests.
Apparently you can avoid this by using git rebase -x to run the test suite at
every step of the rebase and make sure that the tests are still passing. I’ve
never done that though.
accidentally run git commit --amend instead of git rebase --continue
A couple of people mentioned issues with running git commit --amend instead of git rebase --continue when resolving a merge conflict.
The reason this is confusing is that there are two reasons when you might want to edit files during a rebase:
- editing a commit (by using
editingit rebase -i), where you need to writegit commit --amendwhen you’re done - a merge conflict, where you need to run
git rebase --continuewhen you’re done
It’s very easy to get these two cases mixed up because they feel very similar. I think what goes wrong here is that you:
- Start a rebase
- Run into a merge conflict
- Resolve the merge conflict, and run
git add file.txt - Run
git commitbecause that’s what you’re used to doing after you rungit add - But you were supposed to run
git rebase --continue! Now you have a weird extra commit, and maybe it has the wrong commit message and/or author
splitting commits in an interactive rebase is hard
The whole point of rebase is to clean up your commit history, and combining
commits with rebase is pretty easy. But what if you want to split up a commit into 2
smaller commits? It’s not as easy, especially if the commit you want to split
is a few commits back! I actually don’t really know how to do it even though I
feel very comfortable with rebase. I’d probably just do git reset HEAD^^^ or
something and use git add -p to redo all my commits from scratch.
One person shared their workflow for splitting commits with rebase.
complex rebases are hard
If you try to do too many things in a single git rebase -i (reorder commits
AND combine commits AND modify a commit), it can get really confusing.
To avoid this, I personally prefer to only do 1 thing per rebase, and if I want to do 2 different things I’ll do 2 rebases.
rebasing long lived branches can be annoying
If your branch is long-lived (like for 1 month), having to rebase repeatedly gets painful. It might be easier to just do 1 merge at the end and only resolve the conflicts once.
The dream is to avoid this problem by not having long-lived branches but it doesn’t always work out that way in practice.
miscellaneous problems
A few more issues that I think are not that common:
- Stopping a rebase wrong: If you try to abort a rebase that’s going badly with
git reset --hardinstead ofgit rebase --abort, things will behave weirdly until you stop it properly - Weird interactions with merge commits: A couple of quotes about this: “If you rebase your working copy to keep a clean history for a branch, but the underlying project uses merges, the result can be ugly. If you do rebase -i HEAD~4 and the fourth commit back is a merge, you can see dozens of commits in the interactive editor.“, “I’ve learned the hard way to never rebase if I’ve merged anything from another branch”
rebase and commit discipline
I’ve seen a lot of people arguing about rebase. I’ve been thinking about why this is and I’ve noticed that people work at a few different levels of “commit discipline”:
- Literally anything goes, “wip”, “fix”, “idk”, “add thing”
- When you make a pull request (on github/gitlab), squash all of your crappy commits into a single commit with a reasonable message (usually the PR title)
- Atomic Beautiful Commits – every change is split into the appropriate number of commits, where each one has a nice commit message and where they all tell a story around the change you’re making
Often I think different people inside the same company have different levels of commit discipline, and I’ve seen people argue about this a lot. Personally I’m mostly a Level 2 person. I think Level 3 might be what people mean when they say “clean commit history”.
I think Level 1 and Level 2 are pretty easy to achieve without rebase – for
level 1, you don’t have to do anything, and for level 2, you can either press
“squash and merge” in github or run git switch main; git merge --squash mybranch on the command line.
But for Level 3, you either need rebase or some other tool (like GitUp) to help you organize your commits to tell a nice story.
I’ve been wondering if when people argue about whether people “should” use rebase or not, they’re really arguing about which minimum level of commit discipline should be required.
I think how this plays out also depends on how big the changes folks are making – if folks are usually making pretty small pull requests anyway, squashing them into 1 commit isn’t a big deal, but if you’re making a 6000-line change you probably want to split it up into multiple commits.
a “squash and merge” workflow
A couple of people mentioned using this workflow that doesn’t use rebase:
- make commits
- Run
git merge mainto merge main into the branch periodically (and fix conflicts if necessary) - When you’re done, use GitHub’s “squash and merge” feature (which is the
equivalent of running
git checkout main; git merge --squash mybranch) to squash all of the changes into 1 commit. This gets rid of all the “ugly” merge commits.
I originally thought this would make the log of commits on my branch too ugly,
but apparently git log main..mybranch will just show you the changes on your
branch, like this:
$ git log main..mybranch
756d4af (HEAD -> mybranch) Merge branch 'main' into mybranch
20106fd Merge branch 'main' into mybranch
d7da423 some commit on my branch
85a5d7d some other commit on my branch
Of course, the goal here isn’t to force people who have made beautiful atomic commits to squash their commits – it’s just to provide an easy option for folks to clean up a messy commit history (“add new feature; wip; wip; fix; fix; fix; fix; fix;“) without having to use rebase.
I’d be curious to hear about other people who use a workflow like this and if it works well.
there are more problems than I expected
I went into this really feeling like “rebase is fine, what could go wrong?” But many of these problems actually have happened to me in the past, it’s just that over the years I’ve learned how to avoid or fix all of them.
And I’ve never really seen anyone share best practices for rebase, other than “never force push to a shared branch”. All of these honestly make me a lot more reluctant to recommend using rebase.
To recap, I think these are my personal rebase rules I follow:
- stop a rebase if it’s going badly instead of letting it finish (with
git rebase --abort) - know how to use
git reflogto undo a bad rebase - don’t rebase a million tiny commits (instead do it in 2 steps:
git rebase -i HEAD^^^^and thengit rebase main) - don’t do more than one thing in a
git rebase -i. Keep it simple. - never force push to a shared branch
- never rebase commits that have already been pushed to
main
Thanks to Marco Rogers for encouraging me to think about the problems people have with rebase, and to everyone on Mastodon who helped with this.
Confusing git terminology
Hello! I’m slowly working on explaining git. One of my biggest problems is that after almost 15 years of using git, I’ve become very used to git’s idiosyncracies and it’s easy for me to forget what’s confusing about it.
So I asked people on Mastodon:
what git jargon do you find confusing? thinking of writing a blog post that explains some of git’s weirder terminology: “detached HEAD state”, “fast-forward”, “index/staging area/staged”, “ahead of ‘origin/main’ by 1 commit”, etc
I got a lot of GREAT answers and I’ll try to summarize some of them here. Here’s a list of the terms:
- HEAD and “heads”
- “detached HEAD state”
- “ours” and “theirs” while merging or rebasing
- “Your branch is up to date with ‘origin/main’”
- HEAD^, HEAD~ HEAD^^, HEAD~~, HEAD^2, HEAD~2
- .. and …
- “can be fast-forwarded”
- “reference”, “symbolic reference”
- refspecs
- “tree-ish”
- “index”, “staged”, “cached”
- “reset”, “revert”, “restore”
- “untracked files”, “remote-tracking branch”, “track remote branch”
- checkout
- reflog
- merge vs rebase vs cherry-pick
- rebase –onto
- commit
- more confusing terms
I’ve done my best to explain what’s going on with these terms, but they cover basically every single major feature of git which is definitely too much for a single blog post so it’s pretty patchy in some places.
HEAD and “heads”
A few people said they were confused by the terms HEAD and refs/heads/main,
because it sounds like it’s some complicated technical internal thing.
Here’s a quick summary:
- “heads” are “branches”. Internally in git, branches are stored in a directory called
.git/refs/heads. (technically the official git glossary says that the branch is all the commits on it and the head is just the most recent commit, but they’re 2 different ways to think about the same thing) HEADis the current branch. It’s stored in.git/HEAD.
I think that “a head is a branch, HEAD is the current branch” is a good
candidate for the weirdest terminology choice in git, but it’s definitely too
late for a clearer naming scheme so let’s move on.
There are some important exceptions to “HEAD is the current branch”, which we’ll talk about next.
“detached HEAD state”
You’ve probably seen this message:
$ git checkout v0.1
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
[...]
Here’s the deal with this message:
- In Git, usually you have a “current branch” checked out, for example
main. - The place the current branch is stored is called
HEAD. - Any new commits you make will get added to your current branch, and if you run
git merge other_branch, that will also affect your current branch - But
HEADdoesn’t have to be a branch! Instead it can be a commit ID. - Git calls this state (where HEAD is a commit ID instead of a branch) “detached HEAD state”
- For example, you can get into detached HEAD state by checking out a tag, because a tag isn’t a branch
- if you don’t have a current branch, a bunch of things break:
git pulldoesn’t work at all (since the whole point of it is to update your current branch)- neither does
git pushunless you use it in a special way git commit,git merge,git rebase, andgit cherry-pickdo still work, but they’ll leave you with “orphaned” commits that aren’t connected to any branch, so those commits will be hard to find
- You can get out of detached HEAD state by either creating a new branch or switching to an existing branch
“ours” and “theirs” while merging or rebasing
If you have a merge conflict, you can run git checkout --ours file.txt to pick the version of file.txt from the “ours” side. But which side is “ours” and which side is “theirs”?
I always find this confusing and I never use git checkout --ours because of
that, but I looked it up to see which is which.
For merges, here’s how it works: the current branch is “ours” and the branch you’re merging in is “theirs”, like this. Seems reasonable.
$ git checkout merge-into-ours # current branch is "ours"
$ git merge from-theirs # branch we're merging in is "theirs"
For rebases it’s the opposite – the current branch is “theirs” and the target branch we’re rebasing onto is “ours”, like this:
$ git checkout theirs # current branch is "theirs"
$ git rebase ours # branch we're rebasing onto is "ours"
I think the reason for this is that under the hood git rebase main is
repeatedly merging commits from the current branch into a copy of the main branch (you can
see what I mean by that in this weird shell script the implements git rebase using git merge. But I
still find it confusing.
This nice tiny site explains the “ours” and “theirs” terms.
A couple of people also mentioned that VSCode calls “ours”/“theirs” “current change”/“incoming change”, and that it’s confusing in the exact same way.
“Your branch is up to date with ‘origin/main’”
This message seems straightforward – it’s saying that your main branch is up
to date with the origin!
But it’s actually a little misleading. You might think that this means that
your main branch is up to date. It doesn’t. What it actually means is –
if you last ran git fetch or git pull 5 days ago, then your main branch
is up to date with all the changes as of 5 days ago.
So if you don’t realize that, it can give you a false sense of security.
I think git could theoretically give you a more useful message like “is up to
date with the origin’s main as of your last fetch 5 days ago” because the time
that the most recent fetch happened is stored in the reflog, but it doesn’t.
HEAD^, HEAD~ HEAD^^, HEAD~~, HEAD^2, HEAD~2
I’ve known for a long time that HEAD^ refers to the previous commit, but I’ve
been confused for a long time about the difference between HEAD~ and HEAD^.
I looked it up, and here’s how these relate to each other:
HEAD^andHEAD~are the same thing (1 commit ago)HEAD^^^andHEAD~~~andHEAD~3are the same thing (3 commits ago)HEAD^3refers the the third parent of a commit, and is different fromHEAD~3
This seems weird – why are HEAD~ and HEAD^ the same thing? And what’s the
“third parent”? Is that the same thing as the parent’s parent’s parent? (spoiler: it
isn’t) Let’s talk about it!
Most commits have only one parent. But merge commits have multiple parents –
they’re merging together 2 or more commits. In Git HEAD^ means “the parent of
the HEAD commit”. But what if HEAD is a merge commit? What does HEAD^ refer
to?
The answer is that HEAD^ refers to the the first parent of the merge,
HEAD^2 is the second parent, HEAD^3 is the third parent, etc.
But I guess they also wanted a way to refer to “3 commits ago”, so HEAD^3 is
the third parent of the current commit (which may have many parents if it’s a merge commit), and HEAD~3 is the parent’s parent’s
parent.
I think in the context of the merge commit ours/theirs discussion earlier, HEAD^ is “ours” and HEAD^2 is “theirs”.
.. and ...
Here are two commands:
git log main..testgit log main...test
What’s the difference between .. and ...? I never use these so I had to look it up in man git-range-diff. It seems like the answer is that in this case:
A - B main
\
C - D test
main..testis commits C and Dtest..mainis commit Bmain...testis commits B, C, and D
But it gets worse: apparently git diff also supports .. and ..., but
they do something completely different than they do with git log? I think the summary is:
git log test..mainshows changes onmainthat aren’t ontest, whereasgit log test...mainshows changes on both sides.git diff test..mainshowstestchanges andmainchanges (it diffsBandD) whereasgit diff test...maindiffsAandD(it only shows you the diff on one side).
this blog post talks about it a bit more.
“can be fast-forwarded”
Here’s a very common message you’ll see in git status:
$ git status
On branch main
Your branch is behind 'origin/main' by 2 commits, and can be fast-forwarded.
(use "git pull" to update your local branch)
What does “fast-forwarded” mean? Basically it’s trying to say that the two branches look something like this: (newest commits are on the right)
main: A - B - C
origin/main: A - B - C - D - E
or visualized another way:
A - B - C - D - E (origin/main)
|
main
Here origin/main just has 2 extra commits that main doesn’t have, so it’s
easy to bring main up to date – we just need to add those 2 commits.
Literally nothing can possibly go wrong – there’s no possibility of merge
conflicts. A fast forward merge is a very good thing! It’s the easiest way to combine 2 branches.
After running git pull, you’ll end up this state:
main: A - B - C - D - E
origin/main: A - B - C - D - E
Here’s an example of a state which can’t be fast-forwarded.
A - B - C - X (main)
|
- - D - E (origin/main)
Here main has a commit that origin/main doesn’t have (X). So
you can’t do a fast forward. In that case, git status would say:
$ git status
Your branch and 'origin/main' have diverged,
and have 1 and 2 different commits each, respectively.
“reference”, “symbolic reference”
I’ve always found the term “reference” kind of confusing. There are at least 3 things that get called “references” in git
- branches and tags like
mainandv0.2 HEAD, which is the current branch- things like
HEAD^^^which git will resolve to a commit ID. Technically these are probably not “references”, I guess git calls them “revision parameters” but I’ve never used that term.
“symbolic reference” is a very weird term to me because personally I think the only
symbolic reference I’ve ever used is HEAD (the current branch), and HEAD
has a very central place in git (most of git’s core commands’ behaviour depends
on the value of HEAD), so I’m not sure what the point of having it as a
generic concept is.
refspecs
When you configure a git remote in .git/config, there’s this +refs/heads/main:refs/remotes/origin/main thing.
[remote "origin"]
url = git@github.com:jvns/pandas-cookbook
fetch = +refs/heads/main:refs/remotes/origin/main
I don’t really know what this means, I’ve always just used whatever the default
is when you do a git clone or git remote add, and I’ve never felt any
motivation to learn about it or change it from the default.
“tree-ish”
The man page for git checkout says:
git checkout [-f|--ours|--theirs|-m|--conflict=<style>] [<tree-ish>] [--] <pathspec>...
What’s tree-ish??? What git is trying to say here is when you run git checkout THING ., THING can be either:
- a commit ID (like
182cd3f) - a reference to a commit ID (like
mainorHEAD^^orv0.3.2) - a subdirectory inside a commit (like
main:./docs) - I think that’s it????
Personally I’ve never used the “directory inside a commit” thing and from my perspective “tree-ish” might as well just mean “commit or reference to commit”.
“index”, “staged”, “cached”
All of these refer to the exact same thing (the file .git/index, which is where your changes are staged when you run git add):
git diff --cachedgit rm --cachedgit diff --staged- the file
.git/index
Even though they all ultimately refer to the same file, there’s some variation in how those terms are used in practice:
- Apparently the flags
--indexand--cacheddo not generally mean the same thing. I have personally never used the--indexflag so I’m not going to get into it, but this blog post by Junio Hamano (git’s lead maintainer) explains all the gnarly details - the “index” lists untracked files (I guess for performance reasons) but you don’t usually think of the “staging area” as including untracked files”
“reset”, “revert”, “restore”
A bunch of people mentioned that “reset”, “revert” and “restore” are very similar words and it’s hard to differentiate them.
I think it’s made worse because
git reset --hardandgit restore .on their own do basically the same thing. (thoughgit reset --hard COMMITandgit restore --source COMMIT .are completely different from each other)- the respective man pages don’t give very helpful descriptions:
git reset: “Reset current HEAD to the specified state”git revert: “Revert some existing commits”git restore: “Restore working tree files”
Those short descriptions do give you a better sense for which noun is being affected (“current HEAD”, “some commits”, “working tree files”) but they assume you know what “reset”, “revert” and “restore” mean in this context.
Here are some short descriptions of what they each do:
git revert COMMIT: Create a new commit that’s the “opposite” of COMMIT on your current branch (if COMMIT added 3 lines, the new commit will delete those 3 lines)git reset --hard COMMIT: Force your current branch back to the state it was atCOMMIT, erasing any new changes sinceCOMMIT. Very dangerous operation.git restore --source=COMMIT PATH: Take all the files inPATHback to how they were atCOMMIT, without changing any other files or commit history.
“untracked files”, “remote-tracking branch”, “track remote branch”
Git uses the word “track” in 3 different related ways:
Untracked files:in the output ofgit status. This means those files aren’t managed by Git and won’t be included in commits.- a “remote tracking branch” like
origin/main. This is a local reference, and it’s the commit ID thatmainpointed to on the remoteoriginthe last time you rangit pullorgit fetch. - “branch foo set up to track remote branch bar from origin”
The “untracked files” and “remote tracking branch” thing is not too bad – they both use “track”, but the context is very different. No big deal. But I think the other two uses of “track” are actually quite confusing:
mainis a branch that tracks a remoteorigin/mainis a remote-tracking branch
But a “branch that tracks a remote” and a “remote-tracking branch” are different things in Git and the distinction is pretty important! Here’s a quick summary of the differences:
mainis a branch. You can make commits to it, merge into it, etc. It’s often configured to “track” the remotemainin.git/config, which means that you can usegit pullandgit pushto push/pull changes.origin/mainis not a branch. It’s a “remote-tracking branch”, which is not a kind of branch (I’m sorry). You can’t make commits to it. The only way you can update it is by runninggit pullorgit fetchto get the latest state ofmainfrom the remote.
I’d never really thought about this ambiguity before but I think it’s pretty easy to see why folks are confused by it.
checkout
Checkout does two totally unrelated things:
git checkout BRANCHswitches branchesgit checkout file.txtdiscards your unstaged changes tofile.txt
This is well known to be confusing and git has actually split those two
functions into git switch and git restore (though you can still use
checkout if, like me, you have 15 years of muscle memory around git checkout
that you don’t feel like unlearning)
Also personally after 15 years I still can’t remember the order of the
arguments to git checkout main file.txt for restoring the version of
file.txt from the main branch.
I think sometimes you need to pass -- to checkout as an argument somewhere
to help it figure out which argument is a branch and which ones are paths but I
never do that and I’m not sure when it’s needed.
reflog
Lots of people mentioning reading reflog as re-flog and not ref-log. I
won’t get deep into the reflog here because this post is REALLY long but:
- “reference” is an umbrella term git uses for branches, tags, and HEAD
- the reference log (“reflog”) gives you the history of everything a reference has ever pointed to
- It can help get you out of some VERY bad git situations, like if you accidentally delete an important branch
- I find it one of the most confusing parts of git’s UI and I try to avoid needing to use it.
merge vs rebase vs cherry-pick
A bunch of people mentioned being confused about the difference between merge and rebase and not understanding what the “base” in rebase was supposed to be.
I’ll try to summarize them very briefly here, but I don’t think these 1-line explanations are that useful because people structure their workflows around merge / rebase in pretty different ways and to really understand merge/rebase you need to understand the workflows. Also pictures really help. That could really be its whole own blog post though so I’m not going to get into it.
- merge creates a single new commit that merges the 2 branches
- rebase copies commits on the current branch to the target branch, one at a time.
- cherry-pick is similar to rebase, but with a totally different syntax (one big difference is that rebase copies commits FROM the current branch, cherry-pick copies commits TO the current branch)
rebase --onto
git rebase has an flag called onto. This has always seemed confusing to me
because the whole point of git rebase main is to rebase the current branch
onto main. So what’s the extra onto argument about?
I looked it up, and --onto definitely solves a problem that I’ve rarely/never
actually had, but I guess I’ll write down my understanding of it anyway.
A - B - C (main)
\
D - E - F - G (mybranch)
|
otherbranch
Imagine that for some reason I just want to move commits F and G to be
rebased on top of main. I think there’s probably some git workflow where this
comes up a lot.
Apparently you can run git rebase --onto main otherbranch mybranch to do
that. It seems impossible to me to remember the syntax for this (there are 3
different branch names involved, which for me is too many), but I heard about it from a
bunch of people so I guess it must be useful.
commit
Someone mentioned that they found it confusing that commit is used both as a verb and a noun in git.
for example:
- verb: “Remember to commit often”
- noun: “the most recent commit on
main“
My guess is that most folks get used to this relatively quickly, but this use of “commit” is different from how it’s used in SQL databases, where I think “commit” is just a verb (you “COMMIT” to end a transaction) and not a noun.
Also in git you can think of a Git commit in 3 different ways:
- a snapshot of the current state of every file
- a diff from the parent commit
- a history of every previous commit
None of those are wrong: different commands use commits in all of these ways.
For example git show treats a commit as a diff, git log treats it as a
history, and git restore treats it as a snapshot.
But git’s terminology doesn’t do much to help you understand in which sense a commit is being used by a given command.
more confusing terms
Here are a bunch more confusing terms. I don’t know what a lot of these mean.
things I don’t really understand myself:
- “the git pickaxe” (maybe this is
git log -Sandgit log -G, for searching the diffs of previous commits?) - submodules (all I know is that they don’t work the way I want them to work)
- “cone mode” in git sparse checkout (no idea what this is but someone mentioned it)
things that people mentioned finding confusing but that I left out of this post because it was already 3000 words:
- blob, tree
- the direction of “merge”
- “origin”, “upstream”, “downstream”
- that
pushandpullaren’t opposites - the relationship between
fetchandpull(pull = fetch + merge) - git porcelain
- subtrees
- worktrees
- the stash
- “master” or “main” (it sounds like it has a special meaning inside git but it doesn’t)
- when you need to use
origin main(likegit push origin main) vsorigin/main
github terms people mentioned being confused by:
- “pull request” (vs “merge request” in gitlab which folks seemed to think was clearer)
- what “squash and merge” and “rebase and merge” do (I’d never actually heard of
git merge --squashuntil yesterday, I thought “squash and merge” was a special github feature)
it’s genuinely “every git term”
I was surprised that basically every other core feature of git was mentioned by at least one person as being confusing in some way. I’d be interested in hearing more examples of confusing git terms that I missed too.
There’s another great post about this from 2012 called the most confusing git terminology. It talks more about how git’s terminology relates to CVS and Subversion’s terminology.
If I had to pick the 3 most confusing git terms, I think right now I’d pick:
- a
headis a branch,HEADis the current branch - “remote tracking branch” and “branch that tracks a remote” being different things
- how “index”, “staged”, “cached” all refer to the same thing
that’s all!
I learned a lot from writing this – I learned a few new facts about git, but more importantly I feel like I have a slightly better sense now for what someone might mean when they say that everything in git is confusing.
I really hadn’t thought about a lot of these issues before – like I’d never realized how “tracking” is used in such a weird way when discussing branches.
Also as usual I might have made some mistakes, especially since I ended up in a bunch of corners of git that I hadn’t visited before.
Also a very quick plug: I’m working on writing a zine about git, if you’re interested in getting an email when it comes out you can sign up to my very infrequent announcements mailing list.
translations of this post
Some miscellaneous git facts
I’ve been very slowly working on writing about how Git works. I thought I already knew Git pretty well, but as usual when I try to explain something I’ve been learning some new things.
None of these things feel super surprising in retrospect, but I hadn’t thought about them clearly before.
The facts are:
- the “index”, “staging area” and “–cached” are all the same thing
- the stash is a bunch of commits
- not all references are branches or tags
- merge commits aren’t empty
Let’s talk about them!
the “index”, “staging area” and “–cached” are all the same thing
When you run git add file.txt, and then git status, you’ll see something like this:
$ git add content/post/2023-10-20-some-miscellaneous-git-facts.markdown
$ git status
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
new file: content/post/2023-10-20-some-miscellaneous-git-facts.markdown
People usually call this “staging a file” or “adding a file to the staging area”.
When you stage a file with git add, behind the scenes git adds the file to its object
database (in .git/objects) and updates a file called .git/index to refer to
the newly added file.
This “staging area” actually gets referred to by 3 different names in Git. All
of these refer to the exact same thing (the file .git/index):
git diff --cachedgit diff --staged- the file
.git/index
I felt like I should have realized this earlier, but I didn’t, so there it is.
the stash is a bunch of commits
When I run git stash to stash my changes, I’ve always been a bit confused
about where those changes actually went. It turns out that when you run git
stash, git makes some commits with your changes and labels them with a reference
called stash (in .git/refs/stash).
Let’s stash this blog post and look at the log of the stash reference:
$ git log stash --oneline
6cb983fe (refs/stash) WIP on main: c6ee55ed wip
2ff2c273 index on main: c6ee55ed wip
... some more stuff
Now we can look at the commit 2ff2c273 to see what it contains:
$ git show 2ff2c273 --stat
commit 2ff2c273357c94a0087104f776a8dd28ee467769
Author: Julia Evans <julia@jvns.ca>
Date: Fri Oct 20 14:49:20 2023 -0400
index on main: c6ee55ed wip
content/post/2023-10-20-some-miscellaneous-git-facts.markdown | 40 ++++++++++++++++++++++++++++++++++++++++
Unsurprisingly, it contains this blog post. Makes sense!
git stash actually creates 2 separate commits: one for the index, and one for
your changes that you haven’t staged yet. I found this kind of heartening
because I’ve been working on a tool to snapshot and restore the state of a git
repository (that I may or may not ever release) and I came up with a very
similar design, so that made me feel better about my choices.
Apparently older commits in the stash are stored in the reflog.
not all references are branches or tags
Git’s documentation often refers to “references” in a generic way that I find
a little confusing sometimes. Personally 99% of the time when I deal with
a “reference” in Git it’s a branch or HEAD and the other 1% of the time it’s a tag. I
actually didn’t know ANY examples of references that weren’t branches or tags or HEAD.
But now I know one example – the stash is a reference, and it’s not a branch or tag! So that’s cool.
Here are all the references in my blog’s git repository (other than HEAD):
$ find .git/refs -type f
.git/refs/heads/main
.git/refs/remotes/origin/HEAD
.git/refs/remotes/origin/main
.git/refs/stash
Some other references people mentioned in reponses to this post:
refs/notes/*, fromgit notesrefs/pull/123/head, and `refs/pull/123/headfor GitHub pull requests (which you can get withgit fetch origin refs/pull/123/merge)refs/bisect/*, fromgit bisect
merge commits aren’t empty
Here’s a toy git repo where I created two branches x and y, each with 1
file (x.txt and y.txt) and merged them. Let’s look at the merge commit.
$ git log --oneline
96a8afb (HEAD -> y) Merge branch 'x' into y
0931e45 y
1d8bd2d (x) x
If I run git show 96a8afb, the commit looks “empty”: there’s no diff!
git show 96a8afb
commit 96a8afbf776c2cebccf8ec0dba7c6c765ea5d987 (HEAD -> y)
Merge: 0931e45 1d8bd2d
Author: Julia Evans <julia@jvns.ca>
Date: Fri Oct 20 14:07:00 2023 -0400
Merge branch 'x' into y
But if I diff the merge commit against each of its two parent commits separately, you can see that of course there is a diff:
$ git diff 0931e45 96a8afb --stat
x.txt | 1 +
1 file changed, 1 insertion(+)
$ git diff 1d8bd2d 96a8afb --stat
y.txt | 1 +
1 file changed, 1 insertion(+)
It seems kind of obvious in retrospect that merge commits aren’t actually “empty” (they’re snapshots of the current state of the repo, just like any other commit), but I’d never thought about why they appear to be empty.
Apparently the reason that these merge diffs are empty is that merge diffs only show conflicts – if I instead create a repo
with a merge conflict (one branch added x and another branch added y to the
same file), and show the merge commit where I resolved the conflict, it looks
like this:
$ git show HEAD
commit 3bfe8311afa4da867426c0bf6343420217486594
Merge: 782b3d5 ac7046d
Author: Julia Evans <julia@jvns.ca>
Date: Fri Oct 20 15:29:06 2023 -0400
Merge branch 'x' into y
diff --cc file.txt
index 975fbec,587be6b..b680253
--- a/file.txt
+++ b/file.txt
@@@ -1,1 -1,1 +1,1 @@@
- y
-x
++z
It looks like this is trying to tell me that one branch added x, another
branch added y, and the merge commit resolved it by putting z instead. But
in the earlier example, there was no conflict, so Git didn’t display a diff at all.
(thanks to Jordi for telling me how merge diffs work)
that’s all!
I’ll keep this post short, maybe I’ll write another blog post with more git facts as I learn them.