Reading List
The most recent articles from a list of feeds I subscribe to.
Singularity short story fika, co-writing, and Upplandsmuseet
This was my first “proper” weekend in the new place. Yesterday morning I met with a friend from my writing group at Himlen Är Blå Som En Apelsin - a really cozy cafe that is a two minute walk from my apartment.
Singularity short story fika, co-writing, and Upplands Museet
Turn off AI features by default (to reduce their climate impact)
Generative AI features have a large climate impact and water consumption. We can weigh that impact against those features' benefits, but what if they are left unused? If lots of people don't in fact use the thing? That seems like lots of avoidable waste. Which matters, we're in a climate emergency and we're dangerously far from that 1.5 degrees target.
I know, we all want people to use features we build, but it is safe to assume they often don't. For my business, I use a lot of very beautiful self service portals that I only ever log in to, to download a PDF that my accountant needs. The beautifully considered UI, fancy spinners and clever copywriting are there, but if I'm honest, I mostly ignore them (sorry).
Is that ok? A button in your app that your user doesn't press, wastes little energy. But if your app automatically generates summaries, captions or suggestions, and the user didn't want or use that functionality, a lot of energy and water was wasted. While serving no purpose. It's that combo of waste and purposelessness that we should avoid at all times.
Wait, that's absurd, you say. Does this really happen? Yeah, I come across it all the time, and it's not just because I'm somewhat of a luddite myself.
Features I didn't use
Some examples of AI features that ran on my behalf just in the past week, but that I didn't use:
- Loom's transcripts and automated titles and descriptions. They show up almost instantly after upload. I always remove them, they fail to get the point across, which I want to do pointedly to save colleagues reviewing a video time.
- Parabol's automated summary of team retrospectives: it emailed us key points, some incorrect. While we had written them down correctly already.
- Notion's AI assistance that shows up whenever you press ‘Space’. Ok, granted, it only runs once you've actually typed a prompt, but it's a good example for this post, as it's one of those I hear many people want to turn off, and you can only do that “on Enterprise“, according to this Reddit topic dedicated to turning that feature off.
Of course, these features are not redundant if users benefit from them. But let's be real, oftentimes users didn't want to generate anything. It happened anyway, and was unsolicited. They will probably discard of the output. In those cases, the energy-intensive feature was redundant. And that's an issue, as we don't have redundant energy.
Meanwhile, most major tech companies announced they are letting go of their net-zero goals. Some have bought nuclear power plants to cater for their energy needs (see Microsoft's plans with the Three Mile Island plant). This confirms to me that we don't have abundant energy. Maybe one day, but not today.
An ethical web
Is this ethical? The W3C's Ethical Web Principles have a specific principle that applies here: 2.9 The web is an environmentally sustainable platform.
It suggests new technologies should not harm the environment:
We will endeavor not to do further harm to the environment when we introduce new technologies to the web (…)
and recognises people who benefit are not always those who are harmed:
and keep in mind that people most affected by the environmental consequences of new technologies may not be those who benefit from the features introduced.
If a feature is useful for some, but indirectly causes the energy or water bills to go up for others, we're breaking this Ethical Web Principle.
Conclusion
So, I'm just a guy sitting behind a keyboard, begging anyone including generative AI features: only put them to work when users indicate they want that to happen. That's also going to turn out cheaper when OpenAI increase their rates, which is likely as investors are going to want returns. Why not consider leaving out that new LLM-powered feature in the first place: not everything needs to be “artificially intelligent”, sometimes a bunch of if statements make a killer feature (dude, that's so paternalistic and you're oversimplifying the realities of software engineering, you say… yeah, sorry, I'm trying to react to the overcomplexification that also happens).
Do you have other examples of software that forced LLM generated content on you? Let me know and I'll add them to the post.
Further reading
- Thinking about using AI? Here’s what you can and (probably) can’t change about its environmental impact by the Green Web Foundation
- AI’s Growing Carbon Footprint (cites data centres account for 2.5-3.7% of global greenhouse gas emissions, exceeding aviation)
- We’re getting a better idea of AI’s true carbon footprint
- Is generative AI bad for the environment? A computer scientist explains the carbon footprint of ChatGPT and its cousins
Originally posted as Turn off AI features by default (to reduce their climate impact) on Hidde's blog.
Why pipes sometimes get "stuck": buffering
Here’s a niche terminal problem that has bothered me for years but that I never really understood until a few weeks ago. Let’s say you’re running this command to watch for some specific output in a log file:
tail -f /some/log/file | grep thing1 | grep thing2
If log lines are being added to the file relatively slowly, the result I’d see is… nothing! It doesn’t matter if there were matches in the log file or not, there just wouldn’t be any output.
I internalized this as “uh, I guess pipes just get stuck sometimes and don’t
show me the output, that’s weird”, and I’d handle it by just
running grep thing1 /some/log/file | grep thing2
instead, which would work.
So as I’ve been doing a terminal deep dive over the last few months I was really excited to finally learn exactly why this happens.
why this happens: buffering
The reason why “pipes get stuck” sometimes is that it’s VERY common for programs to buffer their output before writing it to a pipe or file. So the pipe is working fine, the problem is that the program never even wrote the data to the pipe!
This is for performance reasons: writing all output immediately as soon as you can uses more system calls, so it’s more efficient to save up data until you have 8KB or so of data to write (or until the program exits) and THEN write it to the pipe.
In this example:
tail -f /some/log/file | grep thing1 | grep thing2
the problem is that grep thing1
is saving up all of its matches until it has
8KB of data to write, which might literally never happen.
programs don’t buffer when writing to a terminal
Part of why I found this so disorienting is that tail -f file | grep thing
will work totally fine, but then when you add the second grep
, it stops
working!! The reason for this is that the way grep
handles buffering depends
on whether it’s writing to a terminal or not.
Here’s how grep
(and many other programs) decides to buffer its output:
- Check if stdout is a terminal or not using the
isatty
function- If it’s a terminal, use line buffering (print every line immediately as soon as you have it)
- Otherwise, use “block buffering” – only print data if you have at least 8KB or so of data to print
So if grep
is writing directly to your terminal then you’ll see the line as
soon as it’s printed, but if it’s writing to a pipe, you won’t.
Of course the buffer size isn’t always 8KB for every program, it depends on the implementation. For grep
the buffering is handled by libc, and libc’s buffer size is
defined in the BUFSIZ
variable. Here’s where that’s defined in glibc.
(as an aside: “programs do not use 8KB output buffers when writing to a terminal” isn’t, like, a law of terminal physics, a program COULD use an 8KB buffer when writing output to a terminal if it wanted, it would just be extremely weird if it did that, I can’t think of any program that behaves that way)
commands that buffer & commands that don’t
One annoying thing about this buffering behaviour is that you kind of need to remember which commands buffer their output when writing to a pipe.
Some commands that don’t buffer their output:
- tail
- cat
- tee
I think almost everything else will buffer output, especially if it’s a command where you’re likely to be using it for batch processing. Here’s a list of some common commands that buffer their output when writing to a pipe, along with the flag that disables block buffering.
- grep (
--line-buffered
) - sed (
-u
) - awk (there’s a
fflush()
function) - tcpdump (
-l
) - jq (
-u
) - tr (
-u
) - cut (can’t disable buffering)
Those are all the ones I can think of, lots of unix commands (like sort
) may
or may not buffer their output but it doesn’t matter because sort
can’t do
anything until it finishes receiving input anyway.
Also I did my best to test both the Mac OS and GNU versions of these but there are a lot of variations and I might have made some mistakes.
programming languages where the default “print” statement buffers
Also, here are a few programming language where the default print statement will buffer output when writing to a pipe, and some ways to disable buffering if you want:
- C (disable with
setvbuf
) - Python (disable with
python -u
, orPYTHONUNBUFFERED=1
, orsys.stdout.reconfigure(line_buffering=False)
, orprint(x, flush=True)
) - Ruby (disable with
STDOUT.sync = true
) - Perl (disable with
$| = 1
)
I assume that these languages are designed this way so that the default print function will be fast when you’re doing batch processing.
Also whether output is buffered or not might depend on how you print, for
example in C++ cout << "hello\n"
buffers when writing to a pipe but cout << "hello" << endl
will flush its output.
when you press Ctrl-C
on a pipe, the contents of the buffer are lost
Let’s say you’re running this command as a hacky way to watch for DNS requests
to example.com
, and you forgot to pass -l
to tcpdump:
sudo tcpdump -ni any port 53 | grep example.com
When you press Ctrl-C
, what happens? In a magical perfect world, what I would
want to happen is for tcpdump
to flush its buffer, grep
would search for
example.com
, and I would see all the output I missed.
But in the real world, what happens is that all the programs get killed and the
output in tcpdump
’s buffer is lost.
I think this problem is probably unavoidable – I spent a little time with
strace
to see how this works and grep
receives the SIGINT
before
tcpdump
anyway so even if tcpdump
tried to flush its buffer grep
would
already be dead.
After a little more investigation, there is a workaround: if you find
tcpdump
’s PID and kill -TERM $PID
, then tcpdump will flush the buffer so
you can see the output. That’s kind of a pain but I tested it and it seems to
work.
redirecting to a file also buffers
It’s not just pipes, this will also buffer:
sudo tcpdump -ni any port 53 > output.txt
Redirecting to a file doesn’t have the same “Ctrl-C
will totally destroy the
contents of the buffer” problem though – in my experience it usually behaves
more like you’d want, where the contents of the buffer get written to the file
before the program exits. I’m not 100% sure whether this is something you can
always rely on or not.
a bunch of potential ways to avoid buffering
Okay, let’s talk solutions. Let’s say you’ve run this command or s
tail -f /some/log/file | grep thing1 | grep thing2
I asked people on Mastodon how they would solve this in practice and there were 5 basic approaches. Here they are:
solution 1: run a program that finishes quickly
Historically my solution to this has been to just avoid the “command writing to pipe slowly” situation completely and instead run a program that will finish quickly like this:
cat /some/log/file | grep thing1 | grep thing2 | tail
This doesn’t do the same thing as the original command but it does mean that you get to avoid thinking about these weird buffering issues.
(you could also do grep thing1 /some/log/file
but I often prefer to use an
“unnecessary” cat
)
solution 2: remember the “line buffer” flag to grep
You could remember that grep has a flag to avoid buffering and pass it like this:
tail -f /some/log/file | grep --line-buffered thing1 | grep thing2
solution 3: use awk
Some people said that if they’re specifically dealing with a multiple greps
situation, they’ll rewrite it to use a single awk
instead, like this:
tail -f /some/log/file | awk '/thing1/ && /thing2/'
Or you would write a more complicated grep
, like this:
tail -f /some/log/file | grep -E 'thing1.*thing2'
(awk
also buffers, so for this to work you’ll want awk
to be the last command in the pipeline)
solution 4: use stdbuf
stdbuf
uses LD_PRELOAD to turn off libc’s buffering, and you can use it to turn off output buffering like this:
tail -f /some/log/file | stdbuf -o0 grep thing1 | grep thing2
Like any LD_PRELOAD
solution it’s a bit unreliable – it doesn’t work on
static binaries, I think won’t work if the program isn’t using libc’s
buffering, and doesn’t always work on Mac OS. Harry Marr has a really nice How stdbuf works post.
solution 5: use unbuffer
unbuffer program
will force the program’s output to be a TTY, which means
that it’ll behave the way it normally would on a TTY (less buffering, colour
output, etc). You could use it in this example like this:
tail -f /some/log/file | unbuffer grep thing1 | grep thing2
Unlike stdbuf
it will always work, though it might have unwanted side
effects, for example grep thing1
’s will also colour matches.
If you want to install unbuffer, it’s in the expect
package.
that’s all the solutions I know about!
It’s a bit hard for me to say which one is “best”, I think personally I’m
mostly likely to use unbuffer
because I know it’s always going to work.
If I learn about more solutions I’ll try to add them to this post.
I’m not really sure how often this comes up
I think it’s not very common for me to have a program that slowly trickles data into a pipe like this, normally if I’m using a pipe a bunch of data gets written very quickly, processed by everything in the pipeline, and then everything exits. The only examples I can come up with right now are:
- tcpdump
tail -f
- watching log files in a different way like with
kubectl logs
- the output of a slow computation
what if there were an environment variable to disable buffering?
I think it would be cool if there were a standard environment variable to turn
off buffering, like PYTHONUNBUFFERED
in Python. I got this idea from a
couple of blog posts by Mark Dominus
in 2018. Maybe NO_BUFFER
like NO_COLOR?
The design seems tricky to get right; Mark points out that NETBSD has environment variables called STDBUF
, STDBUF1
, etc which gives you a
ton of control over buffering but I imagine most developers don’t want to
implement many different environment variables to handle a relatively minor
edge case.
I’m also curious about whether there are any programs that just automatically flush their output buffers after some period of time (like 1 second). It feels like it would be nice in theory but I can’t think of any program that does that so I imagine there are some downsides.
stuff I left out
Some things I didn’t talk about in this post since these posts have been getting pretty long recently and seriously does anyone REALLY want to read 3000 words about buffering?
- the difference between line buffering and having totally unbuffered output
- how buffering to stderr is different from buffering to stdout
- this post is only about buffering that happens inside the program, your operating system’s TTY driver also does a little bit of buffering sometimes
- other reasons you might need to flush your output other than “you’re writing to a pipe”
Trains are offices
Taking the train for work travel can cost more time than going by car or plane. But it's one of the most energy efficient ways to travel, and I get this weird productivity boost from them.
Note that you can absolutely also chill out, read, sleep or listen to music on trains. I like that too, sometimes. But this post is about when I use train time to work. In Europe.
Enjoy the benefits
As mentioned in the intro, a major reason for my traveling by train is the low environmental impact (relatively, and apart from not traveling). I travel a lot for work, so my impact is relatively high, especially when I don't manage to avoid planes. Despite doubts about effectiveness of compensation, I do compensate in various ways, but it still doesn't feel great, avoiding is ideal.
Trains also offer a productivity boost. If you're forced to be in a seat, on a connection too unstable to take work calls, this presents an opportunity. An opportunity to get things done (no I am not for hire as a productivity coach).
gitI don't know what it is about trains, but I really find the time flies when I have something specific to write, code, or design. Trains have proven themselves as great offices to me. I don't know if I can convince my bookkeeper that train tickets are in fact office rental costs, but do you see how they can feel like that?
There are some more benefits to trains:
- good views. Depending on your route and time of travel, there's always plenty to see outside. Sunrise and sunset are particularly nice.
- arrival in central locations. European train stations are usually close to where the fun happens.
- less checks. Within Schengen, international trains usually don't have border checks or luggage checks, so there's a lot less hassle and queuing, you can show up and go.
Embrace the caveats
Things can go wrong, too. Like with any kind of travel.
Delays
If your international train has stopovers, delays can be a headache. Especially between train companies. Plan for stopover time and consider flexible tickets for onward journeys. And stay calm: it happens. Also, in Europe, you have lots of rights re train delays and will have travelled partially for free when delays happen.
Data
Large parts of Europe don't have stable mobile data, especially outside cities. Or, depending on your plan, you don't get a great connection roaming. Have some of your work available offline so that poor connection doesn't interrupt you. Know which web apps to trust (the progressively enhanced parts of GitHub are the best).
Power
Not all power outlets will work, so I follow the ‘ABC’ rule: Always be charging when you have the opportunity, so that batteries are full and ready to go when you don't.
In conclusion
Trains are nice, and their benefits outweigh the caveats, especially if you anticipate those in advance. I'm curious if people have found other benefits or caveats regarding taking the train for work. What's exciting you or holding you back? Feel feel to reply by email or via Mastodon or Bluesky.
Originally posted as Trains are offices on Hidde's blog.