Reading List
The most recent articles from a list of feeds I subscribe to.
Five things about public speaking
CVE-2023-36325: Attackers can de-anonymize i2p hidden services with a message replay attack
tl;dr: If you host eepsites with Java i2p and are running older than i2p 2.3.0, update it as soon as possible. More details below.

A sufficiently determined attacker may be able to de-anonymize the public IPv4 and IPv6 addresses of i2p hidden services (eepsites) by using a combination of brute-forcing the entire i2p router set with a replayed message. This is CVE-2023-36325.
This issue was originally discovered by a user with the identifier
hbapm6le75xwc342hnkltwfnnmt4ccafr5wyf7b6jhw6jxn3fwqa.b32.i2p, which
I will refer to as "hbapm6". While hbapm6 was working on a custom
version of i2p, they found that replaying messages sent down client
tunnels to target i2p routers could cause the i2p software to drop the
packet instead of sending a "wrong destination" response. This can
lead to de-anonymization of a given eepsite by being able to correlate
the public IPv4 or IPv6 address of the contacted router with packets
being dropped.
This is fixed in i2p 2.3.0 by adding a unique identifier to every message ID and separating out bloom filters and other datastores so that such correlation attacks are harder to pull off in the future. These changes are protocol-compatible and all users are encouraged to apply them as soon as possible.
There is insufficent data as to what versions of i2p are vulnerable, but we are certain that 2.2.1 is vulnerable. It is likely that older versions of i2p are also vulnerable. Assume so.
This attack takes days to complete and requires a fairly detailed amount of knowledge of the i2p protocol in order to successfully de-anonymize target eepsites.
Users of i2pd are not affected.
With this understood, here is the CVSS score breakdown for this attack:
| Overall CVSS Score | 3.4 |
|---|---|
| CVSS Base Score | 5.3 |
| Impact Subscore | 1.4 |
| Exploitability Subscore | 3.9 |
| CVSS Temporal Score | 4.8 |
| CVSS Environmental Score | 3.4 |
| Modified Impact Subscore | 1.4 |
AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:N/A:N/E:P/RL:O/RC:C/CR:M/IR:X/AR:X/MAV:N/MAC:H/MPR:N/MUI:N/MS:U/MC:L/MI:N/MA:N
Affected users should update to i2p 2.3.0 as soon as it is available.
The vulnerability has been mitigated by a refactor of the relevant codepaths involved with message parsing. Additionally, the network information database was sharded off with the hope of preventing future attacks.
On a side note, I have been very impressed with the i2p projects handling of the circumstances surrounding hbapm6 of the issues tracked as CVE-2023-36325. For an unknown reason, hbapm6 decided that the best way to get attention for these issues was to impersonate me. I was contacted by the i2p project due to hbapm6 acting very strange (IE: claiming to have a vuln and refusing to show proof of it or how they triggered it, if you have a de-anonymization attack for such a network, just share your code and demonstrate it when asked, it will save so much time for everyone involved), and after a month or two of cajoling, hbapm6 eventually managed to de-anonymize a throwaway VPS that was acting as an i2p router. This confirmed the vuln and lead to me filing this CVE.
I guess this is part of my threat profile now. Fun.
At the very least I got to have a conversation that was like (names have been changed to protect the innocent):
(hbapm6); Why all the snooping? [...] What is this, a game of Among Us?
(Me) <link to my website to an ascii art of an amogus with proof that I am the actual Xe Iaso>
I still have no idea why that person impersonated me. If you're out there and reading this and I wronged you somehow, I'm sorry and would like to know what I fucked up so I can change for the better.
There's some other vulnerabilities that are related to this, but none of them have viable attacks. Most of the changes being done are just various hardening to the pokey edges of the network database and other things. I expect that these are fairly minor issues and when the patch comes out you should probably update.
Why is DNS still hard to learn?
I write a lot about technologies that I found hard to learn about. A while back my friend Sumana asked me an interesting question – why are these things so hard to learn about? Why do they seem so mysterious?
For example, take DNS. We’ve been using DNS since the 80s (for more than 35 years!). It’s used in every website on the internet. And it’s pretty stable – in a lot of ways, it works the exact same way it did 30 years ago.
But it took me YEARS to figure out how to confidently debug DNS issues, and I’ve seen a lot of other programmers struggle with debugging DNS problems as well. So what’s going on?
Here are a couple of thoughts about why learning to troubleshoot DNS problems is hard.
(I’m not going to explain DNS very much in this post, see Implement DNS in a Weekend or my DNS blog posts for more about how DNS works)
it’s not because DNS is super hard
When I finally learned how to troubleshoot DNS problems, my reaction was “what, that was it???? that’s not that hard!“. I felt a little bit cheated! I could explain to you everything that I found confusing about DNS in a few hours.
So – if DNS is not all that complicated, why did it take me so many years to
figure out how to troubleshoot pretty basic DNS issues (like “my domain doesn’t
resolve even though I’ve set it up correctly” or “dig and my browser have
different DNS results, why?“)?
And I wasn’t alone in finding DNS hard to learn! I’ve talked to a lot of smart friends who are very experienced programmers about DNS of the years, and many of them either:
- didn’t feel comfortable making simple DNS changes to their websites
- or were confused about basic facts about how DNS works (like that records are pulled and not pushed)
- or did understand DNS basics pretty well, but had the some of the same
knowledge gaps that I’d struggled with (negative caching and the details of
how
digand your browser do DNS queries differently)
So if we’re all struggling with the same things about DNS, what’s going on? Why is it so hard to learn for so many people?
Here are some ideas.
a lot of the system is hidden
When you make a DNS request on your computer, the basic story is:
- your computer makes a request to a server called resolver
- the resolver checks its cache, and makes requests to some other servers called authoritative nameservers
Here are some things you don’t see:
- the resolver’s cache. What’s in there?
- which library code on your computer is making the DNS request (is it libc
getaddrinfo? if so, is it the getaddrinfo from glibc, or musl, or apple? is it your browser’s DNS code? is it a different custom DNS implementation?). All of these options behave slightly differently and have different configuration, approaches to caching, available features, etc. For example musl DNS didn’t support TCP until early 2023. - the conversation between the resolver and the authoritative nameservers. I
think a lot of DNS issues would be SO simple to understand if you could
magically get a trace of exactly which authoritative nameservers were
queried downstream during your request, and what they said. (like, what if
you could run
dig +debug google.comand it gave you a bunch of extra debugging information?)
dealing with hidden systems
A couple of ideas for how to deal with hidden systems
- just teaching people what the hidden systems are makes a huge difference. For a long time I had no idea that my computer had many different DNS libraries that were used in different situations and I was confused about this for literally years. This is a big part of my approach.
- with Mess With DNS we tried out this “fishbowl” approach where it shows you some parts of the system (the conversation with the resolver and the authoritative nameserver) that are normally hidden
- I feel like it would be extremely cool to extend DNS to include a “debugging information” section. (edit: it looks like this already exists! It’s called Extended DNS Errors, or EDE, and tools are slowly adding support for it.
Extended DNS Errors seem cool
Extended DNS Errors are a new way for DNS servers to provide extra debugging information in DNS response. Here’s an example of what that looks like:
$ dig @8.8.8.8 xjwudh.com
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 39830
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
; EDE: 12 (NSEC Missing): (Invalid denial of existence of xjwudh.com/a)
;; QUESTION SECTION:
;xjwudh.com. IN A
;; AUTHORITY SECTION:
com. 900 IN SOA a.gtld-servers.net. nstld.verisign-grs.com. 1690634120 1800 900 604800 86400
;; Query time: 92 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Sat Jul 29 08:35:45 EDT 2023
;; MSG SIZE rcvd: 161
Here I’ve requested a nonexistent domain, and I got the extended error EDE:
12 (NSEC Missing): (Invalid denial of existence of xjwudh.com/a). I’m not
sure what that means (it’s some DNSSEC Thing), but it’s cool to see an extra
debug message like that.
I did have to install a newer version of dig to get the above to work.
confusing tools
Even though a lot of DNS stuff is hidden, there are a lot of ways to figure out
what’s going on by using dig.
For example, you can use dig +norecurse to figure out if a given DNS resolver
has a particular record in its cache. 8.8.8.8 seems to return a SERVFAIL
response if the response isn’t cached.
here’s what that looks like for google.com
$ dig +norecurse @8.8.8.8 google.com
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11653
;; flags: qr ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;google.com. IN A
;; ANSWER SECTION:
google.com. 21 IN A 172.217.4.206
;; Query time: 57 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Fri Jul 28 10:50:45 EDT 2023
;; MSG SIZE rcvd: 55
and for homestarrunner.com:
$ dig +norecurse @8.8.8.8 homestarrunner.com
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 55777
;; flags: qr ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;homestarrunner.com. IN A
;; Query time: 52 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Fri Jul 28 10:51:01 EDT 2023
;; MSG SIZE rcvd: 47
Here you can see we got a normal NOERROR response for google.com (which is
in 8.8.8.8’s cache) but a SERVFAIL for homestarrunner.com (which isn’t).
This doesn’t mean there’s no DNS record homestarrunner.com (there is!), it’s
just not cached).
But this output is really confusing to read if you’re not used to it! Here are a few things that I think are weird about it:
- the headings are weird (there’s
->>HEADER<<-,flags:,OPT PSEUDOSECTION:,QUESTION SECTION:,ANSWER SECTION:) - the spacing is weird (why is the no newline between
OPT PSEUDOSECTIONandQUESTION SECTION?) MSG SIZE rcvd: 47is weird (are there other fields inMSG SIZEother thanrcvd? what are they?)- it says that there’s 1 record in the ADDITIONAL section but doesn’t show it, you have to somehow magically know that the “OPT PSEUDOSECTION” record is actually in the additional section
In general dig’s output has the feeling of a script someone wrote in an adhoc
way that grew organically over time and not something that was intentionally
designed.
dealing with confusing tools
some ideas for improving on confusing tools:
- explain the output. For example I wrote how to use dig explaining how
dig’s output works and how to configure it to give you a shorter output by default - make new, more friendly tools. For example for DNS there’s
dog and doggo and my dns lookup tool. I think these are really cool but
personally I don’t use them because sometimes I want to do something a little
more advanced (like using
+norecurse) and as far as I can tell neitherdognordoggosupport+norecurse. I’d rather use 1 tool for everything, so I stick todig. Replacing the breadth of functionality ofdigis a huge undertaking. - make dig’s output a little more friendly. If I were better at C programming,
I might try to write a
digpull request that adds a+humanflag to dig that formats the long form output in a more structured and readable way, maybe something like this:
$ dig +human +norecurse @8.8.8.8 google.com
HEADER:
opcode: QUERY
status: NOERROR
id: 11653
flags: qr ra
records: QUESTION: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
QUESTION SECTION:
google.com. IN A
ANSWER SECTION:
google.com. 21 IN A 172.217.4.206
ADDITIONAL SECTION:
EDNS: version: 0, flags:; udp: 512
EXTRA INFO:
Time: Fri Jul 28 10:51:01 EDT 2023
Elapsed: 52 msec
Server: 8.8.8.8:53
Protocol: UDP
Response size: 47 bytes
This makes the structure of the DNS response more clear – there’s the header, the question, the answer, and the additional section.
And it’s not “dumbed down” or anything! It’s the exact same information, just formatted in a more structured way. My biggest frustration with alternative DNS tools that they often remove information in the name of clarity. And though there’s definitely a place for those tools, I want to see all the information! I just want it to be presented clearly.
We’ve learned a lot about how to design more user friendly command line tools in the last 40 years and I think it would be cool to apply some of that knowledge to some of our older crustier tools.
dig +yaml
One quick note on dig: newer versions of dig do have a +yaml output format
which feels a little clearer to me, though it’s too verbose for my taste (a
pretty simple DNS response doesn’t fit on my screen)
weird gotchas
DNS has some weird stuff that’s relatively common to run into, but pretty hard to learn about if nobody tells you what’s going on. A few examples (there are more in some ways DNS can break:
- negative caching! (which I talk about in this talk) It took me probably 5 years to realize that I shouldn’t visit a domain that doesn’t have a DNS record yet, because then the nonexistence of that record will be cached, and it gets cached for HOURS, and it’s really annoying.
- differences in
getaddrinfoimplementations: until early 2023,musldidn’t support TCP DNS - resolvers that ignore TTLs: if you set a TTL on your DNS records (like “5 minutes”), some resolvers will ignore those TTLs completely and cache the records for longer, like maybe 24 hours instead
- if you configure nginx wrong (like this), it’ll cache DNS records forever.
- how ndots can make your Kubernetes DNS slow
dealing with weird gotchas
I don’t have as good answers here as I would like to, but knowledge about weird gotchas is extremely hard won (again, it took me years to figure out negative caching!) and it feels very silly to me that people have to rediscover them for themselves over and over and over again.
A few ideas:
- It’s incredibly helpful when people call out gotchas when explaining a topic. For example (leaving DNS for a moment), Josh Comeau’s Flexbox intro explains this minimum size gotcha which I ran into SO MANY times for several years before finally finding an explanation of what was going on.
- I’d love to see more community collections of common gotchas. For bash, shellcheck is an incredible collection of bash gotchas.
One tricky thing about documenting DNS gotchas is that different people are going to run into different gotchas – if you’re just configuring DNS for your personal domain once every 3 years, you’re probably going to run into different gotchas than someone who administrates DNS for a domain with heavy traffic.
A couple of more quick reasons:
infrequent exposure
A lot of people only deal with DNS extremely infrequently. And of course if you only touch DNS every 3 years it’s going to be harder to learn!
I think cheat sheets (like “here are the steps to changing your nameservers”) can really help with this.
it’s hard to experiment with
DNS can be scary to experiment with – you don’t want to mess up your domain. We built Mess With DNS to make this one a little easier.
that’s all for now
I’d love to hear other thoughts about what makes DNS (or your favourite mysterious technology) hard to learn.
“AI” content and user centered design
Large language models (LLMs), like ChatGPT and Bard, can be used to generate sentences based on statistical likeliness. While the results of these tools can look very impressive (they're designed to), I can't think of cases where the use of LLM-generated content actually improves an end user's experience. Even if not all of the time, LLM output is often nonsensical, false, unclear and boring. Hence, when organisations force LLM-output on users instead of paying people to create their content, they don't center users.
User centered design means we make the user our main concern when we design. When I recently told a friend about this concept, explaining my new job is at a government department focused on centering users, they laughed in surprise. “This is a thing?”, they asked. “What else would you make the main concern when you design?” It made little sense to them that users had to be specifically centered.
If you work in tech, you probably saw projects center other things than users. Business needs, the profit margin, search engines, that one designer's personal preference, the desire to look as cool as a tech brand you love… and so on. Sadly, projects center them instead of users all the time. Most arguments I heard for using LLMs in the content production process quoted at least one of these non-user-centric reasons.
Organisations are starting to use or at least experiment with LLMs to create content for web projects. The hype is real and I worry that, by increasing nonsense, falsehoods and boredom, LLM-generated content is going to worsen user experiences across the board. Why force this content on users? And what about the impact of LLM-generated content beyond individual websites and user experiences: it's also going to pollute the web as a whole and make search worse (as well as itself).
None of this is new, we've had robot-like interactions way before LLMs. When the tax office sends a letter that means you need to pay or receive money, that information is often buried in civil servant speak. When Silicon Valley startup founders announce they were bought, they will mention their “incredible journey”. When lawyers describe employment, customer service phone lines pronounce “your call is important to us” (a great read, BTW)… this is all to say that, even without LLMs, we're used to people that sound more robotic and less human. They speak a lingo.
Lingo gets in the way of clarity. Not just because it feels impersonal and boring, it is also made-up, however brilliantly our prompts will be ‘engineered’. Yes, even if it's sourced—or stolen, in many cases—from original content. That makes it like the lingo humans produce, but much worse. Sure, LLM-generated content could give users clarity, except in a way that's only helpful if the user already knows a lot about the thing that is clarified (so that they can spot falsehoods). This is the crux and why the practical applicability of LLMs isn't nearly as wide as their makers claim.
I can see how a doctor's practice / government department / bank / school could save money and time by putting a chatbot between themselves and the people. There are benefits to one-click-content-creation for organisations. But I don't see how end users could benefit, at all. Who would prefer reading convincing-but-potentially-false chatbot-advice to a conversation with their doctor (or force the bot on others). Zooming out from specific use cases to the wider ecosystem… aren't even those who shrug at ideals like centering humans worried that LLMs-generated content wipes out the very “value” capitalists wants to extract from the web (by enshittification)? I certainly hope so.
Addendum: I didn't know writing this post that OpenAI's CEO Sam Altman literally wrote he looked forward to “AI medical advisors for people who can't afford care”. From his thread on 19 February 2023:
the adaptation to a world deeply integrated with AI tools is probably going to happen pretty quickly; the benefits (and fun!) have too much upside.
these tools will help us be more productive (can't wait to spend less time doing email!), healthier (AI medical advisors for people who can’t afford care), smarter (students using ChatGPT to learn), and more entertained (AI memes lolol).
(…)
we think showing these tools to the world early, while still somewhat broken, is critical if we are going to have sufficient input and repeated efforts to get it right. the level of individual empowerment coming is wonderful, but not without serious challenges.
He talks about “individual empowerment [that] is wonderful”, I think it's incredibly dystopian.
Originally posted as “AI” content and user centered design on Hidde's blog.
Aaron Francis at Laracon US 2023: Publishing Your Work
I've attended this talk twice this year and it hit just as hard the second time. Happy to see it available online. Start watching already!