Reading List
This isn't the way to speed up Rust compile times from Christine Dodrill's Blog RSS feed.
This isn't the way to speed up Rust compile times
Recently serde, one of the most popular Rust libraries made a decision that supposedly sped up compile times by using a precompiled version of a procedural macro instead of compiling it on the fly. Like any technical decision, there are tradeoffs and advantages to everything. I don't think the inherent ecosystem risks in slinging around precompiled binaries are worth the build speed advantages, and in this article I'm going to cover all of the moving parts for this space.
serde
serde is one of the biggest libraries in the Rust ecosystem. It provides the tooling for serializing and deserializing (ser/de) arbitrary data structures into arbitrary formats. The main difference between serde and other approaches is that serde doesn't prefer an individual encoding format. Compare this struct in Rust vs the equivalent struct in Go:
#[derive(Debug, Deserialize, Eq, PartialEq, Clone, Serialize)]
pub struct WebMention {
pub source: String,
pub title: Option<String>,
}
type WebMention struct {
Source string `json:"source"`
Title *string `json:"title"`
}
Besides syntax, the main difference is in how the serialization/deserialization works. In Go the encoding/json package uses runtime reflection to parse the structure metadata. This does work, but it's expensive compared to having all that information already there.
The way serde works is by having an implementation of Deserialize or Serialize on the data types you want to encode or decode. This effectively pushes all of the data that is normally inspected at runtime with reflection into compile-time data. In the process, this makes the code run a little bit faster and more deterministically, but at the cost of adding some time at compile time to determine that reflection data up front.
I think this is a fair tradeoff due to the fundamental improvements in developer experience. In Go, you have to declare the encoding/decoding rules for every codec individually. This can lead to stuctures that look like this:
type WebMention struct {
Source string `json:"source" yaml:"source" toml:"source"`
Title *string `json:"title" yaml:"title" toml:"source"`
}
toml:"source"
defined on the Title
field, didn't you
mean to say toml:"title"
?type WebMention struct {
Source string `json:"source" yaml:"source" toml:"source"`
Title *string `json:"title" yaml:"title" toml:"title"`
}
This becomes unwieldy and can make your code harder to read. Some
codecs get around this by reading and using the same tag rules that
encoding/json does, but the Rust equivalent works for any codec that
can be serialized into or deserialized from. That same WebMention
struct works with JSON, YAML, TOML, msgpack,
or anything else you can imagine. serde is one of the most used
packages for a reason: it's so convenient and widespread that it's
widely seen as being effectively in the standard library.
If you need to add additional behavior such as parsing a string to markdown, you can do that with your own implementation of the Deserialize trait. I do this with the VODs pages in order to define my stream VOD information in configuration. The markdown inside strings compiles to the HTML you see on the VOD page, including the embedded video on XeDN. This is incredibly valuable to me and something I really want to keep doing until I figure out how to switch my site to using something like contentlayer and MDX.
The downsides
It's not all sunshine, puppies and roses though. The main downside to the serde approach is the fact that it relies on a procedural macro. Procedural macros are effectively lisp-style "syntax hygenic" macros. Effectively you can view them as a function that takes in some syntax, does stuff to it, and then returns the result to be compiled in the program.
This is how it can derive the serialization/deserialization code, it takes the tokens that make up the struct type, walks through the fields, and inserts the correct serialization or deserialization code so that you can construct values correctly. If it doesn't know how to deal with a given type, it will blow up at compile-time, meaning that you may need to resort to increasingly annoying hacks to get things working.
When you write your own procedural macro, you create a separate crate for this. This separate crate is compiled against a special set of libraries that allow it to take tokens from the Rust compiler and emit tokens back to the rust compiler. These compiled proc macros are run as dynamic libraries inside invocations of the Rust compiler. This means that proc macros can do anything as the permissions of the Rust compiler, including crashing the compiler, stealing your SSH key and uploading it to a remote server, running arbitrary commands with sudo power, and much more.
A victim of success
Procedural macros are not free. They take nonzero amounts of time to run because they are effectively extending the compiler with arbitrary extra behavior at runtime. This gives you a lot of power to do things like what serde does, but as more of the ecosystem uses it more and more, it starts taking nontrivial amounts of time for the macros to run. This causes more and more of your build time being spent waiting around for a proc macro to finish crunching things, and if the proc macro isn't written cleverly enough it will potentially waste time doing the same behavior over and over again.
This can slow down build times, which make people investigate the problem and (rightly) blame serde for making their builds slow. Amusingly enough, serde is used by the Rust compiler rustc and package manager cargo. This means that the extra time compiling proc macros are biting literally everyone, including the Rust team.
hyperfine --prepare "cargo clean" "cargo
build --release"
Here's the results on our new MacBook Pro
M2 Max:$ hyperfine --prepare "cargo clean" "cargo build --release"
Benchmark 1: cargo build --release
Time (mean ± σ): 41.872 s ± 0.295 s [User: 352.774 s, System: 22.339 s]
Range (min … max): 41.389 s … 42.169 s 10 runs
In comparison, the homelab shellbox machine that
production builds are made on scores this much:hyperfine --prepare "cargo clean" "cargo build --release"
Benchmark 1: cargo build --release
Time (mean ± σ): 103.852 s ± 0.654 s [User: 1058.321 s, System: 42.296 s]
Range (min … max): 102.272 s … 104.843 s 10
runs
Procedural macros are plenty fast, it's always a
tradeoff because they always could be faster. For additional timing
information about xesite builds, look at the timing
report.The change
In essence, the change makes serde's derive macro use a precompiled binary instead of compiling a new procedural macro binary every time you build the serde_derive dependency. This removes the need for that macro to be compiled from source, which can speed up build times across the entire ecosystem in a few cases.
However, this means that the most commonly used crate is shipping an arbitrary binary for production builds without any way to opt-out. This could allow a sufficiently determined attacker to use the serde_derive library as a way to get code execution on every CI instance where Rust is used at the same time.
Combine that with the fact that the Rust ecosystem doesn't currently have a solid story around cryptographic signatures for crates and you get a pretty terrible situation all around.
But this does speed things up for everyone...at the cost of using serde as a weapon to force ecosystem change.
In my testing the binary they ship is a statically linked Linux binary:
$ file ./serde_derive-x86_64-unknown-linux-gnu
./serde_derive-x86_64-unknown-linux-gnu: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), static-pie linked, BuildID[sha1]=b8794565e3bf04d9d58ee87843e47b039595c1ff, stripped
$ ldd ./serde_derive-x86_64-unknown-linux-gnu
statically linked
ldd
on untrusted
executables. ldd
works by setting the environment variable LD_TRACE_LOADED_OBJECTS=1
and then executing the command. This causes your system's C dynamic
linker/loader to print all of the dependencies, however malicious
applications can and will still execute their malicious code even when
that environment variable is set. I've seen evidence of applications
exhibiting different malicious behavior when that variable is set.
Stay safe and use virtual machines when dealing with unknown
code.Out of date "file not found" error with a friend using cargo2nix
Frustratingly, a friend of mine that uses cargo2nix is reporting getting a "file not found" error when trying to build programs depending on serde. This is esepecially confusing given that the binary is a statically linked binary, but I guess we'll figure out what's going on in the future.
There's also additional concerns around the binary in question not being totally reproducible, which is slightly concerning from a security standpoint. If we're going to be trusting some random guy's binaries, I think we are in the right to demand that it is byte-for-byte reproducible on commodity hardware without having to reverse-engineer the build process and figure out which nightly version of the compiler is being used to compile this binary blob that will be run everywhere.
I also can't imagine that distribution maintainers are happy with this now that Rust is basically required to be in distribution package managers. It's unfortunate to see crates.io turn from a source code package manager to a binary package manager like this.
This doesn't even make build times faster
The most frustrating part about this whole affair is that while I was writing the majority of this article, I assumed that it actually sped up compliation. Guess what: it only speeds up compilation when you are doing a brand new build without an existing build cache. In many cases this means that you only gain the increased build speed in very limited cases: when you are doing a brand new clean build or when you update serde_derive.
This would be much more worth the tradeoff if it actually gave a significant compile speed tradeoff, but in order for this to make sense you'd need to be building many copies of serde_derive in your CI builds constantly. Or you'd need to have every procedural macro in the ecosystem also follow this approach. Even then, you'd probably only save about 20-30 seconds in cold builds on extreme cases. I really don't think it's worth it.
The middle path
Everything sucks here. This is a Kobayashi Maru situation. In order to really obviate the need for these precompiled binary blobs being used to sidestep compile time you'd need a complete redesign of the procedural macro system.
One of the huge advantages of the proc macro system as it currently
exists is that you can easily use any Rust library you want at compile
time. This makes doing things like generating C library bindings on
the fly using bindgen
trivial.
Maybe there could be a lot of speed to be gained with aggressive caching of derived compiler code. I think that could solve a lot of the issues at the cost of extra disk space being used. Disk space is plenty cheap though, definitely cheaper than developer time. The really cool advantage of making it at the derive macro level is that it would also apply for traits like Debug and Clone that are commonly derived anyways.
I have no idea what the complexities and caveates of doing this would be, but it could also be interesting to have the crate publishing step do aggressive borrow checking logic for every supported platform but then disable the borrow checker on crates downloaded from crates.io. The borrow checker contributes a lot of time to the compilation process, and if you gate acceptance to crates.io on the borrow checker passing then you can get away without needing to run the extra borrow checker logic when compiling dependencies.
Tangent about using WebAssembly
WASM for procedural macros?
I'm also pretty sure that there is an easier argument to be made for shipping easily replicatable WASM blobs like Zig does instead of shipping around machine code like serde does.
One of the core issues with procedural macros is that they run unsandboxed machine code. Sandboxing programs is basically impossible to do cross-platform without a bunch of ugly hacks at every level.
I guess you'd need to totally rewrite the proc macro system to use WebAssembly instead of native machine code. Doing this with WebAssembly would let the Rust compiler control the runtime environment that applications would run under. This would let packages do things like:
- Declare what permissions it needs and have permissions changes on updates to the macros cause users to have to confirm them
- Declare "cache storage" so that things like derive macro implementations could avoid needing to recompute code that has already passed muster.
- Let people ship precompiled binaries without having to worry as much about supporting every platform under the sun. The same binary would run perfectly on every platform.
- More easily prove reproducibility of the proc macro binaries, especially if the binaries were built on the crates.io registry server somehow.
- Individually allow/deny execution of commands so that common
behaviors like
bindgen
,pkg-config
, and compiling embedded C source code continue working.
This would require a lot of work and would probably break a lot of existing proc macro behavior unless care was taken to make things as compatible. One of the main pain points would be dealing with C dependencies as it is nearly impossible* to deterministically prove where the dependencies in question are located without running a bunch of shell script and C code.
One of the biggest headaches would be making a WebAssembly JIT/VM that would work well enough across platforms that the security benefits would make up for the slight loss in execution speed. This is annoyingly hard to sell given that the current state of the world is suffering from long compilation times. It also doesn't help that WebAssembly is still very relatively new so there's not yet the level of maturity needed to make things stable. There is a POSIX-like layer for WebAssembly programs called WASI that does bridge a lot of the gap, but it misses a lot of other things that would be needed for full compatibility including network socket and subprocess execution support.
This entire situation sucks. I really wish things were better. Hopefully the fixes in serde-rs/serde#2580 will be adopted and make this entire thing a non-issue. I understand why the serde team is making the decisions they are, but I just keep thinking that this isn't the way to speed up Rust compile times. There has to be other options.
I don't know why they made serde a malware vector by adding this unconditional precompiled binary in a patch release in exchange for making cold builds in CI barely faster.
The biggest fear I have is that this practice becomes widespread across the Rust ecosystem. I really hate that the Rust ecosystem seems to have so much drama. It's scaring people away from using the tool to build scalable and stable systems.