Generics & monomorphisation
A generic function in Rust is a template the compiler stamps out once per concrete type it's used with. That single decision buys "zero-cost abstraction" — iterator chains that compile to the same loop you'd write by hand — and it's also where your binary size and compile times go. This page follows one generic function through the compiler: type checking, instance collection, codegen units, the de-duplication machinery, and the dial you turn when the bill gets too high — dyn.
Long read · from typeck to codegen units, share-generics and the size/speed dials · references at the end
1 · One definition, N functions
fn largest<T: PartialOrd + Copy>(items: &[T]) -> T {
let mut best = items[0];
for &it in items {
if it > best { best = it; }
}
best
}
fn main() {
println!("{}", largest(&[3i32, 7, 2]));
println!("{}", largest(&[1.5f64, 0.2]));
}After type checking, the compiler collects every instantiation — every (function, concrete-types) pair the program actually reaches — and generates a separate, fully specialised function for each. The binary ends up containing the moral equivalent of:
fn largest_i32(items: &[i32]) -> i32 { ... } // i32 compare instructions
fn largest_f64(items: &[f64]) -> f64 { ... } // f64 compare instructions
// Real symbol names carry the crate, path and a hash of the substitutions:
// _ZN4demo7largest17h1c0e8c5f9ad2b3aE (largest::<i32>)
// This pass is driven by rustc_monomorphize::collector, which walks MIR
// from the roots (main, exported fns) and records every reachable instance.There is no boxing, no type tag, no runtime dispatch: by the time LLVM sees the code,
generics are gone. Each copy is optimised against its concrete type — the i32 version
can vectorise integer compares, the f64 version handles NaN semantics — exactly as if
you'd written both by hand. The same applies to generic structs (Vec<i32>
and Vec<String> are unrelated types with separately compiled methods)
and to const generics, where [T; N] code is stamped per value of N.
2 · Checked once, stamped many times — not C++ templates
The crucial difference from C++ templates: a Rust generic is type-checked once, at its definition, against its trait bounds. The body may only use capabilities the bounds grant. Instantiation can't fail with an error inside the library's body — if the bounds are satisfied, codegen succeeds.
fn largest<T: PartialOrd>(items: &[T]) -> &T {
let mut best = &items[0];
for it in items {
if it > best { best = it; } // ok: PartialOrd grants >
// it + *best // error HERE, at definition:
// // no `Add` in the bounds
}
best
}
// C++: the template body is a syntax macro, errors appear at instantiation,
// 150 lines deep in someone else's header.
// Rust: errors appear at the definition, or at the call site as a clean
// "the trait bound `X: PartialOrd` is not satisfied".This is also why trait bounds are part of a crate's public API in a hard sense: the compiler will not let the body quietly depend on anything more, so adding a bound later is a breaking change, and the error messages can always name the missing trait.
3 · Where the copies are made — your crate, not the library's
A subtlety with big build-time consequences: when a library crate defines a generic
function, the library's compiled artifact contains no machine code for it —
just the MIR. Codegen happens in the crate that instantiates it, with its concrete types.
Use serde, and your crate compiles every
Deserialize instantiation for your types; that's why heavily
generic dependencies make downstream builds slow even when the dependency itself compiled
quickly, and why cargo llvm-lines regularly shows a handful of generic
functions accounting for half the LLVM IR in a workspace.
- Codegen units. rustc splits a crate into CGUs (16 in release, 256 in debug by default) compiled by LLVM in parallel. An instantiation needed by several CGUs may be duplicated into each (then merged by the linker) — more parallelism, more repeated work.
- share-generics. The unstable
-Zshare-genericsmode (enabled by default for unoptimised builds) lets crates in one build graph reuse each other's instantiations instead of re-stamping them — a debug-build compile-time win. #[inline]on generics is mostly redundant — generic bodies are already available cross-crate by necessity. It matters for the non-generic functions you want inlinable across crates (absent LTO).
4 · "Zero-cost", demonstrated and qualified
pub fn sum_even_squares(v: &[i64]) -> i64 {
v.iter()
.filter(|&&x| x % 2 == 0)
.map(|&x| x * x)
.sum()
}
// Three closures, three adapter structs (Filter<Map<...>>), one generic
// fold behind .sum() — all monomorphised together, inlined into one
// frame, and compiled to a single loop. With -O it auto-vectorises.
// Check on godbolt.org (rustc, -O): no calls, no allocations remain.That's the promise kept: the abstraction costs nothing at runtime. The honest ledger has two other columns. Compile time — every instantiation is real work for LLVM, and optimising the same inlined iterator machinery hundreds of times is a large share of release-build wall time. Binary size — N instantiations are N copies of the code; embedded targets and serverless cold starts both notice. Instruction-cache pressure from bloat can even make the "faster" static dispatch a net loss in rare, very wide call sites.
5 · The other dial: dyn dispatch
Every generic API has a sibling design with dyn Trait: one compiled copy
that takes a fat pointer and dispatches through a vtable (the
trait objects page
covers the mechanics). The trade is mechanical:
fn f<T: Trait>(x: T) | fn f(x: &dyn Trait) | |
|---|---|---|
| copies of f | one per T used | exactly one |
| call cost | direct, inlinable | indirect via vtable, opaque to the optimiser |
| binary / compile time | grows with uses | flat |
| heterogeneous collections | no (one T per instantiation) | yes — Vec<Box<dyn Trait>> |
The std library's own pattern is worth stealing: generic at the edge for ergonomics,
concrete inside for one copy. std::fs::read is literally this:
pub fn read<P: AsRef<Path>>(path: P) -> io::Result<Vec<u8>> {
fn inner(path: &Path) -> io::Result<Vec<u8>> {
// ... all the actual work, compiled exactly once ...
}
inner(path.as_ref()) // the generic shim is two instructions
}
// Callers get read("x"), read(String), read(PathBuf) for free;
// the binary gets one copy of the I/O logic. The `momo` crate
// generates this transformation with a derive.6 · Polymorphisation — the experiment that didn't make it
Monomorphisation is often wasteful: Vec::<String>::len and
Vec::<u64>::len compile to identical code, and many instantiations
differ only in a type parameter the function never actually touches.
Polymorphisation was rustc's attempt to detect unused or layout-irrelevant
parameters and share one copy across them — implemented behind
-Zpolymorphize from 2020. It never matured: the analysis stayed limited,
interacted badly with other compiler features, and the maintenance cost outweighed the
wins, so it was removed from rustc in late 2024.
What survives in practice: the linker's ICF (identical code folding,
-Wl,--icf=all with lld) merges byte-identical instantiations after the fact;
LLVM's mergefunc does some of the same earlier; and the manual techniques in this section
remain the reliable tool. The conceptual lesson stands — most generic code doesn't need
most of its specialisation — but today acting on it is your job, not the compiler's.
7 · Measuring and taming the bill
# Who is filling my binary?
$ cargo bloat --release -n 10
File .text Size Crate Name
4.1% 11.2% 142.3KiB serde_json serde_json::value::Value::deserialize...
2.8% 7.7% 97.9KiB regex regex::exec::ExecBuilder::build
...
# Who is costing me compile time? (counts LLVM IR lines per generic fn,
# summed over all instantiations — the proxy for LLVM work)
$ cargo llvm-lines --release | head
Lines Copies Function name
30043 (5.2%) 408 (3.1%) core::result::Result<T,E>::map_err
18027 (3.1%) 156 (1.2%) alloc::raw_vec::RawVec<T,A>::grow_amortized
...- Box the cold paths. Error construction, logging, config parsing — route them through
dynand keep monomorphisation for the hot loops. - Hoist the non-generic core out of generic functions (the
innertrick above) — by far the best size-per-effort ratio. - Watch nested generics.
combinator<F: Fn>taking closures means every call site is a fresh instantiation; a&dyn Fnparameter collapses them. - For size-critical targets:
opt-level = "z",lto = true,codegen-units = 1,panic = "abort"— and re-measure; LTO often claws back most duplicate-instantiation cost on its own.
References
- rustc dev guide: monomorphization — the collector and partitioning, from the implementers.
- The Book: performance of generics — the ground-floor statement of the model.
- cargo-llvm-lines — instantiation-cost profiling.
- cargo-bloat — binary-size attribution.
- rust-lang/rust #133883 — the PR removing polymorphization, with the post-mortem rationale.
- David Lattimore — speeding up the Rust build cycle — where generics cost shows up in real builds.