Internals · 11 / 11
Internals

The allocator API

Every Box::new, every Vec::push past capacity, bottoms out in one trait with two required methods. This page covers GlobalAlloc and Layout, how #[global_allocator] swaps in jemalloc or mimalloc with five lines, what happens on OOM (and the fallible alternative), the still-unstable Allocator trait behind Vec::new_in, and why embedded and high-performance code treats the allocator as a first-class design decision.

Long read · GlobalAlloc and Layout through allocator_api, arenas, and no_std · references at the end


1 · The trait at the bottom

rust core/src/alloc/global.rs · the whole contract
pub unsafe trait GlobalAlloc {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8;          // required
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout);     // required

    unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 { ... } // default:
    unsafe fn realloc(&self, ptr: *mut u8, layout: Layout,          // alloc+memset /
                      new_size: usize) -> *mut u8 { ... }            // alloc+copy+dealloc
}

Note the asymmetry with C: dealloc receives the Layout back. free() has to look size metadata up from a header next to the allocation; Rust callers (Vec, Box) statically know what they allocated and pass it in, so an allocator may run headerless and route by size class with no lookup. The trait is unsafe on both sides of the contract: implementors must return well-aligned, exclusive memory or null; callers must pass the same layout to dealloc that they got the pointer with.

Layout is the two numbers every allocation needs — size and a power-of-two alignment — computed for any type with Layout::new::<T>(), or built by hand for dynamic structures (array::<T>(n), from_size_align, and extend for header-plus-payload layouts, which handles the padding arithmetic the layout page describes).

2 · The default, and a short history

Unless told otherwise, a Rust binary uses std::alloc::System: malloc/free on Unix, HeapAlloc on Windows (with aligned variants where the layout demands it). It wasn't always so — early Rust shipped jemalloc as the default for executables, which made binaries bigger and surprised C interop; rustc 1.32 (January 2019) switched the default to the system allocator and left jemalloc as an opt-in. rustc itself still links jemalloc for its own use, which says something about both sides of the trade.

Why opt out of the system allocator? Long-running multi-threaded servers are the usual case: allocators differ in per-thread caching, fragmentation behaviour over days of uptime, and contention under parallel load. jemalloc brings arena-per-CPU design and best-in-class introspection/profiling; mimalloc is small and consistently fast; glibc malloc is fine until it isn't (its arena behaviour under many threads is a known RSS amplifier). Measured swaps of 5–20% throughput on allocation-heavy services are routine, which is a lot of win for five lines:

rust src/main.rs · the five lines
// Cargo.toml: tikv-jemallocator = "0.6"   (or mimalloc = "0.1")
use tikv_jemallocator::Jemalloc;

#[global_allocator]
static GLOBAL: Jemalloc = Jemalloc;

fn main() {
    // every Box, Vec, String, HashMap in the whole program — including
    // all dependencies — now allocates through jemalloc.
}

One binary, one global allocator: the attribute may appear once in the crate graph, the pick happens at link time, and there is no per-call dispatch cost — calls compile to direct calls into the chosen implementation via the __rust_alloc symbols.

3 · Writing one — the counting wrapper

Implementing GlobalAlloc is rarely about writing malloc from scratch; the production-grade pattern is the wrapper — instrument the real allocator:

rust src/alloc_meter.rs · live heap accounting, ~20 lines
use std::alloc::{GlobalAlloc, Layout, System};
use std::sync::atomic::{AtomicUsize, Ordering::Relaxed};

pub struct Meter;

pub static LIVE: AtomicUsize = AtomicUsize::new(0);
pub static PEAK: AtomicUsize = AtomicUsize::new(0);

unsafe impl GlobalAlloc for Meter {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        let p = unsafe { System.alloc(layout) };
        if !p.is_null() {
            let now = LIVE.fetch_add(layout.size(), Relaxed) + layout.size();
            PEAK.fetch_max(now, Relaxed);
        }
        p
    }
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        LIVE.fetch_sub(layout.size(), Relaxed);
        unsafe { System.dealloc(ptr, layout) }
    }
}

#[global_allocator]
static A: Meter = Meter;
// Same shape powers: per-request allocation budgets, alloc-count
// assertions in benchmarks ("this hot path allocates zero times"),
// and leak hunting without external tooling.
One real constraint: code inside the allocator must not allocate (no println!, no format!, no panicking paths that build messages) — that recurses straight back into alloc and overflows the stack. Atomics and raw syscalls only.

4 · OOM: abort by default, fallible by request

When alloc returns null, the std containers call handle_alloc_error(layout), which aborts the process — not a panic, no unwinding, no destructors. The reasoning: by the time allocation fails, running recovery code (which itself allocates) rarely goes well, and on overcommitting Linux you often get the OOM killer before you ever see a null. For the cases that genuinely must survive allocation failure — databases honouring memory budgets, kernels, anything on a small device — std grew fallible entry points:

rust src/main.rs · try_reserve (stable since 1.57)
use std::collections::TryReserveError;

fn load(n: usize) -> Result<Vec<u64>, TryReserveError> {
    let mut v: Vec<u64> = Vec::new();
    v.try_reserve_exact(n)?;          // Err(..) instead of abort
    v.extend((0..n as u64));
    Ok(v)
}

fn main() {
    match load(usize::MAX / 16) {
        Ok(v) => println!("loaded {}", v.len()),
        Err(e) => println!("backpressure instead of death: {e}"),
    }
}

This is the honest state of fallible allocation on stable: try_reserve / try_reserve_exact on Vec, String, HashMap and friends, plus Box::try_new behind the unstable allocator feature. Linux-kernel Rust, which forbids infallible allocation outright, builds on its own variants of the alloc crate for exactly this reason.

5 · allocator_api — per-container allocators, still unstable

GlobalAlloc is one allocator per process. The richer design — pass an allocator per container — has lived on nightly for years as the allocator_api feature (tracking issue #32838):

rust nightly · the Allocator trait and the second type parameter
pub unsafe trait Allocator {
    fn allocate(&self, layout: Layout) -> Result<NonNull<[u8]>, AllocError>;
    unsafe fn deallocate(&self, ptr: NonNull<u8>, layout: Layout);
    // + grow / shrink / by-ref combinators, all fallible by design
}

// The collections gained a defaulted allocator parameter:
//   pub struct Vec<T, A: Allocator = Global> { ... }

#![feature(allocator_api)]
use std::alloc::Global;

let v: Vec<u8, Global> = Vec::new_in(Global);
let b = Box::new_in(42u64, Global);
// ...and with an arena allocator A, Vec::new_in(arena) puts the
// elements in the arena — freed all at once when the arena drops.

Differences from GlobalAlloc worth noticing: allocate returns Result<NonNull<[u8]>, AllocError> — fallibility and the actual (possibly larger) size are in the signature, not bolted on; and allocators are passed by value/reference as ordinary generic parameters, so a Vec<T, &Bump> borrows its arena. Why it's still unstable after a decade: the type parameter infects every API that touches collections, and questions like "what does Box<T, A>::into_raw mean across allocators" and how this interacts with dyn and async traits keep reopening. On stable, the allocator-api2 crate mirrors the trait (used by hashbrown), and arena crates ship their own handles:

rust src/main.rs · arenas on stable, today
// bumpalo: bump-pointer arena. Allocation = pointer increment + bounds
// check. No per-object free; everything dies with the Bump.
use bumpalo::Bump;
use bumpalo::collections::Vec as BumpVec;

let arena = Bump::new();
let mut spans = BumpVec::new_in(&arena);
for i in 0..1000u32 {
    spans.push(arena.alloc(format!("span-{i}")) as &String);
}
drop(arena);   // one deallocation for a thousand objects
// Compilers, parsers, request handlers with per-request arenas:
// this is the pattern. rustc's own type interner works this way.

6 · no_std and embedded — bringing your own heap

On bare metal there is no system allocator, but the machinery above is exactly how you get one. core never allocates; the alloc crate (Box, Vec, String, BTreeMap…) works anywhere you provide a #[global_allocator]:

rust src/main.rs · a heap on a microcontroller
#![no_std]
#![no_main]
extern crate alloc;

use embedded_alloc::LlffHeap as Heap;   // linked-list first-fit

#[global_allocator]
static HEAP: Heap = Heap::empty();

#[cortex_m_rt::entry]
fn main() -> ! {
    // Carve the heap out of RAM once, at boot:
    use core::mem::MaybeUninit;
    const SIZE: usize = 16 * 1024;
    static mut MEM: [MaybeUninit<u8>; SIZE] = [MaybeUninit::uninit(); SIZE];
    unsafe { HEAP.init(&raw mut MEM as usize, SIZE) }

    let mut log: alloc::vec::Vec<u32> = alloc::vec::Vec::new();
    log.push(0xC0FFEE);
    loop {}
}

The embedded discipline that follows: allocate at startup, then stop — steady-state allocation on a 16 KiB heap means fragmentation roulette. Many firmware codebases go further and stay heapless (heapless::Vec<T, N>, fixed pools), using the allocator only during init. The same instincts — bound it, front-load it, measure it — are what the jemalloc-on-a-server crowd applies at 10,000x the scale.

7 · A working checklist

  • Default stance: the system allocator is fine. Swap only with a benchmark and a memory-profile in hand; jemalloc for long-running contended services and when you want its profiling, mimalloc for raw speed in a small package.
  • Allocation-heavy hot path? Before reaching for a faster malloc, allocate less: reuse buffers (clear() keeps capacity), with_capacity up front, arenas per request/phase.
  • Need to not die on OOM? try_reserve at the points where size is attacker- or input-controlled; treat everything else as infallible.
  • Auditing memory? A wrapper allocator is 20 lines and works in production; jemalloc's heap profiling (jemalloc_pprof) answers "what is holding 6 GB" properly.
  • Tracking the future: allocator_api (#32838) for per-container allocators; the Rust-for-Linux fork of alloc shows what an all-fallible std would look like.

References

Found this useful?