V · Strings & sequences

Suffix array

All suffixes, sorted.

The story

Manber and Myers proposed an alternative to suffix trees that used 4× less memory while supporting most of the same queries. Sort all suffixes of the string lexicographically; store just the starting indices. Find any pattern with binary search in O(m log n).

How it works

Build the array of starting positions of every suffix, sorted by the suffix string. Pattern search: binary search for the lexicographic position of the pattern. SA-IS (2009) constructs in linear time.

Where it lives

BWA / Bowtie genome alignment (combined with FM-index). bzip2's Burrows-Wheeler step. Full-text search engines' index merging.

The key insight

A suffix array + LCP array gives every operation a suffix tree gives, in less memory. The "suffix tree" you read about in textbooks has been suffix arrays in production for two decades.