Vector databases.
A database built to answer "what is most similar to this?" by finding the nearest points in space, fast.
Once you turn text or images into embeddings — points in space where nearness means similar meaning — you need somewhere to store millions of them and, crucially, to ask "which stored points are closest to this new one?" A normal database is built for exact matches and ranges, not for finding nearest neighbours in hundreds of dimensions.
A vector database is built for exactly that question. It stores embeddings and is optimised to return the most similar ones to a query vector quickly, even across enormous collections. It is the storage-and-search engine behind semantic search, recommendations, and feeding relevant context to AI models.
- Millions of dots to store.1
You already turned everything into points on a map. Now there are millions of them.
- What’s nearest to here?2
The question is always the same: which stored points sit closest to this new one?
- Compare all of them? No.3
Checking every dot one by one is hopeless once the map holds millions.
- Skip straight to the block.4
So it builds an index that jumps near the right neighbourhood instead of scanning all.
- Close enough, way faster.5
It trades a sliver of accuracy for a huge speed-up — "good enough, almost instantly."
- Here’s your top neighbours.6
Back come the nearest matches with their original items — the engine behind RAG.
Why exact search does not scale
Finding the truly closest vector means comparing the query against every stored vector — fine for thousands, hopeless for millions, especially in high dimensions. So vector databases use approximate nearest neighbour (ANN) indexes, clever structures that get you the closest matches almost always, in a tiny fraction of the work, by trading a sliver of accuracy for an enormous speed-up. Tuning that accuracy-vs-speed dial is a core part of using them well.
Where they fit, and the options
Vector databases are the backbone of retrieval-augmented generation (RAG): embed your documents, store them, and at query time fetch the most relevant chunks to hand a language model as context. Choices range from adding vector search to an existing database (such as the pgvector extension for Postgres, simple if your data already lives there) to purpose-built services like Pinecone and Weaviate that specialise in scale, filtering, and managed operation. The trade is convenience and unified storage versus dedicated performance and features.