ELI5 · AI & data

Embeddings.

Turning words and images into points on a map, so things that mean the same thing sit close together.

A computer cannot compare meanings the way you do. To it, "car" and "automobile" are just different strings of letters with nothing in common. Embeddings fix that by turning each piece of text — a word, a sentence, a whole document — into a list of numbers, a point in space.

The trick is where the points land. A model trained on huge amounts of text places things with similar meaning near each other on this map and unrelated things far apart. So "car" and "automobile" end up as neighbours, and now the computer can measure meaning as plain distance.

  1. These two? Total strangers.
    car automobile nothing in common
    1

    To a computer "car" and "automobile" are just letters with nothing in common.

  2. car → a row of numbers.
    car model 0.8 0.1 0.4
    2

    An embedding model reads each word and turns it into a list of numbers.

  3. Numbers are a spot on the map.
    car a spot on the map
    3

    Those numbers are really coordinates — a single point dropped on a map of meaning.

  4. Synonyms? Right next door.
    car · auto banana far apart
    4

    Trained on mountains of text, the model puts similar meanings near each other.

  5. Closeness equals meaning.
    close = similar far = unrelated
    5

    Now meaning is just distance: close points mean similar, far apart means unrelated.

  6. Different words, same spot.
    reset my password account recovery same spot on the map
    6

    That powers search by meaning: "reset my password" finds "account recovery."

Words become points on a map, where things that mean the same thing sit close together.

Why distance equals meaning

The model learns the map by reading enormous amounts of text and noticing which words and ideas appear in similar contexts. Words used the same way drift together. The result is that "nearness" on the map lines up with how related things actually are, even capturing relationships — the direction from "king" to "queen" can echo the direction from "man" to "woman".

Because meaning is now a position, comparing two things is just measuring how close their points are.

What they unlock

This is what powers semantic search: instead of matching exact keywords, you embed the query and find the nearest documents by meaning, so "how do I reset my password" finds an article titled "account recovery." The same idea drives recommendations, clustering, and the retrieval step that feeds relevant context to a language model.

The catch: an embedding only knows what its model was trained on, so meaning is approximate and can carry the biases of that training data.

The real version Vector embedding simulator →
Found this useful?