Should you use NoSQL?

Q: Should you use NoSQL?

Default to relational; reach for a specific NoSQL store only when a named access pattern demands it. A well-tuned Postgres carries most products further than the hype admits — switch when you can point at the workload a particular store serves better, not at scale you might have one day. Start from the default: relational, until a specific store earns its way in. "NoSQL" lumps together very different databases — a document store like MongoDB, a wide-column store like Cassandra, a key-value store like DynamoDB or Redis, and a graph database are about as alike as a hammer and a paint roller — so the only version of the question that has an answer is whether one specific store fits one specific access pattern of yours. Until you can name that pattern, stay relational.

NoSQL is not one thing, and scale is not the question

Start from the default: relational, until a specific store earns its way in. "NoSQL" lumps together very different databases — a document store like MongoDB, a wide-column store like Cassandra, a key-value store like DynamoDB or Redis, and a graph database are about as alike as a hammer and a paint roller — so the only version of the question that has an answer is whether one specific store fits one specific access pattern of yours. Until you can name that pattern, stay relational.

The second confusion is scale. People reach for NoSQL because they heard it is web scale, as if relational databases tip over at some traffic threshold. They do not. A well-tuned Postgres on decent hardware handles thousands of writes a second and tens of thousands of reads — comfortably more than the vast majority of products ever see. Scale is rarely the real reason to switch, and almost never the reason early.

What overturns the default is the shape of your data and the shape of your queries, never a vibe about growth. Get honest about both, and stay relational until they argue otherwise.

When NoSQL is the right call

When your access pattern is simple and known up front — fetch a session by ID, pull a whole product document, append an event to a time series — a key-value or document store does that one thing with brutal efficiency and scales horizontally without much drama. You hand it a key, it hands you a blob, and adding nodes is a first-class operation rather than a months-long sharding project.

The other genuine win is write throughput at scale. Stores like Cassandra spread writes across many nodes and stay available even when some are unreachable. If you ingest a firehose of sensor data or activity events and read it back in predictable ways, that design fits the problem closely.

Document stores also shine when the data really is a self-contained document with a flexible, sparse shape — a catalog where every category has different attributes — and you almost always read the whole thing at once. No joins, no impedance mismatch. The object you store is the object you use.

When to stick with SQL

If your data is relational and you run varied, ad-hoc queries, stay with SQL. A join will run circles around lookups you hand-roll in application code, and the query planner answers questions you did not anticipate at schema-design time. The flexibility is the feature.

Skip NoSQL too when you need multi-record transactions and strong consistency by default, or when the only reason you are reaching for it is to avoid designing a schema. The schema does not disappear when you drop SQL. It moves into your code, scattered and unenforced, and you rebuild it badly.

What NoSQL actually costs

You give up the join and the ad-hoc query. You model for the queries you know, and a new question often means a new table, a duplicated copy of the data, or a slow scan over everything. The flexibility SQL hands you for free becomes engineering work.

You usually give up rich transactions. Many NoSQL stores offer single-item atomicity, not the multi-row, all-or-nothing transactions SQL gives by default. If an invariant spans records — move money between two accounts and both sides must agree — that guarantee is hard to rebuild on a store that does not provide it.

And you give up the database enforcing your schema. Three years and four developers later, the same field exists in six slightly different shapes across your documents, and nothing stopped it. That cost shows up slowly, which is exactly why it surprises people.

The cost math, roughly

The two worlds bill differently, and the shape of the bill matters more than the rates. A relational box is a fixed cost that steps: you pay for the instance whether it is busy or idle, and growth means jumping to the next size, roughly doubling the bill in one move. Managed NoSQL in the DynamoDB mould is metered: you pay per read, per write, per gigabyte stored, so the bill is near zero at idle and grows linearly with traffic, with no instance-size cliff to plan around.

Which shape wins depends on your load curve. Spiky or low traffic favours metered — you stop paying for the quiet hours a provisioned box charges you for. Sustained high throughput flips it: per-request pricing at constant load usually loses to a box you have already paid for, sometimes badly. Denormalisation has its own line, too. Every duplicated copy of a value is paid storage and a paid write, so a model that keeps five copies of the data writes — and bills — five times for every change.

The rule: sketch your traffic over a day before you pick. If it is spiky, bursty, or small, metered pricing is cheap insurance and the always-on box is the waste. If it is a steady high plateau, price the provisioned equivalent first, because constant load is exactly where metering stops being a deal.

The trap: modelling NoSQL like SQL

The classic mistake is bringing relational habits to a non-relational store. People normalise into separate collections, then need to join them, so they fetch one collection, loop, and fetch the related items one by one in application code. That is a hand-rolled join — slow, and without any of the optimisation a real query planner applies. You have taken the worst of both worlds.

In NoSQL the queries come first, then the model. You decide exactly how you will read the data, then shape the storage — often denormalised, the same value duplicated in several places — so each read is a single lookup. The denormalisation is the deal: you trade write cost, storage, and the risk of inconsistent copies for fast, simple reads. If you are not willing to make that trade deliberately, you are not ready for the store.

NoSQL vs a relational database

A relational database is the flexible generalist: rich queries, joins, transactions, and a schema the engine enforces. It costs you horizontal write scaling, which is hard, and it assumes one big node can hold the working set — true for most products for a long time. A NoSQL store is the specialist: pick it and one access pattern gets blazing fast and scales sideways cleanly, at the price of joins, ad-hoc queries, and cross-record transactions.

Rule of thumb: default to relational, because it carries you further than the hype implies. Reach for a specific NoSQL store when you can name the pattern it serves better — "we read sessions by key millions of times a day and never query across them," "we ingest events and read them back by time range." A concrete pattern, not a vibe about future growth.

How to choose without regret

Start relational and prove the access pattern before you specialise. When you can point at a real workload that a particular store serves better, add that store for that workload — not for the whole system.

It is rarely all-or-nothing. The mature answer is usually polyglot: Postgres for the core data, Redis in front for hot key-value reads, maybe a document store for one document-shaped corner. Pick the store per workload, justify each by the pattern it serves, and you will rarely look back.