Databases · Updated 2026-06-11

DuckDB vs ClickHouse

Both are vectorised columnar engines and both are absurdly fast on one machine. The split is the deployment model. DuckDB is a library: analytics inside your process, against files, on a laptop or a Lambda, zero servers. ClickHouse is a service: continuous ingestion, materialized rollups, and thousands of concurrent dashboard queries against a cluster that is always on. Pick the shape, not the speed.

DuckDB
In-process analytical SQL — the OLAP engine you import.
Since
2019
By
CWI / DuckDB Labs
License
MIT
duckdb.org ↗
ClickHouse
Client-server columnar OLAP built for ingestion and serving.
Since
2016
By
ClickHouse Inc. (originally Yandex)
License
Apache 2.0
clickhouse.com ↗

The DuckDB vs ClickHouse comparison is unusual because the engines agree on almost everything — columns, vectorised execution, SQL — and the products share almost nothing. One is an embedded dependency you version-bump; the other is infrastructure you run, scale, and page someone about. Most teams that frame this as a performance question are actually facing an architecture question.

Quick takes

If you're…

  • Analysts query Parquet files in S3 or on laptops DuckDB DuckDB queries Parquet/CSV/Iceberg in place from a notebook. No cluster, no load step.
  • You serve customer-facing dashboards with concurrent users ClickHouse A server with materialized views and sub-second aggregates under concurrency is the job description.
  • Analytics runs inside an application, CLI, or Lambda DuckDB In-process means no network hop and nothing to deploy — DuckDB ships inside your artifact.
  • Events arrive continuously at high volume and must be queryable now ClickHouse MergeTree absorbs streaming inserts (Kafka engine, async inserts) while serving reads.
  • The dataset fits on one good machine and jobs are batch DuckDB A single NVMe box covers hundreds of gigabytes of Parquet; renting a cluster adds cost and ops for nothing.
  • You need petabyte scale with replication and failover ClickHouse Distributed tables, replicas via Keeper, and horizontal scale-out are core features.
  • dbt or pipeline transforms on small-to-medium data DuckDB DuckDB as a local transform engine is fast, free, and CI-friendly.
  • Observability or clickstream workloads with TTL rollups ClickHouse TTL expressions, incremental materialized views, and compression codecs were built for exactly this.
Decision wizard

A few questions, a verdict.

Q1

Who runs the queries?

Q2

Where does the data live?

Q3

How big, really?

Q4

Appetite for running a database server?

At a glance

The scorecard.

Dimension
DuckDB
ClickHouse
Edge
In-process library
Client-server system
depends
One file + reads lakes in place
MergeTree parts + sparse index
depends
One machine, used fully
Horizontal, petabyte-proven
ClickHouse
Batch loads
Streaming-native
ClickHouse
Per-process parallelism
High-concurrency serving
ClickHouse
Views; rollups are jobs
Incremental MVs at insert time
ClickHouse
Zero-config, notebook-native
Powerful, more knobs
DuckDB
Python-stack darling, MIT
Infra standard, fast-moving
tie
In depth

Dimension by dimension.

core

Deployment model

depends
DuckDB

A library. pip install duckdb, import, query — the engine runs inside your Python, Node, Java, or Wasm process, even in the browser. There is no server, no port, no auth layer, because there is no separate thing to connect to.

ClickHouse

A server (or a fleet). Clients connect over HTTP or the native protocol; clusters add Keeper for coordination, replicas, and distributed tables. ClickHouse Cloud removes the hosting but not the client-server shape.

core

Storage engine

depends
DuckDB

A single-file columnar format with compression — plus first-class external reads: Parquet, CSV, JSON, Iceberg, Delta, over local disk, HTTP, or S3. For many users DuckDB never "stores" anything; it is a query engine over files they already have.

ClickHouse

MergeTree: inserts land as sorted immutable parts, background merges consolidate, a sparse primary index skips data at scan time. Specialised codecs, TTLs, and projections layer on. One of the most refined storage engines for analytics, and you tune it accordingly.

core

Scale ceiling

edge: ClickHouse
DuckDB

One process, one machine. Vectorised, multi-threaded, and spills to disk past memory, so a big instance carries you further than people expect — but there is no clustering and none is planned. The ceiling is real.

ClickHouse

Sharding and replication are native; production clusters run to petabytes and trillions of rows. When data outgrows a node, ClickHouse keeps going and DuckDB simply stops being the right tool.

features

Continuous ingestion

edge: ClickHouse
DuckDB

Batch-shaped: load files, append from a pipeline step, rebuild. A single writer per database and no streaming-ingest machinery — fine for hourly jobs, wrong for firehoses.

ClickHouse

Built to drink from the firehose: millions of rows per second per node, a Kafka table engine, async inserts for many small writers, and reads stay fast while writes pour in. This is the workload ClickHouse was invented for at Yandex.

ops

Concurrent serving

edge: ClickHouse
DuckDB

Concurrency is per-process: your app can run parallel reads, but DuckDB is not a shared server many clients hammer at once. Embedding one engine per worker works; pretending it is a warehouse endpoint does not.

ClickHouse

Designed to serve: thousands of concurrent queries with workload isolation via quotas and settings profiles. Customer-facing analytics — every user slicing their own dashboard — is a flagship use case.

features

Materialized views and rollups

edge: ClickHouse
DuckDB

Standard views only; no incremental materialized views. Rollups are pipeline steps you re-run — idiomatic in a batch world, a gap if you wanted always-fresh aggregates.

ClickHouse

Incremental materialized views compute aggregates at insert time, so the rollup is ready before anyone queries it. With TTL-based downsampling, raw data ages out while summaries stay. A genuinely killer feature for metrics and clickstream.

features

Developer ergonomics

edge: DuckDB
DuckDB

The friendliest SQL in analytics: GROUP BY ALL, SELECT * EXCLUDE, reading a CSV by passing its filename, zero configuration, instant startup. Tight pandas/Polars/Arrow interop makes it the default engine of the Python data stack.

ClickHouse

Rich SQL with hundreds of analytical functions, but also a server to configure and MergeTree decisions to get right (ORDER BY keys, partitioning, merges). clickhouse-local offers a taste of the file-querying workflow without the server.

ecosystem

Ecosystem and trajectory

tie
DuckDB

MIT-licensed with a foundation holding the IP; an extension ecosystem (httpfs, spatial, Iceberg, community extensions) and DuckLake, DuckDB Labs’ SQL-metadata lakehouse format. Adoption rode the Python data wave — it is in everything now.

ClickHouse

Apache 2.0 core with a well-funded company driving fast releases; integrations across Kafka, Grafana, dbt, and every BI tool; chDB embeds ClickHouse in-process — the company’s explicit answer to DuckDB. Both projects are healthy; neither is a risky bet.

The grey area

The overlap zone: single-node ClickHouse vs DuckDB.

Each project has reached into the other’s territory, which is why the comparison stays interesting.

ClickHouse can play the embedded game: clickhouse-local queries Parquet and CSV files from the command line with no server, and chDB — now developed inside ClickHouse Inc. — packages the whole engine as an in-process Python library, a direct answer to DuckDB’s pip-install ergonomics. From the other side, DuckDB keeps growing upward: larger-than-memory execution, Iceberg and Delta readers, and DuckLake, a lakehouse format that keeps data in Parquet while putting the metadata in a real SQL database. The honest reading: ClickHouse versus DuckDB on a single node is closer to a coin flip on raw speed than either fan base admits, and ClickBench — ClickHouse’s own published benchmark suite — shows both in the top tier depending on query and hardware.

So the tiebreakers in the overlap zone are not speed. They are: who maintains it (a server needs an owner; a library needs a version bump), what the query concurrency looks like (one analyst vs a thousand dashboard sessions), and whether data arrives in batches or as a stream. A single beefy ClickHouse node serving Grafana panels around the clock is doing a job DuckDB structurally cannot; a DuckDB process inside a Lambda crunching one tenant’s Parquet on demand is a job where a ClickHouse server would be pure overhead.

The least-regret pattern we see: DuckDB for exploration, local pipelines, and anything embedded; ClickHouse when the same queries graduate to always-on, multi-user serving. The SQL dialects are close enough that the migration is mostly DDL — which is also why getting this choice wrong early is survivable.

When to pick neither

A different shape of problem.

  • Embedded transactional storage, not analytics
  • Modest analytics riding on the OLTP database you already run
  • chDB / clickhouse-local
    ClickHouse’s engine in DuckDB’s in-process clothing
  • Org-wide warehouse with governance and sharing
  • Apache Pinot / Druid
    Sub-second OLAP on streaming data with strict freshness SLAs
  • Polars
    DataFrame ergonomics instead of SQL for the same single-node jobs
Situational picks

For specific cases.

Data team querying a Parquet lake for internal analysis

DuckDB

Query the lake in place from notebooks and dbt. No cluster bill, no ingestion pipeline, no ops.

Product feature: per-customer analytics dashboards

ClickHouse

Concurrent serving with materialized rollups is the core competency. Embedding DuckDB per request reloads data and wastes it.

Observability: logs, traces, metrics at high volume

ClickHouse

Streaming ingest, brutal compression, TTL downsampling. Half the observability vendors run on ClickHouse for a reason.

Analytics inside a CLI, desktop app, or serverless function

DuckDB

In-process is the requirement, and DuckDB is the best in-process analytical engine — with Wasm if it has to run in the browser.

Single-node, always-on analytics service in the overlap zone

ClickHouse

If it serves many clients continuously, the server shape wins even on one box. Revisit DuckDB only if concurrency stays trivial.

You want both: local dev speed, server-grade prod

Either

DuckDB in development and pipelines, ClickHouse in serving. The pairing is common and the SQL gap is small.

Sources

Primary material.

Found this useful?