Both are vectorised columnar engines and both are absurdly fast on one machine. The split is the deployment model. DuckDB is a library: analytics inside your process, against files, on a laptop or a Lambda, zero servers. ClickHouse is a service: continuous ingestion, materialized rollups, and thousands of concurrent dashboard queries against a cluster that is always on. Pick the shape, not the speed.
DuckDB
In-process analytical SQL — the OLAP engine you import.
The DuckDB vs ClickHouse comparison is unusual because the engines agree on almost everything — columns, vectorised execution, SQL — and the products share almost nothing. One is an embedded dependency you version-bump; the other is infrastructure you run, scale, and page someone about. Most teams that frame this as a performance question are actually facing an architecture question.
Quick takes
If you're…
Analysts query Parquet files in S3 or on laptops→DuckDBDuckDB queries Parquet/CSV/Iceberg in place from a notebook. No cluster, no load step.
You serve customer-facing dashboards with concurrent users→ClickHouseA server with materialized views and sub-second aggregates under concurrency is the job description.
Analytics runs inside an application, CLI, or Lambda→DuckDBIn-process means no network hop and nothing to deploy — DuckDB ships inside your artifact.
Events arrive continuously at high volume and must be queryable now→ClickHouseMergeTree absorbs streaming inserts (Kafka engine, async inserts) while serving reads.
The dataset fits on one good machine and jobs are batch→DuckDBA single NVMe box covers hundreds of gigabytes of Parquet; renting a cluster adds cost and ops for nothing.
You need petabyte scale with replication and failover→ClickHouseDistributed tables, replicas via Keeper, and horizontal scale-out are core features.
dbt or pipeline transforms on small-to-medium data→DuckDBDuckDB as a local transform engine is fast, free, and CI-friendly.
Observability or clickstream workloads with TTL rollups→ClickHouseTTL expressions, incremental materialized views, and compression codecs were built for exactly this.
A library. pip install duckdb, import, query — the engine runs inside your Python, Node, Java, or Wasm process, even in the browser. There is no server, no port, no auth layer, because there is no separate thing to connect to.
ClickHouse
A server (or a fleet). Clients connect over HTTP or the native protocol; clusters add Keeper for coordination, replicas, and distributed tables. ClickHouse Cloud removes the hosting but not the client-server shape.
core
Storage engine
depends
DuckDB
A single-file columnar format with compression — plus first-class external reads: Parquet, CSV, JSON, Iceberg, Delta, over local disk, HTTP, or S3. For many users DuckDB never "stores" anything; it is a query engine over files they already have.
ClickHouse
MergeTree: inserts land as sorted immutable parts, background merges consolidate, a sparse primary index skips data at scan time. Specialised codecs, TTLs, and projections layer on. One of the most refined storage engines for analytics, and you tune it accordingly.
core
Scale ceiling
edge: ClickHouse
DuckDB
One process, one machine. Vectorised, multi-threaded, and spills to disk past memory, so a big instance carries you further than people expect — but there is no clustering and none is planned. The ceiling is real.
ClickHouse
Sharding and replication are native; production clusters run to petabytes and trillions of rows. When data outgrows a node, ClickHouse keeps going and DuckDB simply stops being the right tool.
features
Continuous ingestion
edge: ClickHouse
DuckDB
Batch-shaped: load files, append from a pipeline step, rebuild. A single writer per database and no streaming-ingest machinery — fine for hourly jobs, wrong for firehoses.
ClickHouse
Built to drink from the firehose: millions of rows per second per node, a Kafka table engine, async inserts for many small writers, and reads stay fast while writes pour in. This is the workload ClickHouse was invented for at Yandex.
ops
Concurrent serving
edge: ClickHouse
DuckDB
Concurrency is per-process: your app can run parallel reads, but DuckDB is not a shared server many clients hammer at once. Embedding one engine per worker works; pretending it is a warehouse endpoint does not.
ClickHouse
Designed to serve: thousands of concurrent queries with workload isolation via quotas and settings profiles. Customer-facing analytics — every user slicing their own dashboard — is a flagship use case.
features
Materialized views and rollups
edge: ClickHouse
DuckDB
Standard views only; no incremental materialized views. Rollups are pipeline steps you re-run — idiomatic in a batch world, a gap if you wanted always-fresh aggregates.
ClickHouse
Incremental materialized views compute aggregates at insert time, so the rollup is ready before anyone queries it. With TTL-based downsampling, raw data ages out while summaries stay. A genuinely killer feature for metrics and clickstream.
features
Developer ergonomics
edge: DuckDB
DuckDB
The friendliest SQL in analytics: GROUP BY ALL, SELECT * EXCLUDE, reading a CSV by passing its filename, zero configuration, instant startup. Tight pandas/Polars/Arrow interop makes it the default engine of the Python data stack.
ClickHouse
Rich SQL with hundreds of analytical functions, but also a server to configure and MergeTree decisions to get right (ORDER BY keys, partitioning, merges). clickhouse-local offers a taste of the file-querying workflow without the server.
ecosystem
Ecosystem and trajectory
tie
DuckDB
MIT-licensed with a foundation holding the IP; an extension ecosystem (httpfs, spatial, Iceberg, community extensions) and DuckLake, DuckDB Labs’ SQL-metadata lakehouse format. Adoption rode the Python data wave — it is in everything now.
ClickHouse
Apache 2.0 core with a well-funded company driving fast releases; integrations across Kafka, Grafana, dbt, and every BI tool; chDB embeds ClickHouse in-process — the company’s explicit answer to DuckDB. Both projects are healthy; neither is a risky bet.
The grey area
The overlap zone: single-node ClickHouse vs DuckDB.
Each project has reached into the other’s territory, which is why the comparison stays interesting.
ClickHouse can play the embedded game: clickhouse-local queries Parquet and CSV files from the command line with no server, and chDB — now developed inside ClickHouse Inc. — packages the whole engine as an in-process Python library, a direct answer to DuckDB’s pip-install ergonomics. From the other side, DuckDB keeps growing upward: larger-than-memory execution, Iceberg and Delta readers, and DuckLake, a lakehouse format that keeps data in Parquet while putting the metadata in a real SQL database. The honest reading: ClickHouse versus DuckDB on a single node is closer to a coin flip on raw speed than either fan base admits, and ClickBench — ClickHouse’s own published benchmark suite — shows both in the top tier depending on query and hardware.
So the tiebreakers in the overlap zone are not speed. They are: who maintains it (a server needs an owner; a library needs a version bump), what the query concurrency looks like (one analyst vs a thousand dashboard sessions), and whether data arrives in batches or as a stream. A single beefy ClickHouse node serving Grafana panels around the clock is doing a job DuckDB structurally cannot; a DuckDB process inside a Lambda crunching one tenant’s Parquet on demand is a job where a ClickHouse server would be pure overhead.
The least-regret pattern we see: DuckDB for exploration, local pipelines, and anything embedded; ClickHouse when the same queries graduate to always-on, multi-user serving. The SQL dialects are close enough that the migration is mostly DDL — which is also why getting this choice wrong early is survivable.