← Blog··Updated 7 Jun 2026·5 min read

DuckDB: the single-node engine eating the warehouse

Most companies' data is not big enough to justify a distributed warehouse. A single fat box running DuckDB reads Parquet and Iceberg off S3 directly and answers the median analytics query in under a second, for a fixed bill and no cold start. The big-data era was mostly oversizing.

AI-assisted postDrafted with help from Claude, edited and fact-checked by Mart. See transparency policy →
A rubber duck in front of a laptop full of code

The small thing quietly doing serious work. Photo: Jakub T. Jankiewicz, CC BY-SA 4.0.

Your data is not as big as your warehouse bill thinks it is

Here is the uncomfortable claim that the entire distributed-warehouse industry is built on you not examining: the median analytics workload does not need a cluster. It needs one big machine and a good columnar engine. The reason a generation of data teams runs Spark or Snowflake over datasets that would fit on a laptop SSD is not that the data is big — it is that "big data" was the dominant procurement story for a decade and nobody got fired for buying a cluster. DuckDB is the engine that makes the alternative embarrassingly easy to demonstrate. It is an in-process OLAP database — "SQLite for analytics" is the standard line, and it is accurate — that reached its stability-focused 1.0 release on 3 June 2024, reads Parquet and Iceberg straight off object storage, and answers most queries before you have finished tabbing away from the terminal.

What DuckDB actually is

DuckDB was built by Mark Raasveldt and Hannes Mühleisen at CWI, the Dutch national research institute for mathematics and computer science, and first released in 2019. The defining design choices are all about getting analytical work done inside a single process with no server, no cluster, and no network coordination:

  • In-process. It runs as a library inside your Python, R, Java, or CLI process — there is no daemon to provision and no cold start, the recurring tax I wrote about for serverless warehouses. Importing the library is the startup.
  • Vectorized execution. It processes data in batches of 2,048 values at a time rather than row by row, which is the single biggest reason a single core does so much work per second.
  • Reads object storage directly. It queries remote Parquet and Iceberg files on S3 without importing them first, with roughly a 7% overhead versus local disk — the cost of HTTP and S3 protocol parsing. The same decoupled storage-and-compute model the lakehouse sells, minus the cluster.

The benchmark headline is that DuckDB's in-memory variant reached #1 on ClickBench in October 2025 and remains the top open-source system on hot runs. ClickBench is a single-table analytics benchmark, so it flatters single-node engines — DataFusion, chDB, and ClickHouse all trade the top spots there too — but the point stands: a single box is now genuinely the fastest place to run a large class of analytical queries.

The "big data is dead" thesis, stated plainly

The intellectual backbone of all this belongs to Jordan Tigani, a BigQuery co-founder who later started MotherDuck, and his Big Data is Dead essay. The argument is not that large datasets do not exist — they do — but that big compute is no longer relevant for the overwhelming majority of workloads, for two reasons. First, most organisations simply do not have much data: one of MotherDuck's investors surveyed portfolio companies and found 100 GB was the right order of magnitude for a data warehouse, the largest B2B companies sat around 1 TB, and the largest B2C companies around 10 TB. Second, even when the dataset is large, queries almost always touch only a small hot slice of it — recent data, a few partitions — so the working set is tiny even when the archive is not.

Put those two facts together and the distributed warehouse is solving a problem most teams do not have. A modern cloud box can carry hundreds of gigabytes of RAM and dozens of cores; a 1 TB Parquet dataset with predicate and projection pushdown rarely scans more than a few percent of itself. The cluster exists to parallelise a scan that, for the median workload, never needed parallelising.

The cost and latency arithmetic

This is where the single-node case stops being a research curiosity and starts being a finance decision. I worked through the serverless-versus-always-on math in detail in The cold-start tax, and DuckDB sits at the cheap, fast corner of that same trade. A serverless warehouse charges you for provisioning time, queue time, and idle-timeout time, most of which is not query execution. DuckDB has none of those line items because there is no warehouse — the engine is already in your process.

Distributed warehouse (serverless) Single fat box + DuckDB
Cold start Seconds to minutes per cold query None — in-process
Billing model Per-DBU/credit, charged from provision to idle-timeout Fixed EC2/VM hourly, or your laptop
Coordination overhead Distributed planner, shuffle, network None — single process
Median-workload latency Sub-second once warm, after the wait Sub-second, no wait
Operational burden Managed, but cost is per-query and unpredictable Run one box; back up the Parquet

For a sub-TB dataset queried on any kind of regular cadence, a single always-on box wins on cost, wins on latency, and wins on bill predictability. The distributed warehouse only pulls ahead when the working set genuinely exceeds one machine's memory and the query rate is high enough to keep a cluster busy — which is a real workload, just a much rarer one than the procurement narrative implies.

MotherDuck, and where the single-node story gets nuanced

The honest counterpoint to "just run DuckDB on a box" is that a single process is a single point of failure, a single tenant, and a single set of credentials, none of which a team wants to manage by hand at scale. MotherDuck is the managed-DuckDB answer: a hosted service co-founded by Tigani that runs DuckDB in the cloud, adds multi-tenancy, persistence, and sharing, and does hybrid execution — splitting a query between the cloud and the DuckDB instance in your local process. That hybrid model is the genuinely new idea. It means the engine can run the scan close to the data in the cloud and the final aggregation on your laptop, which collapses the egress and the round-trip latency at once.

It is also a reminder that "single-node" is an engine property, not a deployment constraint. You can run DuckDB embedded in a Lambda, in a notebook, in a dashboard backend, or as the query engine behind a DuckLake catalog where the metadata lives in Postgres and the data lives in Parquet on S3. The engine is small; the deployments are not limited to small.

When you still need the cluster

I am not arguing nobody needs Spark or Snowflake. I am arguing the threshold is far higher than most teams assume. The distributed warehouse earns its keep when:

  • The working set — not the archive, the working set — genuinely exceeds the memory of the largest single box you can rent, and you cannot partition the query to avoid that.
  • You need hundreds of concurrent users hitting the same compute, which is a multi-tenancy problem a single in-process engine does not solve on its own.
  • You are doing heavy distributed shuffles — large multi-table joins across enormous fact tables — where the parallelism is real work, not coordination overhead.

Outside those cases, reaching for a cluster is paying for elasticity you will not use and a planner you do not need. The tell is simple: look at the actual bytes scanned by your typical query, not the total size of your lake. If the answer is "a few gigabytes," you have been renting a distributed system to scan a working set that fits in a single machine's page cache.

A short close

DuckDB is winning not because it is the most powerful engine — by raw distributed throughput it obviously is not — but because it correctly diagnosed that most workloads never needed distribution in the first place. The big-data era oversized a generation of data stacks against a problem the median company does not have. A single fat box running a vectorized columnar engine, reading Parquet and Iceberg straight off S3, answers the typical analytics query faster and cheaper than the cluster that was sold to replace it, with a fixed bill and no cold start. The contrarian move in 2026 is not buying more compute. It is measuring how little of it you actually use, and then buying one good box.

Read next