DuckLake: metadata belongs in a database, not a pile of files

A duck, on a lake. Photo: Mihael Grmek, CC BY-SA 3.0.

The lakehouse rebuilt a database in object storage, badly

Here is the joke at the centre of the modern lakehouse¹, and once you see it you cannot unsee it. Apache Iceberg² and Delta Lake³ exist to give object storage the things a database gives you for free — atomic commits, snapshots, schema evolution, time travel. They deliver those by writing a tree of metadata files into S3: a root file, a manifest list, manifest files, all in JSON and Avro⁴. In other words, they reimplemented the transactional bookkeeping of a database engine as a pile of immutable files on a system that has no transactions. Then, because that pile cannot safely coordinate concurrent writers on its own, production deployments put a real database in front of it — a catalog like Polaris or Unity — to hand out the atomic commit. You end up running a database to coordinate the files that exist to avoid running a database.

🔗 Learn more — ¹ What is a data lakehouse?

🔗 Learn more — ² How Apache Iceberg actually works

🔗 Learn more — ³ What is Delta Lake (and how does it compare to Iceberg)?

🔗 Learn more — ⁴ What is Apache Avro (and how is it different from Parquet)?

DuckLake's argument, published as a manifesto by the DuckDB team in May 2025, is that this is exactly backwards. If you already need a SQL database in the loop to make the thing correct, stop fighting it: put all the metadata in the database, keep only the bulk Parquet⁵ data in object storage, and delete the file-based metadata layer entirely. It is the kind of idea that is obvious in retrospect and mildly heretical in the moment, because it tells two enormous open-source ecosystems that the hard part of their design was unnecessary.

🔗 Learn more — ⁵ How Parquet works: columnar storage explained

What is actually wrong with metadata-in-files

The DuckLake authors — Mark Raasveldt and Hannes Mühleisen, the same CWI pair behind DuckDB itself — make a specific technical complaint, not a vibe. In Iceberg, every root metadata file contains all existing snapshots complete with schema information, and for every single change a new file is written that contains the complete history. Manifests get batched into two-layer structures specifically to avoid writing or reading too many small files, "something that would not be efficient on blob stores". And making small changes to data is "a largely unsolved problem that requires complex cleanup procedures that are still not very well understood".

That last point is the one that bites in production. A small update means new data files plus new metadata files plus, eventually, a compaction job to clean up the litter — which is the exact small-files-in-object-storage cost structure I picked apart in The hidden cost of a lakehouse on S3, where reading a million tiny objects costs orders of magnitude more than the bytes are worth. A relational table does not have this problem. An UPDATE is an UPDATE. There is no manifest to rewrite, no snapshot file to append, no orphaned-file sweep to schedule.

The DuckLake design, in one sentence

Put the catalog and the table metadata in a SQL database; put the data in Parquet on object storage; coordinate everything with ordinary SQL transactions. That is the whole thing.

flowchart TB
  subgraph iceberg["Iceberg / Delta"]
    cat1["Catalog DB<br/>(Polaris / Unity)"] --> meta["metadata.json<br/>manifest list<br/>manifest files<br/>(JSON/Avro in S3)"]
    meta --> data1["Parquet data files (S3)"]
  end
  subgraph ducklake["DuckLake"]
    cat2["SQL database<br/>(catalog + ALL metadata)"] --> data2["Parquet data files (S3)"]
  end

The metadata is defined as a set of relational tables and pure-SQL transactions, and the catalog can be any system that speaks SQL and supports primary keys — the reference implementation backs SQLite, PostgreSQL, and DuckDB itself. Because the metadata is just rows in a database, you get multi-table transactions, schema evolution, and time travel from the database's existing machinery instead of reinventing them in a file tree. The Parquet data stays in open format in your bucket, so you have not traded away the one genuinely good property of the lakehouse — cheap, open, vendor-neutral bulk storage.

Why this is elegant, and why it is rude

It is elegant because it collapses two systems into one. Iceberg-in-production is always files plus a catalog database; DuckLake is just the database, with the files demoted to dumb data storage. Every metadata operation that was a sequence of S3 round trips — list the manifests, read the manifest list, resolve the snapshot — becomes a single SQL query against an indexed table. The bring-your-own-compute model Mühleisen describes follows naturally: the database holds "a centralized, unified view of your data and then you scale out compute by basically pushing that all into the clients." The metadata is small and lives in a fast indexed store; the compute is whatever engine you point at it.

It is rude because the whole premise is a quiet indictment of the catalog wars that Snowflake's Polaris and Databricks⁶' Unity Catalog have been fighting. DuckLake's position is essentially: you have all been building elaborate REST catalogs to coordinate a file-based metadata layer that should not have existed, and Jordan Tigani made the point sharply — production lakehouses already require a catalog database like Polaris or Unistore, so Iceberg's constraint-driven file design just adds complexity on top of a database you were going to run anyway. When the people who built the dominant single-node engine tell the lakehouse establishment that the hard-won metadata format was a detour, that lands as a provocation, however politely it is phrased.

🔗 Learn more — ⁶ What is Databricks?

The trade-offs are real, not hidden

I am a practitioner, not a fan, so the honest objections matter:

The database is now a dependency and a scaling axis. Iceberg's file-based metadata is annoying but it inherits object storage's near-infinite durability and scale. A Postgres catalog is one more thing to size, back up, fail over, and keep available. For very high write concurrency or truly enormous metadata, the catalog database becomes the bottleneck the file design was trying to avoid.
Ecosystem maturity. Iceberg has every major engine reading and writing it. DuckLake is younger. The mitigation is that the team added Iceberg compatibility, so you are not fully cut off from the Iceberg world.
Vendor-neutrality optics. A format whose reference implementation is most natural with DuckDB invites the question of whether it is a standard or a product feature. The spec is open and the catalog can be plain Postgres, which is the strongest argument that it is the former.

The catalog-as-metadata insight is older than it looks

Worth saying plainly, because it makes DuckLake feel less like a novelty and more like a return to form: a SQL database storing table metadata is not a new idea, it is the original idea. Every Hive metastore⁷ was a relational database⁸ holding table definitions. Every traditional warehouse keeps its catalog in system tables you can query with SQL. What the lakehouse did was take that metastore, decide it was a scaling bottleneck, and try to push the metadata out into the object store alongside the data — which solved a scale problem most tables never have, and created the manifest-file complexity in exchange. DuckLake's contribution is to notice that the working set of metadata is small, indexable, and transactional by nature, which is precisely what relational databases are best at, and that the scale argument for file-based metadata was solving for a tail case at the expense of the common one. The same instinct runs through the single-node engine argument: most workloads are smaller than the architecture assumes, and the architecture should fit the median, not the outlier.

🔗 Learn more — ⁷ What is the Hive metastore?

🔗 Learn more — ⁸ What is a database?

Adoption as of 2026

This is no longer just a manifesto. DuckLake reached a production-ready v1.0 on 13 April 2026 alongside DuckDB 1.5.2, with guaranteed backward compatibility and a stable spec. The ducklake extension is among DuckDB's top-10 core extensions by download, there are clients for Apache DataFusion, Apache Spark, Trino, and Pandas, and the team says it is "already used in production at dozens of companies". "Dozens" is not "the industry," and it is worth being clear-eyed that Iceberg's installed base is orders of magnitude larger. But the trajectory from May-2025 manifesto to April-2026 production release with a multi-engine client ecosystem is fast for an infrastructure format, and the idea has clearly escaped the lab.

A short close

DuckLake's bet is that the lakehouse took a wrong turn when it decided to encode transactional metadata as files in object storage, and that the fix is to stop pretending object storage is a database and just use a database. The data stays in open Parquet where it belongs; the metadata moves to SQL where it always wanted to be. The trade is a real catalog-database dependency in exchange for deleting the manifest-file machinery and the catalog wars on top of it. Whether it displaces Iceberg or just pressures it into a better design, DuckLake has already done the useful thing a heresy does: it made the incumbents explain why the complicated way was necessary, and the explanations are not as convincing as they used to be.