← Learn··Updated 18 Jun 2026·3 min read

What is Delta Lake (and how does it compare to Iceberg)?

Delta Lake is an open table format that adds ACID, time travel, and schema enforcement to a data lake. How it compares to Apache Iceberg.

#data
#lakehouse
#table-formats
#ai-assisted

Delta Lake is an open table format — a transaction log of JSON entries plus Parquet1 data files — that adds ACID2 transactions, time travel, and schema enforcement to a plain data lake3. It was created by Databricks4. The idea behind every table format is the same: a folder of Parquet files in object storage has no notion of a "table," no transactions, and no safe way for two writers to touch it at once. Delta Lake fixes that by writing an authoritative metadata layer alongside the data, turning a directory of files into something a query engine5 can treat as a single, consistent table — the core building block of a data lakehouse6.

🔗 Learn more1 How Parquet works: columnar storage explained
🔗 Learn more2 What is ACID (database transactions)?
🔗 Learn more3 What is a data lake?
🔗 Learn more4 What is Databricks?
🔗 Learn more5 What is a query engine (Trino, Presto, and friends)?
🔗 Learn more6 What is a data lakehouse?

How the delta log works

The mechanism is a single ordered log called the delta log, living in a _delta_log/ subdirectory next to the Parquet files. Every change to the table — adding files, removing files, changing the schema — is written as a numbered JSON commit: 000000.json, 000001.json, and so on. To read the table's current state you replay those commits in order; the result tells you exactly which Parquet files belong to the table right now. Because each commit is an atomic create of the next-numbered file, two concurrent writers cannot both claim the same number, which is how Delta gets ACID isolation on object storage.

Replaying thousands of tiny JSON files would be slow, so Delta periodically writes a checkpoint: a Parquet summary of the log state up to that point. A reader loads the latest checkpoint and then only the JSON commits after it.

This is the key contrast with Apache Iceberg7. Iceberg does not keep one linear log; it maintains a metadata tree — a metadata file pointing at the current snapshot, which points at manifest lists, which point at manifests, which list the data files. Both designs give the same guarantees, but the shapes differ: Delta's flat ordered log is simple to reason about, while Iceberg's tree was built for very large tables and engine-neutral planning from the start. Apache Hudi8 is a third format in the same space, with its own log-and-timeline approach.

🔗 Learn more7 How Apache Iceberg actually works
🔗 Learn more8 What is Apache Hudi?

What you get from it

The features follow from having that metadata layer:

  • ACID transactions — writes either fully commit or do not appear, even with concurrent writers.
  • Time travel — because old commits and the files they referenced are retained, you can query the table as of a past version or timestamp.
  • Schema evolution — add or rename columns over time, with schema enforcement rejecting writes that do not match.
  • MERGE — upserts and deletes against a lake table, the operation that makes change data capture9 and GDPR-style row deletion practical without rewriting everything by hand.
🔗 Learn more9 What is Change Data Capture (CDC)?
%% color = green: live table data, amber: history kept for time travel, grey: engine
flowchart TD
    ENG["Query engine"] --> LOG["_delta_log: read commits + latest checkpoint"]
    LOG --> CUR["Resolve current set of Parquet files"]
    LOG --> OLD["Older commits retained"]
    CUR --> DATA["Live Parquet data files"]
    OLD --> DATA

    classDef grey stroke:#7b88a1,stroke-width:2.5px
    classDef green stroke:#a3be8c,stroke-width:2.5px
    classDef amber stroke:#ebcb8b,stroke-width:2.5px
    class ENG,LOG grey
    class CUR,DATA green
    class OLD amber

The honest comparison

Delta Lake is genuinely capable, and on Databricks it is excellent. But for years it was effectively Databricks-steered: the open-source version trailed the proprietary one, and the most useful features landed inside the platform first. Apache Iceberg, by contrast, was developed at Netflix and donated to the Apache Software Foundation early, and it won the table-format war largely on neutral, vendor-independent governance — Snowflake, AWS, Google, and others standardized on it without one company holding the keys. If your worry is lock-in, that governance difference is the real distinction, not any feature checklist.

The two sides are converging. Delta now ships UniForm, which writes Iceberg-readable metadata alongside the delta log so a single set of Parquet files can be read as either format. And Delta Lake itself moved to the Linux Foundation, loosening the single-vendor grip. The practical reading: pick the format your primary engine supports best, but lean toward whichever keeps you portable — and treat UniForm and Iceberg's broad adoption as signs the lock-in stakes are falling either way.

The short version: Delta Lake turns a Parquet data lake into an ACID table using an ordered JSON-plus-checkpoint log; Iceberg does the same with a metadata tree and won on neutral governance; UniForm and the Linux Foundation move mean the choice matters less than it used to.