What is Delta Lake (and how does it compare to Iceberg)?

Delta Lake is an open table format — a transaction log of JSON entries plus Parquet¹ data files — that adds ACID² transactions, time travel, and schema enforcement to a plain data lake³. It was created by Databricks⁴. The idea behind every table format is the same: a folder of Parquet files in object storage has no notion of a "table," no transactions, and no safe way for two writers to touch it at once. Delta Lake fixes that by writing an authoritative metadata layer alongside the data, turning a directory of files into something a query engine⁵ can treat as a single, consistent table — the core building block of a data lakehouse⁶.

🔗 Learn more — ¹ How Parquet works: columnar storage explained

🔗 Learn more — ² What is ACID (database transactions)?

🔗 Learn more — ³ What is a data lake?

🔗 Learn more — ⁴ What is Databricks?

🔗 Learn more — ⁵ What is a query engine (Trino, Presto, and friends)?

🔗 Learn more — ⁶ What is a data lakehouse?

How the delta log works

The mechanism is a single ordered log called the delta log, living in a _delta_log/ subdirectory next to the Parquet files. Every change to the table — adding files, removing files, changing the schema — is written as a numbered JSON commit: 000000.json, 000001.json, and so on. To read the table's current state you replay those commits in order; the result tells you exactly which Parquet files belong to the table right now. Because each commit is an atomic create of the next-numbered file, two concurrent writers cannot both claim the same number, which is how Delta gets ACID isolation on object storage.

Replaying thousands of tiny JSON files would be slow, so Delta periodically writes a checkpoint: a Parquet summary of the log state up to that point. A reader loads the latest checkpoint and then only the JSON commits after it.

This is the key contrast with Apache Iceberg⁷. Iceberg does not keep one linear log; it maintains a metadata tree — a metadata file pointing at the current snapshot, which points at manifest lists, which point at manifests, which list the data files. Both designs give the same guarantees, but the shapes differ: Delta's flat ordered log is simple to reason about, while Iceberg's tree was built for very large tables and engine-neutral planning from the start. Apache Hudi⁸ is a third format in the same space, with its own log-and-timeline approach.

🔗 Learn more — ⁷ How Apache Iceberg actually works

🔗 Learn more — ⁸ What is Apache Hudi?

What you get from it

The features follow from having that metadata layer:

ACID transactions — writes either fully commit or do not appear, even with concurrent writers.
Time travel — because old commits and the files they referenced are retained, you can query the table as of a past version or timestamp.
Schema evolution — add or rename columns over time, with schema enforcement rejecting writes that do not match.
MERGE — upserts and deletes against a lake table, the operation that makes change data capture⁹ and GDPR-style row deletion practical without rewriting everything by hand.

🔗 Learn more — ⁹ What is Change Data Capture (CDC)?

%% color = green: live table data, amber: history kept for time travel, grey: engine
flowchart TD
    ENG["Query engine"] --> LOG["_delta_log: read commits + latest checkpoint"]
    LOG --> CUR["Resolve current set of Parquet files"]
    LOG --> OLD["Older commits retained"]
    CUR --> DATA["Live Parquet data files"]
    OLD --> DATA

    classDef grey stroke:#7b88a1,stroke-width:2.5px
    classDef green stroke:#a3be8c,stroke-width:2.5px
    classDef amber stroke:#ebcb8b,stroke-width:2.5px
    class ENG,LOG grey
    class CUR,DATA green
    class OLD amber

The honest comparison

Delta Lake is genuinely capable, and on Databricks it is excellent. But for years it was effectively Databricks-steered: the open-source version trailed the proprietary one, and the most useful features landed inside the platform first. Apache Iceberg, by contrast, was developed at Netflix and donated to the Apache Software Foundation early, and it won the table-format war largely on neutral, vendor-independent governance — Snowflake, AWS, Google, and others standardized on it without one company holding the keys. If your worry is lock-in, that governance difference is the real distinction, not any feature checklist.

The two sides are converging. Delta now ships UniForm, which writes Iceberg-readable metadata alongside the delta log so a single set of Parquet files can be read as either format. And Delta Lake itself moved to the Linux Foundation, loosening the single-vendor grip. The practical reading: pick the format your primary engine supports best, but lean toward whichever keeps you portable — and treat UniForm and Iceberg's broad adoption as signs the lock-in stakes are falling either way.

The short version: Delta Lake turns a Parquet data lake into an ACID table using an ordered JSON-plus-checkpoint log; Iceberg does the same with a metadata tree and won on neutral governance; UniForm and the Linux Foundation move mean the choice matters less than it used to.