What is ACID (database transactions)?

ACID is the set of four guarantees a database makes about a transaction — a group of reads and writes treated as a single unit. The letters stand for Atomicity, Consistency, Isolation, and Durability. Together they are the promise that lets you trust a database with money, inventory, and anything else where a half-finished change is worse than no change at all. The term was coined to name properties that reliable systems already aimed for; the canonical treatment is Härder and Reuter's 1983 paper on transaction-oriented recovery.¹

The four guarantees

Atomicity means a transaction is all-or-nothing. The classic example is a bank transfer: debit account A, credit account B. If the system crashes between the two writes, you must not end up having taken money from A without giving it to B. Atomicity guarantees that either both writes land or neither does — the transaction commits as a whole or rolls back as if it never happened.

Consistency means a transaction moves the database from one valid state to another, never leaving it violating the rules you declared — foreign keys, uniqueness, check constraints. If a transaction would break an invariant, it is rejected rather than committed. This is the one letter that is partly your responsibility: the database enforces the constraints you define, but it cannot know a rule you never told it about.

Isolation means concurrent transactions do not step on each other. Run two transactions at the same time and the result should be as if they ran one after the other. Without isolation you get anomalies: a dirty read sees another transaction's uncommitted change that may later roll back; a non-repeatable read gets two different answers for the same row because another transaction committed an update in between; a phantom read finds new rows appearing in a range you already queried. Databases offer isolation levels — Read Committed, Repeatable Read, Serializable — that trade strictness for concurrency, each forbidding more of those anomalies than the last.²

Durability means once a transaction commits, it survives — power loss, crash, restart. The database does not report success until the change is on persistent storage (typically by flushing a write-ahead log), so a committed transfer is still there after the machine comes back up.

Where it gets hard

On a single node, ACID is well understood and your database does the heavy lifting. The difficulty starts when data spans multiple machines. A transaction that has to commit atomically across several nodes needs coordination — two-phase commit and friends — which is slow and blocks when a participant goes silent.

This is the practical face of the CAP theorem: when the network partitions, a distributed system has to choose between staying consistent (refusing writes it cannot safely coordinate) and staying available (accepting writes that may conflict later). You cannot have both during a partition. Strong, ACID-style guarantees pull toward the consistency side, which is why many large-scale systems relaxed them in favor of availability and "eventual" consistency — and why the pendulum has since swung back toward distributed databases that work hard to keep real transactions.

flowchart TD
    TX["Transaction — group of reads and writes"] --> A["Atomicity — all-or-nothing commit"]
    TX --> C["Consistency — invariants preserved"]
    TX --> I["Isolation — concurrent runs don't collide"]
    TX --> D["Durability — survives a crash"]
    I --> ANOM["Isolation levels trade strictness for concurrency"]

    %% color = the four ACID guarantees (green) vs derived detail (plain)
    classDef acid stroke:#a3be8c,stroke-width:2.5px
    classDef plain stroke:#7b88a1,stroke-width:2.5px
    class A,C,I,D acid
    class TX,ANOM plain

ACID on a data lake

In data engineering the phrase shows up in an unexpected place: "ACID on a data lake¹." A data lake is just files in object storage — a folder of Parquet² files has no notion of a transaction. Two jobs writing at once, or a reader catching a writer mid-update, can see a half-written table with no atomicity, no isolation, and no clean rollback.

🔗 Learn more — ¹ What is a data lake?

🔗 Learn more — ² How Parquet works: columnar storage explained

Table formats like Apache Iceberg³ add those guarantees back without a database engine sitting in front of the files. The trick is metadata plus an atomic pointer swap: a write produces new data files and a new metadata snapshot, then commits by atomically swapping a single pointer to that snapshot. Readers always see one complete snapshot, never a partial write — that is the atomicity and the isolation. The previous snapshots stick around, which is where time travel comes from.

🔗 Learn more — ³ How Apache Iceberg actually works

The difference from a transactional database is the scope. A traditional OLTP⁴ database gives you full multi-statement transactions over many tiny row-level writes per second, with rich isolation between concurrent users — built for an application's hot path. A table format on a lakehouse⁵ gives you atomic, isolated table-level commits over large batches, tuned for analytics where writes are bulk operations and not thousands of concurrent point updates. Both earn the word ACID; they are solving it at opposite ends of the size-and-frequency spectrum, which is exactly the OLTP-versus-OLAP split showing up again at the storage layer.

🔗 Learn more — ⁴ OLTP vs OLAP: two opposite jobs

🔗 Learn more — ⁵ What is a data lakehouse?

The short version: ACID is four promises — atomic, consistent, isolated, durable — that turn a pile of reads and writes into something you can reason about. A database gives them to you per row at high frequency; a table format gives them to you per snapshot over a lake. Knowing which kind of ACID you actually need keeps you from reaching for the wrong tool.

🔗 Sources — ¹ ACID, which traces the term to Härder & Reuter's 1983 paper · ² Isolation (database systems) — Wikipedia