What is ClickHouse?

ClickHouse is an open-source columnar OLAP¹ database, written in C++, built for very fast analytical queries over large tables. It pairs vectorized execution with the MergeTree storage engine to answer aggregations and filters across billions of rows in a fraction of a second. It is one of my favorites: it does the analytical job at the top of the performance envelope, with no JVM, no cluster of services to babysit, and a SQL dialect close enough to standard that you are productive on day one.

🔗 Learn more — ¹ OLTP vs OLAP: two opposite jobs

Columnar, vectorized, compressed

ClickHouse stores data by column, not by row. An analytical query usually touches a handful of columns out of dozens, so reading only those columns — instead of dragging entire rows off disk — slashes I/O. This is the core idea behind columnar storage and OLAP engines generally.

Because each column holds values of a single type, compression is dramatic: similar values sit next to each other, so codecs like LZ4 and ZSTD shrink the data hard, which means even less to read. On top of that, ClickHouse uses vectorized execution: instead of processing one row at a time, it runs operations over batches of column values, keeping the CPU's caches and SIMD instructions busy. Columnar layout plus compression plus vectorization is why a GROUP BY over a huge table finishes in the time a row store spends warming up. DuckDB applies the same playbook for in-process analytics; ClickHouse scales it to a clustered server.

MergeTree: the engine that makes it fast

The default table engine, MergeTree, is what makes ingestion and reads both fast. When you insert data, ClickHouse writes it as an immutable sorted part — a self-contained chunk of columns ordered by the table's sorting key. Inserts are therefore cheap appends, not in-place edits. In the background, ClickHouse continuously merges these parts into larger sorted parts, which is where the "merge" in the name comes from.

%% color = green: the merged part that queries read fastest
flowchart TD
    I1["INSERT batch 1 → sorted part"] --> M["background merge"]
    I2["INSERT batch 2 → sorted part"] --> M
    I3["INSERT batch 3 → sorted part"] --> M
    M --> BIG["larger sorted part + sparse primary index"]

    classDef grey stroke:#7b88a1,stroke-width:2.5px
    classDef green stroke:#a3be8c,stroke-width:2.5px
    class BIG green
    class I1,I2,I3,M grey

Rather than indexing every row, MergeTree keeps a sparse primary index: one entry per block of rows (a granule). When a query filters on the sorting key, ClickHouse skips straight to the relevant granules and reads only those, instead of scanning the whole column. Small index, big skips — that is the trade that lets filtering and range scans stay fast as tables grow.

Real-time ingestion and materialized views

ClickHouse shines when data is arriving constantly. It ingests high-throughput streams happily because each insert is just a new sorted part, and it can serve sub-second analytical queries straight from that same source — no separate serving layer needed. A materialized view² in ClickHouse is not a cached snapshot; it is an insert trigger. As rows land in a source table, the view's query runs on the new block and writes aggregated results into a destination table. You pre-aggregate at ingest time, so dashboards read tiny rollups instead of recomputing over raw events. This real-time-serving angle is the same territory Apache Druid³ targets, and ClickHouse covers it with one binary.

🔗 Learn more — ² What is a materialized view?

🔗 Learn more — ³ What is Apache Druid?

The honest tradeoffs

ClickHouse is an OLAP engine, and it is not pretending otherwise. Joins are weaker than in a mature row-store query planner; large many-to-many joins can blow up memory, and you often denormalize or use dictionary lookups instead. Updates and deletes exist but are mutations — asynchronous, heavy operations that rewrite parts, not the cheap single-row edits an OLTP database gives you. It is not a transactional system: no multi-statement transactions, and behavior is eventually consistent in places. Because merges happen in the background, freshly inserted data can briefly appear before duplicates are collapsed, which surprises people who expect strict consistency.

None of that is a flaw; it is the cost of the design. If you need transactional integrity and frequent row-level updates, reach for an OLTP database. If you need to filter, aggregate, and serve analytics over enormous tables — and you want it fast without a heavyweight cluster — ClickHouse is hard to beat.