What is Apache Druid?

Apache Druid is a real-time OLAP¹ datastore for sub-second queries on high-volume event data, combining streaming ingestion with columnar storage and pre-aggregation. It exists to answer a specific question well: given a firehose of events — ad impressions, clicks, network flows, application metrics — how do you serve interactive, filtered, grouped analytics to many concurrent users while the data is still arriving? That is a different job from a data warehouse², and Druid is built end to end around it rather than bolted onto a general engine.

🔗 Learn more — ¹ OLTP vs OLAP: two opposite jobs

🔗 Learn more — ² What is a data warehouse?

Segments, columns, and time

Druid stores data in immutable files called segments. Each segment covers a time range, holds a fixed set of rows, and is organized as columnar storage — values for one column are stored together, so a query that touches three columns reads only those three off disk. Because Druid is built for event data, every dataset is partitioned by time first: the timestamp is not just another column, it is the primary partition key. A query filtered to "the last hour" prunes to a handful of segments before reading anything, which is most of why latency stays low at scale.

On top of the columns, Druid builds bitmap indexes for the string dimensions you filter on. A predicate like country = "EE" AND device = "mobile" becomes a couple of bitmap lookups intersected together, so Druid scans only the matching rows instead of every row in the segment. Compression and dictionary encoding keep those columns small. The combination — time partitioning, columnar layout, bitmap indexes — is what makes the sub-second, high-concurrency story real rather than aspirational.

The multi-process architecture

Druid is not one daemon. It is a cluster of specialized processes, each doing one part of the job:

flowchart TD
    INGEST["Streaming + batch ingestion"] --> MM["Middle Manager: ingest, build segments, serve recent data"]
    MM --> DEEP["Deep storage (S3 / HDFS)"]
    DEEP --> HIST["Historical: load + serve older segments"]
    COORD["Coordinator: assign segments, balance"] --> HIST
    QUERY["Query"] --> BROKER["Broker: route, scatter-gather, merge"]
    BROKER --> HIST
    BROKER --> MM

    classDef write stroke:#ebcb8b,stroke-width:2.5px
    classDef read stroke:#a3be8c,stroke-width:2.5px
    classDef ctrl stroke:#7b88a1,stroke-width:2.5px
    %% color = role: amber = ingest/write path, green = query/read path, grey = control plane
    class INGEST,MM,DEEP write
    class QUERY,BROKER,HIST read
    class COORD ctrl

Middle Manager handles ingestion: it reads from the source, builds new segments, and serves queries against the freshest, not-yet-handed-off data so streaming results are visible immediately.
Historical processes load completed segments from deep storage and answer queries against them. This is where the bulk of your data lives and where most query work happens; you scale read throughput by adding Historicals.
Broker is the query front door. It figures out which segments hold the relevant data, scatters sub-queries to the right Historicals and Middle Managers, and merges the partial results.
Coordinator is the control plane for data placement: it decides which Historical loads which segment, balances them, and applies retention and replication rules.

There is also an Overlord (supervises ingestion tasks) and a Router (optional query gateway), but those four carry the mental model. Deep storage — typically object storage — is the durable source of truth; the cluster is a caching, serving layer in front of it.

Ingestion and roll-up

Druid ingests in two modes. Real-time ingestion pulls directly from a stream like Kafka³, making events queryable within seconds of arrival; batch ingestion loads bounded files for backfills and historical loads. Both produce the same segment format, so old and new data query identically.

🔗 Learn more — ³ What is Apache Kafka?

The feature that defines Druid's economics is roll-up: optional pre-aggregation at ingestion time. You declare a granularity (say, one minute) and a set of metrics, and Druid collapses all events sharing the same dimensions and time bucket into a single pre-aggregated row as it ingests. If your queries never need individual events — only counts, sums, and approximate distinct-counts per minute — roll-up can shrink storage and accelerate queries by an order of magnitude. The trade is real: rolled-up data has lost the raw rows, so you cannot drill below the chosen granularity later.

Where it fits, and where it does not

Druid is excellent for user-facing analytics: dashboards and embedded charts hit by many concurrent users, over time-series event data, demanding consistent sub-second latency. That is its sweet spot and few systems match it there.

Be honest about the cost. The multi-process cluster is operationally complex — several process types, a metadata store, deep storage, and ZooKeeper-style coordination all to run well together. It is purpose-built, not a general warehouse: arbitrary joins, frequent updates, and ad-hoc one-off SQL are not what it is for. If you do not have the scale or concurrency to justify the operations, the obvious alternatives in this same niche are ClickHouse⁴, which delivers similar columnar OLAP speed with a far simpler single-binary footprint, and Apache Pinot⁵, Druid's closest architectural cousin. Reach for Druid when high-concurrency, real-time, time-series analytics is the actual workload — and skip it when it is not.

🔗 Learn more — ⁴ What is ClickHouse?

🔗 Learn more — ⁵ What is Apache Pinot?

Segments, columns, and time

The multi-process architecture

Ingestion and roll-up

Where it fits, and where it does not

Sources