What is Apache Pinot?

Apache Pinot is a real-time distributed OLAP¹ datastore, originally built at LinkedIn, designed for ultra-low-latency, high-concurrency analytics — the kind that powers user-facing dashboards seen by millions of people at once. The defining constraint is not "fast for a human running a report." It is fast at thousands of queries per second, where every one of those queries comes from an end user clicking around a product and expecting a result in milliseconds. That workload shapes every design decision in the system.

🔗 Learn more — ¹ OLTP vs OLAP: two opposite jobs

Most analytical databases assume a handful of analysts. Pinot assumes the analyst is your entire user base. Think a "who viewed your profile" panel, a seller-facing sales dashboard, or live in-app metrics: each page load fires an OLAP query, and there are millions of page loads. Pinot exists to keep that p99 latency low while QPS climbs.

Columnar segments and rich indexing

Like other analytics engines, Pinot stores data column-by-column rather than row-by-row, so a query that touches three columns reads only those three. Data is sliced into immutable units called segments — a self-contained chunk of rows with its own metadata and indexes — which are distributed across the cluster and queried in parallel.

What sets Pinot apart is how much indexing it bolts onto each segment. Beyond plain columnar storage you can attach an inverted index (map a value straight to the rows containing it, so filters skip full scans), a sorted index (data physically ordered by a column for cheap range lookups), range and text indexes, and the headline feature: the star-tree index.

Why star-tree indexes make pre-aggregated queries fast

Dashboards ask the same aggregations over and over — sums and counts grouped by a handful of dimensions. A star-tree index pre-computes those aggregations and stores them in a tree structure inside the segment. Instead of scanning every matching row and summing at query time, Pinot walks the tree to a node that already holds the answer.

The clever part is the configurable space-time tradeoff: you choose which dimensions and metrics to materialize, capping how much extra storage the index costs while still serving the common query shapes near-instantly. Queries that fall outside the pre-aggregated paths still run, just by scanning rather than by lookup. That is what lets Pinot hold low latency at high concurrency on aggregation-heavy workloads.

Real-time ingestion and the cluster architecture

Pinot ingests from both worlds at once. It consumes directly from streaming sources — Kafka² and similar — so freshly produced events are queryable within seconds, and it loads historical data in batch. A single table can blend a real-time portion (recent stream data) with an offline portion (older batch-loaded segments), and the query layer stitches them into one view, so users see both live and historical data seamlessly. This is where stream processing³ upstream meets the serving layer.

🔗 Learn more — ² What is Apache Kafka?

🔗 Learn more — ³ Batch vs stream processing

The cluster splits into three roles. Servers host segments and do the actual scanning and indexing work. Brokers receive each query, fan it out to the servers holding the relevant segments, gather the partial results, and merge them into a final answer. Controllers manage cluster state — segment assignment, metadata, and coordination — leaning on Apache Helix and ZooKeeper underneath. Scaling out means adding servers and spreading segments across them.

%% color = green: where the user-facing query enters and exits
flowchart TD
    Q["user query (high QPS)"] --> B["broker: route + scatter/gather"]
    B --> S1["server: segments + indexes"]
    B --> S2["server: segments + indexes"]
    K["event stream"] --> S1
    K --> S2
    C["controller: cluster state"] -.-> S1
    C -.-> S2

    classDef grey stroke:#7b88a1,stroke-width:2.5px
    classDef green stroke:#a3be8c,stroke-width:2.5px
    class Q,B green
    class S1,S2,K,C grey

Where it fits, honestly

Pinot is exceptional at one thing: user-facing real-time analytics at high QPS with tight latency budgets. If that is your workload, few systems match it. But it is purpose-built and operationally involved — a multi-component cluster (servers, brokers, controllers, ZooKeeper) you have to run, tune, and reason about. That is real overhead you take on deliberately.

It also overlaps heavily with Apache Druid⁴, which targets nearly the same real-time OLAP niche, and with ClickHouse⁵, which is simpler to operate and brutally fast for general analytics but less specialized for the millions-of-concurrent-users serving case. None of these is universally "better." Pick by workload, not hype: reach for Pinot when you genuinely need low-latency analytics served to a large concurrent audience, and look elsewhere when you do not.

🔗 Learn more — ⁴ What is Apache Druid?

🔗 Learn more — ⁵ What is ClickHouse?