What is Snowflake?

Snowflake is a fully-managed cloud data warehouse¹ — its marketing now calls it a "data cloud" — whose signature design choice is separating storage from compute. Your data lives once in shared columnar storage on object stores like S3, and you spin up independent, per-query virtual warehouses to read it. Each virtual warehouse is just a pool of compute you size and start on demand; two of them can hit the same tables at the same time without fighting over resources. That decoupling is the whole pitch, and it is genuinely a good one.

🔗 Learn more — ¹ What is a data warehouse?

This is what a data warehouse looks like rebuilt for the cloud era. A classic on-prem warehouse glued storage and compute together on the same boxes, so scaling one meant scaling both and a heavy ETL² job could starve the dashboards. Snowflake breaks that link.

🔗 Learn more — ² What is ETL (and how is ELT different)?

Storage and compute, pulled apart

Storage is a single shared layer: columnar, compressed, sitting in cloud object storage, billed by the terabyte. Compute is separate virtual warehouses you turn on, resize from X-Small to enormous, and turn off. Because they are independent, you can give the finance team a small warehouse, the data scientists a large one, and a nightly load job its own — all reading the same data, none blocking the others. You pay for storage continuously and for each virtual warehouse only while it runs, per second.

Multi-cluster warehouses extend this: under a burst of concurrent queries, Snowflake automatically adds more clusters of the same size and removes them when the spike passes. You set a min and max; it handles the rest. This is the elasticity people actually buy Snowflake for — concurrency that scales itself without anyone provisioning hardware.

%% color = green: the shared storage every warehouse reads from
flowchart TD
    WH1["Virtual warehouse: BI / dashboards"] --> STORE["Shared columnar storage (object store)"]
    WH2["Virtual warehouse: data science"] --> STORE
    WH3["Virtual warehouse: nightly load"] --> STORE

    classDef green stroke:#a3be8c,stroke-width:2.5px
    classDef grey stroke:#7b88a1,stroke-width:2.5px
    class STORE green
    class WH1,WH2,WH3 grey

Near-zero administration

The other half of the appeal is operational. There are no indexes to tune, no vacuum jobs, no partitions to define by hand. Under the hood Snowflake stores data in micro-partitions — small immutable columnar files, each carrying metadata about the range of values it holds. At query time it prunes any micro-partition that cannot contain matching rows, so a filtered query reads only the files it needs. Automatic clustering keeps that data ordered as it changes, again without you managing it.

The result is a warehouse that mostly administers itself. For a small team this is the real value: you write SQL, results come back fast, and nobody is paged about storage layout. That low-friction UX is why Snowflake spread the way it did.

Because storage is a managed shared layer, Snowflake can grant another account read access to your tables without copying anything — data sharing. The consumer queries your live data with their own compute, and you both pay only for what you each use. It turns the warehouse into a distribution mechanism, which is a legitimately novel capability.

Now the skeptical-fair part. Snowflake is a proprietary, closed platform; your data and your SQL live inside it, and leaving means a migration. And the consumption pricing that makes it elastic also makes it easy to overspend: an oversized warehouse left running, a forgotten auto-resume, or a few analysts firing heavy ad-hoc queries can run the monthly bill far past expectations. It is a pay-more-cloud incumbent — excellent, but you are buying convenience at a premium, and you do not own the engine.

The open alternative is a lakehouse³ stack: keep your data in an open table format like Apache Iceberg⁴ on your own object storage, and point any query engine⁵ at it. You give up some of Snowflake's seamless UX and take on more integration work, but you avoid lock-in, you can swap engines, and you keep the data in a format anything can read. BigQuery⁶ is the closest managed peer, and Databricks⁷ pushes hard on the open-lakehouse angle. Snowflake itself now reads Iceberg tables, a tacit admission that open formats are where the gravity is heading.

🔗 Learn more — ³ What is a data lakehouse?

🔗 Learn more — ⁴ How Apache Iceberg actually works

🔗 Learn more — ⁵ What is a query engine (Trino, Presto, and friends)?

🔗 Learn more — ⁶ What is BigQuery?

🔗 Learn more — ⁷ What is Databricks?

The short version: Snowflake is the most polished managed warehouse going — storage and compute split clean, elastic multi-cluster compute, self-managing micro-partitions, and copy-free data sharing. Buy it for the UX and elasticity; watch the consumption bill, and weigh the lock-in against an open lakehouse before you commit.

Storage and compute, pulled apart

Near-zero administration

Data sharing and the bill

Sources