The cold-start tax: serverless warehouses vs an always-on box

Jack Nicholson frozen in the snow at the end of The Shining

Waiting for a serverless warehouse to cold-start. Still from The Shining (1980), Warner Bros.

The previous post in this series, The hidden cost of a lakehouse on S3, worked out what a single SELECT * costs on the storage side: roughly 43 cents per scan on a badly-laid-out 5 GB table, almost all of it in S3 GET requests. That number is half the bill. The other half is what the compute side charges to exist long enough to answer the query. On a serverless warehouse the compute side is usually larger than the storage side, and most of it pays for time the query was not running.

What "serverless" actually means on a warehouse

Serverless does not mean instant. It means the provider manages capacity behind a queue. The cluster underneath still exists, the user just doesn't run it. On Databricks¹ the same SQL surface comes in three modes with three very different cold-start profiles:

🔗 Learn more — ¹ What is Databricks?

Serverless SQL Warehouse. Warm pool maintained by Databricks. Cold start 2–6 seconds, per Databricks' own documentation. Highest DBU rate per hour.
Pro / Classic SQL Warehouse. Cluster spins up on the customer's cloud account. Cold start around 4 minutes. Lower DBU rate.
Job cluster / on-demand notebook compute. Spun up per job. Cold start 5–10 minutes depending on instance-pool config and image cache. Lowest DBU rate, slowest to provision.

The vendor framing is scales to zero. The customer framing is scales from zero, slowly, on every cold query. Whether the cold start is six seconds or six minutes, the DBU clock starts at provision, not at the first row of output.

The DBU clock starts at provision, not at first row

Take a typical ad-hoc shape: a three-node Pro SQL Warehouse with a four-minute cold start, a query that runs for twelve seconds once the cluster is up, and a ten-minute idle auto-stop timeout. The customer's wall clock looks like:

0:00  → query submitted, cluster provisioning starts
4:00  → cluster ready, query begins
4:12  → result returned
14:12 → idle timeout fires, cluster stops

The DBU bill covers 14 minutes 12 seconds, of which roughly twelve seconds were spent doing useful work. At a representative serverless rate of $0.70/DBU-hour and ~3 DBUs per node-hour across three nodes, the bill for that one query is on the order of $2.50. A query that produced one screenful of rows cost two and a half dollars, of which roughly $0.04 was the query and $2.46 was waiting and idle timeout.

Stack the S3 bill on top: post 8 put the per-scan S3 charge at $0.43 in same-region cost. A full ad-hoc SELECT * from cold on a badly-shaped table is therefore roughly $3 per query — $0.43 to read the data, $2.50 to be a warehouse for fourteen minutes. The storage side is the smaller half.

The hidden 100×/day pattern

One ad-hoc query a day per analyst with that shape is about $75 a month per analyst in pure cold-start tax. Ten analysts firing two ad-hoc queries a day on a Monday-to-Friday cadence works out to ~$1,500/month in DBU charges for waiting, before either the query work itself or the S3 request bill, before any production dashboard refreshes.

The bill scales linearly with the number of cold queries and not at all with the size of the data. A SELECT 1 against a cold warehouse costs the same as a SELECT * FROM huge_table against a cold warehouse, because the cost is the warehouse, not the query.

The self-hosted alternative

An m5.4xlarge (16 vCPU, 64 GiB) in us-east-1 on-demand is about $0.768/hr, or roughly $560/month always-on. A three-node ClickHouse² cluster on that shape is around $1,680/month on-demand, before EBS gp3 storage at $0.08/GB-month and a small ClickHouse Keeper quorum. Reserved instances cut 40–60 % off the EC2 line; a three-year RI brings the same cluster down to roughly $1,000/month, all-in.

🔗 Learn more — ² What is ClickHouse?

The license cost is zero. ClickHouse open source is Apache 2.0. The customer pays for EC2, EBS, network, and the engineer-time to run it. The honest hidden cost is that engineer time: a stable mid-size cluster eats roughly 10–20 % of one engineer for sharding, the Keeper quorum, monitoring, backups, and upgrades. At a loaded €80k engineer that is €800–€1,600/month, comparable to the EC2 line itself, and falls toward zero per query as the cluster grows.

The break-even arithmetic is clean. The ten-analyst, two-cold-queries-a-day shape above is paying ~$1,500/month for the privilege of waiting on cold Databricks clusters, which is roughly the entire EC2 bill for a three-node always-on ClickHouse that would run those same queries in well under a second.

The latency story

ClickHouse on the same shape returns sub-second on aggregations over billions of rows when the schema is selective and the query is columnar. There is no cold start because there is no scale-to-zero. The cluster is on. The query runs. The result is rendered before the analyst has time to switch tabs. ClickHouse's own benchmarks claim up to 6.6× faster than Databricks and Snowflake on join-heavy workloads — vendor benchmarks, so take the number with the usual pinch of salt, but the architectural reasons are real: a single-process columnar engine on local disk does not pay for any of the coordination overhead a distributed Spark³ planner needs.

🔗 Learn more — ³ What is Apache Spark?

The user-experience difference is the part that does not show up on any pricing page. A Databricks SELECT * LIMIT 100 from a cold Pro warehouse is open Slack while it spins up. The same query on a warm ClickHouse is result rendered before the eyes refocus. Both query patterns produce identical data; one of them is fun to use and one of them is not.

When serverless is the right answer

The shape of workload that flips which is cheaper:

Serverless wins on bursty, low-frequency, asymmetric loads. A daily ETL⁴ at 03:00 followed by twenty-three hours of nothing is the case the autoscale story was designed for. The cold-start tax amortises into a big job and the scale-to-zero hours are real savings. Same for one-off ad-hoc analysis a few times a month, or pre-production experimentation where the cluster runs maybe an hour a week.
Always-on wins the moment the workload is even mildly steady. Interactive dashboards, analyst self-service, hourly aggregates, anything that fires queries on a regular cadence. Fixed monthly compute cost dominates variable per-query cost the moment the query rate goes above a few hundred a day, and the always-on cluster is also faster per query because there is no cold start to amortise.

🔗 Learn more — ⁴ What is ETL (and how is ELT different)?

The rough break-even is how many minutes a day is the team actively running queries. Below ~30 minutes a day in aggregate — serverless. Above — always-on.

The accidental architecture

"Serverless is cheaper" is true under one assumption: the warehouse is idle most of the time. Most BI teams are not idle most of the time.

The common accident is the team that picks serverless for the autoscale narrative, then deploys fifty dashboards that auto-refresh every five minutes, then watches the DBU bill triple month-over-month and cannot work out why. Each one of those dashboard refreshes is a small cluster spin-up; the autoscaler is doing exactly what it was sold to do; the bill is exactly what the pricing page predicts under sustained query load. The architecture made sense in a brochure, and made no sense in production.

Self-hosted ClickHouse is the boring counterexample. It does not autoscale. It does not have a cold start. The bill arrives once a month, the same shape, regardless of whether anyone looked at a dashboard that day. That predictability is the entire feature. A finance team can budget it; an engineering team does not need to defend a query rate; an analyst does not need to wait for a cluster.

A short close

Serverless warehouses are priced for the idle-most-of-the-time case. Anyone running a query pattern that is even mildly steady ends up paying the cold-start tax over and over, with most of the bill arriving as time spent waiting and time spent idle-timing-out. An always-on box on EC2 with a free columnar engine trades the autoscale story for a fixed bill and sub-second latency. For interactive analytics that trade is almost always favourable, and the only reason it isn't the default architecture is that fixed monthly cost sells less well in a product demo than scales to zero.