Zero-ETL: querying data without moving it

Kellar, Levitation — magician poster, ca. 1894. The work did not disappear; it moved where you cannot see it. Public domain.

The "T" did not go anywhere

"Zero-ETL¹" is the cleanest piece of category marketing the data industry has produced in years, and like all good marketing it is technically a lie of omission. The promise is that you stop building pipelines that extract, transform, and load data between systems, and instead query it where it already lives. The reality is that of the three letters, only the E and L are actually eliminated. The T — the joins, the deduplication, the type coercion, the business logic that turns raw rows into something a dashboard can trust — does not disappear. It moves. Either to query time, where every reader re-runs it, or up into the catalog layer, where it becomes someone else's standing problem. The honest name, as more than one practitioner has pointed out, would be "zero-EL." That is a worse name, which is why nobody uses it.

🔗 Learn more — ¹ What is ETL (and how is ELT different)?

None of which means zero-ETL is worthless. The data-duplication reduction underneath the slogan is real and valuable. The job of this post is to separate the part that is genuinely new from the part that is federation wearing a 2024 haircut.

Two different things wear the same label

The first source of confusion is that "zero-ETL" covers two architecturally distinct patterns that happen to share a marketing budget. They behave very differently and fail very differently.

	Managed replication	Federation (query in place)
What it does	Streams changes from a source DB into a warehouse continuously	Queries the source/lake directly, no copy
Example	Aurora → Redshift zero-ETL	Snowflake/Databricks² querying shared Iceberg³
Is data copied?	Yes — into Redshift⁴, just without your pipeline	No — read where it sits
Where the "T" lands	Query time in the warehouse	Query time, every time, in the reader
Eliminates duplication?	No — it automates the duplication	Yes

🔗 Learn more — ² What is Databricks?

🔗 Learn more — ³ How Apache Iceberg actually works

🔗 Learn more — ⁴ What is Amazon Redshift?

AWS's Aurora MySQL → Redshift zero-ETL integration went GA in November 2023, with Aurora PostgreSQL and DynamoDB following to GA in 2024. It is genuinely impressive engineering — AWS quotes over 1 million transactions per minute landing in Redshift in under 15 seconds at p50. But notice what it is: it is still copying every row from Aurora into Redshift. The data is duplicated; you just did not write the pipeline. The "zero" refers to the pipeline code you did not maintain, not to the copy that still exists. That is real value — CDC⁵ pipelines are miserable to operate — but it is automation of ETL, not the absence of it.

🔗 Learn more — ⁵ What is Change Data Capture (CDC)?

The genuinely new part: a shared substrate

The federation half is where something architecturally new is happening, and the enabler is the thing I wrote about in How Iceberg won the table-format war. Once every engine reads the same open table format, "query in place" stops being a slow JDBC bridge and becomes "two engines reading the same Parquet⁶ files through the same catalog." That is qualitatively different from the old federation, which translated your query into the foreign system's dialect and shipped it over a connector that was always the bottleneck.

🔗 Learn more — ⁶ How Parquet works: columnar storage explained

Salesforce's Zero Copy Partner Network, launched April 2024, is the clearest example of the new model in production. Its file federation uses Apache Iceberg, the Iceberg REST catalog, and Parquet to read data files directly from Snowflake, Databricks, and generic Iceberg catalogs on S3 or Azure — no copy into Salesforce. The Snowflake side of the same integration is built on Snowflake's Iceberg Tables and Secure Data Sharing. The shared substrate is the new thing. When two platforms point at the same Iceberg table through a REST catalog like Polaris or Unity, neither of them holds a copy, and that is a real reduction in duplicated storage, drift, and reconciliation work that older federation never delivered.

What is rebranded federation, and what is not

Query federation is decades old — Presto⁷ and Trino were doing cross-source queries long before "zero-ETL" had a logo, and the allyticstechperspectives breakdown of federation versus replication walks through why the distinction matters operationally. So what is actually new versus relabelled?

🔗 Learn more — ⁷ What is a query engine (Trino, Presto, and friends)?

Rebranded: the basic act of querying a remote source without copying it. We have always been able to do this. JDBC federation, external tables, Presto connectors — old hat.

Genuinely new:

Open-format-native federation. The reader engine works on the same physical files through the same catalog, not through a query-translation bridge. The performance ceiling is the file scan, not the connector.
Catalog-mediated sharing. Iceberg REST catalogs let one team grant another read access to a table by reference, with governance, instead of exporting a copy. Salesforce reports zero-copy ingestion grew 341% year-over-year to 15 trillion records — the volume is real even if the framing is generous.
Schema-on-read⁸ as the default. Data sits raw and the schema is applied at query time, which is exactly where the relocated "T" lives.

🔗 Learn more — ⁸ Schema-on-read vs schema-on-write

The warehouse vendors are federating with each other now

The clearest sign that the shared-substrate model is more than a slide is that the two companies who spent a decade trying to lock each other's data in are now reading each other's tables. Snowflake open-sourced Polaris and committed hard to Iceberg so that external engines could read Snowflake-managed Iceberg tables without an export. Databricks answered by open-sourcing Unity Catalog, which speaks the same Iceberg REST catalog API and is explicitly designed to let non-Databricks engines query Databricks-governed data. Both moves are zero-ETL plays dressed as governance announcements: the point is that another vendor's compute can read your tables in place, under your access controls, without a copy crossing the boundary. When the incumbents start federating with their direct competitors, the duplication-reduction value has stopped being theoretical. The catch — and there is always a catch — is that interoperability between two open catalogs implementing the same spec is still patchier in practice than the press releases suggest, which is why the catalog wars are the part of this story still actively being fought.

Where the work actually went

This is the part the marketing skips, and it is the part you have to plan for. Deferring transformation does not make it free; it changes who pays and when:

Query-time cost and latency. If you do not pre-transform, every query re-derives the joins and cleaning. On a warehouse that bills per scan or per credit, you are paying the transformation cost on every read instead of once on write — and re-running it under cold-start tax if your warehouse scales to zero between queries. Federation can quietly invert the cost curve for read-heavy workloads.
Governance moves to the catalog. Access control, lineage, and data contracts now live at the catalog layer because that is the only place that sees all the readers. The catalog becomes the load-bearing component, which is one more reason the catalog wars matter.
Correctness debt. With no transform step, every reader is responsible for applying the same business logic the same way. Skip the shared semantic layer and ten teams will compute "active customer" ten subtly different ways. The pipeline you deleted was, among other things, the place that definition used to live.

The skeptical practitioner's summary: zero-ETL does not remove transformation work, it removes the staging copy and reassigns the transformation to query time and the catalog. Whether that is a win depends entirely on your read pattern. Read-light, freshness-critical, duplication-sensitive workloads love it. Read-heavy workloads with expensive transforms can end up paying more, more often, for the privilege of not having a pipeline.

A short close

"Zero-ETL" is two patterns under one slogan: managed replication that automates the copy, and open-format federation that genuinely eliminates it. The transformation never left — it relocated to query time and to the catalog, and pretending otherwise is how teams get surprised by their warehouse bill. What is real, and worth the attention, is that an Iceberg-shaped shared substrate finally makes "query it where it lives" fast enough to be a default instead of a fallback. Strip the marketing and the durable insight is narrow but correct: in 2026, copying data is increasingly the thing you do on purpose, not the thing you do because the tools left you no choice.