← Learn··Updated 18 Jun 2026·3 min read

What is Dataform?

Dataform is a SQL-based transformation framework, now part of Google Cloud, for managing ELT inside BigQuery with SQLX models, ref dependencies, and tests.

Data & lakehouse
#data
#transformation
#sql
#ai-assisted

Dataform is a SQL-based transformation framework — a dbt1-like tool, now part of Google Cloud and integrated with BigQuery2 — for managing ELT3 in the warehouse. You define tables and views as SQLX, link them with ref-based dependencies, attach built-in assertions, and keep the whole project under version control. Google acquired the original Dataform startup in 2020 and folded it into BigQuery, where it now runs as a managed service rather than a standalone product.

🔗 Learn more1 What is dbt?
🔗 Learn more2 What is BigQuery?
🔗 Learn more3 What is ETL (and how is ELT different)?

The framing to hold onto: Dataform owns the transform step. It does not move bytes into the warehouse and it does not query the result. You load raw data into BigQuery first, then Dataform tells BigQuery what SQL to run, in what order, to turn that raw data into clean, documented, tested tables.

SQLX models and the dependency graph

A model in Dataform is a SQLX file — plain SQL extended with a config block. Any valid SQL file is valid SQLX, so the simplest model is just a SELECT; the config block on top adds type (table, view, incremental), documentation, and data-quality rules. The file defines one relation, and Dataform creates it in BigQuery.

The piece that turns a folder of queries into a project is the ref() function. Instead of hardcoding a dataset and table name, one model references another with ref("stg_orders"). That call resolves to the correct, environment-specific table name at compile time, and it simultaneously declares a dependency. Because every model states what it reads from, Dataform assembles the entire project into a dependency graph and derives the correct build order automatically — staging models before the marts that depend on them. You never hand-write the execution order; the references are the order. Dataform also supports JavaScript blocks and includes for generating repetitive SQL, which is its main answer to dbt's Jinja templating.

%% color = green: the published mart downstream tools query
flowchart TD
    RAW["raw.events (loaded into BigQuery)"] --> STG["stg_events (SQLX: clean + cast)"]
    STG --> MART["fct_sessions (SQLX: aggregate)"]
    ASSERT["assertions: unique, non-null"] -.checks.-> MART
    MART --> BI["BI tool / analyst SQL"]

    classDef grey stroke:#7b88a1,stroke-width:2.5px
    classDef green stroke:#a3be8c,stroke-width:2.5px
    class MART green
    class RAW,STG,ASSERT,BI grey

Assertions and scheduling

Assertions are Dataform's data tests. You declare expectations — a column is unique, a column is never null, or an arbitrary condition holds — and Dataform compiles each into a query that returns the rows violating it. If the query returns anything, the assertion fails. You can attach these inline in a model's config or write them as standalone files, and they run as part of the workflow, so a bad load fails loudly instead of quietly corrupting a downstream report.

Because Dataform lives inside Google Cloud, scheduling is handled there too. You define a release and workflow configuration, then trigger runs on a cron4-like schedule, through Cloud Scheduler, or via Cloud Composer (managed Airflow5) when you need orchestration alongside other Google Cloud jobs. Repositories connect to Git — GitHub, GitLab, Azure DevOps, or Bitbucket — so models live in branches with normal code review, and development happens in isolated workspaces with live compilation and a browsable dependency graph.

🔗 Learn more4 What is cron?
🔗 Learn more5 What is Apache Airflow?

How it compares to dbt

Dataform and dbt solve the same problem the same way: SQL models, declared dependencies, tests, version control, lineage. If you already live in BigQuery and Google Cloud, Dataform is a competent, fully managed choice — no separate runner to host, scheduling and Git built in, and no per-seat cost for the core service.

The honest trade-off is reach. Dataform is effectively BigQuery-centric; it is not designed to target Snowflake, Redshift6, or other warehouses. dbt runs across many warehouses and has the far broader ecosystem — adapters, packages, community, and integrations — and SQLMesh7 is a newer cross-engine alternative worth knowing about. So the decision is mostly about lock-in: Dataform if BigQuery is your home and you value the managed, integrated experience; dbt if you want portability across a data warehouse8 landscape or the larger community behind it.

🔗 Learn more6 What is Amazon Redshift?
🔗 Learn more7 What is SQLMesh (and how does it compare to dbt)?
🔗 Learn more8 What is a data warehouse?

The short version: Dataform is BigQuery's native transform layer — SQLX models linked by ref() into a dependency graph, guarded by assertions, scheduled and versioned inside Google Cloud. Strong fit if you live in BigQuery; dbt still owns the cross-warehouse world.

🔗 SourcesDataform overview (Google Cloud docs), Welcoming Dataform to BigQuery (Google Cloud Blog)