← Learn··Updated 18 Jun 2026·3 min read

What is dbt?

dbt runs the T in ELT inside the warehouse: you write SELECT statements as models, and dbt orders dependencies, materializes tables, tests, and docs.

Data & lakehouse
#data
#transformation
#sql
#ai-assisted

dbt (data build tool) is a transformation framework that runs the "T" in ELT1 directly inside your warehouse. You write transformations as plain SELECT statements — dbt calls each one a model — and dbt works out the dependency order, compiles everything to SQL, runs it against the warehouse, and materializes the results as tables or views. On top of that it layers tests, documentation, and lineage, so a pile of SQL becomes a maintainable, version-controlled project.

🔗 Learn more1 What is ETL (and how is ELT different)?

The key reframing: in classic ETL you transformed data on the way in, before loading. In ELT you load raw data into a data warehouse2 first, then transform it where it sits — using the warehouse's own compute. dbt owns that last step and nothing else. It does not move bytes; it tells the warehouse what SQL to run, in what order.

🔗 Learn more2 What is a data warehouse?

Models and ref() build a DAG

A model is a .sql file containing one SELECT. The file name becomes the relation dbt creates. The trick that makes dbt more than a folder of queries is the ref() function: instead of hardcoding a table name, one model selects from another with {{ ref('stg_orders') }}. That reference is both the real, environment-correct table name at compile time and a declared dependency.

Because every model declares what it reads from, dbt assembles the whole project into a DAG — a directed acyclic graph3 of models. From that graph it knows the correct build order: staging models first, then the marts that depend on them, never the other way around. You never write the schedule by hand; the dependencies are the schedule.

🔗 Learn more3 What is a DAG (and why orchestrators use them)?
%% color = green: the mart that downstream consumers actually query
flowchart TD
    SRC["source: raw.orders (loaded by EL tool)"] --> STG["stg_orders: clean, rename, cast"]
    SRC2["source: raw.customers"] --> STG2["stg_customers: clean, rename, cast"]
    STG --> MART["fct_orders: join + aggregate"]
    STG2 --> MART
    MART --> BI["BI tool / SQL consumer"]

    classDef grey stroke:#7b88a1,stroke-width:2.5px
    classDef green stroke:#a3be8c,stroke-width:2.5px
    class MART green
    class SRC,SRC2,STG,STG2,BI grey

Materializations: view, table, incremental

The same model can be built different ways, controlled by its materialization. A view is just a saved query — cheap to build, recomputed on every read. A table is fully rebuilt and stored, so reads are fast but each run rewrites everything. An incremental model is the important one at scale: dbt builds the full table once, then on later runs only processes new or changed rows and appends them, instead of reprocessing all of history. You declare the materialization in config; the SELECT stays the same.

Tests, sources, and why it took off

dbt also lets you assert things about your data. A source is the declared entry point — the raw tables your EL tool landed — and you can attach freshness checks and tests to them. Tests are reusable assertions written in YAML: this column is unique, that one is not_null, this key has a valid relationships reference to another table. They run as SQL and fail the build when the data violates them.

The reason dbt spread so fast is that it brought ordinary software practices to analytics. SQL transformations became modular files in version control, with code review, environments, automated tests, and generated documentation plus a browsable data lineage4 graph. Before dbt, this logic lived in thousand-line stored procedures or hand-ordered scripts nobody dared touch.

🔗 Learn more4 What is data lineage?

The honest caveats

dbt is transform-only. It does not ingest data — you need a separate EL tool (Fivetran, Airbyte, custom loaders) to land raw data first. It does not orchestrate; people trigger dbt run from Airflow5, cron6, or dbt Cloud. And it is warehouse-centric and SQL-bound: dbt assumes a SQL engine underneath and that your transformations can be expressed as SELECT statements. If your problem needs row-by-row procedural logic or lives outside a warehouse, dbt is the wrong tool.

🔗 Learn more5 What is Apache Airflow?
🔗 Learn more6 What is cron?

The short version: dbt is the disciplined SELECT-statement layer of an analytics stack — models linked by ref() into a DAG, materialized as views, tables, or incremental builds, with tests, docs, and lineage attached. It runs the transform, and assumes everything else is someone else's job.