What is Dagster?

Dagster is a Python data orchestrator built around software-defined assets — you declare the data assets you want and their dependencies, and Dagster builds the DAG¹, tracks lineage, and runs or backfills them. Instead of writing a graph of tasks and reasoning about what each task happens to produce, you describe the tables, files, and models that should exist, and the orchestrator works out the execution graph from how those assets reference each other. That single reframing is what sets Dagster apart from the older generation of schedulers.

🔗 Learn more — ¹ What is a DAG (and why orchestrators use them)?

Assets, not tasks

The dominant model for years was task-centric, set by Apache Airflow²: you write a DAG of operators, and each one is an opaque unit of work. Whether a task actually produced the table it was supposed to is outside the orchestrator's knowledge — the graph describes order of operations, not what data exists.

🔗 Learn more — ² What is Apache Airflow?

Dagster inverts this. The core unit is the asset: a persistent object in your data platform — a warehouse table, a Parquet³ file, a trained model — declared as a decorated Python function. The function's return value is the asset; the assets it reads from become its upstream dependencies. You never hand-wire the DAG. Dagster reads the dependency declarations and derives the graph, so the structure of your code and the structure of your data pipeline⁴ are the same thing.

🔗 Learn more — ³ How Parquet works: columnar storage explained

🔗 Learn more — ⁴ What is a data pipeline?

%% color = green: the asset consumers actually query
flowchart TD
    RAW["@asset raw_orders"] --> STG["@asset stg_orders"]
    RAW2["@asset raw_customers"] --> STG2["@asset stg_customers"]
    STG --> MART["@asset fct_orders"]
    STG2 --> MART
    MART --> BI["dashboards / consumers"]

    classDef grey stroke:#7b88a1,stroke-width:2.5px
    classDef green stroke:#a3be8c,stroke-width:2.5px
    class MART green
    class RAW,RAW2,STG,STG2,BI grey

Because Dagster knows which assets exist and how they relate, data lineage⁵ is not a bolt-on — it is the native data model. The asset graph in the UI is the lineage graph, kept current automatically rather than reconstructed by a separate scraping tool.

🔗 Learn more — ⁵ What is data lineage?

Typing, partitions, and the UI

Two features make the asset model practical at scale. The first is typed inputs and outputs: assets pass values through an I/O manager that handles persistence, so a function can return a DataFrame and Dagster takes care of writing it to storage and loading it for downstream assets. The boundary between business logic and where-the-bytes-live is explicit, which is what makes the same asset trivially re-targetable from a local file to a warehouse.

The second is partitions. You declare an asset as partitioned — by date, by region, by customer — and each partition is independently materialized, tracked, and backfillable. When a backfill is "rebuild March 1–31," Dagster launches and tracks those 31 partitions as first-class units rather than as one giant opaque run. This is the kind of operation that turns into a fragile hand-rolled loop in a task-centric setup.

Around all of this sits a genuinely rich UI. It shows the full asset graph, the freshness and materialization status of every asset, run history, logs, and the lineage between everything. It is the single pane that makes a sprawling data pipeline legible, and it is one of the strongest parts of the tool.

Local dev and testing

Dagster takes the developer experience seriously, which is where it most clearly improves on its predecessors. Definitions are plain Python objects, so an asset is just a function you can import and call in a unit test with fabricated inputs — no scheduler, no database, no live warehouse. The I/O managers and external connections are supplied as configurable resources, so the same asset runs against a local SQLite or in-memory store in tests and against the real warehouse in production, swapped by configuration rather than by editing the asset.

dagster dev spins up the full UI and daemon locally, so you iterate against the same interface you ship to. Treating pipelines as ordinary, testable software — rather than YAML or untestable operator graphs — is the through-line of the whole project.

The honest trade-offs

The asset model and the developer experience are a real step up from Airflow for analytics pipelines: lineage you get for free, partitions and backfills that are first-class, and code you can actually unit-test. But it is worth being fair about the cost. Dagster is younger and has a smaller installed base — Airflow has years of accumulated operators, integrations, and people who already know it, and it still dominates production by sheer count. Dagster is also opinionated: the asset abstraction is a genuine mental shift, and if your work is irregular task plumbing rather than a graph of materialized data, that opinion can fight you. Prefect⁶, by contrast, stays closer to the task model with lighter ceremony. Dagster's bet is that for analytics — where the deliverable really is a set of tables and models — modelling the assets directly is the right abstraction. For that workload, the bet largely pays off.

🔗 Learn more — ⁶ What is Prefect?

For deeper reference, the official Dagster documentation covers assets, partitions, and resources in detail.