← Learn··Updated 18 Jun 2026·2 min read

What is the medallion architecture (bronze, silver, gold)?

A layered lakehouse pattern: bronze holds raw ingested data, silver is cleaned and conformed, gold is business-ready aggregates. Refinement flows upward.

Data & lakehouse
#data
#architecture
#lakehouse
#ai-assisted

The medallion architecture is a convention for organizing a data lakehouse1 into three layers — bronze, silver, and gold — where data gets progressively cleaner and more useful as it moves up. It is not a technology or a specification; it is a naming scheme, popularized by Databricks2, for a discipline most mature pipelines already followed: keep the raw input, refine in stages, and serve from the top.

🔗 Learn more1 What is a data lakehouse?
🔗 Learn more2 What is Databricks?

The three layers

flowchart TD
    SRC["Sources: apps, APIs, CDC, files"] --> BRONZE["Bronze: raw, as-ingested, append-only"]
    BRONZE --> SILVER["Silver: cleaned, deduplicated, conformed, typed"]
    SILVER --> GOLD["Gold: business aggregates and marts"]
    GOLD --> BI["BI, reports, ML features"]

    %% color = refinement level: amber raw, grey cleaned, green serve-ready
    classDef raw stroke:#ebcb8b,stroke-width:2.5px
    classDef mid stroke:#7b88a1,stroke-width:2.5px
    classDef ready stroke:#a3be8c,stroke-width:2.5px
    class BRONZE raw
    class SILVER mid
    class GOLD,BI ready
    class SRC mid
  • Bronze is the landing zone: data exactly as it arrived, appended without edits. It is ugly on purpose. Its job is to be a faithful, replayable record of what the source actually sent, so you can reprocess everything downstream without re-ingesting.
  • Silver is where the cleaning happens: deduplication, type casting, schema enforcement, joining reference data, applying change-data-capture to get current state. Silver is the trustworthy, queryable model of the business entities.
  • Gold is the serving layer: aggregations, denormalized marts, metrics, and feature tables shaped for specific consumers — dashboards, reports, ML.

Why the separation earns its keep

The value is not the medals; it is the reproducibility and clear contracts between stages. Because bronze keeps the raw input untouched, any bug in cleaning or aggregation is fixable by replaying from bronze — you never have to beg the source for last quarter's data again. Each layer is a stable interface: analysts build on silver, executives consume gold, and a change in raw format is absorbed at bronze without rippling everywhere at once.

It also separates concerns that otherwise tangle. Ingestion reliability lives at bronze, data-quality logic at silver, and business semantics at gold. Each can be owned, tested, and reasoned about on its own.

The honest caveat

Three layers is a default, not a law. A small pipeline moving one clean source into one table does not need three physical copies of the data and the storage and orchestration that implies — that is ceremony, not architecture. The pattern also tempts layer sprawl: bronze-bronze, pre-silver, "platinum," a maze of intermediate tables nobody can map.

Take the discipline, not the dogma. The real lesson is older than the branding: keep your raw data immutable and replayable, refine in explicit stages with clear contracts, and serve from a layer shaped for the consumer. Whether you call them bronze, silver, and gold — or raw, clean, and serving — is cosmetic. The staging is the point.