Orchestration, catalogs & governance
11 articles in this category.
·2 min read
What is the medallion architecture (bronze, silver, gold)?
A layered lakehouse pattern: bronze holds raw ingested data, silver is cleaned and conformed, gold is business-ready aggregates. Refinement flows upward.
#data
#architecture
#lakehouse
#ai-assisted
·3 min read
What is Apache Airflow?
Airflow schedules and monitors pipelines defined as DAGs of tasks in Python. It orchestrates work — deciding what runs, when, and in what order — without doing the heavy compute itself. How the scheduler, DAGs, operators, and executor fit together.
#data
#airflow
#orchestration
#ai-assisted
·3 min read
What is AWS Glue?
AWS Glue is Amazon's serverless data-integration service: a managed Hive-compatible Data Catalog, schema crawlers, and serverless Spark ETL jobs.
#data
#catalog
#etl
#ai-assisted
·2 min read
What is data lineage?
Data lineage is the recorded provenance of data — where a field came from, what transformed it, and what depends on it. It's how you do impact analysis and debug wrong numbers.
#data
#governance
#observability
#ai-assisted
·3 min read
What is Dagster?
Dagster is a Python data orchestrator built on software-defined assets: you declare the data assets you want and it builds the DAG, lineage, and runs.
#data
#orchestration
#ai-assisted
·2 min read
What is a data mesh?
A data mesh decentralizes data ownership to the domain teams that produce it, treats datasets as products, and federates governance — an org model, not a tool.
#data
#architecture
#governance
#ai-assisted
·3 min read
What is Mage?
Mage is an open-source data pipeline tool with a notebook-style, block-based editor where each runnable block forms the pipeline's DAG.
#data
#orchestration
#ai-assisted
·3 min read
What is Project Nessie?
Project Nessie is an open-source transactional catalog that brings git-like branches, tags, and atomic commits to data lake tables.
#data
#catalog
#lakehouse
#ai-assisted
·3 min read
What is Prefect?
Prefect is a Python-native orchestrator: plain functions become flows and tasks via decorators, with dynamic, runtime-determined workflows instead of a static DAG.
#data
#orchestration
#ai-assisted
·3 min read
What is Temporal (durable execution)?
Temporal is a durable-execution platform where you write workflows as ordinary code that survives crashes and resumes exactly where it left off.
#data
#orchestration
#workflows
#ai-assisted
·3 min read
What is the Hive metastore?
The Hive metastore maps table names to schemas, partitions, and file locations so engines can treat directories of files as SQL tables.
#data
#catalog
#ai-assisted