Pipelines & ingestion

6 articles in this category.

·3 min read
Batch vs stream processing
Batch processes finite datasets on a schedule for throughput; stream processing handles unbounded events continuously for low latency. When to use each.
#data
#streaming
#processing
#ai-assisted
·3 min read
What is a DAG (and why orchestrators use them)?
A DAG models a pipeline as tasks (nodes) and dependencies (edges) with no cycles — so a valid run order always exists. Why orchestrators rely on it.
#data
#orchestration
#ai-assisted
·3 min read
What is a data pipeline?
A data pipeline moves data from sources through ingest, transform, store, and serve — reliably, on a schedule or as a stream. The stages, batch vs streaming, and where pipelines rot.
#data
#pipelines
#ai-assisted
·4 min read
What is Change Data Capture (CDC)?
CDC streams inserts, updates, and deletes out of a database as they happen — log-based, query-based, or trigger-based — so downstream systems stay in sync.
#data
#cdc
#pipelines
#ai-assisted
·3 min read
What is ETL (and how is ELT different)?
ETL extracts data, transforms it, then loads it. ELT loads raw first and transforms inside the warehouse. Why cheap cloud compute flipped the order, and where each still fits.
#data
#etl
#pipelines
#ai-assisted
·3 min read
What is idempotency (in data pipelines)?
An idempotent step gives the same result whether it runs once or ten times — the property that lets a crashed, re-run pipeline stay correct instead of double-counting.
#data
#pipelines
#reliability
#ai-assisted