Processing & streaming

9 articles in this category.

·3 min read
What is Amazon Kinesis?
Amazon Kinesis is AWS's managed streaming family: Data Streams (a sharded log), Firehose (managed delivery), and Managed Service for Apache Flink.
#data
#streaming
#cloud
#ai-assisted
·4 min read
What is Apache Kafka?
Kafka is a distributed, append-only log you can publish events to and replay later. Not a queue and not a database — a durable commit log that decouples the systems producing data from the ones consuming it. How the log, partitions, and consumer groups fit together.
#data
#kafka
#streaming
#ai-assisted
·3 min read
What is Apache Beam?
Apache Beam is a unified model for batch and streaming: write one pipeline and run it on a pluggable runner like Dataflow, Flink, or Spark.
#data
#processing
#streaming
#ai-assisted
·3 min read
What is Apache Flink?
Apache Flink is a distributed engine for stateful stream processing: true record-at-a-time streaming, event-time with watermarks, exactly-once state.
#data
#streaming
#processing
#ai-assisted
·4 min read
What is Apache Spark?
Spark is a distributed compute engine that runs one logical query across a cluster of machines. It builds a lazy plan, splits it into parallel tasks, and shuffles data between stages. How the driver, executors, and DAG actually fit together.
#data
#spark
#distributed-computing
#ai-assisted
·3 min read
What is Apache Pulsar?
Apache Pulsar is a distributed messaging and streaming platform that splits serving (brokers) from storage (BookKeeper), with multi-tenancy built in.
#data
#streaming
#messaging
#ai-assisted
·3 min read
What is Hadoop (and why MapReduce faded)?
Hadoop launched the big-data era with HDFS, MapReduce, and YARN. Foundational, but largely superseded by Spark, S3, and cloud warehouses.
#data
#processing
#history
#ai-assisted
·3 min read
What is RabbitMQ (and how is it different from Kafka)?
RabbitMQ is a smart message broker that routes messages through exchanges to queues and deletes them once consumed — unlike Kafka's replayable log.
#data
#messaging
#ai-assisted
·3 min read
What is Redpanda?
Redpanda is a Kafka-API-compatible streaming platform in C++ with no JVM and no ZooKeeper, built for lower latency and simpler operations.
#data
#streaming
#kafka
#ai-assisted