#data

4 posts

·7 min read
The hidden cost of a lakehouse on S3
A lakehouse on object storage looks cheap because storage is cheap. The bill is built from request count and managed-tier access fees, both of which scale with file count, not data volume. 5 GB stored as one million 5 MB files is a different invoice than 5 GB stored as ten 512 MB files.
#data
#lakehouse
#s3
#iceberg
#cost
#ai-assisted
·5 min read
The cold-start tax: serverless warehouses vs an always-on box
A serverless warehouse that auto-provisions three nodes for a SELECT * spends most of the bill on the time you waited for it. A self-hosted ClickHouse on EC2 trades elasticity for sub-second latency and a fixed monthly line item — favourable any time queries are even mildly steady.
#data
#warehouse
#databricks
#clickhouse
#cost
#ai-assisted
·6 min read
YAML vs YML, and what 'markup language' actually means
The .yml extension is a 1990s DOS artifact. The 'YAML Ain't Markup Language' acronym is a 2002 self-correction. Both questions resolve cleanly once you know markup languages and data serialisation formats are different categories with different ancestors.
#yaml
#markup
#history
#data
#opinion
#ai-assisted
·5 min read
JSON: discovered, not invented
Douglas Crockford has said for twenty years that he did not invent JSON, he discovered it. The format was sitting inside JavaScript the whole time, waiting for someone to extract it. The story of how a 2001 footnote in a browser scripting language ate XML's lunch is shorter than most people think.
#json
#history
#data
#ai-assisted