← Blog··Updated 21 May 2026·6 min read

YAML vs YML, and what 'markup language' actually means

The .yml extension is a 1990s DOS artifact. The 'YAML Ain't Markup Language' acronym is a 2002 self-correction. Both questions resolve cleanly once you know markup languages and data serialisation formats are different categories with different ancestors.

AI-assisted postDrafted with help from Claude, edited and fact-checked by Mart. See transparency policy →

Two questions come up about YAML often enough to be worth answering side-by-side. The first is small and practical: .yaml or .yml, which extension is correct? The second is more historical: what does the acronym actually stand for, given that the official spelling is "YAML Ain't Markup Language" and the project's own logo says so? Both answers are short. Both point to a bigger category question that is worth fifteen hundred words.

The .yml extension is a DOS artifact

The short extension exists because of an operating-system constraint that was already historical when YAML was named. From 1981 through the mid-1990s, the FAT file systems that shipped with PC DOS, MS-DOS, and pre-Windows-95 versions of Windows enforced the 8.3 filename convention: eight characters of base name, a dot, then no more than three characters of extension. MYFILE.TXT worked; MY_LONG_FILE_NAME.TEXT did not. Long filenames arrived with VFAT in Windows 95 in 1995, six years before YAML was even named, so the technical reason for short extensions had been gone for half a decade by the time anyone needed to save a file as .yaml.

The convention persisted anyway. Cross-platform tooling, batch scripts, and editor file-type detection assumed three-character extensions for portability, and a generation of file-extension habits had calcified. When YAML was first released in 2001, both .yaml and .yml got picked up by different parsers and different ecosystems, and neither one ever fully displaced the other. The official YAML organisation has recommended .yaml as the canonical extension since 2006 and the YAML 1.2 specification in 2009 made .yaml the preferred form, but .yml survives in any ecosystem that locked in its convention before 2006 — Ruby on Rails for config/database.yml, Docker for docker-compose.yml, the early GitHub Actions documentation. Ecosystems that arrived later — Kubernetes, OpenAPI, Helm — went with .yaml. Both extensions parse identically. The only practical reason to pick one over the other in 2026 is to match what the rest of the repository already uses.

The recursive acronym

YAML itself was first released on 2001-05-11, when Clark Evans posted "YAML Draft 0.1" to the sml-dev mailing list, a forum that had spun off from the larger xml-dev list specifically to talk about simpler alternatives to XML (YAML — Wikipedia). Evans designed YAML jointly with Ingy döt Net and Oren Ben-Kiki, and at the launch the acronym stood for Yet Another Markup Language — a deliberate riff on the proliferation of XML-derived formats that had defined the late 1990s. The XML hype cycle was at its peak; "yet another" was the right amount of self-deprecation.

The name lasted less than a year. Between December 2001 and April 2002 the project re-acronymed YAML to YAML Ain't Markup Language, a recursive acronym in the GNU tradition. The change was not a marketing exercise. It was a category correction. The maintainers had realised, correctly, that YAML's purpose was data serialisation — taking an in-memory data structure and writing it to text so another program could read it back — and that calling it a markup language would invite a permanent misunderstanding about what category of thing it was. The recursive acronym preserved the original four letters while disowning the original four words. The original yaml.org front page still leads with the corrected acronym twenty-five years later.

What "markup" actually means

A markup language is a system for annotating a document with metadata about its structure, presentation, or both. The document is the primary artifact; the markup is the annotation laid over it. The annotation typically takes the form of tags, marks, or sigils interspersed with the document text, and the rendering or processing stage interprets those marks to produce the final output.

The category name traces to IBM in the late 1960s, where Charles Goldfarb, Edward Mosher, and Raymond Lorie developed Generalized Markup Language (GML) to standardise document formatting across IBM's product lines. Goldfarb coined the name partly to describe the language's intent and partly using the initials of the three authors' surnames. GML was the seed; the team's work expanded into a multi-vendor standardisation effort that ran through the 1970s and into the early 1980s under the International Organization for Standardization. Goldfarb eventually chaired the committee. The standardisation effort took eight years. The result, SGML — Standard Generalized Markup Language — was published as ISO 8879 in October 1986 and is the common ancestor of nearly every markup language anyone has used since.

The family tree

SGML itself was almost never edited by hand. It was a meta-standard — a language for describing markup languages — and the real action happened in its descendants:

  • HTML was invented at CERN in 1989–1990 by Tim Berners-Lee as a hypertext document format defined as an SGML application. Pragmatic, browser-permissive, tag-soup-tolerant. The first commercially relevant markup language and still the most-deployed one in the world.
  • XML was published by the W3C in 1998 as a strict re-design of SGML for the emerging web era. Self-describing structure, mandatory well-formedness, namespaces, schemas. The Java/enterprise era's interchange format of choice from roughly 1999 through 2012.
  • XHTML in 2000 was HTML reformulated as XML. It enjoyed about a decade of orthodoxy among standards-minded web developers and was then quietly abandoned as a mainline standard in favour of HTML5 in 2014.
  • Markdown was created in March 2004 by John Gruber, in collaboration with Aaron Swartz, as a plain-text shorthand that compiles to HTML (Markdown — Wikipedia). The aesthetic is inverted from XML — minimal symbols, maximum source-readability — but the purpose is the same: annotate a document so a processor can render it.
  • DocBook, AsciiDoc, reStructuredText, MDX all sit in the same family. All annotate a document. All have an SGML or XML ancestry in their heads or hearts.

This is what markup language means as a technical term. The document is what you are looking at; the markup is the metadata layered on top.

Data serialisation formats are a different category

What YAML, JSON, TOML, and INI are actually doing is a different thing. They take a data structure — usually a tree or graph of dictionaries, lists, and primitives that exists in a program's memory — and project it to a text or byte representation so that another program (or the same program later) can reconstitute the data structure. There is no document being annotated. The artifact is the data.

A quick walk through the family in roughly chronological order:

  • CSV dates back to the 1970s and predates most of the others by a decade. No nesting, no types, just rows.
  • INI files arrived with Windows 3.x in the early 1990s. Sectioned key-value pairs.
  • XML was used so often for data serialisation in the 2000s — SOAP, RSS, enterprise messaging, Maven pom.xml, Spring config — that for most engineers under 35, "XML" means the verbose way Java applications store configuration, not a generalised markup standard. The dual use is real; the category drift is real.
  • JSON was formalised by Douglas Crockford in the early 2000s as JavaScript Object Notation, extracted from JavaScript's object literal syntax. Specified in RFC 4627 in 2006 and then ECMA-404 in 2013. The dominant interchange format on the web from roughly 2009 onward.
  • YAML in 2001, designed primarily for human authoring rather than machine generation. YAML 1.2 is on paper a strict superset of JSON, which means any valid JSON file is also a valid YAML file. Indentation-significant, which is the source of every YAML horror story.
  • TOML was released by Tom Preston-Werner in 2013, explicitly designed in reaction to YAML's whitespace-fragility. INI-flavoured, unambiguous, machine-parseable. Now used by Cargo.toml, pyproject.toml, and Gemfile.lock-adjacent tooling.
  • MessagePack, BSON, Protocol Buffers, Avro, Cap'n Proto are all binary serialisation formats — same category as the text ones, different optimisation target. Smaller on the wire, faster to encode and decode, not human-readable.

None of these are markup languages. None of them annotate a document. All of them serialise data.

XML is the overlap

The category most responsible for the confusion is XML itself. XML is a markup language by design — a strict subset of SGML, fully tag-based, descended from the document-annotation lineage. It was also used so heavily for data serialisation during the 2000s that an entire generation of engineers grew up encountering XML in pom.xml and web.xml and applicationContext.xml, where there was no document being annotated and the tags were just a verbose way of expressing a tree of key-value pairs. The original category was right; the cultural memory drifted. By the time JSON had displaced XML as the default web-API serialisation format around 2009, the colloquial meaning of "markup language" had blurred into "any structured text format a computer reads."

This is the same imprecision that produced the original "Yet Another Markup Language" expansion of YAML. The maintainers fixed it on their side in 2002. The broader vocabulary never caught up.

A short close

The .yml extension is a DOS-era artifact from a generation of tooling that assumed three-character extensions for portability. The "YAML Ain't Markup Language" acronym is a small, deliberate category correction the maintainers made a year after the project launched. The bigger lesson, behind both, is that markup language is a technical category with a specific lineage — Goldfarb's IBM team in the 1960s, SGML in 1986, HTML and XML and Markdown all descending from that root — and is not a synonym for any text file a computer reads. YAML, JSON, TOML, and the binary formats sit in the next family over. Both families are useful. They are doing different work. The word for that work matters.

Read next