What does grep stand for, and the seventy-five-year history of the regular expression

Joe Condon and Ken Thompson with Belle at Bell Labs, 1977

Joe Condon and Ken Thompson working on Belle, 1977 — picture courtesy of Bell Laboratories. Nine years before this photo, Thompson had implemented Stephen Kleene's 1951 regex notation in the QED editor on MIT's CTSS; four years before this photo, he extracted the g/re/p command from QED's Unix successor ed as a standalone utility.

The name grep is shorter than the explanation. It comes from a single command in the ed line editor: g/re/p, meaning globally search for a regular expression and print every matching line. Ken Thompson extracted that command into a standalone program in 1973 and the name travelled with the binary. Almost every Linux distribution in 2026 ships the same four letters, doing roughly the same thing, against text streams that no longer look anything like the punched cards Thompson was running it on. The reason the name survived is the same reason the underlying notation survived: a 1951 mathematician wrote down a tiny grammar for describing patterns, and seventy-five years of computing has not produced a better one.

▶ Try the regex tester — live match highlighting, capture groups, and a token-by-token explanation of any pattern.

Kleene's 1951 memo

The mathematical foundation predates Unix by twenty years. In 1951, Stephen Cole Kleene published a RAND Corporation memorandum titled Representation of Events in Nerve Nets and Finite Automata. The memo expanded in 1956 into a chapter in the Automata Studies volume from Princeton. Kleene was extending Warren McCulloch and Walter Pitts' 1943 paper on neural nets — an attempt to formalise how a network of binary-threshold neurons could compute — and the contribution Kleene made was a notation for describing the set of inputs a finite automaton would accept.

He called the sets regular events; the algebra he invented to describe them used three operations: union (now |), concatenation (implicit), and the star (*) for "zero or more repetitions". Those three operators are enough to describe any language recognisable by a finite automaton, and the proof that they suffice is the Kleene theorem. The mathematical category exists exactly because Kleene drew the line: a regular language is one a finite automaton can recognise, and Kleene's algebra is the notation for spelling out which one.

The whole thing was an exercise in theoretical computer science. Nobody implemented it on a computer for seventeen years.

Thompson and CTSS, 1968

Ken Thompson read Kleene's work in the 1960s while at Bell Labs and recognised that the algebra was the right notation for pattern matching in a text editor. He wrote up the implementation in Regular Expression Search Algorithm in CACM in June 1968 — the first published software implementation of regex. The trick was a just-in-time compiler: Thompson translated each regex into native IBM 7094 machine code at the moment the user typed it, ran the compiled matcher against the text, and discarded the code at the end of the search. The technique is now known as Thompson's construction; it underlies the linear-time matching algorithms that modern engines use seven decades later.

Thompson built the matcher into QED, the time-sharing editor on the Compatible Time-Sharing System at MIT, and from QED it travelled into ed, the line editor that shipped with the first Unix in 1971. Inside ed, the command g/regex/p would print every line in the buffer that matched a given pattern. The command was idiomatic enough that in 1973 Thompson extracted it as a standalone Unix utility, named after the syntax: grep.

What grep actually stands for

The Unix lore here is more boring than the modern repackaging suggests. The four letters are not an acronym for any phrase. They are the literal characters of the ed command, in order: g, the global modifier; re, the regex; p, the print action. Strip the slashes and what is left is grep. The name preserves the command's lineage rather than describing the program's purpose.

Three later variants kept the naming game:

egrep (1973) — extended grep, supporting alternation and grouping that the original grep could not match. Now usually a synonym for grep -E.
fgrep — fixed-string grep, for literal-string searches without regex compilation. Now grep -F.
pgrep — process grep, for matching processes by name. A 1990s addition, not from Thompson.

The grep name is older than most engineers who reach for it daily. The name will outlive the binary.

POSIX standardises regex, twice

By 1988 every Unix had a regex implementation and every implementation differed slightly. The POSIX standardisation effort — the IEEE 1003 working group also responsible for codifying the rest of the Unix interface — formalised regex in IEEE 1003.2 in 1992 under two specifications:

POSIX BRE (Basic Regular Expressions) — the older grep flavour. Special characters like +, ?, (, ) are literal unless escaped with backslash. Mostly preserved for backward compatibility with older Unix utilities and shell¹ scripts.
POSIX ERE (Extended Regular Expressions) — the egrep flavour. Special characters are special by default; escape with backslash to match literally. The form most modern tools use under grep -E.

🔗 Learn more — ¹ What is a shell?

POSIX did not standardise lookarounds, backreferences across alternation, non-greedy quantifiers, named groups, or Unicode property classes. Every one of those features was added later, by different authors, in different tools, with different syntax. The flavour war is a direct consequence of the POSIX standard's deliberate narrowness.

Perl and the extension explosion

Larry Wall released Perl 1.0 in December 1987 with a regex engine that bolted a series of extensions on top of POSIX ERE. Over Perl 2 (1988) through Perl 5 (1994), the extensions accumulated:

Lazy quantifiers (*?, +?).
Lookaheads and lookbehinds ((?=...), (?<=...)).
Backreferences (\1, \2).
Named groups ((?P<name>...) in Python, (?<name>...) in .NET, neither matching Perl's own later syntax).
Character class shortcuts (\d, \w, \s).
The (?:...) non-capturing group.

Wall's philosophy was that regex was a programming language inside the programming language, and that the right move was to make it expressive enough to handle real text-processing tasks. The cost of the philosophy was that Perl's regex engine — backtracking, recursive, expressive — has exponential worst-case time on certain input patterns. Almost every Perl-compatible engine since has inherited the same algorithmic risk.

PCRE: Perl's regex as a C library

In 1997, Philip Hazel at the University of Cambridge wrote PCRE — Perl Compatible Regular Expressions — as a standalone C library so other programs could use Perl's flavour without embedding Perl. PCRE became the regex engine inside Apache HTTP Server, PHP, nginx, R, and dozens of other tools. PCRE2, released in 2015, was a clean-slate rewrite with a more consistent API and better Unicode support; PCRE2 is the version every modern project should be using.

The practical consequence: the Perl-compatible flavour is the de facto standard outside of POSIX-only environments. When a tutorial says "use this regex", it almost certainly means PCRE syntax. The minor incompatibilities between PCRE, Python's re module, Java's java.util.regex, .NET's System.Text.RegularExpressions, and JavaScript's RegExp are the source of every regex porting bug since 2001.

The flavour war, in one table

The common pitfalls when porting a regex between languages:

Feature	POSIX ERE	PCRE / Perl	Python `re`	JavaScript	RE2 / Go
Lookbehind	no	yes (fixed-width Perl; variable PCRE2)	yes (fixed-width)	yes (2018+)	no
Named groups	no	`(?<name>...)`	`(?P<name>...)`	`(?<name>...)`	yes (Perl-style)
Backreferences	optional	yes	yes	yes	no
Recursion `(?R)`	no	yes	no	no	no
Unicode classes `\p{...}`	no	yes	yes	yes (u flag)	yes
Atomic groups `(?>...)`	no	yes	yes (3.11+)	no	no
Exponential worst case	yes	yes	yes	yes	no

The last row is the load-bearing one. PCRE-family engines can take exponential time on adversarial patterns — the family of bugs known as ReDoS, responsible for the Cloudflare outage of 2 July 2019 among others.

RE2 and the linear-time school

Russ Cox at Google released RE2 in 2010 and wrote a series of essays arguing that backtracking regex engines are a security liability and that linear-time matching using Thompson's original construction is achievable without losing most of the practical features. RE2 powers grep-like tools at Google scale, the Go standard library, and a handful of other languages that explicitly do not want exponential blowups. The cost is dropping backreferences and recursion — features that Thompson's NFA construction cannot support in linear time.

The RE2 vs PCRE split is the active flavour war in 2026. Most languages still ship a PCRE-style engine. A growing minority ship RE2 or a similar linear-time engine. The choice is between expressiveness and worst-case latency. Most application code does not need backreferences, and almost no application code wants a regex to ever hang the request.

A short close

The notation Kleene wrote down in 1951 is still the right notation seventy-five years later, with three operators that handle most of the patterns engineers actually want to match. The grep name is an ed command preserved as a binary name. The flavour war is a thirty-year argument about which extensions on top of Kleene's algebra are worth their algorithmic cost. The math underneath has not changed since Automata Studies. The argument on top of it shows no sign of ending.

Try it

Build and test patterns in the regex tester & explainer — live match highlighting, capture groups, a token-by-token breakdown, and short lessons for each category. New to the syntax? Start with What is a regular expression?.