What is a segfault?
A short reference on segmentation faults — what actually happens between the MMU and SIGSEGV, the five bugs that cause almost all of them, how to get and read the core dump, and why memory-safe languages mostly (but not entirely) make them extinct.
The one-line definition
A segmentation fault (segfault) is the operating system killing your process for touching memory it has no right to touch. The hardware's memory-management unit (MMU) catches the illegal access, the kernel receives the fault, and delivers your process a SIGSEGV signal, whose default action is: terminate, and dump core. It is not the program noticing a bug — it is the floor beneath the program refusing to exist there.
What actually happens, step by step
- Your code dereferences an address — through a null pointer, a stale pointer, an out-of-range index.
- The CPU asks the MMU to translate that virtual address. The MMU finds no valid mapping for your process (or a mapping without permission — writing to read-only memory counts).
- The hardware raises a page fault; the kernel inspects it. A legitimate fault (a page swapped out, a lazily-allocated stack page) gets fixed silently — page faults are routine. An illegitimate one has no fix.
- The kernel delivers SIGSEGV. Unhandled, the process dies and the kernel writes the core dump, per
ulimit -candcore_pattern.
The word segmentation is itself a fossil: it survives from the segment-based memory models of 1960s–70s machines, where memory really was divided into named segments and violating one had a literal meaning. Paged virtual memory replaced segments; the error's name never updated.
The five bugs behind almost every segfault
| Bug | The shape of it |
|---|---|
| Null dereference | Following a pointer that is 0 — the "you forgot to check the return value" classic |
| Use-after-free | Memory freed, pointer kept, pointer used. Sometimes works (worse!), sometimes faults, always wrong |
| Buffer overflow | Writing past the end of an array into whatever lives next door |
| Stack overflow | Usually runaway recursion — the stack grows into the guard page |
| Dangling stack pointer | Returning the address of a local variable, then using it after the frame is gone |
The cruelty is that none of these faults reliably: an out-of-bounds write inside a still-mapped page corrupts data silently instead of crashing. A segfault is the lucky outcome — the bug that announces itself.
Debugging one
- Get the core.
ulimit -c unlimited(or on systemd machines,coredumpctl list— modern distros route dumps there viacore_pattern). - Open it:
gdb ./yourprog core, thenbtfor the backtrace — the single most information-dense command in debugging; it names the file and line of death. - Catch it live instead: run under
gdband it stops at the faulting instruction;valgrindnarrates the illegal access and, crucially, the earlier free that set it up. - Build with sanitizers: compile with
-fsanitize=address(ASan) and most of the five bugs above are caught at their first occurrence with a readable report, not at their eventual crash.
Why you rarely see them in Python or Go
Memory-safe languages take the pointer arithmetic away: bounds checks turn overflows into exceptions (IndexError, panic: index out of range), garbage collection makes use-after-free unrepresentable, and null dereferences become catchable errors instead of MMU faults. A segfault in pure Python is definitionally not your bug — it is a bug in the interpreter or a C extension underneath. That's the loophole: FFI, native extensions, and unsafe blocks reopen the door, which is why numpy or a Rust unsafe module can still hand you a genuine SIGSEGV.
The footguns
Segfault ≠ bus error. SIGBUS (bus error) is the sibling: the address was valid but the access malformed — misaligned reads on strict architectures, truncated mmap'd files. Different signal, different causes.
The OOM killer is not a segfault. A process that dies because the machine ran out of memory is killed with SIGKILL by the kernel's OOM killer — no signal handler, no core, exit code 137. "It just died with 137" is a memory-quantity problem; a segfault is a memory-correctness problem.
A caught SIGSEGV is a trap. You can install a handler and continue — and almost never should: after an illegal access, the process state is untrustworthy by definition. Handlers are for logging and dying gracefully, not for soldiering on.