RAID explained: levels, redundancy, and which to pick

RAID — Redundant Array of Independent Disks — combines several physical disks into one logical volume, to buy you some mix of three things: redundancy (survive a disk dying), capacity (one big volume instead of many small ones), and speed (read and write across several spindles at once). Which of those you get, and what you trade for it, is the whole point of "RAID levels."

Before anything else, the one rule that saves people from disaster:

RAID is not a backup. It protects against a disk failing. It does nothing about a file you deleted, a filesystem that corrupted, a bad write your application made, or ransomware — all of those replicate across every mirror and parity stripe instantly. Backups are a separate job. (Here's where the storage backup line belongs.)

Hardware RAID, software RAID, and md

RAID can live in a dedicated controller card (hardware RAID), in firmware ("fake RAID"), or in the operating system (software RAID). On Linux the software option is md — the multiple-device layer, driven by mdadm — and for a homelab it's almost always the right choice: no proprietary controller to fail and lock your array to one card model, full visibility, and it's free. (Hardware vs software RAID makes the full case — and why bit-rot is a separate, filesystem-layer question, not part of this one.) Everything below is described in md terms, but the levels themselves are universal.

A RAID array presents a single block device (e.g. /dev/md0); you then put a filesystem on top of it like any other disk.

The levels

Level	What it does	Min disks	Survives	Usable capacity	Pay for it with
0	Stripe, no redundancy	2	nothing	100%	any disk dies → all data gone
1	Mirror	2	n−1 disks	one disk	capacity (you keep one disk's worth)
5	Stripe + 1 parity	3	1 disk	n−1 disks	slow writes, risky rebuilds
6	Stripe + 2 parity	4	2 disks	n−2 disks	slower writes, more parity overhead
10	Mirror + stripe	4	1 (maybe more)	50%	half your raw capacity

RAID 0 — striping

Data is split across all disks, so reads and writes hit several spindles at once: roughly N× the throughput and N× the capacity. There is zero redundancy — lose any one disk and the entire array is gone, because every file is scattered across all of them. Only for scratch space, caches, or data you can lose without flinching.

RAID 1 — mirroring

Every disk holds an identical copy. Two disks → one disk's capacity, but you can lose either and keep running, and reads can be served from both. The simple, boring, correct choice for a two-disk boot or data volume. Add more disks for more copies (three-way mirror survives two failures).

RAID 5 — striping with single parity

Data striped across all disks plus one disk's worth of parity distributed among them, so any single disk can be reconstructed from the rest. Usable capacity is N−1 disks — efficient. The catches: every write becomes read-modify-write to update parity (the "write penalty"), and rebuilds are dangerous on large drives. When a disk dies, rebuilding forces a full read of every remaining disk; on multi-terabyte drives the odds of hitting an unrecoverable read error (URE) during that read are non-trivial, and a second error mid-rebuild loses the array. This is why RAID 5 is widely discouraged for large modern SATA drives — the rebuild is exactly when you can least afford a second fault.

RAID 6 — striping with double parity

Like RAID 5 but with two independent parity blocks, so it survives two simultaneous disk failures — including a second failure during a rebuild, which is the scenario that kills RAID 5. Capacity is N−2. The standard choice for large arrays (6+ big drives) where you want parity efficiency without the rebuild-roulette.

RAID 10 — mirror plus stripe

Disks are paired into mirrors, and data is striped across the pairs. You get mirror-grade safety and stripe-grade speed, with no parity math (so no write penalty). Usable capacity is half the raw total. It survives any single disk, and survives a double failure as long as the two dead disks aren't the same mirror pair. Best random-IO profile of any redundant level, which is why it's the homelab sweet spot for mixed workloads — here's a real four-disk RAID 10 picked apart.

(Levels 2, 3, and 4 exist but are historical curiosities; nobody deploys them today.)

Which one should I pick?

flowchart TD
    START["Picking a RAID level"] --> Q1{"Need to survive a disk failure?"}
    Q1 -->|no| R0["RAID 0 — max speed & capacity, zero redundancy"]
    Q1 -->|yes| Q2{"How many disks?"}
    Q2 -->|two| R1["RAID 1 — mirror"]
    Q2 -->|four or more| Q3{"What matters more?"}
    Q3 -->|speed & safe rebuilds| R10["RAID 10"]
    Q3 -->|usable capacity| R6["RAID 6 — not RAID 5 on big drives"]

    %% color = redundancy claim: red holds no copy, green survives a failure
    classDef danger stroke:#bf616a,stroke-width:2.5px
    classDef safe stroke:#a3be8c,stroke-width:2.5px
    classDef plain stroke:#7b88a1,stroke-width:2.5px
    class R0 danger
    class R1,R10,R6 safe
    class START,Q1,Q2,Q3 plain

Work backwards from the goal, the drive count, and the drive size:

Two disks, want safety: RAID 1. Done.
Performance and redundancy, four disks, mixed read/write: RAID 10. Half capacity is the price; you get speed and simple, fast rebuilds.
Maximise capacity across many large disks, can tolerate slower writes: RAID 6. Not RAID 5 — on multi-terabyte drives the rebuild risk isn't worth the one extra disk of capacity.
Pure throughput, data is disposable: RAID 0. Know that any failure is total.
One disk: that's not RAID. Just back it up.

Two factors push you toward more redundancy than feels necessary: big drives (longer, riskier rebuilds → prefer RAID 6 or 10 over 5) and matched-batch disks bought together (they can fail around the same time → spread them so a correlated failure doesn't take out both halves of one mirror).

Creating one with mdadm

For completeness, a four-disk RAID 10 is one command:

mdadm --create /dev/md0 --level=10 --raid-devices=4 \
      /dev/sd{a,b,c,d}1
# then put a filesystem on it:
mkfs.ext4 /dev/md0

After that it's a normal block device. Persist the config (mdadm --detail --scan >> /etc/mdadm/mdadm.conf) so it auto-assembles on boot, and — the step everyone skips — wire mdadm --monitor to an address you actually read, plus periodic mdadm --action=check scrubs to surface bad sectors before a real read trips over them.

The catch RAID can't fix

Every level above protects against a disk dying. None of them protects against a disk lying — silently returning wrong bytes (bit-rot) that md happily mirrors or parity-protects without ever knowing they're wrong. ext4's metadata_csum catches corrupted filesystem metadata, but not your file data. Catching and self-healing data corruption needs checksums and redundancy in the same layer — which is exactly what ZFS and btrfs do, and where mdadm + ext4 reaches its limit.