Hardware vs software RAID, with the filesystem kept out of it

Most hardware-vs-software RAID arguments online are a mess, because they quietly mix two questions that live at two different layers:

Where does the RAID happen — a controller card (hardware) or the kernel (software, md/mdadm)?
Does the storage detect and heal silent corruption (bit-rot) — a property of the filesystem on top, not of the RAID underneath?

Conflate those and you get the usual "software RAID won because of checksums!" — which is wrong, because plain software RAID (mdadm + ext4) has no more bit-rot protection than a hardware controller does. Checksumming belongs to ZFS and btrfs, and that's a filesystem decision made one layer up. So let's decide the actual hardware-vs-software question with integrity off the table, then bring it back deliberately — because it's the one thing that links the two.

The baseline: same filesystem on both, integrity set aside

Compare apples to apples. Same ext4 (or xfs) on top, RAID 10 underneath, the only variable being where the RAID lives:

hardware:   disks → controller card → /dev/sdX → ext4 → files
software:   disks → md (kernel)     → /dev/md0 → ext4 → files

Identical shape. On data integrity they are tied — neither checksums your file data, both will mirror or parity-protect a corrupt block without noticing. Redundancy math is identical too. So the baseline is decided entirely on the operational axes:

	Hardware RAID	Software RAID (mdadm)
Portability	Proprietary on-disk metadata — controller dies, you need the same model/firmware to read the array	Open metadata — disks import on any Linux box (`mdadm --assemble`)
Visibility	Disks hidden behind the card; vendor-specific tooling	Per-disk SMART, mismatch counters, scrub control, mature recovery tooling
Cost	A decent card (with battery) costs real money	Free; an HBA to attach the disks is cheap
Flexibility	Reshape/grow/level-change limited by the card	`md` grows and reshapes in place fairly freely
Write cache	Battery/flash-backed cache — a real latency win for sync writes, and an extra failure point	No persistent write cache

Four of those five favour software. The lock-in row is the one that actually hurts: a hardware array is hostage to a specific controller, and "find the identical card with matching firmware, years later, after the original died" is exactly the situation you're trying to avoid by running RAID in the first place.

The one genuine hardware win is the battery-backed write cache. For a write-heavy, fsync-bound workload — a busy database committing constantly — the controller can acknowledge writes from protected cache and cut latency in a way md has no equivalent for. That's a real edge, and it's also a narrow one: most homelab workloads (media, backups, game servers, general file storage) never touch it.

Baseline verdict

Building a plain ext4/xfs array from scratch for a homelab: software RAID (mdadm). It wins four of five axes and loses only a cache feature you probably won't use.
You already own a server with a RAID card and a working battery: it'll run fine with ext4 today — but seriously consider migrating to software anyway, and plan it for your next rebuild. The faults don't go away: the battery is a wear item that silently drops the array to slow write-through the day it dies, the data stays hostage to that one controller model and firmware, and you're locked out of ever adding ZFS/btrfs integrity without pulling the card first. None of that is an emergency; all of it is real. An IT-mode HBA plus mdadm deletes every one of those failure points for the price of a single migration — so the honest default even here is "keep it running, but treat software as where you're heading," not "leave it forever."
Write-heavy database with heavy sync commits: the battery-backed cache earns its keep. This is hardware RAID's real niche.

Notice none of that mentioned bit-rot. That's the point — it doesn't belong in this comparison.

The separate question: integrity lives in the filesystem

Bit-rot protection is not something either RAID layer provides, because both treat your data as opaque blocks. Detecting silent corruption needs a checksum stored apart from the data and redundancy the same layer controls — and that only exists when the filesystem and the RAID are fused, as in ZFS and btrfs. Per the ZFS documentation's data-integrity discussion, traditional filesystems and RAID store the checksum (if any) alongside the data, so in-flight corruption and phantom reads/writes "are undetectable by most filesystems"; ZFS stores each block's checksum in its parent pointer "so that the entire pool self-validates" and can heal from a good copy.

So integrity is a three-bucket picture, and the RAID layer barely participates:

Stack	RAID lives in	Filesystem	Detects + heals bit-rot?
Hardware RAID + ext4/xfs	controller	ext4 / xfs	no
mdadm + ext4/xfs	kernel	ext4 / xfs	no
ZFS / btrfs	fused with the FS	itself	yes

The first two are the same bucket for integrity. This is the part people get backwards: mdadm doesn't beat hardware RAID on bit-rot — they're equally blind. ZFS/btrfs beat both.

Where the two questions finally connect

Here's the link that makes the hardware-vs-software choice matter beyond the baseline: getting integrity requires raw disks, and hardware RAID won't give them up.

ZFS/btrfs can only self-heal if they own the redundancy — they need to see the individual disks, not one logical volume the controller pretends is a single drive. Put ZFS on a hardware-RAID volume and it can still detect a bad checksum but can't repair it, because it has no redundant copy it controls. You've kept the detection and thrown away the cure.

This isn't opinion; it's the project's own guidance. OpenZFS's hardware documentation states it flatly:

"Hardware RAID controllers should not be used with ZFS." … "Hardware RAID will limit opportunities for ZFS to perform self healing on checksum failures." … "It is best to use a HBA instead of a RAID controller, for both performance and reliability."

That's why the homelab move is an HBA flashed to IT mode (or plain AHCI on the motherboard): it presents raw disks so the software layer — md for plain redundancy, ZFS/btrfs for redundancy plus integrity — can do its job. (Vendor blogs from 45Drives and Klara Systems make the same case well, though both sell software-defined storage, so weight them accordingly — the OpenZFS doc is the neutral anchor.)

So the real long-term cost of hardware RAID isn't losing the baseline by a little. It's that the controller forecloses the integrity upgrade path. On mdadm + ext4 you can later migrate to ZFS by exporting the disks and rebuilding; behind a RAID card, step one of that migration is "remove the card."

Don't confuse either with fakeRAID

One trap sits below both: motherboard/BIOS "RAID" (Intel RST, AMD RAIDXpert), a.k.a. fakeRAID. It's the worst of both worlds — the parity/mirroring runs on your CPU like software RAID (no dedicated processor, no battery-backed cache), but it writes proprietary, chipset-locked metadata like hardware RAID. You pay the CPU cost and accept the lock-in, and get neither the cache nor the portability. If your board offers it, set the controller to AHCI and do RAID in software instead.

Picking

flowchart TD
    START["How should I do RAID?"] --> Q1{"Want bit-rot detection and self-heal?"}
    Q1 -->|yes| ZFS["ZFS or btrfs on an HBA / IT-mode — never hardware RAID"]
    Q1 -->|"no, plain ext4 or xfs"| Q2{"Write-heavy sync workload (busy database)?"}
    Q2 -->|no| SW["mdadm — portable, free, visible, flexible"]
    Q2 -->|yes| Q3{"Already own a card with a working battery?"}
    Q3 -->|yes| HW["Hardware RAID — write-cache niche only; still plan to migrate off"]
    Q3 -->|no| SW2["mdadm on SSD/NVMe — don't buy a card just for the cache"]

    %% green = recommended for its case; amber = right only in a narrow niche
    classDef good stroke:#a3be8c,stroke-width:2.5px
    classDef niche stroke:#ebcb8b,stroke-width:2.5px
    classDef plain stroke:#7b88a1,stroke-width:2.5px
    class ZFS,SW,SW2 good
    class HW niche
    class START,Q1,Q2,Q3 plain

The whole thing in one breath: decide hardware vs software on lock-in, tooling, cost, and write cache — software wins that for almost every homelab. Decide integrity separately, at the filesystem layer, where ZFS and btrfs are the only options that have it. And the two questions touch at exactly one point: hardware RAID is the choice that slams the door on the integrity one.

For the levels themselves see RAID explained; for the integrity layer in depth see ZFS, btrfs, and when to leave mdadm + ext4; for a real array built on exactly this reasoning, the homelab RAID 10 anatomy.