Monitoring your homelab

level:Intermediate

ℹ️ Hands-on — you'll create files and run commands on your own Debian box.

A homelab grows quietly: one service becomes ten, and one day something breaks and you only find out when you try to use it. Monitoring is how the machine tells you there is a problem, instead of the other way round. The trap is reaching straight for the heavyweight stack everyone screenshots; the right move is to start with the simplest thing that answers "is it up?" and add depth only when you actually feel its absence.

What monitoring covers

Three different questions, three different tools — and you do not need all three on day one:

flowchart TD
    Q["What do I need to know?"] -->|"is it up? alert me"| K["Uptime Kuma"]
    Q -->|"how is this host doing?"| N["Netdata"]
    Q -->|"history + many hosts + dashboards"| P["Prometheus + Grafana"]

Pick by the question you have today, not by what looks impressive — most homelabs live happily on the left box for a long time.

Availability — is the service reachable, and tell me the moment it is not.
Metrics — CPU, memory, disk, temperature, per-container resource use, over time.
Logs — what a service actually said when it misbehaved.

Start here: Uptime Kuma

What uptime monitoring is

Uptime Kuma is a self-hosted "is it up?" monitor: it pings your services on a schedule, shows a green/red dashboard, and — the part that matters — alerts you (email, Telegram, Discord, ntfy, dozens more) the second a check fails. It runs as one container, like everything else in this series. Give it its own folder (~/uptime-kuma/) and save this compose file there:

docker-compose.yml

services:
  uptime-kuma:
    image: louislam/uptime-kuma:1
    volumes:
      - kuma_data:/app/data
    restart: unless-stopped
volumes:
  kuma_data:

Bring it up from that folder, on your Debian box:

docker compose up -d

You should see Docker pull the image and report Container uptime-kuma-uptime-kuma-1 Started. Now add it behind your reverse proxy, add a check per service, wire one notification channel, and you have covered 80% of what monitoring is for in an afternoon.

💡 Tip — An alert you never set up is the failure mode that bites. Configure at least one notification channel on day one — a dashboard nobody is looking at when a service dies is not monitoring.

When you want metrics: Netdata

When "it's up" stops being enough — you want to know why the box is slow — Netdata gives you per-second metrics for CPU, RAM, disk, network, temperatures, and per-container usage out of the box, with almost no configuration. One agent per host, a rich live dashboard, sensible default alarms.

ℹ️ Note — Netdata answers "how is this machine doing right now?" extremely well. It is the natural second step when a single host starts feeling loaded; you do not need Prometheus yet.

The full stack: Prometheus + Grafana

What Prometheus and Grafana are

Prometheus is a time-series database that scrapes metrics from your hosts and containers and stores their history; Grafana is the dashboard layer that graphs and alerts on that data. Together they are the industry-standard observability stack — and the same shape that runs on real production clusters like the k3s setup this series builds toward.

flowchart LR
    E["node_exporter<br/>cAdvisor"] -->|"scraped"| P["Prometheus<br/><i>stores history</i>"]
    P --> G["Grafana<br/><i>dashboards + alerts</i>"]
    P --> AM["Alertmanager<br/><i>routes alerts</i>"]

Exporters expose metrics, Prometheus stores them, Grafana visualises, Alertmanager notifies — powerful, and more to run.

⚠️ Warning — Prometheus + Grafana is several containers, a scrape config, exporters per host, and dashboards to build. It is worth it across multiple machines or when you want long-term history — but it is real ongoing maintenance. Do not start here for a single box; you will spend more time tending the monitoring than the things it monitors.

Don't forget the logs

Metrics tell you that something broke; logs tell you what it said. Two commands cover most of it on Debian:

journalctl -u <service> — the systemd journal for a system service (add -f to follow live, -e to jump to the end).
docker logs <container> — a container's stdout/stderr (add -f to follow).

For many hosts you can later centralise logs (Loki, the Grafana ecosystem's log store), but journalctl and docker logs are enough for a single homelab box.

A short close

Monitoring is the habit that turns a pile of containers into something you can trust to run unattended. Start at the cheap, high-value end — Uptime Kuma plus one alert channel — and climb to Netdata, then Prometheus and Grafana, only when a real need pushes you there. Pair it with knowing where the logs live (journalctl, docker logs) and you will hear about problems before your users do. The last step in this series is what happens when one Docker host is no longer enough: from Docker to k3s. The whole arc is in the homelab series hub.