Reading time: ~16 minutes Audience: Homelab and self-hosting enthusiasts


What Is Prometheus and Grafana?

Overview

Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. It scrapes metrics from endpoints (exporters) using HTTP pulls, stores them in a time-series database, and evaluates alerting rules. Grafana is a visualization platform that queries Prometheus (and other data sources) to create dashboards, alerts, and annotations. Together, they form the de facto standard for cloud-native monitoring and are the backbone of most homelab observability stacks.

Why Use Them for Docker Monitoring?

  • Container Awareness: cAdvisor exports per-container CPU, memory, network, and filesystem metrics
  • Service Discovery: Prometheus automatically discovers new containers via Docker labels
  • Flexible Queries: PromQL allows complex aggregations (e.g., “top 5 containers by memory usage”)
  • Alerting: Alertmanager routes alerts to Slack, Telegram, PagerDuty, or email
  • Visual Dashboards: Grafana provides pre-built Docker and Node Exporter dashboards

Why Monitor Your Docker Homelab?

Prevent Resource Exhaustion

A runaway container or memory leak can crash the host. Monitoring reveals trends before they become outages. In a homelab, this means your Plex server stays online during a movie night.

Capacity Planning

Metrics show whether your current hardware is sufficient. If CPU usage is consistently above 80%, it is time to upgrade or consolidate services.

Troubleshooting

When a service slows down, dashboards reveal whether the bottleneck is CPU, disk I/O, or network latency. This replaces guesswork with data.


Installation

Prerequisites

  • Docker and Docker Compose
  • A Linux server (2+ cores, 4 GB RAM)
  • Port 3000 (Grafana), 9090 (Prometheus), and 9093 (Alertmanager) available

Method 1: Docker Compose (Recommended)

version: "3.8"

networks:
  monitoring:
    driver: bridge

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: always
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--storage.tsdb.retention.time=30d'
      - '--web.enable-lifecycle'
    networks:
      - monitoring

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: always
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=***    networks:
      - monitoring

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: always
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--path.rootfs=/rootfs'
    networks:
      - monitoring

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    restart: always
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    privileged: true
    networks:
      - monitoring

  alertmanager:
    image: prom/alertmanager:latest
    container_name: alertmanager
    restart: always
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
      - alertmanager_data:/alertmanager
    networks:
      - monitoring

volumes:
  prometheus_data:
  grafana_data:
  alertmanager_data:

prometheus.yml

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093

rule_files:
  - /etc/prometheus/rules/*.yml

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

alertmanager.yml

global:
  smtp_smarthost: 'smtp.gmail.com:587'
  smtp_from: '[email protected]'
  smtp_auth_username: '[email protected]'
  smtp_auth_password: '***

route:
  receiver: 'default'
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h

receivers:
  - name: 'default'
    email_configs:
      - to: '[email protected]'
        subject: 'Homelab Alert: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}

Deploy:

docker compose up -d

Basic Setup and Configuration

Step 1: Add Prometheus as a Grafana Data Source

  1. Open Grafana at http://your-server:3000
  2. Go to ConnectionsData SourcesAdd data source
  3. Select Prometheus
  4. URL: http://prometheus:9090
  5. Click Save & Test

Step 2: Import Pre-Built Dashboards

Grafana’s dashboard library has thousands of templates. Recommended dashboards:

  • Node Exporter Full: ID 1860
  • Docker and Host Monitoring: ID 179
  • cAdvisor Exporter: ID 14282

Import via DashboardsImport → enter the ID.

Step 3: Verify Metrics

In Prometheus (http://your-server:9090), go to Graph and query:

up
node_cpu_seconds_total
container_cpu_usage_seconds_total

You should see time-series data. If up is 0 for a target, check the exporter’s connectivity.


Advanced Features

Recording Rules

Recording rules pre-compute expensive queries. Create /etc/prometheus/rules/rules.yml:

groups:
  - name: docker_rules
    interval: 30s
    rules:
      - record: container:memory_usage_bytes:rate5m
        expr: rate(container_memory_usage_bytes[5m])

Alert Rules

Create /etc/prometheus/rules/alerts.yml:

groups:
  - name: docker_alerts
    rules:
      - alert: HighMemoryUsage
        expr: (container_memory_usage_bytes / container_spec_memory_limit_bytes) > 0.85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Container {{ $labels.name }} high memory usage"
          description: "Memory usage is above 85% for more than 5 minutes."

Grafana Annotations

Annotate dashboards with deployment events. Use the Grafana API or webhooks to mark when a container is updated:

curl -H "Content-Type: application/json" \
  -X POST \
  http://admin:admin@grafana:3000/api/annotations \
  -d '{"dashboardId": 1, "text": "Deployed Nextcloud v28"}'

Integrating with Your Homelab

Reverse Proxy

Expose Grafana via HTTPS:

labels:
  - "traefik.enable=true"
  - "traefik.http.routers.grafana.rule=Host(`grafana.yourdomain.com`)"
  - "traefik.http.routers.grafana.tls.certresolver=letsencrypt"
  - "traefik.http.services.grafana.loadbalancer.server.port=3000"

Loki Integration

Add Grafana Loki as a second data source for log correlation. See our Grafana Loki guide.

Backup

Back up Grafana dashboards and Prometheus data:

# Grafana dashboards
mkdir -p ./grafana-dashboards
curl -s http://admin:admin@localhost:3000/api/search | jq -r '.[].uri' | \
  while read uri; do
    curl -s http://admin:admin@localhost:3000/api/dashboards/$uri > ./grafana-dashboards/$(basename $uri).json
  done

# Prometheus data
sudo tar czf prometheus-backup.tar.gz /var/lib/docker/volumes/monitoring_prometheus_data/_data

Alternatives to Consider

InfluxDB + Telegraf

InfluxDB is a purpose-built time-series database with a SQL-like query language (InfluxQL/Flux). Telegraf is a powerful agent with hundreds of input plugins. This stack is excellent if you prefer SQL-like queries or need to integrate with industrial sensors.

Zabbix

Zabbix is an enterprise monitoring platform with auto-discovery, agent-based monitoring, and native alerting. It is more complex than Prometheus but has a built-in dashboard and no separate Grafana requirement. Good for homelab operators who want an all-in-one solution.

Netdata

Netdata provides real-time, per-second metrics with minimal configuration. It is ideal for quick troubleshooting but has limited long-term storage and query capabilities compared to Prometheus.

Tool Best For Query Language Alerting Long-Term Storage
Prometheus + Grafana Cloud-native, Docker PromQL Alertmanager Yes (TSDB)
InfluxDB + Telegraf SQL-like queries, IoT InfluxQL/Flux Native Yes
Zabbix Enterprise all-in-one SQL Built-in Yes
Netdata Real-time troubleshooting Web UI Limited No

Frequently Asked Questions

How much RAM does Prometheus use?

Prometheus memory usage correlates with the number of time-series. A homelab with 1,000 series uses ~500 MB. Limit cardinality by dropping unused labels and reducing scrape frequency.

Can I monitor remote servers?

Yes. Install Node Exporter on each remote server and add it to prometheus.yml:

  - job_name: 'remote-server'
    static_configs:
      - targets: ['192.168.1.20:9100']

How do I update Prometheus?

Update the Docker image and recreate the container. Prometheus data persists in the named volume.

docker pull prom/prometheus:latest
docker compose up -d prometheus

Does Prometheus support push metrics?

Yes, via the Pushgateway. However, pulls are preferred. Use Pushgateway only for batch jobs or ephemeral containers that cannot expose a scrape endpoint.


Conclusion

Summary

Prometheus and Grafana are the gold standard for monitoring Docker-based homelabs. They provide per-container metrics, host health, alerting, and beautiful dashboards — all without cost. With Docker Compose, the entire stack deploys in minutes. With Alertmanager, you are notified before a problem becomes an outage.

Next Steps

  • Add the Loki data source for log correlation
  • Create custom dashboards for your specific services
  • Configure recording rules for frequently used queries
  • Set up multi-channel alerting (Slack + email)

Affiliate Opportunities

  • installation: hosting — VPS for remote monitoring
  • integration: tool — Grafana Cloud for managed hosting
  • alternatives: tool — InfluxDB Cloud or Datadog

Internal Linking Strategy

CTA

  • [comment] What metrics do you track in your homelab? Share your dashboard screenshots.
  • [newsletter] Subscribe for weekly observability and monitoring guides.
  • [internal_link] Next: set up Prometheus Alertmanager