Docker Monitoring with Grafana and Prometheus: A Complete Homelab Guide

Reading time: ~16 minutes Audience: Homelab and self-hosting enthusiasts

What Is Prometheus and Grafana?

Overview

Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. It scrapes metrics from endpoints (exporters) using HTTP pulls, stores them in a time-series database, and evaluates alerting rules. Grafana is a visualization platform that queries Prometheus (and other data sources) to create dashboards, alerts, and annotations. Together, they form the de facto standard for cloud-native monitoring and are the backbone of most homelab observability stacks.

Why Use Them for Docker Monitoring?

Container Awareness: cAdvisor exports per-container CPU, memory, network, and filesystem metrics
Service Discovery: Prometheus automatically discovers new containers via Docker labels
Flexible Queries: PromQL allows complex aggregations (e.g., “top 5 containers by memory usage”)
Alerting: Alertmanager routes alerts to Slack, Telegram, PagerDuty, or email
Visual Dashboards: Grafana provides pre-built Docker and Node Exporter dashboards

Why Monitor Your Docker Homelab?

Prevent Resource Exhaustion

A runaway container or memory leak can crash the host. Monitoring reveals trends before they become outages. In a homelab, this means your Plex server stays online during a movie night.

Capacity Planning

Metrics show whether your current hardware is sufficient. If CPU usage is consistently above 80%, it is time to upgrade or consolidate services.

Troubleshooting

When a service slows down, dashboards reveal whether the bottleneck is CPU, disk I/O, or network latency. This replaces guesswork with data.

Installation

Prerequisites

Docker and Docker Compose
A Linux server (2+ cores, 4 GB RAM)
Port 3000 (Grafana), 9090 (Prometheus), and 9093 (Alertmanager) available

Method 1: Docker Compose (Recommended)

version: "3.8"

networks:
  monitoring:
    driver: bridge

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: always
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--storage.tsdb.retention.time=30d'
      - '--web.enable-lifecycle'
    networks:
      - monitoring

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: always
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=***    networks:
      - monitoring

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: always
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--path.rootfs=/rootfs'
    networks:
      - monitoring

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    restart: always
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    privileged: true
    networks:
      - monitoring

  alertmanager:
    image: prom/alertmanager:latest
    container_name: alertmanager
    restart: always
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
      - alertmanager_data:/alertmanager
    networks:
      - monitoring

volumes:
  prometheus_data:
  grafana_data:
  alertmanager_data:

prometheus.yml

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093

rule_files:
  - /etc/prometheus/rules/*.yml

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

alertmanager.yml

global:
  smtp_smarthost: 'smtp.gmail.com:587'
  smtp_from: '[email protected]'
  smtp_auth_username: '[email protected]'
  smtp_auth_password: '***

route:
  receiver: 'default'
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h

receivers:
  - name: 'default'
    email_configs:
      - to: '[email protected]'
        subject: 'Homelab Alert: {{ .GroupLabels.alertname }}'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}

Deploy:

docker compose up -d

Basic Setup and Configuration

Step 1: Add Prometheus as a Grafana Data Source

Open Grafana at http://your-server:3000
Go to Connections → Data Sources → Add data source
Select Prometheus
URL: http://prometheus:9090
Click Save & Test

Step 2: Import Pre-Built Dashboards

Grafana’s dashboard library has thousands of templates. Recommended dashboards:

Node Exporter Full: ID 1860
Docker and Host Monitoring: ID 179
cAdvisor Exporter: ID 14282

Import via Dashboards → Import → enter the ID.

Step 3: Verify Metrics

In Prometheus (http://your-server:9090), go to Graph and query:

up
node_cpu_seconds_total
container_cpu_usage_seconds_total

You should see time-series data. If up is 0 for a target, check the exporter’s connectivity.

Advanced Features

Recording Rules

Recording rules pre-compute expensive queries. Create /etc/prometheus/rules/rules.yml:

groups:
  - name: docker_rules
    interval: 30s
    rules:
      - record: container:memory_usage_bytes:rate5m
        expr: rate(container_memory_usage_bytes[5m])

Alert Rules

Create /etc/prometheus/rules/alerts.yml:

groups:
  - name: docker_alerts
    rules:
      - alert: HighMemoryUsage
        expr: (container_memory_usage_bytes / container_spec_memory_limit_bytes) > 0.85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Container {{ $labels.name }} high memory usage"
          description: "Memory usage is above 85% for more than 5 minutes."

Grafana Annotations

Annotate dashboards with deployment events. Use the Grafana API or webhooks to mark when a container is updated:

curl -H "Content-Type: application/json" \
  -X POST \
  http://admin:admin@grafana:3000/api/annotations \
  -d '{"dashboardId": 1, "text": "Deployed Nextcloud v28"}'

Integrating with Your Homelab

Reverse Proxy

Expose Grafana via HTTPS:

labels:
  - "traefik.enable=true"
  - "traefik.http.routers.grafana.rule=Host(`grafana.yourdomain.com`)"
  - "traefik.http.routers.grafana.tls.certresolver=letsencrypt"
  - "traefik.http.services.grafana.loadbalancer.server.port=3000"

Loki Integration

Add Grafana Loki as a second data source for log correlation. See our Grafana Loki guide.

Backup

Back up Grafana dashboards and Prometheus data:

# Grafana dashboards
mkdir -p ./grafana-dashboards
curl -s http://admin:admin@localhost:3000/api/search | jq -r '.[].uri' | \
  while read uri; do
    curl -s http://admin:admin@localhost:3000/api/dashboards/$uri > ./grafana-dashboards/$(basename $uri).json
  done

# Prometheus data
sudo tar czf prometheus-backup.tar.gz /var/lib/docker/volumes/monitoring_prometheus_data/_data

Alternatives to Consider

InfluxDB + Telegraf

InfluxDB is a purpose-built time-series database with a SQL-like query language (InfluxQL/Flux). Telegraf is a powerful agent with hundreds of input plugins. This stack is excellent if you prefer SQL-like queries or need to integrate with industrial sensors.

Zabbix

Zabbix is an enterprise monitoring platform with auto-discovery, agent-based monitoring, and native alerting. It is more complex than Prometheus but has a built-in dashboard and no separate Grafana requirement. Good for homelab operators who want an all-in-one solution.

Netdata

Netdata provides real-time, per-second metrics with minimal configuration. It is ideal for quick troubleshooting but has limited long-term storage and query capabilities compared to Prometheus.

Tool	Best For	Query Language	Alerting	Long-Term Storage
Prometheus + Grafana	Cloud-native, Docker	PromQL	Alertmanager	Yes (TSDB)
InfluxDB + Telegraf	SQL-like queries, IoT	InfluxQL/Flux	Native	Yes
Zabbix	Enterprise all-in-one	SQL	Built-in	Yes
Netdata	Real-time troubleshooting	Web UI	Limited	No

Frequently Asked Questions

How much RAM does Prometheus use?

Prometheus memory usage correlates with the number of time-series. A homelab with 1,000 series uses ~500 MB. Limit cardinality by dropping unused labels and reducing scrape frequency.

Can I monitor remote servers?

Yes. Install Node Exporter on each remote server and add it to prometheus.yml:

  - job_name: 'remote-server'
    static_configs:
      - targets: ['192.168.1.20:9100']

How do I update Prometheus?

Update the Docker image and recreate the container. Prometheus data persists in the named volume.

docker pull prom/prometheus:latest
docker compose up -d prometheus

Does Prometheus support push metrics?

Yes, via the Pushgateway. However, pulls are preferred. Use Pushgateway only for batch jobs or ephemeral containers that cannot expose a scrape endpoint.

Conclusion

Summary

Prometheus and Grafana are the gold standard for monitoring Docker-based homelabs. They provide per-container metrics, host health, alerting, and beautiful dashboards — all without cost. With Docker Compose, the entire stack deploys in minutes. With Alertmanager, you are notified before a problem becomes an outage.

Next Steps

Add the Loki data source for log correlation
Create custom dashboards for your specific services
Configure recording rules for frequently used queries
Set up multi-channel alerting (Slack + email)

Affiliate Opportunities

installation: hosting — VPS for remote monitoring
integration: tool — Grafana Cloud for managed hosting
alternatives: tool — InfluxDB Cloud or Datadog

Internal Linking Strategy

installation → setup_guide: Docker Compose for beginners
integration → related_guide: Grafana Loki log aggregation
alternatives → comparison: Prometheus vs InfluxDB

CTA

[comment] What metrics do you track in your homelab? Share your dashboard screenshots.
[newsletter] Subscribe for weekly observability and monitoring guides.
[internal_link] Next: set up Prometheus Alertmanager