Reading time: ~16 minutes Audience: Homelab and self-hosting enthusiasts
What Is Prometheus and Grafana?
Overview
Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. It scrapes metrics from endpoints (exporters) using HTTP pulls, stores them in a time-series database, and evaluates alerting rules. Grafana is a visualization platform that queries Prometheus (and other data sources) to create dashboards, alerts, and annotations. Together, they form the de facto standard for cloud-native monitoring and are the backbone of most homelab observability stacks.
Why Use Them for Docker Monitoring?
- Container Awareness: cAdvisor exports per-container CPU, memory, network, and filesystem metrics
- Service Discovery: Prometheus automatically discovers new containers via Docker labels
- Flexible Queries: PromQL allows complex aggregations (e.g., “top 5 containers by memory usage”)
- Alerting: Alertmanager routes alerts to Slack, Telegram, PagerDuty, or email
- Visual Dashboards: Grafana provides pre-built Docker and Node Exporter dashboards
Why Monitor Your Docker Homelab?
Prevent Resource Exhaustion
A runaway container or memory leak can crash the host. Monitoring reveals trends before they become outages. In a homelab, this means your Plex server stays online during a movie night.
Capacity Planning
Metrics show whether your current hardware is sufficient. If CPU usage is consistently above 80%, it is time to upgrade or consolidate services.
Troubleshooting
When a service slows down, dashboards reveal whether the bottleneck is CPU, disk I/O, or network latency. This replaces guesswork with data.
Installation
Prerequisites
- Docker and Docker Compose
- A Linux server (2+ cores, 4 GB RAM)
- Port 3000 (Grafana), 9090 (Prometheus), and 9093 (Alertmanager) available
Method 1: Docker Compose (Recommended)
version: "3.8"
networks:
monitoring:
driver: bridge
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
restart: always
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention.time=30d'
- '--web.enable-lifecycle'
networks:
- monitoring
grafana:
image: grafana/grafana:latest
container_name: grafana
restart: always
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=*** networks:
- monitoring
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
restart: always
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--path.rootfs=/rootfs'
networks:
- monitoring
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
container_name: cadvisor
restart: always
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
privileged: true
networks:
- monitoring
alertmanager:
image: prom/alertmanager:latest
container_name: alertmanager
restart: always
ports:
- "9093:9093"
volumes:
- ./alertmanager.yml:/etc/alertmanager/alertmanager.yml:ro
- alertmanager_data:/alertmanager
networks:
- monitoring
volumes:
prometheus_data:
grafana_data:
alertmanager_data:
prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
rule_files:
- /etc/prometheus/rules/*.yml
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'cadvisor'
static_configs:
- targets: ['cadvisor:8080']
alertmanager.yml
global:
smtp_smarthost: 'smtp.gmail.com:587'
smtp_from: '[email protected]'
smtp_auth_username: '[email protected]'
smtp_auth_password: '***
route:
receiver: 'default'
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receivers:
- name: 'default'
email_configs:
- to: '[email protected]'
subject: 'Homelab Alert: {{ .GroupLabels.alertname }}'
body: |
{{ range .Alerts }}
Alert: {{ .Annotations.summary }}
Description: {{ .Annotations.description }}
{{ end }}
Deploy:
docker compose up -d
Basic Setup and Configuration
Step 1: Add Prometheus as a Grafana Data Source
- Open Grafana at
http://your-server:3000 - Go to Connections → Data Sources → Add data source
- Select Prometheus
- URL:
http://prometheus:9090 - Click Save & Test
Step 2: Import Pre-Built Dashboards
Grafana’s dashboard library has thousands of templates. Recommended dashboards:
- Node Exporter Full: ID 1860
- Docker and Host Monitoring: ID 179
- cAdvisor Exporter: ID 14282
Import via Dashboards → Import → enter the ID.
Step 3: Verify Metrics
In Prometheus (http://your-server:9090), go to Graph and query:
up
node_cpu_seconds_total
container_cpu_usage_seconds_total
You should see time-series data. If up is 0 for a target, check the exporter’s connectivity.
Advanced Features
Recording Rules
Recording rules pre-compute expensive queries. Create /etc/prometheus/rules/rules.yml:
groups:
- name: docker_rules
interval: 30s
rules:
- record: container:memory_usage_bytes:rate5m
expr: rate(container_memory_usage_bytes[5m])
Alert Rules
Create /etc/prometheus/rules/alerts.yml:
groups:
- name: docker_alerts
rules:
- alert: HighMemoryUsage
expr: (container_memory_usage_bytes / container_spec_memory_limit_bytes) > 0.85
for: 5m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.name }} high memory usage"
description: "Memory usage is above 85% for more than 5 minutes."
Grafana Annotations
Annotate dashboards with deployment events. Use the Grafana API or webhooks to mark when a container is updated:
curl -H "Content-Type: application/json" \
-X POST \
http://admin:admin@grafana:3000/api/annotations \
-d '{"dashboardId": 1, "text": "Deployed Nextcloud v28"}'
Integrating with Your Homelab
Reverse Proxy
Expose Grafana via HTTPS:
labels:
- "traefik.enable=true"
- "traefik.http.routers.grafana.rule=Host(`grafana.yourdomain.com`)"
- "traefik.http.routers.grafana.tls.certresolver=letsencrypt"
- "traefik.http.services.grafana.loadbalancer.server.port=3000"
Loki Integration
Add Grafana Loki as a second data source for log correlation. See our Grafana Loki guide.
Backup
Back up Grafana dashboards and Prometheus data:
# Grafana dashboards
mkdir -p ./grafana-dashboards
curl -s http://admin:admin@localhost:3000/api/search | jq -r '.[].uri' | \
while read uri; do
curl -s http://admin:admin@localhost:3000/api/dashboards/$uri > ./grafana-dashboards/$(basename $uri).json
done
# Prometheus data
sudo tar czf prometheus-backup.tar.gz /var/lib/docker/volumes/monitoring_prometheus_data/_data
Alternatives to Consider
InfluxDB + Telegraf
InfluxDB is a purpose-built time-series database with a SQL-like query language (InfluxQL/Flux). Telegraf is a powerful agent with hundreds of input plugins. This stack is excellent if you prefer SQL-like queries or need to integrate with industrial sensors.
Zabbix
Zabbix is an enterprise monitoring platform with auto-discovery, agent-based monitoring, and native alerting. It is more complex than Prometheus but has a built-in dashboard and no separate Grafana requirement. Good for homelab operators who want an all-in-one solution.
Netdata
Netdata provides real-time, per-second metrics with minimal configuration. It is ideal for quick troubleshooting but has limited long-term storage and query capabilities compared to Prometheus.
| Tool | Best For | Query Language | Alerting | Long-Term Storage |
|---|---|---|---|---|
| Prometheus + Grafana | Cloud-native, Docker | PromQL | Alertmanager | Yes (TSDB) |
| InfluxDB + Telegraf | SQL-like queries, IoT | InfluxQL/Flux | Native | Yes |
| Zabbix | Enterprise all-in-one | SQL | Built-in | Yes |
| Netdata | Real-time troubleshooting | Web UI | Limited | No |
Frequently Asked Questions
How much RAM does Prometheus use?
Prometheus memory usage correlates with the number of time-series. A homelab with 1,000 series uses ~500 MB. Limit cardinality by dropping unused labels and reducing scrape frequency.
Can I monitor remote servers?
Yes. Install Node Exporter on each remote server and add it to prometheus.yml:
- job_name: 'remote-server'
static_configs:
- targets: ['192.168.1.20:9100']
How do I update Prometheus?
Update the Docker image and recreate the container. Prometheus data persists in the named volume.
docker pull prom/prometheus:latest
docker compose up -d prometheus
Does Prometheus support push metrics?
Yes, via the Pushgateway. However, pulls are preferred. Use Pushgateway only for batch jobs or ephemeral containers that cannot expose a scrape endpoint.
Conclusion
Summary
Prometheus and Grafana are the gold standard for monitoring Docker-based homelabs. They provide per-container metrics, host health, alerting, and beautiful dashboards — all without cost. With Docker Compose, the entire stack deploys in minutes. With Alertmanager, you are notified before a problem becomes an outage.
Next Steps
- Add the Loki data source for log correlation
- Create custom dashboards for your specific services
- Configure recording rules for frequently used queries
- Set up multi-channel alerting (Slack + email)
Affiliate Opportunities
- installation: hosting — VPS for remote monitoring
- integration: tool — Grafana Cloud for managed hosting
- alternatives: tool — InfluxDB Cloud or Datadog
Internal Linking Strategy
installation→ setup_guide: Docker Compose for beginnersintegration→ related_guide: Grafana Loki log aggregationalternatives→ comparison: Prometheus vs InfluxDB
CTA
- [comment] What metrics do you track in your homelab? Share your dashboard screenshots.
- [newsletter] Subscribe for weekly observability and monitoring guides.
- [internal_link] Next: set up Prometheus Alertmanager