Skip to Content

Production Guide: Deploy Grafana + Prometheus with Docker Compose + Nginx on Ubuntu

A production-oriented blueprint for secure metrics collection, resilient dashboards, and operational guardrails using Docker Compose and an Nginx TLS edge.

When teams move from ad-hoc server checks to reliable operations, they usually need three things quickly: a trustworthy metrics pipeline, clear dashboards for non-SRE stakeholders, and a deployment model that can be audited and recovered under pressure. This guide delivers exactly that by deploying Prometheus for scraping and storage, Grafana for visualization, and Nginx as a hardened public edge. The stack is intentionally opinionated for small and mid-size production environments where you need practical reliability without committing to full Kubernetes complexity on day one.

The workflow in this guide mirrors real operations: isolate services on a private Docker network, expose only a TLS-terminated reverse proxy, keep credentials outside image layers, and build verification checks that catch silent misconfiguration before users do. You will deploy, validate, and harden the environment, then add backup and restore procedures so the monitoring platform remains dependable during incidents and upgrades.

Architecture and flow overview

Request flow is straightforward: administrators access Grafana over HTTPS through Nginx, while Prometheus and exporters remain private on an internal bridge network. Prometheus scrapes configured targets on fixed intervals, stores time-series data in its local volume, and exposes query endpoints to Grafana through the internal network. This keeps public attack surface minimal while preserving low-latency data access between services.

For resilience, persistent volumes are used for both Prometheus TSDB and Grafana configuration/dashboards. If the host reboots, containers restart automatically and state remains intact. Operationally, this design separates concerns cleanly: Nginx controls certificates and edge policy, Prometheus handles collection and retention, and Grafana handles presentation, access control, and alerting integrations.

# Request flow
# Browser -> Nginx :443 -> Grafana :3000
# Prometheus :9090 and exporters stay on internal network
# Grafana queries Prometheus over docker network only

If the copy button does not work in your browser/editor, manually select and copy the command block.

Prerequisites

  • Ubuntu 22.04+ host (minimum 2 vCPU, 4 GB RAM; recommended 4 vCPU, 8 GB RAM).
  • A DNS name pointed to your server (example: metrics.example.com).
  • Docker Engine and Docker Compose plugin installed.
  • Ports 80/443 reachable from the internet for TLS validation.
  • A non-root sudo user for day-to-day operations.
  • SMTP or chat webhook destination for alert notifications.

Step-by-step deployment

1) Prepare directories and strict permissions

Create an isolated project path and lock down secrets files so API keys and admin credentials are never world-readable.

sudo mkdir -p /opt/monitoring/{prometheus,grafana/nginx}
cd /opt/monitoring
install -m 700 -d secrets
touch .env
chmod 600 .env

If the copy button does not work in your browser/editor, manually select and copy the command block.

2) Generate secrets and environment variables

Use long random secrets and keep them in .env. Avoid hardcoding credentials in compose files or committing them to source control.

GRAFANA_ADMIN_PASSWORD=$(openssl rand -base64 36 | tr -d "\n")
cat > /opt/monitoring/.env <

If the copy button does not work in your browser/editor, manually select and copy the command block.

3) Create Prometheus configuration

Start with core scrape jobs for Prometheus itself and node-exporter. Keep scrape intervals explicit so capacity planning and troubleshooting are predictable.

cat > /opt/monitoring/prometheus/prometheus.yml <<'EOF'
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['prometheus:9090']

  - job_name: node-exporter
    static_configs:
      - targets: ['node-exporter:9100']
EOF

If the copy button does not work in your browser/editor, manually select and copy the command block.

4) Create Docker Compose stack

This stack keeps Prometheus internal, publishes Grafana only to localhost, and routes public traffic through Nginx. That pattern allows security policy and TLS controls to remain centralized.

cat > /opt/monitoring/docker-compose.yml <<'EOF'
services:
  prometheus:
    image: prom/prometheus:latest
    command:
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.retention.time=${PROM_RETENTION}
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - prometheus_data:/prometheus
    restart: unless-stopped

  node-exporter:
    image: prom/node-exporter:latest
    command:
      - --path.rootfs=/host
    volumes:
      - /:/host:ro,rslave
    restart: unless-stopped

  grafana:
    image: grafana/grafana:latest
    env_file: .env
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - 127.0.0.1:3000:3000
    depends_on:
      - prometheus
    restart: unless-stopped

volumes:
  prometheus_data:
  grafana_data:
EOF

If the copy button does not work in your browser/editor, manually select and copy the command block.

5) Configure Nginx reverse proxy with TLS

Nginx terminates TLS and forwards requests to Grafana on localhost. Add security headers and conservative request limits to reduce abuse risk.

sudo apt-get update && sudo apt-get install -y nginx certbot python3-certbot-nginx

cat | sudo tee /etc/nginx/sites-available/grafana >/dev/null <<'EOF'
server {
  listen 80;
  server_name metrics.example.com;

  location / {
    proxy_pass http://127.0.0.1:3000;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_read_timeout 60s;
  }
}
EOF

sudo ln -s /etc/nginx/sites-available/grafana /etc/nginx/sites-enabled/grafana
sudo nginx -t && sudo systemctl reload nginx
sudo certbot --nginx -d metrics.example.com --redirect --agree-tos -m [email protected] -n

If the copy button does not work in your browser/editor, manually select and copy the command block.

6) Launch services and initialize Grafana datasource

Bring the stack up, verify container health, and register Prometheus as a data source. For repeatable provisioning, use Grafana provisioning files in production rather than manual clicks.

cd /opt/monitoring
docker compose up -d
docker compose ps

# quick health checks
curl -sf http://127.0.0.1:3000/api/health
docker compose exec prometheus wget -qO- http://localhost:9090/-/ready

If the copy button does not work in your browser/editor, manually select and copy the command block.

7) Add retention, backups, and upgrade discipline

Monitoring data is most valuable during incidents, so backup discipline matters. Capture Grafana and Prometheus volumes with predictable retention and test restores monthly.

cat > /usr/local/bin/monitoring-backup.sh <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
TS=$(date +%F-%H%M%S)
DEST=/var/backups/monitoring/$TS
mkdir -p "$DEST"
docker run --rm -v monitoring_grafana_data:/src -v "$DEST":/dest alpine tar czf /dest/grafana.tgz -C /src .
docker run --rm -v monitoring_prometheus_data:/src -v "$DEST":/dest alpine tar czf /dest/prometheus.tgz -C /src .
find /var/backups/monitoring -mindepth 1 -maxdepth 1 -type d -mtime +14 -exec rm -rf {} +
EOF
chmod +x /usr/local/bin/monitoring-backup.sh
( crontab -l 2>/dev/null; echo "20 2 * * * /usr/local/bin/monitoring-backup.sh" ) | crontab -

If the copy button does not work in your browser/editor, manually select and copy the command block.

Configuration and secrets handling best practices

Use environment variables only for non-sensitive defaults and prefer Docker secrets or external secret managers for high-trust environments. Restrict shell history in shared bastions, rotate Grafana admin credentials after handoff, and use separate service accounts for automation. If you integrate cloud metrics or managed databases, place API tokens in secret files mounted read-only into the container and avoid embedding tokens in dashboard JSON exports.

At the network level, enforce host firewall rules so only 80/443 are public and all monitoring backplane ports remain private. If teams need remote Prometheus access, expose it through authenticated private networking (VPN, Tailscale, WireGuard) rather than opening port 9090 publicly. Finally, define retention based on incident review windows and available disk IOPS; long retention without storage tuning causes preventable query latency and compaction churn.

Verification checklist

  • docker compose ps shows all services as running and healthy.
  • https://metrics.example.com loads with valid TLS certificate.
  • Grafana can query Prometheus and dashboard panels return recent datapoints.
  • Node exporter metrics include CPU, memory, filesystem, and network timeseries.
  • Backup script creates restorable archives and old snapshots rotate automatically.
cd /opt/monitoring
docker compose ps
curl -I https://metrics.example.com
curl -sf https://metrics.example.com/api/health
docker compose exec prometheus wget -qO- http://localhost:9090/api/v1/targets | jq ".data.activeTargets[] | {health: .health, labels: .labels.job}"

If the copy button does not work in your browser/editor, manually select and copy the command block.

Common issues and fixes

Grafana login loops after reverse proxy setup

Set the external URL and protocol correctly via GF_SERVER_ROOT_URL and ensure Nginx forwards X-Forwarded-Proto. Mismatch between internal and external URL is the usual root cause.

Prometheus target is down for node-exporter

Confirm exporter container is attached to the same compose network and target hostname matches service name. Check for host firewall rules blocking internal container traffic if custom bridge networks are used.

Disk usage grows unexpectedly

Retention and scrape cardinality are often misaligned. Reduce label explosion, trim scrape intervals for noisy jobs, and tune retention to match available storage and recovery objectives.

Intermittent 502 from Nginx

Grafana may be restarting during plugin changes or memory pressure. Inspect container logs, increase host memory/swap baseline, and add upstream timeouts conservatively.

FAQ

Can I expose Prometheus directly for remote troubleshooting?

Avoid public exposure. Use private networking plus authentication, or route through a bastion with strict access control and audit logging.

How much retention should I keep in Prometheus?

Start with 15 days for small environments, then tune by incident review needs, cardinality, and disk performance. Increase only after measuring compaction overhead.

Should I run Alertmanager in the same stack?

Yes for most production setups. Keep Alertmanager internal, integrate with email/chat/webhooks, and test escalation paths during daytime drills.

How do I make dashboards reproducible across environments?

Provision datasources and dashboards as code using Grafana provisioning files and version control. Avoid one-off UI-only changes in production.

Is Docker Compose enough for production monitoring?

For many teams yes, especially single-region workloads with clear backup and restore procedures. Move to Kubernetes when you need stronger multi-node orchestration constraints.

How do I rotate Grafana admin credentials safely?

Create named admin users first, validate access, then rotate or disable bootstrap credentials in a maintenance window with rollback notes.

What is the safest way to add application metrics?

Expose application metrics on private interfaces and scrape through Prometheus service discovery. Do not place metrics endpoints directly on public networks.

Related guides

For teams building a broader platform baseline, these guides complement this deployment:

Talk to us

If you want this implemented with hardened defaults, observability, and tested recovery playbooks, our team can help.

Contact Us

Production Guide: Deploy n8n with Docker Compose + Nginx + PostgreSQL on Ubuntu
A production-oriented n8n deployment with secure config, backup strategy, verification checks, and practical troubleshooting.