Skip to Content

Production Guide: Deploy Grafana with Docker Compose + Nginx + PostgreSQL on Ubuntu

A production-oriented Grafana deployment with hardened config, reverse proxy, backups, and operations runbooks.

When teams adopt Grafana quickly, they often begin with a single-container demo and then hit production blockers: dashboard loss during upgrades, ad-hoc credentials, weak network boundaries, and alert routing that silently fails under stress. This guide gives you a practical production baseline that remains simple enough for one Ubuntu host while still enforcing disciplined operations.

We deploy Grafana with Docker Compose, put Nginx in front as the policy boundary, and store Grafana state in PostgreSQL. The workflow is intentionally operations-first: deterministic startup ordering, explicit network boundaries, backup and restore paths, and troubleshooting procedures your on-call team can execute quickly without improvisation.

Real-world use case: a SaaS team has outgrown ad-hoc monitoring. They need reliable dashboards and alerting for app, infrastructure, and database performance across staging and production, with role-based access and an upgrade path to multi-node architecture later. This setup provides that baseline today and keeps migration options open for tomorrow.

Architecture and flow overview

The stack has three core services: PostgreSQL for durable Grafana state, Grafana for dashboards and alerting, and Nginx as the edge reverse proxy. Incoming requests land at Nginx, then forward privately to Grafana. Grafana writes metadata to PostgreSQL over an internal Docker network not directly exposed to the public internet.

This architecture keeps host exposure minimal: only Nginx binds to 80/443; Grafana and PostgreSQL stay private. It also avoids brittle defaults. Grafana dashboards, user settings, teams, and alerting state survive restarts and upgrades because data is persisted in PostgreSQL and managed by volume policy.

Operationally, this model is easy to reason about. If edge traffic fails, inspect Nginx first. If login and dashboard rendering fail, inspect Grafana logs and DB health next. If persistence breaks, restore PostgreSQL data from tested backups. Clear ownership boundaries reduce incident MTTR significantly.

Prerequisites

  • Ubuntu 22.04/24.04 with at least 2 vCPU, 4 GB RAM, and 30+ GB available disk.
  • Docker Engine + Docker Compose plugin installed and verified.
  • DNS record for your Grafana hostname (for example, grafana.example.com).
  • A deployment user with sudo access and ability to manage firewall rules.
  • A backup location for PostgreSQL dumps and periodic snapshots.

Do not place secrets directly in compose YAML or shell history. Use a locked-down .env file (or secret manager), then rotate credentials after first successful login.

Step-by-step deployment

1) Prepare host and project directories

Create a stable directory layout for reproducible maintenance, upgrades, and handoffs.

sudo apt update && sudo apt install -y curl ca-certificates gnupg
mkdir -p ~/apps/grafana/{nginx,postgres,provisioning,backups}
cd ~/apps/grafana

Manual copy fallback: If the copy button does not work in your browser/editor, select the code block and copy it manually.

2) Create environment file and lock permissions

Generate strong secrets and keep them readable only by the deployment account.

cd ~/apps/grafana
cat > .env <<'EOF'
POSTGRES_DB=grafana
POSTGRES_USER=grafana
POSTGRES_PASSWORD=REPLACE_WITH_LONG_RANDOM_SECRET
GF_SECURITY_ADMIN_USER=admin
GF_SECURITY_ADMIN_PASSWORD=REPLACE_WITH_ANOTHER_LONG_SECRET
GF_SERVER_DOMAIN=grafana.example.com
GF_SERVER_ROOT_URL=https://grafana.example.com
EOF
chmod 600 .env

Manual copy fallback: If the copy button does not work in your browser/editor, select the code block and copy it manually.

3) Create docker-compose stack

Use explicit restart policies, health checks, and network scoping.

cat > docker-compose.yml <<'EOF'
services:
  postgres:
    image: postgres:16
    container_name: grafana-postgres
    env_file: .env
    environment:
      POSTGRES_DB: ${POSTGRES_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    volumes:
      - pg_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 10s
      timeout: 5s
      retries: 10
    restart: unless-stopped
    networks: [backend]

  grafana:
    image: grafana/grafana:11.1.0
    container_name: grafana-app
    env_file: .env
    environment:
      GF_DATABASE_TYPE: postgres
      GF_DATABASE_HOST: postgres:5432
      GF_DATABASE_NAME: ${POSTGRES_DB}
      GF_DATABASE_USER: ${POSTGRES_USER}
      GF_DATABASE_PASSWORD: ${POSTGRES_PASSWORD}
      GF_SERVER_DOMAIN: ${GF_SERVER_DOMAIN}
      GF_SERVER_ROOT_URL: ${GF_SERVER_ROOT_URL}
      GF_SECURITY_ADMIN_USER: ${GF_SECURITY_ADMIN_USER}
      GF_SECURITY_ADMIN_PASSWORD: ${GF_SECURITY_ADMIN_PASSWORD}
      GF_USERS_ALLOW_SIGN_UP: "false"
      GF_AUTH_ANONYMOUS_ENABLED: "false"
      GF_SECURITY_COOKIE_SECURE: "true"
      GF_SECURITY_COOKIE_SAMESITE: strict
    depends_on:
      postgres:
        condition: service_healthy
    volumes:
      - grafana_data:/var/lib/grafana
      - ./provisioning:/etc/grafana/provisioning
    restart: unless-stopped
    networks: [backend]

  nginx:
    image: nginx:1.27
    container_name: grafana-nginx
    depends_on:
      - grafana
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/default.conf:/etc/nginx/conf.d/default.conf:ro
    restart: unless-stopped
    networks: [backend]

networks:
  backend:
    driver: bridge

volumes:
  pg_data:
  grafana_data:
EOF

Manual copy fallback: If the copy button does not work in your browser/editor, select the code block and copy it manually.

4) Create Nginx reverse proxy config

Begin with HTTP validation, then add certificates for full TLS termination at the edge.

cat > nginx/default.conf <<'EOF'
server {
  listen 80;
  server_name grafana.example.com;

  client_max_body_size 20m;
  proxy_read_timeout 90s;

  location / {
    proxy_pass http://grafana:3000;
    proxy_http_version 1.1;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
  }
}
EOF

Manual copy fallback: If the copy button does not work in your browser/editor, select the code block and copy it manually.

5) Start stack and verify basic health

Confirm startup order and container readiness before exposing to users.

docker compose pull
docker compose up -d
docker compose ps

Manual copy fallback: If the copy button does not work in your browser/editor, select the code block and copy it manually.

6) Apply host firewall baseline

Restrict ingress to SSH and web ports; deny all else.

sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow OpenSSH
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable
sudo ufw status verbose

Manual copy fallback: If the copy button does not work in your browser/editor, select the code block and copy it manually.

7) Add backup automation for PostgreSQL

Automate dumps early so you are never “about to configure backups” during an outage.

cat > backup-postgres.sh <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
cd ~/apps/grafana
source .env
mkdir -p backups
TS=$(date +%F-%H%M)
docker compose exec -T postgres pg_dump -U "$POSTGRES_USER" "$POSTGRES_DB" > "backups/grafana-$TS.sql"
find backups -type f -name 'grafana-*.sql' -mtime +14 -delete
EOF
chmod +x backup-postgres.sh

Manual copy fallback: If the copy button does not work in your browser/editor, select the code block and copy it manually.

Configuration and secret-handling best practices

Store production secrets outside source control and outside chat transcripts. Use placeholders in versioned files and inject real values at deployment time through your secret manager or CI environment variables. Rotate secrets periodically and immediately after staffing changes or incident exposure.

After first login, disable shared admin usage. Create named accounts, assign least-privilege roles, and require MFA/SSO where your identity stack supports it. Keep admin access to a small set of operators and document emergency break-glass procedure separately.

Use immutable image tags for controlled upgrades. Never switch to floating tags in production without testing because plugin compatibility and migration behavior can change unexpectedly between patch lines. Maintain a changelog with rollback notes and validated backup IDs.

For PostgreSQL, run both logical backups and periodic filesystem snapshots. Logical dumps provide portable restores and schema visibility; snapshots provide fast host-level rollback. Validate restore quarterly at minimum so backup confidence is based on evidence, not assumption.

Production operations playbook

Good observability systems fail when operations are inconsistent, not when YAML is imperfect. Build repeatable runbooks now: upgrade windows, smoke tests, backup validation, alert route checks, and post-change documentation. This shrinks recovery time and avoids tribal knowledge dependence.

Track platform health with a concise SLO-focused dashboard set first: request latency, panel load time, error ratio, database connection health, container restart counts, disk growth, and alert delivery success. Expand only after core signals are stable and actionable.

As usage grows, this architecture evolves cleanly: externalize PostgreSQL to managed service, add ingress policy controls, and split telemetry pipelines by criticality. Because service boundaries are explicit in this baseline, migration friction stays manageable.

Verification checklist

  • Grafana UI loads from your configured domain.
  • Admin login succeeds, then admin password is rotated.
  • A test dashboard remains after container restart.
  • PostgreSQL healthcheck reports healthy consistently.
  • Nginx proxy forwards websocket and normal HTTP traffic.
  • Backup script runs and restore test succeeds in a disposable environment.
docker compose ps
docker compose logs --tail=120 grafana
docker compose logs --tail=120 postgres
docker compose logs --tail=120 nginx

docker compose exec grafana /bin/sh -lc 'wget -qO- http://localhost:3000/api/health'

Manual copy fallback: If the copy button does not work in your browser/editor, select the code block and copy it manually.

Common issues and fixes

Grafana migration errors on startup

Usually caused by invalid DB credentials, stale failed schema attempts, or race conditions from bad startup order. Validate env values, keep a single writer, and confirm PostgreSQL readiness before Grafana starts.

Dashboards disappear after restart

Most often indicates SQLite fallback or missing persistent volume mapping. Confirm PostgreSQL backend is active and check Grafana startup logs for database configuration lines.

Reverse proxy returns 502/504

Verify service DNS resolution inside Docker network and ensure proxy_pass target matches service name/port. Inspect Nginx logs and check container health side-by-side.

Alert notifications arrive late or fail intermittently

Validate outbound egress, endpoint credentials, and host time synchronization. Alerting reliability often degrades when NTP drift or flaky outbound connectivity is ignored.

Slow dashboard rendering under load

Profile expensive panels, optimize datasource queries, and reduce overly broad time windows. If saturation persists, tune PostgreSQL and consider scaling Grafana frontends horizontally.

FAQ

Can I skip PostgreSQL and keep SQLite for small teams?

You can for short-lived proof-of-concepts, but production reliability and concurrency improve significantly with PostgreSQL, especially as dashboard edits and alerting complexity increase.

Where should TLS terminate in this architecture?

Terminate at Nginx so certificate lifecycle, edge headers, and routing policy are centralized. Keep Grafana on the private network without direct public port exposure.

How frequently should backups run?

At least daily for low-change environments, plus pre-upgrade backups every time. High-change teams should run more frequent dumps and periodic snapshot backups.

What is the safest upgrade strategy for Grafana?

Pin image versions, run upgrades in a maintenance window, create fresh backup artifacts, validate smoke tests immediately, and keep rollback steps documented before starting.

How do I support multiple teams securely?

Use Grafana organizations, folders, and team-level permissions. Limit admin rights, enforce identity controls, and separate datasource privileges by environment sensitivity.

Do I need to monitor the monitoring stack itself?

Absolutely. Monitor dashboard response times, datasource failures, queue latency, storage growth, and alert delivery health so the observability platform remains trustworthy during incidents.

What first signal indicates this stack is under-provisioned?

You will typically see rising panel query latency, increasing DB saturation, and frequent container restarts during peak windows. Capacity planning should begin before user-visible failures.

Related internal guides

Talk to us

If you want this implemented with hardened defaults, observability, and tested recovery playbooks, our team can help.

Contact Us

Production Guide: Deploy Authentik with Docker Compose + Traefik + PostgreSQL on Ubuntu
Build a production-grade Authentik SSO platform with secure secrets, resilient routing, verification checks, and practical recovery workflows.