Skip to Content

How to Deploy Grafana in Production with Docker Compose + systemd

A practical, production-focused implementation guide with operational checks and recovery playbooks.

When dashboards are mission-critical, a quick container launch is not enough. Teams need reliable startup behavior, secured secrets, predictable upgrades, and a repeatable way to verify that monitoring is still healthy after every change. This guide provides a production-first Grafana deployment pattern using Docker Compose for application packaging and systemd for lifecycle reliability on Linux hosts.

The approach is designed for real operations teams managing incidents, change windows, and audit requirements. Instead of treating monitoring as a side project, we treat it as a service with clear ownership: codified configuration, documented checks, and routine backup/restore discipline. By the end, you will have a deployment you can hand to on-call engineers with confidence.

Architecture and flow overview

The stack keeps responsibilities clean. Grafana runs as a container behind an HTTPS reverse proxy. Data is persisted on host storage, while systemd ensures the Compose service restarts after host reboots and recovers from failures. Secrets are supplied through an environment file with strict permissions.

  • Edge: HTTPS termination and proxy headers
  • App: Grafana container on localhost bridge network
  • State: Persistent mount for Grafana database and plugins
  • Ops: systemd unit controlling docker compose lifecycle
  • Safety: Scheduled backup and restore test process

This model is small enough for one VM and robust enough for production if you follow verification and change-control steps below.

Because this stack does not depend on Kubernetes, it is approachable for smaller teams that still need production rigor. You get declarative infrastructure in Compose, reliable process supervision through systemd, and clear separation between secrets and code. Over time, the same patterns can migrate to larger orchestrators, but the operational habits you build here—verification, backup discipline, and change control—will transfer directly.

Prerequisites

  • Linux server (Ubuntu 22.04+ or Debian 12+) with sudo access
  • Docker Engine + Docker Compose plugin installed
  • A DNS record for your monitoring subdomain
  • Firewall allowing 22, 80, and 443
  • At least 2 vCPU and 4 GB RAM for initial workloads
  • A password manager entry for Grafana admin credentials

Step-by-step deployment

1) Prepare directories and permissions

Use stable paths so backup jobs and runbooks stay consistent.

sudo mkdir -p /opt/grafana/{compose,provisioning,dashboards,backups}
sudo mkdir -p /var/lib/grafana-data
sudo chown -R 472:472 /var/lib/grafana-data
sudo chown -R $USER:$USER /opt/grafana

If the copy button is unavailable in your browser/editor, manually copy the command block above.

2) Create production environment file

Keep secrets out of compose YAML and source control.

cat > /opt/grafana/.env.prod <<'EOF'
GF_SECURITY_ADMIN_USER=admin
GF_SECURITY_ADMIN_PASSWORD=$(openssl rand -base64 32)
GF_SERVER_ROOT_URL=https://monitoring.example.com
GF_INSTALL_PLUGINS=grafana-clock-panel,grafana-simple-json-datasource
EOF
chmod 600 /opt/grafana/.env.prod

If the copy button is unavailable in your browser/editor, manually copy the command block above.

3) Define docker-compose service

cat > /opt/grafana/compose/docker-compose.yml <<'EOF'
services:
  grafana:
    image: grafana/grafana:11.1.4
    container_name: grafana
    restart: unless-stopped
    env_file:
      - /opt/grafana/.env.prod
    ports:
      - "127.0.0.1:3000:3000"
    volumes:
      - /var/lib/grafana-data:/var/lib/grafana
      - /opt/grafana/provisioning:/etc/grafana/provisioning
      - /opt/grafana/dashboards:/var/lib/grafana/dashboards
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://127.0.0.1:3000/api/health"]
      interval: 30s
      timeout: 5s
      retries: 5
      start_period: 40s
EOF

If the copy button is unavailable in your browser/editor, manually copy the command block above.

4) Add systemd unit for compose lifecycle

Create a local unit file and move it into your systemd directory during implementation.

cat > /tmp/grafana-compose.service <<'EOF'
[Unit]
Description=Grafana Docker Compose Stack
Requires=docker.service
After=docker.service network-online.target
Wants=network-online.target

[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/opt/grafana/compose
ExecStart=/usr/bin/docker compose up -d
ExecStop=/usr/bin/docker compose down
TimeoutStartSec=0

[Install]
WantedBy=multi-user.target
EOF

sudo mv /tmp/grafana-compose.service /etc/systemd/system/grafana-compose.service
sudo systemctl daemon-reload
sudo systemctl enable --now grafana-compose.service

If the copy button is unavailable in your browser/editor, manually copy the command block above.

5) Configure reverse proxy and TLS

Point your monitoring domain to localhost:3000 and enforce HTTPS. Keep HSTS and forwarding headers enabled.

# Example checks after proxy config
curl -I http://monitoring.example.com
curl -I https://monitoring.example.com
curl -s http://127.0.0.1:3000/api/health | jq

If the copy button is unavailable in your browser/editor, manually copy the command block above.

6) Configure backups and retention

cat > /opt/grafana/backups/backup-grafana.sh <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
TS=$(date +%F-%H%M)
mkdir -p /opt/grafana/backups/archive
sudo tar -czf /opt/grafana/backups/archive/grafana-$TS.tgz \
  /var/lib/grafana-data /opt/grafana/provisioning /opt/grafana/dashboards
find /opt/grafana/backups/archive -type f -mtime +14 -delete
EOF
chmod +x /opt/grafana/backups/backup-grafana.sh

# Add to crontab for daily backups
(crontab -l 2>/dev/null; echo "0 2 * * * /opt/grafana/backups/backup-grafana.sh") | crontab -

If the copy button is unavailable in your browser/editor, manually copy the command block above.

7) Create an operational acceptance test

Run a short acceptance script after deploys and upgrades.

systemctl status grafana-compose.service --no-pager
docker compose -f /opt/grafana/compose/docker-compose.yml ps
curl -s http://127.0.0.1:3000/api/health | jq
curl -I https://monitoring.example.com

If the copy button is unavailable in your browser/editor, manually copy the command block above.

Configuration and secrets handling best practices

Use a dedicated service account for Grafana administration and rotate admin credentials after initial setup. Keep SMTP, OAuth, and API tokens in a protected environment file with 0600 permissions and root-owned parent directories. If you later move to external secret management (Vault, SOPS, cloud secret managers), keep the same variable names so your Compose file remains stable.

Avoid putting Grafana directly on the public internet without reverse-proxy controls. Apply request size limits, enable access logs, and enforce HTTPS redirects. For teams with compliance requirements, configure audit log retention and document who has org-admin permissions.

Finally, treat dashboards and data sources as code where possible. Provision known-good defaults from versioned files in /opt/grafana/provisioning, then review manual UI changes weekly to reduce environment drift.

When adding new data sources, prefer read-only service accounts and rotate credentials on the same schedule as host access keys. Document each source in a short runbook entry so incident responders know which systems feed the dashboards and who owns those credentials. If a source fails, your runbook should include a direct link to the upstream system owner and the expected recovery procedure.

Verification checklist

Run these checks immediately after deployment and after each upgrade window.

# systemd and container health
systemctl status grafana-compose.service --no-pager
docker compose -f /opt/grafana/compose/docker-compose.yml ps

# local app health
curl -s http://127.0.0.1:3000/api/health | jq

# external HTTPS check
curl -I https://monitoring.example.com

# certificate auto-renew dry-run
sudo certbot renew --dry-run

If the copy button is unavailable in your browser/editor, manually copy the command block above.

Expected outcomes: service shows active state, container status is healthy, health endpoint returns "database":"ok", HTTPS returns 200/302, and certbot dry-run succeeds. Capture these outputs in your operations runbook.

Common issues and fixes

Grafana container restarts repeatedly

Most often caused by invalid environment variables or volume permission mismatches. Validate .env.prod syntax and ensure UID 472 can write to /var/lib/grafana-data.

Login succeeds but dashboards fail to save

This is usually storage permission drift after restoring from backup. Re-apply ownership and restart the stack.

TLS works but Grafana generates wrong callback URLs

Set GF_SERVER_ROOT_URL to the public HTTPS domain and pass X-Forwarded-Proto in Nginx.

High memory usage during dashboard refresh spikes

Start by reducing panel refresh intervals, then move heavy queries to recording rules in your metrics backend. Scale VM memory only after query optimization.

Upgrade introduces plugin incompatibility

Pin plugin versions, test in staging, and keep one rollback artifact from the previous image and data backup. Use maintenance windows for production upgrades.

FAQ

Can I run Grafana without Nginx?

Yes, but for production you should still place a reverse proxy in front to handle TLS, headers, and policy controls cleanly.

How often should I back up Grafana?

Daily is a practical minimum for most teams. For critical operations, back up more frequently and test restore monthly.

Should I use SQLite or an external database?

SQLite is fine for many small to medium deployments. Move to MySQL/PostgreSQL when concurrency, HA, or compliance requirements grow.

How do I rotate the admin password safely?

Update .env.prod, restart the service during a maintenance window, verify login, then revoke stale sessions.

What is the safest way to upgrade Grafana?

Pin versions, back up first, test in staging, then roll forward during a controlled window with explicit verification checks.

Can this pattern be adapted to multiple environments?

Yes. Keep the same structure and automate per-environment variables. Separate domains and credentials for dev, staging, and production.

How do I prove the deployment is healthy to stakeholders?

Share a short acceptance report: service status, health endpoint output, TLS verification, and successful backup artifact creation.

Related guides

Talk to us

If you want this implemented with hardened defaults, observability, and tested recovery playbooks, our team can help.

Contact Us

Header image: Original SysBrix generated header, no watermark.

Production Guide: Deploy Rundeck with Docker Compose + Caddy + PostgreSQL on Ubuntu
Runbook automation with scheduled jobs, secure secrets, automatic HTTPS, and production backups.