How to Deploy Grafana in Production with Docker Compose + systemd

Production monitoring needs more than a running container. Teams need reliable startup, controlled upgrades, secure secrets, and a repeatable acceptance checklist. This guide shows a practical Grafana deployment pattern using Docker Compose with systemd supervision so the service behaves predictably during restarts, incidents, and maintenance windows.

The objective is operational confidence: any engineer on rotation should be able to deploy, verify, and recover the stack without tribal knowledge. We focus on clear structure, version pinning, and runbook-friendly commands instead of one-off setup shortcuts.

In real environments, monitoring failures are often discovered during an outage, exactly when stress is highest. A resilient baseline prevents avoidable firefighting by giving your team deterministic behavior on boot, clear health checks, and rollback-ready backups. This tutorial is intentionally production-oriented, with explicit controls for reliability and ownership.

Another practical advantage of this approach is onboarding speed. New engineers can review the same directory structure, the same systemd unit semantics, and the same verification checklist, rather than learning a custom setup from chat history. The result is fewer surprises, faster incident response, and cleaner accountability across platform and application teams.

Architecture and flow overview

Grafana runs as a container, receives traffic through a TLS reverse proxy, and stores persistent state on host storage. systemd controls lifecycle so service state remains consistent after reboot and during controlled restarts.

Edge: HTTPS reverse proxy with forwarded headers
App: Grafana container managed by Docker Compose
State: Mounted persistent data path
Control: systemd unit for startup and stop semantics
Ops: Backup workflow plus restore drill cadence

This separation keeps each concern explicit and easier to audit.

Prerequisites

Linux VM/server with sudo access
Docker Engine + Compose plugin
DNS record for monitoring domain
Firewall access for SSH and HTTPS
Password manager entry for admin credentials
Owner assigned for monitoring operations

Step-by-step deployment

1) Create directories and permissions

Use a fixed layout so every runbook references the same paths.

sudo mkdir -p /opt/grafana/{compose,provisioning,dashboards,backups}
sudo mkdir -p /var/lib/grafana-data
sudo chown -R 472:472 /var/lib/grafana-data
sudo chown -R $USER:$USER /opt/grafana

If the copy button is unavailable in your browser/editor, manually copy the command block above.

2) Store secrets in environment file

Keep credentials out of compose yaml and repositories.

cat > /opt/grafana/.env.prod <<'EOF'
GF_SECURITY_ADMIN_USER=admin
GF_SECURITY_ADMIN_PASSWORD=REPLACE_WITH_LONG_RANDOM_PASSWORD
GF_SERVER_ROOT_URL=https://monitoring.example.com
GF_USERS_ALLOW_SIGN_UP=false
GF_AUTH_ANONYMOUS_ENABLED=false
GF_LOG_MODE=console
EOF
chmod 600 /opt/grafana/.env.prod

If the copy button is unavailable in your browser/editor, manually copy the command block above.

3) Define compose service

cat > /opt/grafana/compose/docker-compose.yml <<'EOF'
services:
  grafana:
    image: grafana/grafana:11.1.4
    container_name: grafana
    restart: unless-stopped
    env_file:
      - /opt/grafana/.env.prod
    ports:
      - "127.0.0.1:3000:3000"
    volumes:
      - /var/lib/grafana-data:/var/lib/grafana
      - /opt/grafana/provisioning:/etc/grafana/provisioning
      - /opt/grafana/dashboards:/var/lib/grafana/dashboards
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://127.0.0.1:3000/api/health"]
      interval: 30s
      timeout: 5s
      retries: 5
      start_period: 40s
EOF

If the copy button is unavailable in your browser/editor, manually copy the command block above.

4) Create systemd unit file

Use a unit wrapper so startup ordering and service state are predictable.

cat > /tmp/grafana-compose.service <<'EOF'
[Unit]
Description=Grafana Docker Compose Stack
Requires=docker.service
After=docker.service network-online.target
Wants=network-online.target

[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/opt/grafana/compose
ExecStart=/usr/bin/docker compose up -d
ExecStop=/usr/bin/docker compose down
TimeoutStartSec=0

[Install]
WantedBy=multi-user.target
EOF

If the copy button is unavailable in your browser/editor, manually copy the command block above.

5) Verify runtime state

systemctl status grafana-compose.service --no-pager
docker compose -f /opt/grafana/compose/docker-compose.yml ps
curl -s https://monitoring.example.com/api/health

If the copy button is unavailable in your browser/editor, manually copy the command block above.

6) Backup job

cat > /opt/grafana/backups/backup-grafana.sh <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
TS=$(date +%F-%H%M)
mkdir -p /opt/grafana/backups/archive
sudo tar -czf /opt/grafana/backups/archive/grafana-$TS.tgz   /var/lib/grafana-data /opt/grafana/provisioning /opt/grafana/dashboards
echo "backup created: grafana-$TS.tgz"
EOF
chmod +x /opt/grafana/backups/backup-grafana.sh

If the copy button is unavailable in your browser/editor, manually copy the command block above.

7) Acceptance checklist execution

docker compose -f /opt/grafana/compose/docker-compose.yml ps
curl -s https://monitoring.example.com/api/health
ls -lh /opt/grafana/backups/archive | tail -n 3

If the copy button is unavailable in your browser/editor, manually copy the command block above.

Configuration/secrets handling

Limit admin access to named maintainers and rotate credentials on a schedule. Document who can change dashboards, data sources, and alert policies. If you are in a regulated environment, keep change approvals linked to pull requests and maintenance tickets.

Secrets should be managed as first-class assets. At minimum, protect environment files with strict permissions and encrypted storage. In larger environments, migrate to a dedicated secret manager and keep key names stable so deployment templates do not drift between environments.

Treat provisioning as code: dashboards, folders, and data source settings should be reviewed and versioned. This reduces unexpected behavior during incidents and helps new team members understand intent quickly. Consistency is more important than cleverness.

Finally, define upgrade policy early. Pin versions, test in staging, and require validation evidence before production rollouts. Monitoring must be dependable under pressure; operational discipline is what makes that true over time.

From an operations-management perspective, assign a primary and secondary owner for this service and document decision rights clearly: who can approve upgrades, who can rollback, and who must be paged when acceptance tests fail. This governance layer is often missing in technical tutorials, but it is essential for stable production ownership and faster decision-making during incidents.

Verification

After deployment and each upgrade, verify service state, health endpoint response, and backup output. Save results in a runbook so anyone on-call can confirm baseline health in minutes.

Service starts after host reboot
Container reports healthy status
Health endpoint responds successfully
TLS path remains valid through proxy
Backup artifact is generated on schedule

Verification is a contract with future incident response. If checks are skipped, downtime risk increases.

For stronger operational maturity, keep a short acceptance template with timestamp, environment, operator, and outcome. Store these records in your incident-management system so future postmortems can correlate service changes with platform behavior.

Common issues/fixes

Container exits on startup

Check environment variable syntax and filesystem permissions on mounted data paths.

Login works but settings are not saved

Usually indicates write-permission mismatch in persistent storage. Reapply ownership and restart.

Wrong callback URLs

Set the public root URL and confirm forwarded-proto header handling in the proxy layer.

Slow dashboards during incidents

Reduce refresh intervals and optimize data-source queries before scaling compute.

Plugin issues after upgrade

Pin plugin versions and maintain rollback artifacts for the previous image and data.

Service not available after reboot

Validate unit dependencies so Docker is ready before the compose service is started.

FAQ

Is Compose enough for production Grafana?

Yes for many teams, if you add lifecycle control, verification, and backup discipline.

How often should restores be tested?

Monthly minimum, and after backup-script changes.

Should Grafana be internet-exposed directly?

No. Put it behind an HTTPS proxy and restrict direct port exposure.

What is a safe upgrade process?

Pin versions, test in staging, back up first, and verify after rollout.

Can we start with .env and migrate later?

Yes. Begin with strict permissions, then migrate to secret management as maturity increases.

What should the handoff include?

Runbook commands, acceptance checks, rollback steps, and escalation contacts.

How many admins are recommended?

Keep it minimal: two named maintainers plus a controlled break-glass account.

Internal links

Talk to us

If you want support designing or hardening your observability platform, we can help with architecture, migration planning, and production readiness.

in Guides

# Docker Compose Grafana Guides Observability Systemd

Production Guide: Deploy Nextcloud with Docker Compose + NGINX + MariaDB + Redis on Ubuntu

A production-oriented Nextcloud deployment with Docker Compose, NGINX TLS edge, MariaDB durability, Redis locking, and practical verification/operations.