Skip to Content

Production Guide: Deploy Grafana Loki with Docker Compose + Caddy on Ubuntu

A production-ready, security-conscious Loki stack with Promtail ingestion, Grafana dashboards, TLS, and operational runbooks.

Logs are the first place teams look when production behavior changes, but many environments still rely on fragmented SSH sessions and ad-hoc grep commands under pressure. This guide shows how to deploy Grafana Loki with Docker Compose + Caddy on Ubuntu so your operations team gets centralized logs, encrypted access, and a workflow that scales from one VM to multiple services.

The real-world use case is straightforward: your apps are running in Docker on one or a few Linux hosts, incidents happen outside business hours, and you need consistent evidence quickly. By the end of this guide you will have Loki for log storage and query, Promtail for shipping host and container logs, Grafana for search and dashboards, and Caddy for automatic HTTPS at the edge.

Architecture and flow overview

This stack is intentionally pragmatic. Loki stores compressed log chunks on local disk and indexes labels for fast retrieval. Promtail tails host and Docker log files, labels them, and pushes them to Loki. Grafana queries Loki and gives your team a clean user interface for ad-hoc search, filters, and incident timelines. Caddy sits in front of Grafana, terminates TLS automatically, and applies baseline security headers.

For small and medium production workloads, this architecture is resilient enough when you combine it with retention limits, controlled label cardinality, and regular backups. If your volume grows significantly, you can move Loki to object storage and add read/write components later without rewriting your operational model.

  • Ingress: Caddy handles public HTTPS and reverse proxies to Grafana.
  • Visualization: Grafana is pre-provisioned with a Loki datasource.
  • Collection: Promtail tails system and container log files.
  • Storage: Loki stores data with 7-day retention in this baseline.

Prerequisites

Before deployment, confirm the following:

  • Ubuntu 22.04/24.04 server with at least 4 vCPU, 8 GB RAM, and 80+ GB SSD.
  • A DNS record such as logs.yourdomain.com pointing to your server IP.
  • Docker Engine and Docker Compose plugin installed.
  • Ports 80/443 reachable from the internet for automatic certificate issuance.
  • A credential vault or secret manager for Grafana admin and integration credentials.

Capacity note: if your estate writes verbose JSON logs, size disk with growth headroom. A 7-day retention can balloon quickly if noisy debug logs are left enabled in production.

Step-by-step deployment

1) Prepare the host and baseline security

Start with package updates, a minimal firewall policy, and known timezone. Even if this is "just" observability, this VM will hold sensitive operational data. Treat it like a production asset.

sudo apt update && sudo apt -y upgrade
sudo apt -y install ca-certificates curl gnupg ufw jq
sudo timedatectl set-timezone UTC
sudo ufw allow OpenSSH
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw --force enable

If the copy button does not work in your browser/editor, select the block and copy manually.

2) Create directories and data paths

Separate configuration under /opt and persistent Loki data under /var/lib. This keeps backups and upgrades predictable.

sudo mkdir -p /opt/loki-stack/{loki,promtail,grafana/provisioning/datasources,caddy}
sudo mkdir -p /var/lib/loki/{chunks,index,cache,wal}
sudo chown -R $USER:$USER /opt/loki-stack
sudo chown -R 10001:10001 /var/lib/loki

If the copy button does not work in your browser/editor, select the block and copy manually.

3) Configure Loki with retention controls

Use a simple single-node filesystem backend first. The key production decision is retention: keep enough history to investigate incidents without letting storage grow unbounded.

cat > /opt/loki-stack/loki/loki-config.yml <<'YAML'
auth_enabled: false
server:
  http_listen_port: 3100
common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /var/loki/chunks
      rules_directory: /var/loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory
schema_config:
  configs:
    - from: 2024-01-01
      store: boltdb-shipper
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h
storage_config:
  boltdb_shipper:
    active_index_directory: /var/loki/index
    cache_location: /var/loki/cache
  filesystem:
    directory: /var/loki/chunks
limits_config:
  retention_period: 168h
chunk_store_config:
  max_look_back_period: 168h
table_manager:
  retention_deletes_enabled: true
  retention_period: 168h
YAML

If the copy button does not work in your browser/editor, select the block and copy manually.

Why this matters in practice: long retention with high-cardinality labels can degrade query latency and create hidden cost pressure. Start conservative, then tune after measuring real usage.

4) Configure Promtail ingestion jobs

Promtail collects from host logs and Docker JSON logs. Keep labels small and stable. Avoid adding request IDs, session IDs, or dynamic user IDs as labelsβ€”they explode index cardinality.

cat > /opt/loki-stack/promtail/promtail-config.yml <<'YAML'
server:
  http_listen_port: 9080
positions:
  filename: /tmp/positions.yaml
clients:
  - url: http://loki:3100/loki/api/v1/push
scrape_configs:
  - job_name: varlogs
    static_configs:
      - targets: [localhost]
        labels:
          job: syslog
          host: ${HOSTNAME}
          __path__: /var/log/*.log
  - job_name: docker
    static_configs:
      - targets: [localhost]
        labels:
          job: docker
          host: ${HOSTNAME}
          __path__: /var/lib/docker/containers/*/*-json.log
YAML

If the copy button does not work in your browser/editor, select the block and copy manually.

5) Provision Grafana datasource automatically

Provisioning avoids manual UI setup and supports repeatable disaster recovery. If you rebuild the host, your datasource comes back immediately.

cat > /opt/loki-stack/grafana/provisioning/datasources/loki.yml <<'YAML'
apiVersion: 1
datasources:
  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    isDefault: true
YAML

If the copy button does not work in your browser/editor, select the block and copy manually.

6) Compose stack and environment variables

Use a dedicated .env file and do not commit it to source control. Rotate passwords during handoff. The compose file pins explicit image versions for reproducibility.

cat > /opt/loki-stack/.env <<'ENV'
DOMAIN=logs.yourdomain.com
GF_ADMIN_USER=admin
GF_ADMIN_PASSWORD=replace-with-strong-secret
ENV

cat > /opt/loki-stack/docker-compose.yml <<'YAML'
services:
  loki:
    image: grafana/loki:3.0.0
    command: -config.file=/etc/loki/local-config.yaml
    volumes:
      - ./loki/loki-config.yml:/etc/loki/local-config.yaml:ro
      - /var/lib/loki:/var/loki
    restart: unless-stopped

  promtail:
    image: grafana/promtail:3.0.0
    command: -config.file=/etc/promtail/config.yml
    volumes:
      - ./promtail/promtail-config.yml:/etc/promtail/config.yml:ro
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
    restart: unless-stopped

  grafana:
    image: grafana/grafana:11.1.0
    environment:
      - GF_SECURITY_ADMIN_USER=${GF_ADMIN_USER}
      - GF_SECURITY_ADMIN_PASSWORD=${GF_ADMIN_PASSWORD}
      - GF_SERVER_ROOT_URL=https://${DOMAIN}
    volumes:
      - grafana-data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
    restart: unless-stopped

  caddy:
    image: caddy:2.8
    ports:
      - "80:80"
      - "443:443"
    environment:
      - DOMAIN=${DOMAIN}
    volumes:
      - ./caddy/Caddyfile:/etc/caddy/Caddyfile:ro
      - caddy-data:/data
      - caddy-config:/config
    depends_on:
      - grafana
    restart: unless-stopped

volumes:
  grafana-data:
  caddy-data:
  caddy-config:
YAML

If the copy button does not work in your browser/editor, select the block and copy manually.

7) Configure Caddy for TLS and secure headers

Caddy removes most certificate management toil. It requests and renews TLS certificates automatically while preserving a clean reverse proxy setup.

cat > /opt/loki-stack/caddy/Caddyfile <<'CADDY'
{$DOMAIN} {
  encode gzip zstd
  reverse_proxy grafana:3000
  header {
    Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
    X-Content-Type-Options nosniff
    X-Frame-Options SAMEORIGIN
    Referrer-Policy no-referrer-when-downgrade
  }
}
CADDY

If the copy button does not work in your browser/editor, select the block and copy manually.

8) Launch and inspect health

Bring the stack up, verify container status, and stream logs for the first startup cycle. First boot often surfaces permission or DNS issues early.

cd /opt/loki-stack
docker compose pull
docker compose up -d

docker compose ps
docker compose logs -f --tail=100 caddy grafana loki promtail

If the copy button does not work in your browser/editor, select the block and copy manually.

Configuration and secrets handling

Secrets discipline is where many "working" deployments fail compliance and operational readiness checks. Use these controls from day one:

  • Store GF_ADMIN_PASSWORD in a secret manager; avoid plaintext in shell history.
  • Restrict SSH with key auth only and enforce least-privilege sudo policies.
  • Use an allowlist for admin access if your team operates from fixed IP ranges.
  • Consider SSO (OIDC/SAML) in Grafana if multiple operators need access.
  • Disable unnecessary debug logging on application workloads to reduce sensitive leakage.

If your security baseline requires at-rest encryption, layer host-level disk encryption and controlled backup key handling into your runbook.

Verification checklist

Run these checks before declaring the rollout complete. This is the minimum proof that the stack is healthy and usable under incident pressure.

curl -s http://127.0.0.1:3100/ready && echo
curl -Ik https://$DOMAIN | head -n 5

# Query a recent stream from Loki API
curl -G -s "http://127.0.0.1:3100/loki/api/v1/query_range"   --data-urlencode 'query={job="syslog"}'   --data-urlencode 'limit=5' | jq '.status, .data.result | length'

If the copy button does not work in your browser/editor, select the block and copy manually.

  • Grafana login works over HTTPS with no certificate warning.
  • Loki /ready endpoint responds successfully.
  • At least one stream appears in Explore within 2–5 minutes.
  • Queries complete within acceptable latency for your operations SLA.
  • Disk growth rate is measured and recorded for capacity planning.

Common issues and fixes

No logs visible in Grafana Explore

Check Promtail target paths and container mounts first. The most common root cause is incorrect path mapping or permissions on /var/lib/docker/containers.

Certificates fail to issue

Confirm DNS points to the right public IP and that ports 80/443 are open. Temporary ACME failures usually resolve quickly after DNS propagation.

Queries are slow during incidents

Reduce overly broad label selectors, tighten time windows, and remove high-cardinality labels from Promtail configs. Slow queries are often a data modeling problem, not compute shortage.

Loki disk usage grows too fast

Lower retention, reduce noisy log levels, and archive old data externally if needed. Also verify that retention deletes are enabled and actually running.

Grafana admin lockout

Reset credentials from inside the container and rotate immediately in your secret store.

# Rotate Grafana admin password
cd /opt/loki-stack
docker compose exec -T grafana grafana cli admin reset-admin-password 'new-strong-password'

# Backup key assets
tar czf /root/loki-stack-backup-$(date +%F).tgz   /opt/loki-stack /var/lib/loki

If the copy button does not work in your browser/editor, select the block and copy manually.

FAQ

Can I run this stack on one VM in production?

Yes, for many SMB and mid-size workloads this is a practical starting point. The key is disciplined retention, backups, and monitoring so you can scale before stress becomes outage.

What retention period should I start with?

Seven days is a reasonable baseline when incidents are frequent and disk is limited. Increase retention gradually after you observe actual query behavior and storage growth.

Should I ship application logs directly to Loki instead of files?

File tailing with Promtail is simpler operationally and easier to debug. Direct push can work, but it increases coupling and usually complicates failure handling.

How do I secure Grafana for a larger team?

Integrate SSO, enforce role-based access controls, and disable local admin where possible. Also add audit logging and periodic permission reviews.

When should I move from filesystem to object storage for Loki?

Move when disk growth, backup windows, or query latency begin to create operational risk. Object storage plus a distributed Loki topology is the standard next step.

How often should I back up this stack?

At minimum, daily backups of configuration and Loki data, with tested restore drills. Critical environments often use more frequent snapshots plus off-site retention.

Related guides

If you are building a complete observability and platform operations baseline, these Guides are good next steps:

Talk to us

If you want support designing or hardening your observability platform, we can help with architecture, migration planning, and production readiness.

Contact Us

Production Guide: Deploy Apache Superset with Docker Compose + Nginx + PostgreSQL + Redis on Ubuntu
A practical, production-oriented Superset deployment with secure defaults, TLS, operations checks, and troubleshooting.