Logs are the first place teams look when production behavior changes, but many environments still rely on fragmented SSH sessions and ad-hoc grep commands under pressure. This guide shows how to deploy Grafana Loki with Docker Compose + Caddy on Ubuntu so your operations team gets centralized logs, encrypted access, and a workflow that scales from one VM to multiple services.
The real-world use case is straightforward: your apps are running in Docker on one or a few Linux hosts, incidents happen outside business hours, and you need consistent evidence quickly. By the end of this guide you will have Loki for log storage and query, Promtail for shipping host and container logs, Grafana for search and dashboards, and Caddy for automatic HTTPS at the edge.
Architecture and flow overview
This stack is intentionally pragmatic. Loki stores compressed log chunks on local disk and indexes labels for fast retrieval. Promtail tails host and Docker log files, labels them, and pushes them to Loki. Grafana queries Loki and gives your team a clean user interface for ad-hoc search, filters, and incident timelines. Caddy sits in front of Grafana, terminates TLS automatically, and applies baseline security headers.
For small and medium production workloads, this architecture is resilient enough when you combine it with retention limits, controlled label cardinality, and regular backups. If your volume grows significantly, you can move Loki to object storage and add read/write components later without rewriting your operational model.
- Ingress: Caddy handles public HTTPS and reverse proxies to Grafana.
- Visualization: Grafana is pre-provisioned with a Loki datasource.
- Collection: Promtail tails system and container log files.
- Storage: Loki stores data with 7-day retention in this baseline.
Prerequisites
Before deployment, confirm the following:
- Ubuntu 22.04/24.04 server with at least 4 vCPU, 8 GB RAM, and 80+ GB SSD.
- A DNS record such as
logs.yourdomain.compointing to your server IP. - Docker Engine and Docker Compose plugin installed.
- Ports 80/443 reachable from the internet for automatic certificate issuance.
- A credential vault or secret manager for Grafana admin and integration credentials.
Capacity note: if your estate writes verbose JSON logs, size disk with growth headroom. A 7-day retention can balloon quickly if noisy debug logs are left enabled in production.
Step-by-step deployment
1) Prepare the host and baseline security
Start with package updates, a minimal firewall policy, and known timezone. Even if this is "just" observability, this VM will hold sensitive operational data. Treat it like a production asset.
sudo apt update && sudo apt -y upgrade
sudo apt -y install ca-certificates curl gnupg ufw jq
sudo timedatectl set-timezone UTC
sudo ufw allow OpenSSH
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw --force enable
If the copy button does not work in your browser/editor, select the block and copy manually.
2) Create directories and data paths
Separate configuration under /opt and persistent Loki data under /var/lib. This keeps backups and upgrades predictable.
sudo mkdir -p /opt/loki-stack/{loki,promtail,grafana/provisioning/datasources,caddy}
sudo mkdir -p /var/lib/loki/{chunks,index,cache,wal}
sudo chown -R $USER:$USER /opt/loki-stack
sudo chown -R 10001:10001 /var/lib/loki
If the copy button does not work in your browser/editor, select the block and copy manually.
3) Configure Loki with retention controls
Use a simple single-node filesystem backend first. The key production decision is retention: keep enough history to investigate incidents without letting storage grow unbounded.
cat > /opt/loki-stack/loki/loki-config.yml <<'YAML'
auth_enabled: false
server:
http_listen_port: 3100
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /var/loki/chunks
rules_directory: /var/loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
schema_config:
configs:
- from: 2024-01-01
store: boltdb-shipper
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
storage_config:
boltdb_shipper:
active_index_directory: /var/loki/index
cache_location: /var/loki/cache
filesystem:
directory: /var/loki/chunks
limits_config:
retention_period: 168h
chunk_store_config:
max_look_back_period: 168h
table_manager:
retention_deletes_enabled: true
retention_period: 168h
YAML
If the copy button does not work in your browser/editor, select the block and copy manually.
Why this matters in practice: long retention with high-cardinality labels can degrade query latency and create hidden cost pressure. Start conservative, then tune after measuring real usage.
4) Configure Promtail ingestion jobs
Promtail collects from host logs and Docker JSON logs. Keep labels small and stable. Avoid adding request IDs, session IDs, or dynamic user IDs as labelsβthey explode index cardinality.
cat > /opt/loki-stack/promtail/promtail-config.yml <<'YAML'
server:
http_listen_port: 9080
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: varlogs
static_configs:
- targets: [localhost]
labels:
job: syslog
host: ${HOSTNAME}
__path__: /var/log/*.log
- job_name: docker
static_configs:
- targets: [localhost]
labels:
job: docker
host: ${HOSTNAME}
__path__: /var/lib/docker/containers/*/*-json.log
YAML
If the copy button does not work in your browser/editor, select the block and copy manually.
5) Provision Grafana datasource automatically
Provisioning avoids manual UI setup and supports repeatable disaster recovery. If you rebuild the host, your datasource comes back immediately.
cat > /opt/loki-stack/grafana/provisioning/datasources/loki.yml <<'YAML'
apiVersion: 1
datasources:
- name: Loki
type: loki
access: proxy
url: http://loki:3100
isDefault: true
YAML
If the copy button does not work in your browser/editor, select the block and copy manually.
6) Compose stack and environment variables
Use a dedicated .env file and do not commit it to source control. Rotate passwords during handoff. The compose file pins explicit image versions for reproducibility.
cat > /opt/loki-stack/.env <<'ENV'
DOMAIN=logs.yourdomain.com
GF_ADMIN_USER=admin
GF_ADMIN_PASSWORD=replace-with-strong-secret
ENV
cat > /opt/loki-stack/docker-compose.yml <<'YAML'
services:
loki:
image: grafana/loki:3.0.0
command: -config.file=/etc/loki/local-config.yaml
volumes:
- ./loki/loki-config.yml:/etc/loki/local-config.yaml:ro
- /var/lib/loki:/var/loki
restart: unless-stopped
promtail:
image: grafana/promtail:3.0.0
command: -config.file=/etc/promtail/config.yml
volumes:
- ./promtail/promtail-config.yml:/etc/promtail/config.yml:ro
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
restart: unless-stopped
grafana:
image: grafana/grafana:11.1.0
environment:
- GF_SECURITY_ADMIN_USER=${GF_ADMIN_USER}
- GF_SECURITY_ADMIN_PASSWORD=${GF_ADMIN_PASSWORD}
- GF_SERVER_ROOT_URL=https://${DOMAIN}
volumes:
- grafana-data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning:ro
restart: unless-stopped
caddy:
image: caddy:2.8
ports:
- "80:80"
- "443:443"
environment:
- DOMAIN=${DOMAIN}
volumes:
- ./caddy/Caddyfile:/etc/caddy/Caddyfile:ro
- caddy-data:/data
- caddy-config:/config
depends_on:
- grafana
restart: unless-stopped
volumes:
grafana-data:
caddy-data:
caddy-config:
YAML
If the copy button does not work in your browser/editor, select the block and copy manually.
7) Configure Caddy for TLS and secure headers
Caddy removes most certificate management toil. It requests and renews TLS certificates automatically while preserving a clean reverse proxy setup.
cat > /opt/loki-stack/caddy/Caddyfile <<'CADDY'
{$DOMAIN} {
encode gzip zstd
reverse_proxy grafana:3000
header {
Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
X-Content-Type-Options nosniff
X-Frame-Options SAMEORIGIN
Referrer-Policy no-referrer-when-downgrade
}
}
CADDY
If the copy button does not work in your browser/editor, select the block and copy manually.
8) Launch and inspect health
Bring the stack up, verify container status, and stream logs for the first startup cycle. First boot often surfaces permission or DNS issues early.
cd /opt/loki-stack
docker compose pull
docker compose up -d
docker compose ps
docker compose logs -f --tail=100 caddy grafana loki promtail
If the copy button does not work in your browser/editor, select the block and copy manually.
Configuration and secrets handling
Secrets discipline is where many "working" deployments fail compliance and operational readiness checks. Use these controls from day one:
- Store
GF_ADMIN_PASSWORDin a secret manager; avoid plaintext in shell history. - Restrict SSH with key auth only and enforce least-privilege sudo policies.
- Use an allowlist for admin access if your team operates from fixed IP ranges.
- Consider SSO (OIDC/SAML) in Grafana if multiple operators need access.
- Disable unnecessary debug logging on application workloads to reduce sensitive leakage.
If your security baseline requires at-rest encryption, layer host-level disk encryption and controlled backup key handling into your runbook.
Verification checklist
Run these checks before declaring the rollout complete. This is the minimum proof that the stack is healthy and usable under incident pressure.
curl -s http://127.0.0.1:3100/ready && echo
curl -Ik https://$DOMAIN | head -n 5
# Query a recent stream from Loki API
curl -G -s "http://127.0.0.1:3100/loki/api/v1/query_range" --data-urlencode 'query={job="syslog"}' --data-urlencode 'limit=5' | jq '.status, .data.result | length'
If the copy button does not work in your browser/editor, select the block and copy manually.
- Grafana login works over HTTPS with no certificate warning.
- Loki
/readyendpoint responds successfully. - At least one stream appears in Explore within 2β5 minutes.
- Queries complete within acceptable latency for your operations SLA.
- Disk growth rate is measured and recorded for capacity planning.
Common issues and fixes
No logs visible in Grafana Explore
Check Promtail target paths and container mounts first. The most common root cause is incorrect path mapping or permissions on /var/lib/docker/containers.
Certificates fail to issue
Confirm DNS points to the right public IP and that ports 80/443 are open. Temporary ACME failures usually resolve quickly after DNS propagation.
Queries are slow during incidents
Reduce overly broad label selectors, tighten time windows, and remove high-cardinality labels from Promtail configs. Slow queries are often a data modeling problem, not compute shortage.
Loki disk usage grows too fast
Lower retention, reduce noisy log levels, and archive old data externally if needed. Also verify that retention deletes are enabled and actually running.
Grafana admin lockout
Reset credentials from inside the container and rotate immediately in your secret store.
# Rotate Grafana admin password
cd /opt/loki-stack
docker compose exec -T grafana grafana cli admin reset-admin-password 'new-strong-password'
# Backup key assets
tar czf /root/loki-stack-backup-$(date +%F).tgz /opt/loki-stack /var/lib/loki
If the copy button does not work in your browser/editor, select the block and copy manually.
FAQ
Can I run this stack on one VM in production?
Yes, for many SMB and mid-size workloads this is a practical starting point. The key is disciplined retention, backups, and monitoring so you can scale before stress becomes outage.
What retention period should I start with?
Seven days is a reasonable baseline when incidents are frequent and disk is limited. Increase retention gradually after you observe actual query behavior and storage growth.
Should I ship application logs directly to Loki instead of files?
File tailing with Promtail is simpler operationally and easier to debug. Direct push can work, but it increases coupling and usually complicates failure handling.
How do I secure Grafana for a larger team?
Integrate SSO, enforce role-based access controls, and disable local admin where possible. Also add audit logging and periodic permission reviews.
When should I move from filesystem to object storage for Loki?
Move when disk growth, backup windows, or query latency begin to create operational risk. Object storage plus a distributed Loki topology is the standard next step.
How often should I back up this stack?
At minimum, daily backups of configuration and Loki data, with tested restore drills. Critical environments often use more frequent snapshots plus off-site retention.
Related guides
If you are building a complete observability and platform operations baseline, these Guides are good next steps:
- https://sysbrix.com/blog/guides-3/production-guide-deploy-apache-superset-with-docker-compose-nginx-postgresql-redis-on-ubuntu-189
- https://sysbrix.com/blog/guides-3/how-to-deploy-kestra-with-docker-compose-and-caddy-for-production-workflows-185
- https://sysbrix.com/blog/guides-3/open-webui-setup-guide-deploy-your-own-private-chatgpt-with-local-models-and-team-access-184
Talk to us
If you want support designing or hardening your observability platform, we can help with architecture, migration planning, and production readiness.