Production Guide: Deploy Kestra with Docker Compose + Traefik + PostgreSQL on Ubuntu

Teams that centralize workflow orchestration often hit the same wall: ad hoc scripts spread across CI runners, cron jobs, and serverless glue code eventually become hard to audit, hard to retry, and risky to scale. Kestra is a strong option for consolidating jobs, scheduling, event handling, and operational visibility into one platform that engineering and operations teams can run themselves.

This guide walks through a production-ready deployment of Kestra using Docker Compose + Traefik + PostgreSQL on Ubuntu. You’ll set up isolated services, persistent storage, TLS termination, secret management, health checks, and a repeatable validation workflow. The focus is practical operations: what to configure, how to verify it, and what usually breaks first in real environments.

Architecture and flow overview

The deployment follows a layered pattern that keeps responsibilities clear and reduces blast radius during incidents:

Application layer: Kestra web/API and background workers.
Data layer: PostgreSQL for durable metadata and execution state.
Edge layer: Traefik for TLS termination, routing, and optional middleware controls.
Host layer: Ubuntu hardening, UFW policy, log rotation, and backup jobs.

Operationally, users and API clients reach Kestra via HTTPS through Traefik. Internal traffic between services stays on a private Docker network. Backups run from the host, and recovery drills validate that restored state can replay scheduled workloads cleanly.

Prerequisites

Ubuntu 22.04/24.04 VM with at least 4 vCPU, 8 GB RAM, and fast SSD storage.
DNS A record for kestra.example.com pointing to your server.
Docker Engine + Docker Compose plugin installed.
Ports 22, 80, and 443 allowed from trusted networks.
A secure mailbox for TLS/Let’s Encrypt notifications.
Time sync enabled (chrony/systemd-timesyncd) to avoid certificate and scheduling drift.

Step-by-step deployment

1) Prepare host and base packages

Start by updating packages and installing baseline tooling for diagnostics, backups, and TLS lifecycle operations.

sudo apt update && sudo apt -y upgrade
sudo apt -y install ca-certificates curl jq unzip ufw fail2ban
sudo timedatectl set-timezone America/Chicago
sudo mkdir -p /opt/kestra/{{traefik,postgres,data,backup}}
sudo chown -R $USER:$USER /opt/kestra

If the copy button does not work in your browser/editor, select the code block manually and copy with Ctrl/Cmd+C.

2) Create environment file for secrets

Keep credentials in an environment file owned by root and never commit this file to version control.

cat >/opt/kestra/.env <<'EOF'
POSTGRES_DB=kestra
POSTGRES_USER=kestra
POSTGRES_PASSWORD=replace-with-strong-random-password
KESTRA_BASIC_AUTH_USER=opsadmin
KESTRA_BASIC_AUTH_PASSWORD=replace-with-long-passphrase
[email protected]
DOMAIN=kestra.example.com
EOF

chmod 600 /opt/kestra/.env

If the copy button does not work in your browser/editor, select the code block manually and copy with Ctrl/Cmd+C.

3) Create Docker Compose stack

This compose file separates edge, database, and application services with explicit restart behavior and health checks.

cat >/opt/kestra/docker-compose.yml <<'EOF'
services:
  traefik:
    image: traefik:v3.1
    command:
      - --providers.docker=true
      - --providers.docker.exposedbydefault=false
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443
      - --certificatesresolvers.le.acme.email=${LETSENCRYPT_EMAIL}
      - --certificatesresolvers.le.acme.storage=/letsencrypt/acme.json
      - --certificatesresolvers.le.acme.tlschallenge=true
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - /opt/kestra/traefik:/letsencrypt
    restart: unless-stopped

  postgres:
    image: postgres:16
    environment:
      POSTGRES_DB: ${POSTGRES_DB}
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    volumes:
      - /opt/kestra/postgres:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 10s
      timeout: 5s
      retries: 12
    restart: unless-stopped

  kestra:
    image: kestra/kestra:latest
    depends_on:
      postgres:
        condition: service_healthy
    env_file: /opt/kestra/.env
    environment:
      KESTRA_CONFIGURATION: |
        datasources:
          postgres:
            url: jdbc:postgresql://postgres:5432/${POSTGRES_DB}
            driverClassName: org.postgresql.Driver
            username: ${POSTGRES_USER}
            password: ${POSTGRES_PASSWORD}
        kestra:
          repository:
            type: postgres
          queue:
            type: postgres
          storage:
            type: local
            local:
              basePath: /app/storage
          server:
            basicAuth:
              enabled: true
              username: ${KESTRA_BASIC_AUTH_USER}
              password: ${KESTRA_BASIC_AUTH_PASSWORD}
    volumes:
      - /opt/kestra/data:/app/storage
    labels:
      - traefik.enable=true
      - traefik.http.routers.kestra.rule=Host(`${DOMAIN}`)
      - traefik.http.routers.kestra.entrypoints=websecure
      - traefik.http.routers.kestra.tls.certresolver=le
      - traefik.http.services.kestra.loadbalancer.server.port=8080
    restart: unless-stopped
EOF

If the copy button does not work in your browser/editor, select the code block manually and copy with Ctrl/Cmd+C.

4) Start stack and verify health

cd /opt/kestra
docker compose --env-file .env up -d
docker compose ps
docker compose logs --tail=100 postgres
docker compose logs --tail=100 kestra

If the copy button does not work in your browser/editor, select the code block manually and copy with Ctrl/Cmd+C.

5) Harden firewall and brute-force protection

sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow OpenSSH
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw --force enable

sudo systemctl enable --now fail2ban

If the copy button does not work in your browser/editor, select the code block manually and copy with Ctrl/Cmd+C.

Configuration and secret-handling best practices

For production, treat orchestration credentials as high-value assets. Keep secrets in a dedicated secret manager where possible (Vault, AWS Secrets Manager, or SOPS-encrypted files), and inject them at runtime rather than baking them into images. If you must use .env, limit file permissions and rotate values on a regular schedule.

Use separate credentials for database access, API automation, and human administration. Avoid sharing admin credentials across teams. Add an approval policy for high-impact changes to flows, and define break-glass access with auditing enabled.

For change safety, deploy updates with a staging environment first, validate flow execution semantics, and then promote to production. Keep Compose and image tags explicit to avoid accidental major-version jumps that alter behavior.

Operations, backups, and lifecycle management

Running orchestration in production is less about first deployment and more about disciplined lifecycle management. Define service-level objectives for workflow success rate, median execution latency, and queue drain time during incident scenarios. These objectives help your team decide when to scale vertically, when to split workloads, and when to tune retry behavior for noisy dependencies.

Backups should include both PostgreSQL and the local workflow storage directory. Keep at least one off-host encrypted copy so host compromise or disk failure does not remove both primary and backup data. A practical baseline is daily full backups retained for 14–30 days, plus weekly immutable snapshots for compliance and post-incident forensics.

Patch management matters for reliability and security. Track upstream releases for Kestra, PostgreSQL, and Traefik, then test upgrades against a staging clone using representative flows. Establish a predictable maintenance window and communicate expected impact to stakeholders in advance. After upgrades, rerun smoke tests and compare execution metrics against pre-upgrade baselines.

For governance, limit who can create production schedules and who can edit secrets. Add audit review checkpoints for privileged changes and require peer review for workflow definitions that touch billing, identity systems, or customer-facing APIs.

cat >/opt/kestra/backup.sh <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
source /opt/kestra/.env
STAMP=$(date +%F-%H%M)
mkdir -p /opt/kestra/backup/$STAMP
cd /opt/kestra
docker compose exec -T postgres pg_dump -U "$POSTGRES_USER" "$POSTGRES_DB" > /opt/kestra/backup/$STAMP/kestra.sql
tar -czf /opt/kestra/backup/$STAMP/storage.tar.gz -C /opt/kestra data
find /opt/kestra/backup -maxdepth 1 -type d -mtime +21 -exec rm -rf {} +
EOF
chmod 700 /opt/kestra/backup.sh
sudo ln -sf /opt/kestra/backup.sh /etc/cron.daily/kestra-backup

If the copy button does not work in your browser/editor, select the code block manually and copy with Ctrl/Cmd+C.

Verification checklist

HTTPS endpoint resolves and returns valid TLS chain.
Login succeeds with basic auth and expected role permissions.
Database health remains healthy under normal load.
A sample scheduled workflow executes and writes expected logs.
Restart drill confirms service recovery and job state persistence.

curl -I https://kestra.example.com
cd /opt/kestra && docker compose ps
cd /opt/kestra && docker compose exec -T postgres psql -U ${POSTGRES_USER} -d ${POSTGRES_DB} -c "select now();"
cd /opt/kestra && docker compose logs --tail=200 kestra

If the copy button does not work in your browser/editor, select the code block manually and copy with Ctrl/Cmd+C.

Common issues and fixes

Traefik certificate not issuing

Check DNS propagation first, then ensure port 443 is reachable from the public internet. If ACME storage file permissions are wrong, Traefik cannot persist cert state. Confirm /opt/kestra/traefik/acme.json exists and is writable by the container.

Application starts but cannot reach PostgreSQL

Most failures come from wrong credentials, typo in DB host, or race conditions before DB is healthy. Keep depends_on: condition: service_healthy and validate pg_isready checks.

Slow UI during peak execution windows

Profile queue depth and DB I/O. Move to larger instance class, tune PostgreSQL shared buffers, and split worker-heavy flows if a single host is saturating CPU or disk throughput.

Unexpected workflow failures after image update

Pin image tags and review release notes before upgrades. Run a smoke-test suite of representative workflows in staging, then roll forward in production during a low-risk maintenance window.

FAQ

Can I run Kestra behind Cloudflare instead of direct public ingress?

Yes. Keep origin locked down to Cloudflare egress ranges, enforce Full (Strict) TLS, and still maintain host-level firewall controls. Validate websocket behavior for interactive UI features.

How often should I back up PostgreSQL and workflow storage?

At minimum, run nightly full backups plus more frequent WAL or incremental snapshots for lower RPO. Test restore monthly and after major schema changes.

Is Docker Compose enough for enterprise production?

For many teams, yes—especially for single-region moderate workloads. Once you need multi-zone HA, autoscaling workers, and policy-heavy governance, evaluate Kubernetes.

What is the safest way to rotate credentials?

Create new credentials, deploy updated secrets, verify service health, then revoke old secrets. Avoid in-place overwrite without rollback checkpoints.

How do I monitor failed workflows proactively?

Export logs/metrics to your monitoring stack (Prometheus/Grafana, ELK, or OpenTelemetry pipelines), then alert on failure rate spikes, queue growth, and latency percentiles.

Can I use external managed PostgreSQL instead of a containerized DB?

Absolutely. Managed PostgreSQL can improve durability and reduce operational load. Ensure private networking, TLS connections, and version compatibility before migration.

Related guides

Talk to us

Need help deploying a production-ready Outline knowledge platform, integrating SSO, or building secure backup and upgrade runbooks for your team? We can help with architecture, hardening, migration, and operational readiness.

in Guides

# DevOps Guides Kestra Self-Hosting

Production Guide: Deploy Outline with Docker Compose, Caddy, PostgreSQL, and Redis on Ubuntu

A practical, production-focused runbook for self-hosted team knowledge with secure access, backups, and operational guardrails.