Skip to Content

Windmill Self-Host Setup: High Availability, Multi-Workspace Governance, Custom Runtimes, and Disaster Recovery

Architect a production-grade Windmill deployment with replicated servers, isolated workspaces, custom worker runtimes, and a bulletproof backup strategy.

Running Windmill on a single VPS gets you started, but production workloads demand more. You need redundancy when a node fails, governance when multiple teams share the same instance, and flexibility when default runtimes do not fit your use case. This guide covers the advanced patterns that separate a hobby deployment from infrastructure you can bet your operations on.

We will walk through high-availability server configuration, multi-workspace governance, custom runtime environments, and disaster recovery planning. If you are still setting up your first instance, start with our quick-start guide. For Git sync and enterprise workflow patterns, see our Git Sync deep dive. For scripting fundamentals, check Running Scripts and Automating Workflows.

What You Need Before Starting

These patterns assume you have a running Windmill instance and understand its core components. Before proceeding, confirm you have:

  • A running Windmill instance deployed via Docker Compose or Kubernetes
  • Access to an external PostgreSQL database (managed or self-hosted)
  • A reverse proxy or load balancer capable of health checks and failover
  • Shared storage or S3-compatible object storage for backup targets
  • Familiarity with Windmill worker groups, environment variables, and the admin panel

The examples below use Docker Compose for clarity, but all concepts translate directly to the Windmill Helm chart for Kubernetes deployments.

High Availability Server Configuration

Windmill servers are stateless. All state lives in PostgreSQL, which means you can run multiple server containers behind a load balancer without worrying about session stickiness or shared caches.

Scaling the Server Tier

Increase the replica count in your Compose file and let your load balancer distribute traffic:

  windmill_server:
    image: ${WM_IMAGE}
    pull_policy: always
    deploy:
      replicas: 3
      restart_policy:
        condition: any
    environment:
      - DATABASE_URL=${DATABASE_URL}
      - MODE=server
    networks:
      - windmill
    depends_on:
      db:
        condition: service_healthy

Each server container handles API requests and serves the frontend. If one dies, the others keep accepting traffic. There is no leader election or clustering protocol to manage.

Health Checks for Load Balancers

Configure your load balancer to hit the health endpoint at /api/health. A 200 response means the server is ready. Anything else triggers a failover:

# NGINX upstream with health check
upstream windmill {
    server 10.0.1.10:8000;
    server 10.0.1.11:8000;
    server 10.0.1.12:8000;
}

server {
    location / {
        proxy_pass http://windmill;
        health_check uri=/api/health interval=5s fails=3 passes=2;
    }
}

For zero-downtime deployments, use a rolling update strategy. Update one server at a time, verifying health before moving to the next.

Multi-Workspace Governance

Workspaces in Windmill are isolated environments with their own scripts, variables, and permissions. On a shared instance, governance means controlling who can create workspaces, what resources they access, and how secrets are managed.

Workspace Creation Policies

By default, any admin can spin up a new workspace. Lock this down from the instance settings panel or by setting environment variables:

# .env
CREATE_WORKSPACE_REQUIRE_SUPERADMIN=true
CREATE_WORKSPACE_ALLOWED_DOMAINS=company.com

This ensures only instance superadmins can create workspaces, and only users from approved email domains get invited.

Secret Encryption Per Workspace

Windmill encrypts workspace secrets with a unique key per workspace. For additional isolation, rotate encryption keys periodically from the workspace settings. If a key is compromised, only that workspace is affected.

Audit Logging

Enable audit logging to track who ran what, when, and with what parameters. Windmill can stream audit events to an external SIEM or webhook endpoint:

# .env
AUDIT_LOG_ENABLED=true
AUDIT_LOG_WEBHOOK=https://siem.company.com/windmill-audit

Store these logs immutably. They are your evidence trail for compliance reviews and incident response.

Custom Runtimes and Worker Groups

Default workers handle Python, TypeScript, Go, and Bash. When you need a custom environment — a specific Python version, system libraries, or proprietary tools — build a custom worker image.

Building a Custom Worker Image

Create a Dockerfile that extends the official worker image and installs your dependencies:

FROM ghcr.io/windmill-labs/windmill:main

# Install custom system dependencies
RUN apt-get update && apt-get install -y \
    libxml2-dev \
    libxslt1-dev \
    poppler-utils \
    && rm -rf /var/lib/apt/lists/*

# Install Python packages globally
RUN pip install --no-cache-dir \
    camelot-py[cv] \
    pdfplumber \
    transformers

Build and push the image to your registry, then reference it in a new worker service:

  windmill_worker_pdf:
    image: registry.company.com/windmill-pdf-worker:latest
    deploy:
      replicas: 2
      resources:
        limits:
          cpus: '2'
          memory: 4096M
    environment:
      - DATABASE_URL=${DATABASE_URL}
      - MODE=worker
      - WORKER_GROUP=pdf-processing
    networks:
      - windmill

Tag your scripts with pdf-processing and Windmill routes them exclusively to these workers. Other jobs never touch this pool, so resource contention stays predictable.

Init Scripts for Dynamic Setup

For lighter customization, use init scripts that run when a worker starts. Mount an init script and set the path via environment variable:

# docker-compose.yml
  windmill_worker:
    volumes:
      - ./init.sh:/init.sh
    environment:
      - INIT_SCRIPT=/init.sh

Disaster Recovery and Backup Strategy

Your disaster recovery plan is only as good as your last verified restore. Windmill makes this simple because everything lives in PostgreSQL.

Automated Database Backups

Run continuous archiving with WAL-E or pgBackRest for point-in-time recovery. For simpler setups, schedule logical dumps:

#!/bin/bash
# /opt/windmill/backup.sh
set -euo pipefail

BACKUP_DIR="/backups/windmill"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
S3_BUCKET="s3://company-backups/windmill"
mkdir -p $BACKUP_DIR

# Full pg_dump
docker exec windmill-db-1 pg_dump -Fc -U postgres windmill \
  > $BACKUP_DIR/windmill_$TIMESTAMP.dump

# Upload to S3
aws s3 cp $BACKUP_DIR/windmill_$TIMESTAMP.dump $S3_BUCKET/

# Cleanup local files older than 7 days
find $BACKUP_DIR -name '*.dump' -mtime +7 -delete

# Verify the backup is restorable (weekly)
if [ $(date +%u) -eq 7 ]; then
  pg_restore -l $BACKUP_DIR/windmill_$TIMESTAMP.dump > /dev/null
  echo "Backup verified: $TIMESTAMP"
fi

Schedule this with cron at least daily, and test restores monthly. A backup you cannot restore is not a backup.

Cross-Region Replication

For true resilience, replicate your database to a secondary region. PostgreSQL streaming replication or tools like Bucardo keep a hot standby ready. If your primary region fails, promote the standby and repoint your load balancer.

Exporting Workspace Code to Git

Windmill supports Git sync, which pushes workspace scripts and flows to a repository. This gives you version control, code review, and an additional recovery path. Configure it from the workspace settings or via the API. For detailed patterns, see our Git Sync guide.

Tips and Troubleshooting

Server replicas cause stale frontend assets

If you run multiple servers, ensure all containers use the same Windmill image version. Mismatched versions can serve conflicting frontend bundles. Pin your image tag instead of using :main:

WM_IMAGE=ghcr.io/windmill-labs/windmill:1.352

Custom workers fail to start

Check that your custom image includes the windmill binary in the expected path and that the MODE=worker environment variable is set. The entrypoint must match the official image.

Database replication lag breaks job scheduling

If you use read replicas, Windmill servers and workers must connect to the primary for writes. Route reads through the replica only if you have verified the application logic is read-only.

Audit logs grow without bound

Set a retention policy on your audit log storage. Windmill does not prune old audit events automatically. Use log rotation or lifecycle rules on your S3 bucket.

Workspace isolation feels too restrictive

Use folders and granular permissions within a workspace before creating new workspaces. Workspaces are hard boundaries; folders are flexible. Start with folders, escalate to workspaces when teams truly need isolation.

Next Steps

You now have the architecture for a resilient, governed, and flexible Windmill deployment. High-availability servers eliminate single points of failure. Multi-workspace policies keep teams isolated and compliant. Custom runtimes let you run exactly the code you need. And a tested backup strategy ensures you sleep through the night.

For related guidance, explore our other Windmill guides:

Need help designing your HA topology, implementing compliance controls, or scaling to thousands of daily jobs? Contact our team for enterprise Windmill architecture consulting and managed infrastructure services.

Windmill Self-Host Setup: Deploy Your Own Workflow Engine in Under 10 Minutes
Get a production-ready Windmill instance running on your own infrastructure with Docker Compose, custom workers, and secure domain configuration.