Windmill Self-Host Setup: Git Sync, Worker Groups, App Builder, and Enterprise Workflow Patterns

Windmill Self-Host Setup: Git Sync, Worker Groups, App Builder, and Enterprise Workflow Patterns

The first Windmill guide covered deployment, basic scripts, and simple workflows. This guide covers the operational depth that engineering teams actually need: Git sync that makes your Windmill workspace reproducible from a repository, worker groups that route expensive jobs to high-memory machines and fast jobs to lightweight ones, the App Builder that turns scripts into internal tools anyone can use, and the workflow patterns — approval gates, error handling, retries, and complex branching — that you need before you can put Windmill into production for anything important.

Prerequisites

A running Windmill instance with PostgreSQL — see our getting started guide
Windmill version 1.280+ — Git sync and App Builder features require recent releases
Admin access to the Windmill instance and workspace
The Windmill CLI (wmill) installed — pip install wmill or npm install -g windmill-cli
A Git repository (GitHub, GitLab, or Gitea) for workspace sync
At least 4GB RAM on the host for production use with multiple worker groups

Verify your Windmill version and CLI connection:

# Check Windmill version:
docker exec windmill_server windmill --version

# Configure CLI to your instance:
wmill workspace add my-workspace \
  https://windmill.yourdomain.com \
  --token YOUR_WINDMILL_TOKEN
wmill workspace switch my-workspace

# Verify connection:
wmill user whoami

# List current scripts and flows:
wmill script list | head -10
wmill flow list | head -10

Git Sync: Version Control for Your Windmill Workspace

Without Git sync, your Windmill scripts and flows live only in the database. One accidental deletion, one catastrophic upgrade, one database failure — and your automation infrastructure is gone. Git sync solves this by continuously mirroring your workspace to a Git repository, giving you version history, code review, rollback capability, and reproducible workspace reconstruction.

Setting Up Git Sync in Windmill

Go to Workspace Settings → Git Sync. Configure:

Git Repository URL: your repo URL (SSH or HTTPS)
Branch: the branch to sync to (e.g., main)
Deploy from Git: enable to make the Git repo the source of truth (pushes to Git trigger deploys)
SSH Key or Token: credentials for Windmill to push to your repo

Directory Structure After Sync

# After enabling Git sync, your repository will have this structure:
# (Windmill creates this automatically)

your-windmill-workspace/
├── u/                          # User-owned scripts and flows
│   └── admin/
│       ├── send_slack_notification.py
│       ├── send_slack_notification.yaml   # Script metadata
│       ├── daily_report_flow.yaml         # Flow definition
│       └── rotate_db_password.sh
├── f/                          # Folder-organized resources
│   ├── devops/
│   │   ├── deploy_service.py
│   │   └── restart_container.sh
│   └── data-team/
│       ├── sync_warehouse.py
│       └── generate_report.sql
├── g/                          # Global resources (shared across workspace)
│   └── all/
│       ├── slack_webhook.resource.yaml    # Encrypted resource definition
│       └── prod_db.resource.yaml
└── wmill-lock.yaml             # Lock file (don't edit manually)

# Pull your workspace to local files:
wmill sync pull --yes
ls -la  # See the synced directory structure

# Make a change locally:
sed -i 's/channel: "#general"/channel: "#alerts"/' \
  u/admin/send_slack_notification.py

# Push changes back to Windmill:
wmill sync push --yes

# Or let Git trigger a deploy:
git add -A && git commit -m "Update notification channel" && git push
# Windmill polls the repo and deploys changes automatically

CI/CD Pipeline for Windmill Workspace

# .github/workflows/windmill-deploy.yml
# Deploy Windmill scripts and flows on push to main

name: Deploy to Windmill

on:
  push:
    branches: [main]
    paths:
      - 'windmill/**'  # Only trigger when Windmill files change

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 2  # Need history for diff

      - name: Install Windmill CLI
        run: pip install wmill

      - name: Configure workspace
        env:
          WINDMILL_URL: ${{ secrets.WINDMILL_URL }}
          WINDMILL_TOKEN: ${{ secrets.WINDMILL_TOKEN }}
        run: |
          wmill workspace add prod "$WINDMILL_URL" --token "$WINDMILL_TOKEN"
          wmill workspace switch prod

      - name: Validate scripts before deploying
        run: |
          cd windmill
          # Run syntax checks on Python scripts
          find . -name '*.py' -exec python3 -m py_compile {} \;
          echo "Python syntax check passed"

      - name: Deploy to Windmill
        run: |
          cd windmill
          wmill sync push --yes --skip-variables --skip-resources
          echo "Deployment complete"

      - name: Smoke test critical flows
        env:
          WINDMILL_URL: ${{ secrets.WINDMILL_URL }}
          WINDMILL_TOKEN: ${{ secrets.WINDMILL_TOKEN }}
        run: |
          # Run a quick test flow to verify deployment health
          wmill flow run f/devops/health_check \
            --data '{"environment": "production"}' \
            --timeout 60
          echo "Health check passed"

Worker Groups: Routing Jobs to the Right Resources

By default, all Windmill jobs run on the default worker group. This means a 10-minute data processing job and a 2-second API call compete for the same workers. Worker groups solve this by letting you create specialized worker pools and route specific scripts or flows to them based on requirements.

Worker Group Architecture

default — handles all unassigned jobs. Keep lightweight for responsiveness.
heavy — high-memory workers for data processing, ML inference, large file operations
gpu — GPU-enabled workers for image processing, ML training
native — for scripts requiring specific system dependencies
report — dedicated to scheduled reporting jobs that can be slower

Configuring Multiple Worker Groups with Docker Compose

# docker-compose.yml — multi-worker-group setup
version: '3.8'

services:
  db:
    image: postgres:15-alpine
    container_name: windmill_db
    restart: unless-stopped
    environment:
      POSTGRES_DB: windmill
      POSTGRES_USER: windmill
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
    networks:
      - windmill_net

  windmill_server:
    image: ghcr.io/windmill-labs/windmill:main
    container_name: windmill_server
    restart: unless-stopped
    ports:
      - "8000:8000"
    environment:
      DATABASE_URL: postgres://windmill:${POSTGRES_PASSWORD}@db/windmill
      BASE_URL: https://windmill.yourdomain.com
      WINDMILL_OAUTH_CLIENTS: '{}'
      MODE: server   # Server-only mode — doesn't execute jobs
    depends_on:
      - db
    networks:
      - windmill_net

  # Fast default worker — handles quick jobs
  worker_default:
    image: ghcr.io/windmill-labs/windmill:main
    restart: unless-stopped
    environment:
      DATABASE_URL: postgres://windmill:${POSTGRES_PASSWORD}@db/windmill
      MODE: worker
      WORKER_GROUP: default
      NUM_WORKERS: 8          # Many workers for fast parallel execution
      SLEEP_QUEUE: 50
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 1G
    depends_on:
      - windmill_server
    networks:
      - windmill_net

  # Heavy worker — for resource-intensive jobs
  worker_heavy:
    image: ghcr.io/windmill-labs/windmill:main
    restart: unless-stopped
    environment:
      DATABASE_URL: postgres://windmill:${POSTGRES_PASSWORD}@db/windmill
      MODE: worker
      WORKER_GROUP: heavy
      NUM_WORKERS: 2          # Fewer workers — each gets more resources
      SLEEP_QUEUE: 100
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    deploy:
      resources:
        limits:
          cpus: '4.0'
          memory: 8G          # Large memory limit for data processing
    depends_on:
      - windmill_server
    networks:
      - windmill_net

  # Report worker — dedicated to scheduled reporting
  worker_report:
    image: ghcr.io/windmill-labs/windmill:main
    restart: unless-stopped
    environment:
      DATABASE_URL: postgres://windmill:${POSTGRES_PASSWORD}@db/windmill
      MODE: worker
      WORKER_GROUP: report
      NUM_WORKERS: 3
      SLEEP_QUEUE: 200
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    depends_on:
      - windmill_server
    networks:
      - windmill_net

volumes:
  postgres_data:

networks:
  windmill_net:

Assigning Scripts to Worker Groups

# Assign a script to a worker group in its YAML metadata:
# This metadata file lives alongside the script file in Git

# u/data-team/sync_data_warehouse.yaml
cat > u/data-team/sync_data_warehouse.yaml << 'EOF'
summary: Sync data from production to warehouse
description: Full ETL sync — typically takes 10-30 minutes
schema:
  $schema: http://json-schema.org/draft-07/schema#
  type: object
  properties:
    table_filter:
      type: string
      description: "Optional table name filter (leave empty for all tables)"
  required: []
worker_group:           # Assign to the heavy worker group
  value: heavy          # This job will only run on heavy workers
EOF

# Or set worker group via the Windmill UI:
# Script → Edit → Advanced → Worker Group Override → heavy

# Monitor worker group activity via Windmill admin:
wmill script run u/admin/worker_status_check

# Check which worker group processed each recent job:
wmill run list --limit 20 | jq '[.[] | {script: .script_path, worker_group: .worker_tag, duration_ms: .duration_ms}]'

The App Builder: Turning Scripts Into Internal Tools

The App Builder is where Windmill shifts from a script runner into a genuine internal tooling platform. You build a drag-and-drop UI — buttons, forms, tables, charts, text inputs — and wire each element to a Windmill script or flow. The result is a usable internal tool that anyone on your team can run without any technical knowledge.

Building a Database Management Tool

A practical example: an internal tool for running database maintenance tasks. Non-technical team members can trigger it with the right parameters without SSH access.

# First, create the backend scripts that the App UI will call:

# Script 1: Get table sizes (READ operation)
# u/devops/get_table_sizes.py

# requirements:
# psycopg2-binary

import psycopg2
from typing import TypedDict

class postgresql(TypedDict):
    host: str
    port: int
    user: str
    password: str
    dbname: str

def main(db: postgresql, schema: str = "public") -> list[dict]:
    """
    Get table sizes and row counts for a schema.
    Returns sorted list of tables by size (largest first).
    """
    conn = psycopg2.connect(**db)
    cur = conn.cursor()

    cur.execute("""
        SELECT
            schemaname,
            tablename,
            pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size,
            pg_total_relation_size(schemaname||'.'||tablename) as size_bytes,
            (SELECT reltuples::bigint FROM pg_class
             WHERE relname = tablename) as row_estimate
        FROM pg_tables
        WHERE schemaname = %s
        ORDER BY size_bytes DESC
    """, (schema,))

    columns = [desc[0] for desc in cur.description]
    rows = cur.fetchall()
    conn.close()

    return [dict(zip(columns, row)) for row in rows]

# Script 2: Run VACUUM ANALYZE (WRITE operation — requires approval flow)
# u/devops/vacuum_table.py

def main(db: postgresql, table_name: str, analyze: bool = True) -> dict:
    """
    Runs VACUUM (and optionally ANALYZE) on a specific table.
    WARNING: Can be slow on large tables.
    """
    # Validate table name to prevent SQL injection
    if not table_name.replace('_', '').replace('.', '').isalnum():
        raise ValueError(f"Invalid table name: {table_name}")

    conn = psycopg2.connect(**db)
    conn.autocommit = True  # VACUUM can't run in a transaction
    cur = conn.cursor()

    command = f"VACUUM {'ANALYZE' if analyze else ''} {table_name}"
    cur.execute(command)
    conn.close()

    return {"status": "complete", "command": command, "table": table_name}

App Builder Configuration

# App Builder layout (configure visually in Windmill UI):
# Apps → Create App → Drag and drop components

# Component 1: Header
# - Text: "Database Maintenance Tool"
# - Subtext: "For production DB maintenance operations"

# Component 2: Select input for database environment
# - Label: "Environment"
# - Options: ["production", "staging"]
# - Variable: selected_env

# Component 3: Select input for schema
# - Label: "Schema"
# - Default: "public"

# Component 4: Button — "Load Tables"
# - Onclick: run script u/devops/get_table_sizes
# - Inputs: db = $res:prod_database_resource, schema = components.schema_select.value

# Component 5: Table display
# - Data source: output of get_table_sizes script
# - Columns: tablename, size, row_estimate
# - Row selection enabled

# Component 6: Text display — show selected table
# - Content: "Selected: " + components.table.selectedRow.tablename

# Component 7: Button — "VACUUM ANALYZE" (dangerous action)
# - Style: danger (red)
# - Onclick: run script u/devops/vacuum_table with approval
# - Inputs: db = resource, table_name = components.table.selectedRow.tablename
# - Confirm modal: "This will VACUUM ANALYZE the selected table. Continue?"

# App permissions — who can use this tool:
# App → Settings → Visibility → Team (specific groups)
# Set read-only users and operator users separately

# Export the app definition to Git:
wmill sync pull --yes  # App definitions are included in sync

Enterprise Workflow Patterns

Production workflows need more than happy-path execution. They need to handle failures gracefully, require human approval before destructive operations, retry transient errors with backoff, and branch based on conditions. Windmill's flow engine handles all of these — here's how to implement them.

Pattern 1: Approval Gates for Destructive Operations

# Flow YAML for a deployment with approval gate:
# f/devops/deploy_with_approval.yaml

summary: Deploy service with human approval gate
description: >
  Deploys a service to production. Requires approval from a
  DevOps team member before the actual deployment runs.
schema:
  type: object
  properties:
    service:
      type: string
      description: Service name to deploy
    version:
      type: string
      description: Docker image tag to deploy
    environment:
      type: string
      enum: [staging, production]
modules:

  # Step 1: Validate inputs and check prerequisites
  - id: validate
    value:
      type: rawscript
      language: python3
      content: |
        def main(service: str, version: str, environment: str) -> dict:
            # Check the image exists before asking for approval
            import requests
            resp = requests.head(
                f"https://registry.yourdomain.com/v2/{service}/manifests/{version}",
                headers={"Authorization": "Bearer TOKEN"}
            )
            if resp.status_code != 200:
                raise Exception(f"Image {service}:{version} not found in registry")

            return {
                "service": service,
                "version": version,
                "environment": environment,
                "image_confirmed": True
            }

  # Step 2: Approval gate — pauses until a human approves
  - id: approval
    value:
      type: approval
      approvers:
        - type: group
          name: devops-team
      # Message shown in the approval notification:
      message: |
        Deploy request: {{ flow_input.service }}:{{ flow_input.version }}
        Environment: {{ flow_input.environment }}

        Image verified: {{ results.validate.image_confirmed }}

        Please review and approve or reject this deployment.
      timeout: 3600  # 1 hour — auto-reject if no response

  # Step 3: Notify that approval was received
  - id: notify_approved
    value:
      type: rawscript
      language: python3
      content: |
        def main(service: str, version: str, environment: str) -> str:
            import requests
            requests.post(
                SLACK_WEBHOOK,
                json={"text": f"✅ Deploy approved: {service}:{version} → {environment}"}
            )
            return "notified"

  # Step 4: Execute the actual deployment
  - id: deploy
    value:
      type: script
      path: f/devops/deploy_service
    depends_on: [approval]

  # Step 5: Verify deployment health
  - id: health_check
    value:
      type: rawscript
      language: bash
      content: |
        #!/bin/bash
        for i in {1..12}; do
          STATUS=$(curl -sf "https://$SERVICE/health" | jq -r .status)
          [ "$STATUS" = "ok" ] && echo "healthy" && exit 0
          sleep 10
        done
        echo "UNHEALTHY" && exit 1

Pattern 2: Retry with Exponential Backoff

# Flow with retry configuration and error handling:
# f/data-team/sync_external_api.yaml

modules:
  # API sync with retry built into the step:
  - id: fetch_data
    value:
      type: rawscript
      language: python3
      content: |
        import requests
        import time

        def main(api_endpoint: str, api_key: str) -> dict:
            """Fetch data from external API — may fail transiently."""
            response = requests.get(
                api_endpoint,
                headers={"Authorization": f"Bearer {api_key}"},
                timeout=30
            )
            # Will raise on 4xx/5xx — triggers retry
            response.raise_for_status()
            return response.json()
    # Retry configuration:
    retry:
      constant:
        attempts: 5        # Try up to 5 times
        seconds: 30        # Wait 30 seconds between retries
    # Or use exponential backoff:
    # retry:
    #   exponential:
    #     attempts: 5
    #     multiplier: 2
    #     seconds: 10      # First retry: 10s, then 20s, 40s, 80s, 160s

  # Process data (no retry — if this fails, it's not transient)
  - id: process
    value:
      type: rawscript
      language: python3
      content: |
        def main(data: dict) -> dict:
            # Transform and validate the data
            return {"processed": True, "count": len(data.get("items", []))}

  # Error handler — runs if ANY step fails:
  - id: on_failure
    value:
      type: rawscript
      language: python3
      content: |
        def main(error: str, step: str) -> None:
            import requests
            requests.post(SLACK_WEBHOOK, json={
                "text": f"❌ Sync failed at step '{step}':\n{error}",
                "channel": "#data-alerts"
            })
    type: failure  # This module only runs on failure, not on success

Pattern 3: Fan-Out with Parallel Processing

# Process multiple items in parallel using Flow's loop feature:
# f/data-team/process_all_regions.yaml

modules:
  # Step 1: Get list of regions to process
  - id: get_regions
    value:
      type: rawscript
      language: python3
      content: |
        def main() -> list[str]:
            return ["us-east", "eu-west", "ap-south", "us-west", "eu-central"]

  # Step 2: Process each region in parallel
  - id: process_regions
    value:
      type: forloopflow
      iterator:
        type: javascript
        expr: 'results.get_regions'  # Iterate over the list from step 1
      parallel: true                  # Run all iterations simultaneously
      parallelism: 5                  # Max 5 parallel workers
      modules:
        # These modules run for each region:
        - id: fetch_region_data
          value:
            type: rawscript
            language: python3
            content: |
              def main(iter: dict) -> dict:
                  region = iter["value"]  # Current iteration value
                  # fetch_data_for_region(region)
                  return {"region": region, "records": 1000}

        - id: aggregate_region
          value:
            type: rawscript
            language: python3
            content: |
              def main(region: str, records: int) -> dict:
                  # aggregate(region, records)
                  return {"region": region, "status": "done"}

  # Step 3: Collect all results after parallel processing completes
  - id: summarize
    value:
      type: rawscript
      language: python3
      content: |
        def main(region_results: list) -> dict:
            total = sum(r.get("records", 0) for r in region_results)
            return {
                "regions_processed": len(region_results),
                "total_records": total,
                "status": "complete"
            }

Monitoring and Observability

Job Monitoring and Alerting

#!/usr/bin/env python3
# windmill-monitor.py
# Monitors Windmill job health and alerts on failures or long-running jobs

import requests
import os
from datetime import datetime, timezone

WINDMILL_URL = os.environ["WINDMILL_URL"]
WINDMILL_TOKEN = os.environ["WINDMILL_TOKEN"]
SLACK_WEBHOOK = os.environ.get("SLACK_WEBHOOK")
WORKSPACE = "my-workspace"

HEADERS = {"Authorization": f"Bearer {WINDMILL_TOKEN}"}

def get_failed_jobs(since_minutes: int = 60) -> list:
    """Get jobs that failed in the last N minutes."""
    resp = requests.get(
        f"{WINDMILL_URL}/api/w/{WORKSPACE}/jobs/list",
        headers=HEADERS,
        params={"running": False, "success": False, "per_page": 50}
    )
    resp.raise_for_status()
    jobs = resp.json()

    # Filter to recent failures only
    cutoff = datetime.now(timezone.utc).timestamp() - (since_minutes * 60)
    return [
        j for j in jobs
        if j.get("created_at", 0) / 1000 > cutoff
        and j.get("type") == "CompletedJob"
        and not j.get("success")
    ]

def get_long_running_jobs(threshold_minutes: int = 30) -> list:
    """Get jobs running longer than the threshold."""
    resp = requests.get(
        f"{WINDMILL_URL}/api/w/{WORKSPACE}/jobs/list",
        headers=HEADERS,
        params={"running": True, "per_page": 50}
    )
    resp.raise_for_status()
    jobs = resp.json()

    threshold_ms = threshold_minutes * 60 * 1000
    now_ms = datetime.now(timezone.utc).timestamp() * 1000
    return [
        j for j in jobs
        if (now_ms - j.get("started_at", now_ms)) > threshold_ms
    ]

def send_alert(message: str):
    if SLACK_WEBHOOK:
        requests.post(SLACK_WEBHOOK, json={"text": message})
    print(message)

# Run checks:
failed = get_failed_jobs(60)
if failed:
    scripts = [j.get("script_path", "unknown") for j in failed]
    send_alert(f"🔴 Windmill: {len(failed)} failed jobs in last hour\n"
               f"Scripts: {', '.join(set(scripts))}")

long_running = get_long_running_jobs(30)
if long_running:
    jobs = [f"{j.get('script_path')} ({(datetime.now(timezone.utc).timestamp() * 1000 - j.get('started_at', 0)) / 60000:.0f}m)" for j in long_running]
    send_alert(f"⚠️ Windmill: {len(long_running)} jobs running >30min:\n" + "\n".join(jobs))

if not failed and not long_running:
    print("Windmill health check: OK")

# Add to crontab:
# */10 * * * * python3 /opt/scripts/windmill-monitor.py >> /var/log/windmill-monitor.log 2>&1

Tips, Gotchas, and Troubleshooting

Git Sync Conflicts and Recovery

# If Git sync gets out of sync (edited in UI and in Git simultaneously):

# Option 1: Take UI as source of truth — pull from Windmill to Git:
wmill sync pull --yes
git add -A && git commit -m "Sync from Windmill: resolve conflict"
git push

# Option 2: Take Git as source of truth — push Git to Windmill:
# Force-push to Windmill (WARNING: overwrites UI changes)
wmill sync push --yes --skip-variables

# Option 3: Check what's different:
wmill sync pull --yes  # Pull to local
git diff             # See what changed
git stash            # Stash remote version
wmill sync pull --yes  # Get current Windmill state
git stash pop        # Apply your changes on top
git add -A && git diff --cached  # Review the merge

# Prevent conflicts: enforce Git-only changes in team settings
# Workspace Settings → Git Sync → Require Deploy from Git
# This makes the UI read-only for scripts managed in Git

# Debug sync issues:
wmill sync status
# Shows which files differ between Git repo and Windmill instance

Worker Group Jobs Not Being Picked Up

# Check if the worker group is running:
docker compose ps | grep worker

# Check worker logs for connection errors:
docker compose logs worker_heavy --tail 30

# Verify the worker is registering with the correct group name:
docker compose logs worker_heavy | grep -i 'worker_group\|WORKER_GROUP\|registered'

# Check if jobs are stuck in queue for that group:
# Windmill Admin → Workers → check the heavy group queue depth

# Common issues:
# 1. WORKER_GROUP env var doesn't match what's set in script metadata
#    Script YAML: worker_group.value must exactly match the WORKER_GROUP env var

# 2. No workers running for that group:
#    docker compose ps | grep worker_heavy
#    If missing, check docker-compose.yml and restart

# 3. Worker group name with spaces or special chars — use only alphanumeric + hyphen/underscore

# Test by manually running a job and watching which worker picks it up:
wmill script run u/devops/test_script --tag heavy
docker compose logs worker_heavy --follow  # Should show the job being picked up

App Builder Components Not Updating

# App Builder component outputs not refreshing when inputs change:

# Check the trigger configuration:
# Component → Runnable → Trigger settings
# Options:
#   - Manual (button click only)
#   - On change (re-runs when any input changes)
#   - On load (runs once when app loads)

# If a table isn't showing new data after a button click:
# The table's data source must reference the script output, not a static value
# Data source should be: runnables.load_tables.result
# NOT: [{"tablename": "static", ...}]

# Debug script execution in App Builder:
# Click the play button on the runnable
# Click "Logs" to see execution output
# If the script errors, you'll see the Python/TS traceback here

# Resource connection issues in App Builder:
# If script uses db: postgresql resource, verify:
# 1. Resource exists: Windmill → Resources → check prod_database exists
# 2. User has access: Settings → Permissions → can the app user access this resource?
# 3. Resource type matches: the TypedDict in script must match the resource schema

Pro Tips

Use Windmill's built-in secret management instead of env vars — store sensitive values as Windmill Variables (encrypted secrets) rather than Docker environment variables. Scripts reference them as $var:SECRET_NAME — they're version-controlled (the name, not the value) and access-controlled per user/group.
Build a script library for common operations — create reusable scripts for things every flow needs: send Slack notification, create Gitea issue, log to database, query PostgreSQL. Reference them as flow steps rather than copy-pasting code between flows. When the Slack format changes, you fix it once.
Use approval steps to enforce four-eyes principle on production changes — any flow that modifies production data, rotates credentials, or deploys services should have an approval gate. This is a policy decision enforced at the workflow level, not dependent on individual scripts being careful.
Export App definitions with wmill sync pull — Apps are included in the Git sync output as YAML. Treat them like code: review changes in pull requests, roll back if an App update breaks something, and deploy through CI/CD just like scripts.
Set realistic timeouts for each script based on observed p95 runtime — Windmill jobs have a default timeout that terminates long-running jobs. For data processing scripts that legitimately take 20 minutes, set the timeout to 30 minutes in the script metadata. For quick API calls, set 30 seconds to fail fast.

Wrapping Up

Advanced Windmill self-host setup transforms it from a script runner into a proper internal platform. Git sync means your automation infrastructure is version-controlled and reproducible. Worker groups ensure expensive jobs don't starve fast ones. The App Builder means non-technical team members can trigger operations safely without CLI access. And enterprise workflow patterns — approval gates, parallel processing, retry with backoff, and error handling — are what make Windmill suitable for automations your business actually depends on.

Together with the foundational Windmill guide covering deployment, basic scripts, and simple workflows, these two guides give you a complete picture of running Windmill as serious infrastructure rather than an experiment.

Need a Production Internal Tooling Platform Built for Your Team?

Designing Windmill with proper Git sync, worker group architecture, App Builder tools for your team's specific workflows, and enterprise approval patterns for production operations — the sysbrix team builds internal tooling platforms that engineering organizations can actually rely on for day-to-day operations.

Talk to Us →

in Guides

# Automation GitOps Internal Tools Self-Hosted Windmill

Keycloak Docker Setup Guide: User Federation, Custom Themes, Fine-Grained Authorization, and High Availability

Go beyond basic Keycloak deployment — learn how to sync users from LDAP/Active Directory, build branded login themes, implement fine-grained resource authorization, and run Keycloak in a highly available cluster behind a load balancer.