Windmill Self-Host Setup: AI-Powered Workflows, Native Integrations, Audit Logging, and Scaling to Production

Windmill Self-Host Setup: AI-Powered Workflows, Native Integrations, Audit Logging, and Scaling to Production

The first two guides in this series covered the foundational patterns: deploying Windmill and building scripts and basic workflows, then Git sync, worker groups, the App Builder, and enterprise workflow patterns. This third guide goes to production depth: AI-powered workflow steps that call LLMs for classification, extraction, and generation tasks; native Windmill integrations that connect your SaaS tools without writing boilerplate; compliance-grade audit logging for regulated environments; and the performance and scaling patterns that keep Windmill responsive when multiple teams are running jobs simultaneously.

Prerequisites

A production Windmill instance with PostgreSQL and worker groups configured — see our advanced configuration guide
Windmill version 1.290+ — AI workflow features covered here require recent releases
Ollama running for local LLM access, or an OpenAI/LiteLLM API key for cloud models
At least 4 vCPU and 8GB RAM for production use with AI workflow steps
Admin access and at least 10 workflows already built — this guide assumes operational familiarity

Check your Windmill deployment is ready for the patterns in this guide:

# Verify Windmill version:
docker exec windmill_server windmill --version

# Check worker groups are running:
docker compose ps | grep worker
# Should show: worker_default, worker_heavy (or your named groups)

# Verify PostgreSQL connection:
docker exec windmill_server bash -c \
  'psql $DATABASE_URL -c "SELECT COUNT(*) as jobs FROM job;"'

# Check Ollama is reachable from Windmill containers:
docker exec windmill_worker_default \
  curl -s http://172.17.0.1:11434/api/tags | jq '[.models[].name]'

# List all workspaces:
wmill workspace list

AI-Powered Workflows: LLMs as Workflow Steps

Windmill treats LLM calls like any other script — a function that takes structured input and returns structured output. This means you can chain AI steps with database queries, API calls, and human approval gates in a single workflow. The result is AI automation that's auditable, retryable, and inspectable rather than a black box.

Structured LLM Output with Instructor

The hardest part of using LLMs in automation isn't the API call — it's getting reliable structured output. The instructor library wraps OpenAI (and OpenAI-compatible APIs) to guarantee Pydantic-validated responses:

# u/ai-team/classify_support_ticket.py
# requirements:
# openai>=1.0.0
# instructor>=0.5.0
# pydantic>=2.0.0

from pydantic import BaseModel, Field
from typing import Literal
from enum import Enum
import instructor
from openai import OpenAI

# Use LiteLLM proxy URL or direct OpenAI:
DEFAULT_BASE_URL = "http://172.17.0.1:4000/v1"  # LiteLLM proxy

class Priority(str, Enum):
    low = "low"
    medium = "medium"
    high = "high"
    critical = "critical"

class SupportTicketClassification(BaseModel):
    category: Literal["billing", "technical", "account", "feature_request", "other"]
    priority: Priority
    sentiment: Literal["positive", "neutral", "frustrated", "angry"]
    requires_human: bool = Field(
        description="True if this needs human attention, False if AI can handle it"
    )
    summary: str = Field(max_length=200, description="One sentence summary")
    suggested_response: str = Field(
        description="Draft response to send to the customer",
        max_length=500
    )

def main(
    ticket_text: str,
    customer_email: str,
    api_key: str = "$var:OPENAI_API_KEY",
    model: str = "gpt-4o-mini",
    base_url: str = DEFAULT_BASE_URL
) -> dict:
    """
    Classifies a support ticket and generates a draft response.
    Returns structured data safe for downstream workflow steps.
    """
    client = instructor.from_openai(
        OpenAI(api_key=api_key, base_url=base_url)
    )

    result = client.chat.completions.create(
        model=model,
        response_model=SupportTicketClassification,
        messages=[
            {
                "role": "system",
                "content": "You are a support ticket classifier for Acme Corp. "
                           "Classify tickets accurately and draft helpful responses."
            },
            {
                "role": "user",
                "content": f"Customer: {customer_email}\n\nTicket:\n{ticket_text}"
            }
        ],
        max_retries=3  # instructor retries on validation failure
    )

    return {
        "category": result.category,
        "priority": result.priority.value,
        "sentiment": result.sentiment,
        "requires_human": result.requires_human,
        "summary": result.summary,
        "suggested_response": result.suggested_response,
        "customer_email": customer_email
    }

Complete AI Support Triage Flow

# f/support/ai_triage_flow.yaml — Complete AI-powered support triage
# This flow: receives ticket → AI classifies → routes to appropriate handler

summary: AI Support Ticket Triage
schema:
  type: object
  properties:
    ticket_text:
      type: string
      description: Full text of the support ticket
    customer_email:
      type: string
      format: email
    ticket_id:
      type: string
  required: [ticket_text, customer_email, ticket_id]

modules:
  # Step 1: AI classification with retry
  - id: classify
    value:
      type: script
      path: u/ai-team/classify_support_ticket
    retry:
      constant:
        attempts: 3
        seconds: 10

  # Step 2: Branch based on AI classification
  - id: route
    value:
      type: branchall  # Run ALL matching branches
      branches:

        # Branch A: High priority → create urgent ticket in Jira
        - summary: Critical/High Priority
          skip_failure: false
          modules:
            - id: create_urgent_ticket
              value:
                type: script
                path: hub/jira/create_issue  # From Windmill Hub
              input_transforms:
                project_key: {type: static, value: "SUPPORT"}
                issue_type: {type: static, value: "Bug"}
                summary:
                  type: javascript
                  expr: 'results.classify.summary'
                priority:
                  type: javascript
                  expr: 'results.classify.priority.toUpperCase()'
                description:
                  type: javascript
                  expr: '`Customer: ${flow_input.customer_email}\n\n${flow_input.ticket_text}`'
          expr: '"critical" === results.classify.priority || "high" === results.classify.priority'

        # Branch B: Requires human → slack alert
        - summary: Needs Human Review
          modules:
            - id: notify_team
              value:
                type: rawscript
                language: python3
                content: |
                  import requests
                  def main(classify_result: dict, ticket_id: str, customer_email: str) -> str:
                      requests.post(
                          "$var:SLACK_WEBHOOK",
                          json={
                              "text": f"🎫 *Support ticket needs review* (#{ticket_id})\n"
                                     f"Customer: {customer_email}\n"
                                     f"Category: {classify_result['category']} | "
                                     f"Priority: {classify_result['priority']}\n"
                                     f"Summary: {classify_result['summary']}"
                          }
                      )
                      return "notified"
          expr: 'results.classify.requires_human === true'

        # Branch C: Auto-handle → send AI draft response
        - summary: Auto-respond
          modules:
            - id: send_response
              value:
                type: rawscript
                language: python3
                content: |
                  def main(classify_result: dict, customer_email: str) -> dict:
                      # Send the AI-drafted response via email
                      # send_email(to=customer_email, body=classify_result['suggested_response'])
                      return {"sent_to": customer_email, "auto_responded": True}
          expr: 'results.classify.requires_human === false'

RAG-Powered Scripts with Local Ollama

# u/ai-team/answer_from_docs.py
# RAG script using local Ollama — no data leaves your infrastructure

# requirements:
# requests
# chromadb
# ollama

import ollama
import chromadb
import os
from pathlib import Path

# ChromaDB stores embeddings on disk — persist across Windmill job runs
DB_PATH = "/tmp/chroma_db"  # Or use a shared volume for persistence

def embed_text(text: str, model: str = "nomic-embed-text") -> list[float]:
    """Generate embedding using local Ollama."""
    response = ollama.embed(model=model, input=text)
    return response["embeddings"][0]

def main(
    question: str,
    collection_name: str = "company-docs",
    top_k: int = 4,
    model: str = "llama3.1:8b"
) -> dict:
    """
    Answer a question using RAG with local embeddings and LLM.
    Requires documents to be pre-indexed in the collection.
    """
    # Connect to ChromaDB:
    client = chromadb.PersistentClient(path=DB_PATH)

    try:
        collection = client.get_collection(collection_name)
    except Exception:
        return {"error": f"Collection '{collection_name}' not found. Run indexing first."}

    # Retrieve relevant chunks:
    query_embedding = embed_text(question)
    results = collection.query(
        query_embeddings=[query_embedding],
        n_results=top_k,
        include=["documents", "metadatas", "distances"]
    )

    # Filter by relevance threshold (cosine distance < 0.5 = relevant):
    relevant_chunks = [
        (doc, meta, dist)
        for doc, meta, dist in zip(
            results["documents"][0],
            results["metadatas"][0],
            results["distances"][0]
        )
        if dist < 0.5
    ]

    if not relevant_chunks:
        return {
            "answer": "I couldn't find relevant information to answer this question.",
            "sources": [],
            "question": question
        }

    # Build context from retrieved chunks:
    context = "\n\n".join([
        f"Source: {meta.get('source', 'unknown')}\n{doc}"
        for doc, meta, _ in relevant_chunks
    ])

    # Generate answer with local LLM:
    response = ollama.chat(
        model=model,
        messages=[
            {
                "role": "system",
                "content": "Answer the question using ONLY the provided context. "
                           "If the context doesn't contain the answer, say so clearly."
            },
            {
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {question}"
            }
        ]
    )

    return {
        "answer": response["message"]["content"],
        "sources": list({meta.get("source", "unknown") for _, meta, _ in relevant_chunks}),
        "question": question,
        "chunks_used": len(relevant_chunks)
    }

Native Windmill Integrations

The Windmill Hub (hub.windmill.dev) provides pre-built, tested integration scripts for dozens of services — GitHub, Slack, Jira, Stripe, Postgres, S3, and more. These are production-tested scripts you can call from your flows without writing the API integration yourself.

Using Hub Scripts in Flows

# Reference hub scripts directly in your flow YAML:
# Format: hub/{service}/{operation}

# Examples of useful hub scripts:
# hub/github/create_issue
# hub/github/list_pull_requests
# hub/slack/send_message
# hub/jira/create_issue
# hub/jira/transition_issue
# hub/stripe/create_customer
# hub/sendgrid/send_email
# hub/postgres/run_query
# hub/s3/upload_file
# hub/notion/create_page

# Flow using hub scripts for a complete GitHub PR notification workflow:
# f/devops/notify_pr_created.yaml

modules:
  # Get PR details from GitHub:
  - id: get_pr
    value:
      type: script
      path: hub/github/get_pull_request
    input_transforms:
      token: {type: javascript, expr: '$var:GITHUB_TOKEN'}
      owner: {type: javascript, expr: 'flow_input.repo_owner'}
      repo: {type: javascript, expr: 'flow_input.repo_name'}
      pull_number: {type: javascript, expr: 'flow_input.pr_number'}

  # Run automated review with AI:
  - id: ai_review
    value:
      type: script
      path: u/ai-team/review_pull_request
    input_transforms:
      title: {type: javascript, expr: 'results.get_pr.title'}
      description: {type: javascript, expr: 'results.get_pr.body'}
      changed_files: {type: javascript, expr: 'results.get_pr.changed_files'}

  # Post AI review as GitHub comment:
  - id: post_comment
    value:
      type: script
      path: hub/github/create_issue_comment
    input_transforms:
      token: {type: javascript, expr: '$var:GITHUB_TOKEN'}
      owner: {type: javascript, expr: 'flow_input.repo_owner'}
      repo: {type: javascript, expr: 'flow_input.repo_name'}
      issue_number: {type: javascript, expr: 'flow_input.pr_number'}
      body: {type: javascript, expr: '"## AI Review\n\n" + results.ai_review.feedback'}

  # Notify team in Slack:
  - id: slack_notify
    value:
      type: script
      path: hub/slack/send_message
    input_transforms:
      token: {type: javascript, expr: '$var:SLACK_BOT_TOKEN'}
      channel: {type: static, value: '#code-review'}
      text:
        type: javascript
        expr: |
          `New PR ready for review: <${results.get_pr.html_url}|${results.get_pr.title}>\n` +
          `Author: ${results.get_pr.user.login}\n` +
          `AI Assessment: ${results.ai_review.recommendation}`

Building a Reusable Integration Library

# Create a folder of thin wrapper scripts that standardize how your team
# interacts with external services — consistent error handling, logging, retry

# f/integrations/send_notification.py
# A unified notification script that routes to the right channel

# requirements:
# requests

import requests
from typing import Literal, Optional

def main(
    message: str,
    title: str = "",
    channel: Literal["slack", "telegram", "email", "auto"] = "auto",
    severity: Literal["info", "warning", "error", "critical"] = "info",
    slack_webhook: str = "$var:SLACK_OPS_WEBHOOK",
    telegram_token: str = "$var:TELEGRAM_BOT_TOKEN",
    telegram_chat_id: str = "$var:TELEGRAM_OPS_CHAT_ID",
    recipient_email: Optional[str] = None
) -> dict:
    """
    Unified notification sender. Automatically routes critical alerts
    to Telegram (immediate), warnings to Slack, info to email.
    """
    results = []

    # Determine channels based on severity if auto:
    channels = [channel] if channel != "auto" else {
        "info": ["email"],
        "warning": ["slack"],
        "error": ["slack", "telegram"],
        "critical": ["slack", "telegram"]
    }.get(severity, ["slack"])

    emoji = {"info": "ℹ️", "warning": "⚠️", "error": "🔴", "critical": "🚨"}[severity]
    full_message = f"{emoji} {'**' + title + '**' + chr(10) if title else ''}{message}"

    for ch in channels:
        try:
            if ch == "slack":
                resp = requests.post(slack_webhook, json={"text": full_message})
                resp.raise_for_status()
                results.append({"channel": "slack", "status": "sent"})

            elif ch == "telegram":
                resp = requests.post(
                    f"https://api.telegram.org/bot{telegram_token}/sendMessage",
                    json={"chat_id": telegram_chat_id, "text": full_message, "parse_mode": "Markdown"}
                )
                resp.raise_for_status()
                results.append({"channel": "telegram", "status": "sent"})

        except Exception as e:
            results.append({"channel": ch, "status": "failed", "error": str(e)})

    all_sent = all(r["status"] == "sent" for r in results)
    return {
        "delivered": all_sent,
        "channels": results,
        "message_preview": full_message[:100]
    }

Compliance Audit Logging

For teams in regulated industries or with internal compliance requirements, every Windmill job execution needs to be auditable: who triggered it, when, with what inputs, what the outcome was, and how long it took. Windmill stores this in PostgreSQL — the compliance work is extracting it systematically and ensuring it's retained appropriately.

Querying the Windmill Audit Trail

-- Windmill audit queries (run against the PostgreSQL database)
-- These queries cover the common compliance evidence requirements

-- 1. All job executions for a specific user in the last 30 days:
SELECT
    j.id,
    j.created_by,
    j.script_path,
    j.flow_path,
    j.created_at,
    j.started_at,
    j.duration_ms,
    j.success,
    j.workspace_id
FROM v_completed_job j
WHERE
    j.created_by = '[email protected]'
    AND j.created_at > NOW() - INTERVAL '30 days'
ORDER BY j.created_at DESC;

-- 2. All executions of a specific script (change audit):
SELECT
    j.created_by,
    j.created_at,
    j.duration_ms,
    j.success,
    j.result->'error' as error_message
FROM v_completed_job j
WHERE
    j.script_path = 'u/devops/deploy_service'
    AND j.created_at > NOW() - INTERVAL '90 days'
ORDER BY j.created_at DESC;

-- 3. Failed jobs with error details for incident investigation:
SELECT
    j.script_path,
    j.flow_path,
    j.created_by,
    j.created_at,
    j.result->>'error' as error,
    j.args as input_parameters
FROM v_completed_job j
WHERE
    j.success = false
    AND j.created_at > NOW() - INTERVAL '7 days'
ORDER BY j.created_at DESC
LIMIT 100;

-- 4. Script version history (what changed and when):
SELECT
    s.path,
    s.created_by as changed_by,
    s.created_at as changed_at,
    s.description,
    length(s.content) as script_size_chars
FROM script s
WHERE s.workspace_id = 'my-workspace'
ORDER BY s.path, s.created_at DESC;

-- 5. Monthly usage summary per user (for cost allocation):
SELECT
    created_by as user,
    COUNT(*) as total_jobs,
    SUM(duration_ms) / 1000.0 as total_seconds,
    COUNT(*) FILTER (WHERE success = false) as failed_jobs,
    COUNT(DISTINCT script_path) as distinct_scripts_used
FROM v_completed_job
WHERE created_at >= DATE_TRUNC('month', NOW())
GROUP BY created_by
ORDER BY total_jobs DESC;

Automated Compliance Report Generator

# u/compliance/generate_monthly_audit_report.py
# Generates a monthly Windmill audit report for compliance teams
# Schedule: 0 8 1 * * (1st of each month at 8am)

# requirements:
# psycopg2-binary

import psycopg2
import json
import os
from datetime import datetime, date
from dateutil.relativedelta import relativedelta

def main(
    database_url: str = "$var:WINDMILL_DATABASE_URL",
    workspace: str = "my-workspace",
    report_month_offset: int = 1  # 1 = last month, 0 = current month
) -> dict:
    """
    Generates a compliance audit report for the specified month.
    Returns structured data suitable for email or storage.
    """
    today = date.today()
    report_start = (today - relativedelta(months=report_month_offset)).replace(day=1)
    report_end = report_start + relativedelta(months=1)
    period_label = report_start.strftime("%B %Y")

    conn = psycopg2.connect(database_url)
    cur = conn.cursor()

    # Total execution summary:
    cur.execute("""
        SELECT
            COUNT(*) as total,
            SUM(CASE WHEN success THEN 1 ELSE 0 END) as successful,
            SUM(CASE WHEN NOT success THEN 1 ELSE 0 END) as failed,
            ROUND(AVG(duration_ms)) as avg_duration_ms,
            COUNT(DISTINCT created_by) as active_users
        FROM v_completed_job
        WHERE workspace_id = %s
          AND created_at >= %s AND created_at < %s
    """, (workspace, report_start, report_end))
    summary = dict(zip([d[0] for d in cur.description], cur.fetchone()))

    # Top 10 most-used scripts:
    cur.execute("""
        SELECT script_path, COUNT(*) as runs,
               SUM(CASE WHEN NOT success THEN 1 ELSE 0 END) as failures
        FROM v_completed_job
        WHERE workspace_id = %s AND script_path IS NOT NULL
          AND created_at >= %s AND created_at < %s
        GROUP BY script_path
        ORDER BY runs DESC LIMIT 10
    """, (workspace, report_start, report_end))
    cols = [d[0] for d in cur.description]
    top_scripts = [dict(zip(cols, row)) for row in cur.fetchall()]

    # Scripts modified during the period (change log):
    cur.execute("""
        SELECT DISTINCT ON (path)
            path, created_by as modified_by,
            created_at as modified_at, description
        FROM script
        WHERE workspace_id = %s
          AND created_at >= %s AND created_at < %s
        ORDER BY path, created_at DESC
    """, (workspace, report_start, report_end))
    cols = [d[0] for d in cur.description]
    script_changes = [dict(zip(cols, row)) for row in cur.fetchall()]

    # Users who executed sensitive scripts (flag based on path prefix):
    cur.execute("""
        SELECT created_by, script_path, created_at, success
        FROM v_completed_job
        WHERE workspace_id = %s
          AND (script_path LIKE '%/deploy%' OR script_path LIKE '%/delete%'
               OR script_path LIKE '%/rotate%' OR script_path LIKE '%/admin%')
          AND created_at >= %s AND created_at < %s
        ORDER BY created_at DESC
    """, (workspace, report_start, report_end))
    cols = [d[0] for d in cur.description]
    sensitive_ops = [dict(zip(cols, row)) for row in cur.fetchall()]

    conn.close()

    report = {
        "report_type": "Windmill Monthly Audit Report",
        "period": period_label,
        "workspace": workspace,
        "generated_at": datetime.utcnow().isoformat(),
        "execution_summary": summary,
        "top_scripts": top_scripts,
        "script_changes": script_changes,
        "sensitive_operations": sensitive_ops,
        "sensitive_ops_count": len(sensitive_ops)
    }

    return report

Scaling Windmill for Production Workloads

Database Performance Tuning for High Job Volume

# PostgreSQL tuning for Windmill at scale (100+ jobs/hour)
# Add to postgresql.conf or docker-compose environment:

# For 4GB RAM PostgreSQL dedicated to Windmill:
# shared_buffers = 1GB
# effective_cache_size = 3GB
# work_mem = 16MB
# maintenance_work_mem = 256MB
# max_connections = 200
# wal_level = minimal  # Windmill doesn't need logical replication

# Monitor Windmill's PostgreSQL usage:
docker exec windmill_db psql -U windmill windmill << 'EOF'
-- Jobs per hour over the last 24 hours:
SELECT
    DATE_TRUNC('hour', created_at) as hour,
    COUNT(*) as jobs,
    AVG(duration_ms)::int as avg_ms
FROM v_completed_job
WHERE created_at > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour;

-- Check for job queue depth (jobs waiting for workers):
SELECT tag, COUNT(*) as queued
FROM queue
GROUP BY tag
ORDER BY queued DESC;

-- Identify slowest scripts (p95 by script):
SELECT
    script_path,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY duration_ms) as p95_ms,
    COUNT(*) as runs
FROM v_completed_job
WHERE created_at > NOW() - INTERVAL '7 days'
  AND script_path IS NOT NULL
GROUP BY script_path
HAVING COUNT(*) > 10
ORDER BY p95_ms DESC
LIMIT 20;
EOF

Horizontal Scaling: Running Workers on Multiple Servers

# Running Windmill workers on a separate, more powerful server
# The workers connect to the same PostgreSQL database but run on different hardware

# On the WORKER SERVER (different machine from the Windmill server):
cat > docker-compose.worker.yml << 'EOF'
version: '3.8'

services:
  # Default worker on worker server:
  worker_default_remote:
    image: ghcr.io/windmill-labs/windmill:main
    restart: unless-stopped
    environment:
      DATABASE_URL: postgresql://windmill:${POSTGRES_PASSWORD}@DB_SERVER_IP:5432/windmill
      MODE: worker
      WORKER_GROUP: default
      NUM_WORKERS: 16
      SLEEP_QUEUE: 50
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    deploy:
      resources:
        limits:
          cpus: '8.0'
          memory: 8G

  # GPU worker for ML workloads (if the worker server has GPU):
  worker_gpu:
    image: ghcr.io/windmill-labs/windmill:main
    restart: unless-stopped
    environment:
      DATABASE_URL: postgresql://windmill:${POSTGRES_PASSWORD}@DB_SERVER_IP:5432/windmill
      MODE: worker
      WORKER_GROUP: gpu
      NUM_WORKERS: 2
      SLEEP_QUEUE: 200
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
EOF

# Start workers on the remote server:
docker compose -f docker-compose.worker.yml up -d

# Monitor worker distribution:
wmill run u/admin/worker_health_check
# Should show workers from both servers in the worker list

# In Windmill admin UI, go to:
# Workers → should show workers from both servers
# Jobs → can filter by worker to see distribution

Tips, Gotchas, and Troubleshooting

AI Workflow Steps Failing Inconsistently

# Intermittent failures in AI steps usually have three causes:

# 1. LLM response didn't match expected schema (structured output failures)
# Fix: Use instructor library with max_retries=3
# The library automatically retries with the validation error as feedback

# 2. LLM API rate limit hit
# Symptom: RateLimitError or 429 in job logs
docker compose logs worker_default | grep -i 'rate_limit\|429\|ratelimit'
# Fix: Add retry with backoff in your flow:
# retry:
#   exponential:
#     attempts: 5
#     multiplier: 2
#     seconds: 5  # First retry: 5s, then 10s, 20s, 40s, 80s

# 3. Ollama timeout for large models
# Symptom: TimeoutError after ~30 seconds
# Fix: Increase timeout in the Ollama HTTP call:
import requests
response = requests.post(
    "http://172.17.0.1:11434/api/chat",
    json={"model": "llama3.1:70b", "messages": messages},
    timeout=300  # 5 minutes for large models
)

# Check Ollama is loaded and responding:
docker exec worker_default curl -s http://172.17.0.1:11434/api/tags | jq '.models | length'
# Should return number of loaded models; 0 means Ollama is running but no models pulled

Compliance Report Missing Jobs

# If the audit report seems to miss recent jobs:

# 1. Check if jobs are being pruned too aggressively:
docker exec windmill_server env | grep -i 'prune\|retention\|job_retention'

# Windmill retains completed jobs based on configuration
# Default retention is typically 30-90 days depending on version
# For compliance, extend retention:
# In docker-compose.yml environment:
# JOB_RETENTION_SECS=7776000  # 90 days
# Or for Windmill EE:
# JOB_RETENTION_SECS=31536000  # 1 year

# 2. Check the v_completed_job view covers the right time range:
docker exec windmill_db psql -U windmill windmill \
  -c "SELECT MIN(created_at), MAX(created_at), COUNT(*) FROM v_completed_job;"

# 3. Ensure the audit query workspace filter matches exactly:
# 'my-workspace' vs 'my_workspace' (case sensitive, no spaces)
docker exec windmill_db psql -U windmill windmill \
  -c "SELECT DISTINCT workspace_id FROM v_completed_job;"

# 4. Export to external storage for long-term compliance retention:
# (Windmill's internal DB isn't designed for 5+ year retention)
docker exec windmill_db pg_dump \
  -U windmill \
  --table v_completed_job \
  windmill | gzip > compliance-jobs-$(date +%Y-%m).sql.gz

High Memory Usage from Multiple AI Workers

# AI workflow steps that use large models can leak memory between jobs
# Monitor per-worker memory:
docker stats --no-stream | grep worker | sort -k4 -h

# If a worker is using > 80% of its limit, the job is likely still holding model state

# Solution 1: Restart workers on a schedule (crude but effective):
# Add to crontab: 0 4 * * * docker restart worker_heavy_ai

# Solution 2: Set NODE_OPTIONS memory limit for each worker:
# In docker-compose.yml worker environment:
# NODE_OPTIONS=--max-old-space-size=4096  # Cap Node.js heap at 4GB

# Solution 3: Use a dedicated AI worker with auto-restart:
  worker_ai:
    image: ghcr.io/windmill-labs/windmill:main
    restart: unless-stopped
    environment:
      MODE: worker
      WORKER_GROUP: ai
      NUM_WORKERS: 2
    deploy:
      resources:
        limits:
          memory: 6G
    # Healthcheck to restart if memory limit approached:
    healthcheck:
      test: ["CMD", "sh", "-c", "free | awk '/Mem:/{if($3/$2 > 0.85) exit 1; exit 0}"]
      interval: 30s
      timeout: 5s
      retries: 2
      start_period: 30s

Pro Tips

Use Windmill's built-in AI assist for script writing — the Windmill editor has a built-in AI that generates script boilerplate from a natural language description. It understands Windmill's resource types and input/output patterns. Use it to generate the scaffold, then edit for your specific logic — 10x faster than starting from scratch.
Build a script testing framework before production — create a test flow that calls each of your critical scripts with known inputs and validates the output format. Run this test flow in CI after every Git push to catch regressions before they hit production workflows.
Use Windmill's schedule metadata for documentation — every scheduled flow accepts a description and tags. Fill these out: "Runs daily at 2am to sync customer data from Salesforce to the warehouse. Owned by data team. Escalation: @alice." When something breaks at 3am, the on-call doesn't need to find the original developer.
Separate workspace per team for isolation — in large organizations, create separate Windmill workspaces per team (engineering, data, ops). Teams share the same Windmill instance but have isolated script libraries, secrets, and permissions. Prevents "helpful" cross-team modifications that break each other's flows.
Monitor the job queue depth as your primary scaling signal — if the Windmill job queue consistently has depth > 0 (jobs waiting for a worker), you need more workers. The queue depth is the most actionable metric for capacity planning: it directly tells you whether your worker capacity matches your job volume.

Wrapping Up

This third Windmill guide completes the series. Together they cover the complete arc: deploying Windmill and building your first scripts and flows, Git sync, worker groups, App Builder, and enterprise workflow patterns, and this guide's AI-powered workflow steps, native integrations, compliance audit logging, and production scaling.

The AI workflow patterns are where Windmill starts to pull ahead of traditional automation platforms — not just running scripts, but orchestrating AI classification, extraction, and generation as first-class workflow steps with all the observability, retry logic, and audit trail that production automation requires. Build the foundation from the first two guides, then layer on these capabilities when your team is ready for them.

Need an Enterprise Internal Tooling Platform Built and Maintained?

Designing Windmill for multiple teams with proper Git sync, AI workflow integration, compliance audit logging, worker group architecture, and scaling to production job volumes — the sysbrix team builds and maintains internal tooling platforms that engineering organizations can rely on, not just deploy and hope for the best.

Talk to Us →

in Guides

# AI Workflows Automation Internal Tools Self-Hosted Windmill

Uptime Kuma Setup: Grafana Integration, Custom Dashboards, Alertmanager, and Enterprise Observability

Complete your Uptime Kuma monitoring stack by integrating with Grafana for unified dashboards, wiring Prometheus Alertmanager for sophisticated alert routing, building custom status page embeds, and connecting Uptime Kuma into an enterprise-grade observability platform.