Windmill Self-Host Setup: AI-Powered Workflows, Native Integrations, Audit Logging, and Scaling to Production
The first two guides in this series covered the foundational patterns: deploying Windmill and building scripts and basic workflows, then Git sync, worker groups, the App Builder, and enterprise workflow patterns. This third guide goes to production depth: AI-powered workflow steps that call LLMs for classification, extraction, and generation tasks; native Windmill integrations that connect your SaaS tools without writing boilerplate; compliance-grade audit logging for regulated environments; and the performance and scaling patterns that keep Windmill responsive when multiple teams are running jobs simultaneously.
Prerequisites
- A production Windmill instance with PostgreSQL and worker groups configured — see our advanced configuration guide
- Windmill version 1.290+ — AI workflow features covered here require recent releases
- Ollama running for local LLM access, or an OpenAI/LiteLLM API key for cloud models
- At least 4 vCPU and 8GB RAM for production use with AI workflow steps
- Admin access and at least 10 workflows already built — this guide assumes operational familiarity
Check your Windmill deployment is ready for the patterns in this guide:
# Verify Windmill version:
docker exec windmill_server windmill --version
# Check worker groups are running:
docker compose ps | grep worker
# Should show: worker_default, worker_heavy (or your named groups)
# Verify PostgreSQL connection:
docker exec windmill_server bash -c \
'psql $DATABASE_URL -c "SELECT COUNT(*) as jobs FROM job;"'
# Check Ollama is reachable from Windmill containers:
docker exec windmill_worker_default \
curl -s http://172.17.0.1:11434/api/tags | jq '[.models[].name]'
# List all workspaces:
wmill workspace list
AI-Powered Workflows: LLMs as Workflow Steps
Windmill treats LLM calls like any other script — a function that takes structured input and returns structured output. This means you can chain AI steps with database queries, API calls, and human approval gates in a single workflow. The result is AI automation that's auditable, retryable, and inspectable rather than a black box.
Structured LLM Output with Instructor
The hardest part of using LLMs in automation isn't the API call — it's getting reliable structured output. The instructor library wraps OpenAI (and OpenAI-compatible APIs) to guarantee Pydantic-validated responses:
# u/ai-team/classify_support_ticket.py
# requirements:
# openai>=1.0.0
# instructor>=0.5.0
# pydantic>=2.0.0
from pydantic import BaseModel, Field
from typing import Literal
from enum import Enum
import instructor
from openai import OpenAI
# Use LiteLLM proxy URL or direct OpenAI:
DEFAULT_BASE_URL = "http://172.17.0.1:4000/v1" # LiteLLM proxy
class Priority(str, Enum):
low = "low"
medium = "medium"
high = "high"
critical = "critical"
class SupportTicketClassification(BaseModel):
category: Literal["billing", "technical", "account", "feature_request", "other"]
priority: Priority
sentiment: Literal["positive", "neutral", "frustrated", "angry"]
requires_human: bool = Field(
description="True if this needs human attention, False if AI can handle it"
)
summary: str = Field(max_length=200, description="One sentence summary")
suggested_response: str = Field(
description="Draft response to send to the customer",
max_length=500
)
def main(
ticket_text: str,
customer_email: str,
api_key: str = "$var:OPENAI_API_KEY",
model: str = "gpt-4o-mini",
base_url: str = DEFAULT_BASE_URL
) -> dict:
"""
Classifies a support ticket and generates a draft response.
Returns structured data safe for downstream workflow steps.
"""
client = instructor.from_openai(
OpenAI(api_key=api_key, base_url=base_url)
)
result = client.chat.completions.create(
model=model,
response_model=SupportTicketClassification,
messages=[
{
"role": "system",
"content": "You are a support ticket classifier for Acme Corp. "
"Classify tickets accurately and draft helpful responses."
},
{
"role": "user",
"content": f"Customer: {customer_email}\n\nTicket:\n{ticket_text}"
}
],
max_retries=3 # instructor retries on validation failure
)
return {
"category": result.category,
"priority": result.priority.value,
"sentiment": result.sentiment,
"requires_human": result.requires_human,
"summary": result.summary,
"suggested_response": result.suggested_response,
"customer_email": customer_email
}
Complete AI Support Triage Flow
# f/support/ai_triage_flow.yaml — Complete AI-powered support triage
# This flow: receives ticket → AI classifies → routes to appropriate handler
summary: AI Support Ticket Triage
schema:
type: object
properties:
ticket_text:
type: string
description: Full text of the support ticket
customer_email:
type: string
format: email
ticket_id:
type: string
required: [ticket_text, customer_email, ticket_id]
modules:
# Step 1: AI classification with retry
- id: classify
value:
type: script
path: u/ai-team/classify_support_ticket
retry:
constant:
attempts: 3
seconds: 10
# Step 2: Branch based on AI classification
- id: route
value:
type: branchall # Run ALL matching branches
branches:
# Branch A: High priority → create urgent ticket in Jira
- summary: Critical/High Priority
skip_failure: false
modules:
- id: create_urgent_ticket
value:
type: script
path: hub/jira/create_issue # From Windmill Hub
input_transforms:
project_key: {type: static, value: "SUPPORT"}
issue_type: {type: static, value: "Bug"}
summary:
type: javascript
expr: 'results.classify.summary'
priority:
type: javascript
expr: 'results.classify.priority.toUpperCase()'
description:
type: javascript
expr: '`Customer: ${flow_input.customer_email}\n\n${flow_input.ticket_text}`'
expr: '"critical" === results.classify.priority || "high" === results.classify.priority'
# Branch B: Requires human → slack alert
- summary: Needs Human Review
modules:
- id: notify_team
value:
type: rawscript
language: python3
content: |
import requests
def main(classify_result: dict, ticket_id: str, customer_email: str) -> str:
requests.post(
"$var:SLACK_WEBHOOK",
json={
"text": f"🎫 *Support ticket needs review* (#{ticket_id})\n"
f"Customer: {customer_email}\n"
f"Category: {classify_result['category']} | "
f"Priority: {classify_result['priority']}\n"
f"Summary: {classify_result['summary']}"
}
)
return "notified"
expr: 'results.classify.requires_human === true'
# Branch C: Auto-handle → send AI draft response
- summary: Auto-respond
modules:
- id: send_response
value:
type: rawscript
language: python3
content: |
def main(classify_result: dict, customer_email: str) -> dict:
# Send the AI-drafted response via email
# send_email(to=customer_email, body=classify_result['suggested_response'])
return {"sent_to": customer_email, "auto_responded": True}
expr: 'results.classify.requires_human === false'
RAG-Powered Scripts with Local Ollama
# u/ai-team/answer_from_docs.py
# RAG script using local Ollama — no data leaves your infrastructure
# requirements:
# requests
# chromadb
# ollama
import ollama
import chromadb
import os
from pathlib import Path
# ChromaDB stores embeddings on disk — persist across Windmill job runs
DB_PATH = "/tmp/chroma_db" # Or use a shared volume for persistence
def embed_text(text: str, model: str = "nomic-embed-text") -> list[float]:
"""Generate embedding using local Ollama."""
response = ollama.embed(model=model, input=text)
return response["embeddings"][0]
def main(
question: str,
collection_name: str = "company-docs",
top_k: int = 4,
model: str = "llama3.1:8b"
) -> dict:
"""
Answer a question using RAG with local embeddings and LLM.
Requires documents to be pre-indexed in the collection.
"""
# Connect to ChromaDB:
client = chromadb.PersistentClient(path=DB_PATH)
try:
collection = client.get_collection(collection_name)
except Exception:
return {"error": f"Collection '{collection_name}' not found. Run indexing first."}
# Retrieve relevant chunks:
query_embedding = embed_text(question)
results = collection.query(
query_embeddings=[query_embedding],
n_results=top_k,
include=["documents", "metadatas", "distances"]
)
# Filter by relevance threshold (cosine distance < 0.5 = relevant):
relevant_chunks = [
(doc, meta, dist)
for doc, meta, dist in zip(
results["documents"][0],
results["metadatas"][0],
results["distances"][0]
)
if dist < 0.5
]
if not relevant_chunks:
return {
"answer": "I couldn't find relevant information to answer this question.",
"sources": [],
"question": question
}
# Build context from retrieved chunks:
context = "\n\n".join([
f"Source: {meta.get('source', 'unknown')}\n{doc}"
for doc, meta, _ in relevant_chunks
])
# Generate answer with local LLM:
response = ollama.chat(
model=model,
messages=[
{
"role": "system",
"content": "Answer the question using ONLY the provided context. "
"If the context doesn't contain the answer, say so clearly."
},
{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}"
}
]
)
return {
"answer": response["message"]["content"],
"sources": list({meta.get("source", "unknown") for _, meta, _ in relevant_chunks}),
"question": question,
"chunks_used": len(relevant_chunks)
}
Native Windmill Integrations
The Windmill Hub (hub.windmill.dev) provides pre-built, tested integration scripts for dozens of services — GitHub, Slack, Jira, Stripe, Postgres, S3, and more. These are production-tested scripts you can call from your flows without writing the API integration yourself.
Using Hub Scripts in Flows
# Reference hub scripts directly in your flow YAML:
# Format: hub/{service}/{operation}
# Examples of useful hub scripts:
# hub/github/create_issue
# hub/github/list_pull_requests
# hub/slack/send_message
# hub/jira/create_issue
# hub/jira/transition_issue
# hub/stripe/create_customer
# hub/sendgrid/send_email
# hub/postgres/run_query
# hub/s3/upload_file
# hub/notion/create_page
# Flow using hub scripts for a complete GitHub PR notification workflow:
# f/devops/notify_pr_created.yaml
modules:
# Get PR details from GitHub:
- id: get_pr
value:
type: script
path: hub/github/get_pull_request
input_transforms:
token: {type: javascript, expr: '$var:GITHUB_TOKEN'}
owner: {type: javascript, expr: 'flow_input.repo_owner'}
repo: {type: javascript, expr: 'flow_input.repo_name'}
pull_number: {type: javascript, expr: 'flow_input.pr_number'}
# Run automated review with AI:
- id: ai_review
value:
type: script
path: u/ai-team/review_pull_request
input_transforms:
title: {type: javascript, expr: 'results.get_pr.title'}
description: {type: javascript, expr: 'results.get_pr.body'}
changed_files: {type: javascript, expr: 'results.get_pr.changed_files'}
# Post AI review as GitHub comment:
- id: post_comment
value:
type: script
path: hub/github/create_issue_comment
input_transforms:
token: {type: javascript, expr: '$var:GITHUB_TOKEN'}
owner: {type: javascript, expr: 'flow_input.repo_owner'}
repo: {type: javascript, expr: 'flow_input.repo_name'}
issue_number: {type: javascript, expr: 'flow_input.pr_number'}
body: {type: javascript, expr: '"## AI Review\n\n" + results.ai_review.feedback'}
# Notify team in Slack:
- id: slack_notify
value:
type: script
path: hub/slack/send_message
input_transforms:
token: {type: javascript, expr: '$var:SLACK_BOT_TOKEN'}
channel: {type: static, value: '#code-review'}
text:
type: javascript
expr: |
`New PR ready for review: <${results.get_pr.html_url}|${results.get_pr.title}>\n` +
`Author: ${results.get_pr.user.login}\n` +
`AI Assessment: ${results.ai_review.recommendation}`
Building a Reusable Integration Library
# Create a folder of thin wrapper scripts that standardize how your team
# interacts with external services — consistent error handling, logging, retry
# f/integrations/send_notification.py
# A unified notification script that routes to the right channel
# requirements:
# requests
import requests
from typing import Literal, Optional
def main(
message: str,
title: str = "",
channel: Literal["slack", "telegram", "email", "auto"] = "auto",
severity: Literal["info", "warning", "error", "critical"] = "info",
slack_webhook: str = "$var:SLACK_OPS_WEBHOOK",
telegram_token: str = "$var:TELEGRAM_BOT_TOKEN",
telegram_chat_id: str = "$var:TELEGRAM_OPS_CHAT_ID",
recipient_email: Optional[str] = None
) -> dict:
"""
Unified notification sender. Automatically routes critical alerts
to Telegram (immediate), warnings to Slack, info to email.
"""
results = []
# Determine channels based on severity if auto:
channels = [channel] if channel != "auto" else {
"info": ["email"],
"warning": ["slack"],
"error": ["slack", "telegram"],
"critical": ["slack", "telegram"]
}.get(severity, ["slack"])
emoji = {"info": "ℹ️", "warning": "⚠️", "error": "🔴", "critical": "🚨"}[severity]
full_message = f"{emoji} {'**' + title + '**' + chr(10) if title else ''}{message}"
for ch in channels:
try:
if ch == "slack":
resp = requests.post(slack_webhook, json={"text": full_message})
resp.raise_for_status()
results.append({"channel": "slack", "status": "sent"})
elif ch == "telegram":
resp = requests.post(
f"https://api.telegram.org/bot{telegram_token}/sendMessage",
json={"chat_id": telegram_chat_id, "text": full_message, "parse_mode": "Markdown"}
)
resp.raise_for_status()
results.append({"channel": "telegram", "status": "sent"})
except Exception as e:
results.append({"channel": ch, "status": "failed", "error": str(e)})
all_sent = all(r["status"] == "sent" for r in results)
return {
"delivered": all_sent,
"channels": results,
"message_preview": full_message[:100]
}
Compliance Audit Logging
For teams in regulated industries or with internal compliance requirements, every Windmill job execution needs to be auditable: who triggered it, when, with what inputs, what the outcome was, and how long it took. Windmill stores this in PostgreSQL — the compliance work is extracting it systematically and ensuring it's retained appropriately.
Querying the Windmill Audit Trail
-- Windmill audit queries (run against the PostgreSQL database)
-- These queries cover the common compliance evidence requirements
-- 1. All job executions for a specific user in the last 30 days:
SELECT
j.id,
j.created_by,
j.script_path,
j.flow_path,
j.created_at,
j.started_at,
j.duration_ms,
j.success,
j.workspace_id
FROM v_completed_job j
WHERE
j.created_by = '[email protected]'
AND j.created_at > NOW() - INTERVAL '30 days'
ORDER BY j.created_at DESC;
-- 2. All executions of a specific script (change audit):
SELECT
j.created_by,
j.created_at,
j.duration_ms,
j.success,
j.result->'error' as error_message
FROM v_completed_job j
WHERE
j.script_path = 'u/devops/deploy_service'
AND j.created_at > NOW() - INTERVAL '90 days'
ORDER BY j.created_at DESC;
-- 3. Failed jobs with error details for incident investigation:
SELECT
j.script_path,
j.flow_path,
j.created_by,
j.created_at,
j.result->>'error' as error,
j.args as input_parameters
FROM v_completed_job j
WHERE
j.success = false
AND j.created_at > NOW() - INTERVAL '7 days'
ORDER BY j.created_at DESC
LIMIT 100;
-- 4. Script version history (what changed and when):
SELECT
s.path,
s.created_by as changed_by,
s.created_at as changed_at,
s.description,
length(s.content) as script_size_chars
FROM script s
WHERE s.workspace_id = 'my-workspace'
ORDER BY s.path, s.created_at DESC;
-- 5. Monthly usage summary per user (for cost allocation):
SELECT
created_by as user,
COUNT(*) as total_jobs,
SUM(duration_ms) / 1000.0 as total_seconds,
COUNT(*) FILTER (WHERE success = false) as failed_jobs,
COUNT(DISTINCT script_path) as distinct_scripts_used
FROM v_completed_job
WHERE created_at >= DATE_TRUNC('month', NOW())
GROUP BY created_by
ORDER BY total_jobs DESC;
Automated Compliance Report Generator
# u/compliance/generate_monthly_audit_report.py
# Generates a monthly Windmill audit report for compliance teams
# Schedule: 0 8 1 * * (1st of each month at 8am)
# requirements:
# psycopg2-binary
import psycopg2
import json
import os
from datetime import datetime, date
from dateutil.relativedelta import relativedelta
def main(
database_url: str = "$var:WINDMILL_DATABASE_URL",
workspace: str = "my-workspace",
report_month_offset: int = 1 # 1 = last month, 0 = current month
) -> dict:
"""
Generates a compliance audit report for the specified month.
Returns structured data suitable for email or storage.
"""
today = date.today()
report_start = (today - relativedelta(months=report_month_offset)).replace(day=1)
report_end = report_start + relativedelta(months=1)
period_label = report_start.strftime("%B %Y")
conn = psycopg2.connect(database_url)
cur = conn.cursor()
# Total execution summary:
cur.execute("""
SELECT
COUNT(*) as total,
SUM(CASE WHEN success THEN 1 ELSE 0 END) as successful,
SUM(CASE WHEN NOT success THEN 1 ELSE 0 END) as failed,
ROUND(AVG(duration_ms)) as avg_duration_ms,
COUNT(DISTINCT created_by) as active_users
FROM v_completed_job
WHERE workspace_id = %s
AND created_at >= %s AND created_at < %s
""", (workspace, report_start, report_end))
summary = dict(zip([d[0] for d in cur.description], cur.fetchone()))
# Top 10 most-used scripts:
cur.execute("""
SELECT script_path, COUNT(*) as runs,
SUM(CASE WHEN NOT success THEN 1 ELSE 0 END) as failures
FROM v_completed_job
WHERE workspace_id = %s AND script_path IS NOT NULL
AND created_at >= %s AND created_at < %s
GROUP BY script_path
ORDER BY runs DESC LIMIT 10
""", (workspace, report_start, report_end))
cols = [d[0] for d in cur.description]
top_scripts = [dict(zip(cols, row)) for row in cur.fetchall()]
# Scripts modified during the period (change log):
cur.execute("""
SELECT DISTINCT ON (path)
path, created_by as modified_by,
created_at as modified_at, description
FROM script
WHERE workspace_id = %s
AND created_at >= %s AND created_at < %s
ORDER BY path, created_at DESC
""", (workspace, report_start, report_end))
cols = [d[0] for d in cur.description]
script_changes = [dict(zip(cols, row)) for row in cur.fetchall()]
# Users who executed sensitive scripts (flag based on path prefix):
cur.execute("""
SELECT created_by, script_path, created_at, success
FROM v_completed_job
WHERE workspace_id = %s
AND (script_path LIKE '%/deploy%' OR script_path LIKE '%/delete%'
OR script_path LIKE '%/rotate%' OR script_path LIKE '%/admin%')
AND created_at >= %s AND created_at < %s
ORDER BY created_at DESC
""", (workspace, report_start, report_end))
cols = [d[0] for d in cur.description]
sensitive_ops = [dict(zip(cols, row)) for row in cur.fetchall()]
conn.close()
report = {
"report_type": "Windmill Monthly Audit Report",
"period": period_label,
"workspace": workspace,
"generated_at": datetime.utcnow().isoformat(),
"execution_summary": summary,
"top_scripts": top_scripts,
"script_changes": script_changes,
"sensitive_operations": sensitive_ops,
"sensitive_ops_count": len(sensitive_ops)
}
return report
Scaling Windmill for Production Workloads
Database Performance Tuning for High Job Volume
# PostgreSQL tuning for Windmill at scale (100+ jobs/hour)
# Add to postgresql.conf or docker-compose environment:
# For 4GB RAM PostgreSQL dedicated to Windmill:
# shared_buffers = 1GB
# effective_cache_size = 3GB
# work_mem = 16MB
# maintenance_work_mem = 256MB
# max_connections = 200
# wal_level = minimal # Windmill doesn't need logical replication
# Monitor Windmill's PostgreSQL usage:
docker exec windmill_db psql -U windmill windmill << 'EOF'
-- Jobs per hour over the last 24 hours:
SELECT
DATE_TRUNC('hour', created_at) as hour,
COUNT(*) as jobs,
AVG(duration_ms)::int as avg_ms
FROM v_completed_job
WHERE created_at > NOW() - INTERVAL '24 hours'
GROUP BY hour
ORDER BY hour;
-- Check for job queue depth (jobs waiting for workers):
SELECT tag, COUNT(*) as queued
FROM queue
GROUP BY tag
ORDER BY queued DESC;
-- Identify slowest scripts (p95 by script):
SELECT
script_path,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY duration_ms) as p95_ms,
COUNT(*) as runs
FROM v_completed_job
WHERE created_at > NOW() - INTERVAL '7 days'
AND script_path IS NOT NULL
GROUP BY script_path
HAVING COUNT(*) > 10
ORDER BY p95_ms DESC
LIMIT 20;
EOF
Horizontal Scaling: Running Workers on Multiple Servers
# Running Windmill workers on a separate, more powerful server
# The workers connect to the same PostgreSQL database but run on different hardware
# On the WORKER SERVER (different machine from the Windmill server):
cat > docker-compose.worker.yml << 'EOF'
version: '3.8'
services:
# Default worker on worker server:
worker_default_remote:
image: ghcr.io/windmill-labs/windmill:main
restart: unless-stopped
environment:
DATABASE_URL: postgresql://windmill:${POSTGRES_PASSWORD}@DB_SERVER_IP:5432/windmill
MODE: worker
WORKER_GROUP: default
NUM_WORKERS: 16
SLEEP_QUEUE: 50
volumes:
- /var/run/docker.sock:/var/run/docker.sock
deploy:
resources:
limits:
cpus: '8.0'
memory: 8G
# GPU worker for ML workloads (if the worker server has GPU):
worker_gpu:
image: ghcr.io/windmill-labs/windmill:main
restart: unless-stopped
environment:
DATABASE_URL: postgresql://windmill:${POSTGRES_PASSWORD}@DB_SERVER_IP:5432/windmill
MODE: worker
WORKER_GROUP: gpu
NUM_WORKERS: 2
SLEEP_QUEUE: 200
volumes:
- /var/run/docker.sock:/var/run/docker.sock
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
EOF
# Start workers on the remote server:
docker compose -f docker-compose.worker.yml up -d
# Monitor worker distribution:
wmill run u/admin/worker_health_check
# Should show workers from both servers in the worker list
# In Windmill admin UI, go to:
# Workers → should show workers from both servers
# Jobs → can filter by worker to see distribution
Tips, Gotchas, and Troubleshooting
AI Workflow Steps Failing Inconsistently
# Intermittent failures in AI steps usually have three causes:
# 1. LLM response didn't match expected schema (structured output failures)
# Fix: Use instructor library with max_retries=3
# The library automatically retries with the validation error as feedback
# 2. LLM API rate limit hit
# Symptom: RateLimitError or 429 in job logs
docker compose logs worker_default | grep -i 'rate_limit\|429\|ratelimit'
# Fix: Add retry with backoff in your flow:
# retry:
# exponential:
# attempts: 5
# multiplier: 2
# seconds: 5 # First retry: 5s, then 10s, 20s, 40s, 80s
# 3. Ollama timeout for large models
# Symptom: TimeoutError after ~30 seconds
# Fix: Increase timeout in the Ollama HTTP call:
import requests
response = requests.post(
"http://172.17.0.1:11434/api/chat",
json={"model": "llama3.1:70b", "messages": messages},
timeout=300 # 5 minutes for large models
)
# Check Ollama is loaded and responding:
docker exec worker_default curl -s http://172.17.0.1:11434/api/tags | jq '.models | length'
# Should return number of loaded models; 0 means Ollama is running but no models pulled
Compliance Report Missing Jobs
# If the audit report seems to miss recent jobs:
# 1. Check if jobs are being pruned too aggressively:
docker exec windmill_server env | grep -i 'prune\|retention\|job_retention'
# Windmill retains completed jobs based on configuration
# Default retention is typically 30-90 days depending on version
# For compliance, extend retention:
# In docker-compose.yml environment:
# JOB_RETENTION_SECS=7776000 # 90 days
# Or for Windmill EE:
# JOB_RETENTION_SECS=31536000 # 1 year
# 2. Check the v_completed_job view covers the right time range:
docker exec windmill_db psql -U windmill windmill \
-c "SELECT MIN(created_at), MAX(created_at), COUNT(*) FROM v_completed_job;"
# 3. Ensure the audit query workspace filter matches exactly:
# 'my-workspace' vs 'my_workspace' (case sensitive, no spaces)
docker exec windmill_db psql -U windmill windmill \
-c "SELECT DISTINCT workspace_id FROM v_completed_job;"
# 4. Export to external storage for long-term compliance retention:
# (Windmill's internal DB isn't designed for 5+ year retention)
docker exec windmill_db pg_dump \
-U windmill \
--table v_completed_job \
windmill | gzip > compliance-jobs-$(date +%Y-%m).sql.gz
High Memory Usage from Multiple AI Workers
# AI workflow steps that use large models can leak memory between jobs
# Monitor per-worker memory:
docker stats --no-stream | grep worker | sort -k4 -h
# If a worker is using > 80% of its limit, the job is likely still holding model state
# Solution 1: Restart workers on a schedule (crude but effective):
# Add to crontab: 0 4 * * * docker restart worker_heavy_ai
# Solution 2: Set NODE_OPTIONS memory limit for each worker:
# In docker-compose.yml worker environment:
# NODE_OPTIONS=--max-old-space-size=4096 # Cap Node.js heap at 4GB
# Solution 3: Use a dedicated AI worker with auto-restart:
worker_ai:
image: ghcr.io/windmill-labs/windmill:main
restart: unless-stopped
environment:
MODE: worker
WORKER_GROUP: ai
NUM_WORKERS: 2
deploy:
resources:
limits:
memory: 6G
# Healthcheck to restart if memory limit approached:
healthcheck:
test: ["CMD", "sh", "-c", "free | awk '/Mem:/{if($3/$2 > 0.85) exit 1; exit 0}"]
interval: 30s
timeout: 5s
retries: 2
start_period: 30s
Pro Tips
- Use Windmill's built-in AI assist for script writing — the Windmill editor has a built-in AI that generates script boilerplate from a natural language description. It understands Windmill's resource types and input/output patterns. Use it to generate the scaffold, then edit for your specific logic — 10x faster than starting from scratch.
- Build a script testing framework before production — create a test flow that calls each of your critical scripts with known inputs and validates the output format. Run this test flow in CI after every Git push to catch regressions before they hit production workflows.
- Use Windmill's schedule metadata for documentation — every scheduled flow accepts a description and tags. Fill these out: "Runs daily at 2am to sync customer data from Salesforce to the warehouse. Owned by data team. Escalation: @alice." When something breaks at 3am, the on-call doesn't need to find the original developer.
- Separate workspace per team for isolation — in large organizations, create separate Windmill workspaces per team (engineering, data, ops). Teams share the same Windmill instance but have isolated script libraries, secrets, and permissions. Prevents "helpful" cross-team modifications that break each other's flows.
- Monitor the job queue depth as your primary scaling signal — if the Windmill job queue consistently has depth > 0 (jobs waiting for a worker), you need more workers. The queue depth is the most actionable metric for capacity planning: it directly tells you whether your worker capacity matches your job volume.
Wrapping Up
This third Windmill guide completes the series. Together they cover the complete arc: deploying Windmill and building your first scripts and flows, Git sync, worker groups, App Builder, and enterprise workflow patterns, and this guide's AI-powered workflow steps, native integrations, compliance audit logging, and production scaling.
The AI workflow patterns are where Windmill starts to pull ahead of traditional automation platforms — not just running scripts, but orchestrating AI classification, extraction, and generation as first-class workflow steps with all the observability, retry logic, and audit trail that production automation requires. Build the foundation from the first two guides, then layer on these capabilities when your team is ready for them.
Need an Enterprise Internal Tooling Platform Built and Maintained?
Designing Windmill for multiple teams with proper Git sync, AI workflow integration, compliance audit logging, worker group architecture, and scaling to production job volumes — the sysbrix team builds and maintains internal tooling platforms that engineering organizations can rely on, not just deploy and hope for the best.
Talk to Us →