Headscale Self-Hosted VPN: Production Docker Compose + Caddy + OIDC Deployment Guide

Headscale self-hosted VPN: Intro and real-world use case

If your team needs secure remote access to private services, a Headscale self-hosted VPN gives you a practical middle ground between consumer VPN subscriptions and heavyweight enterprise overlays. Engineering, DevOps, and support teams need private access to dashboards, staging APIs, SSH bastions, and incident tooling from untrusted networks. A WireGuard-based control plane like Headscale offers modern identity-aware access without expensive appliances or brittle hub-and-spoke constraints.

This guide deploys Headscale in production with Docker Compose deployment, a hardened Caddy reverse proxy, and OIDC authentication so identity policies stay centralized. We also implement split DNS, onboarding keys, backups, and safe upgrades. The goal is first-pass success in a real environment, not a demo-only quickstart.

Architecture and flow overview

The stack has three core services: Headscale API, PostgreSQL state backend, and Caddy edge proxy. End-user devices run Tailscale clients and enroll into your private tailnet using SSO. This preserves enterprise-grade controls while keeping operations lean.

Headscale API: control plane for nodes, keys, routes, and policy.
PostgreSQL: durable state and transaction-safe updates.
Caddy: TLS termination, HTTP security headers, and structured logs.
OIDC provider: MFA, lifecycle management, and group-based authorization.
Clients: Linux/macOS/Windows nodes participating in WireGuard mesh networking.

Control-plane traffic is HTTPS to your domain. Data-plane traffic is encrypted WireGuard peer-to-peer when possible. This separation keeps your API surface manageable while maximizing performance.

Prerequisites

Ubuntu 22.04/24.04 VM (2 vCPU, 4 GB RAM minimum for small teams).
DNS A record for vpn.example.com to your server.
OIDC app with client ID/secret and group claims.
Inbound ports 80/443 and outbound HTTPS to your identity provider.
Owner for patching, backups, and access governance.

Step-by-step deployment with complete commands

1) Prepare host and Docker runtime

sudo apt update && sudo apt -y upgrade
sudo apt -y install curl ca-certificates gnupg ufw jq git
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker
docker --version
docker compose version

Select and copy manually

Reconnect your shell after docker group assignment. Validate baseline host hardening before production enrollment.

2) Create project layout and secure environment file

mkdir -p ~/headscale/{config,data,caddy}
cd ~/headscale
cat > .env <<'EOF'
DOMAIN=vpn.example.com
OIDC_ISSUER=https://auth.example.com/realms/main
OIDC_CLIENT_ID=headscale
OIDC_CLIENT_SECRET=replace_me
HEADSCALE_DNS_BASE=corp.example.internal
POSTGRES_PASSWORD=replace_with_32_char_secret
HEADSCALE_NOISE_PRIVATE_KEY=
EOF
chmod 600 .env

Select and copy manually

The .env file contains sensitive values. Keep permissions strict and use a secrets manager as the source of truth for rotation workflows.

3) Define production compose stack

version: "3.9"
services:
  postgres:
    image: postgres:16-alpine
    container_name: headscale-postgres
    restart: unless-stopped
    environment:
      POSTGRES_DB: headscale
      POSTGRES_USER: headscale
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    volumes:
      - ./data/postgres:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U headscale -d headscale"]
      interval: 10s
      timeout: 5s
      retries: 10
  headscale:
    image: headscale/headscale:stable
    container_name: headscale
    restart: unless-stopped
    env_file: .env
    command: serve
    depends_on:
      postgres:
        condition: service_healthy
    volumes:
      - ./config:/etc/headscale
      - ./data/headscale:/var/lib/headscale
    expose:
      - "8080"
  caddy:
    image: caddy:2
    container_name: headscale-caddy
    restart: unless-stopped
    ports: ["80:80","443:443"]
    volumes:
      - ./caddy/Caddyfile:/etc/caddy/Caddyfile:ro
      - ./data/caddy:/data
      - ./data/caddy_config:/config
    depends_on: [headscale]

Select and copy manually

PostgreSQL and Caddy run as peer services. For regulated environments, move PostgreSQL to a dedicated managed service and keep only app + proxy on this host.

4) Configure Headscale core settings

server_url: https://vpn.example.com
listen_addr: 0.0.0.0:8080
metrics_listen_addr: 0.0.0.0:9090
database:
  type: postgres
  postgres:
    host: postgres
    port: 5432
    name: headscale
    user: headscale
    pass: ${POSTGRES_PASSWORD}
prefixes:
  v4: 100.64.0.0/10
dns:
  magic_dns: true
  base_domain: corp.example.internal
  nameservers:
    global: [1.1.1.1, 9.9.9.9]
oidc:
  only_start_if_oidc_is_available: true
  issuer: ${OIDC_ISSUER}
  client_id: ${OIDC_CLIENT_ID}
  client_secret: ${OIDC_CLIENT_SECRET}
  scope: ["openid","profile","email","groups"]
  allowed_groups: ["net-admins"]

Select and copy manually

Magic DNS improves operator usability dramatically. OIDC group allow-listing keeps enrollment locked to approved personas.

5) Configure Caddy edge and TLS security defaults

cat > caddy/Caddyfile <<'EOF'
vpn.example.com {
  encode zstd gzip
  @api path / /admin* /api* /oidc* /register*
  reverse_proxy @api headscale:8080
  header {
    Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
    X-Content-Type-Options "nosniff"
    X-Frame-Options "DENY"
    Referrer-Policy "strict-origin-when-cross-origin"
  }
  log {
    output file /data/access.log
    format json
  }
}
EOF

Select and copy manually

Caddy reduces certificate-ops burden and ships useful logging out of the box. Route logs to your SIEM for retention and alerting.

6) Bootstrap and launch services

NOISE_KEY=$(docker run --rm headscale/headscale:stable headscale generate private-key)
sed -i "s|^HEADSCALE_NOISE_PRIVATE_KEY=.*|HEADSCALE_NOISE_PRIVATE_KEY=${NOISE_KEY}|" .env
docker compose pull
docker compose up -d
docker compose ps
docker compose logs --tail=100 headscale

Select and copy manually

Persist noise keys and environment data across restarts. Accidental key rotation can force unnecessary client re-authentication events.

7) Create namespace and issue onboarding keys

docker compose exec headscale headscale namespaces create platform
docker compose exec headscale headscale preauthkeys create --namespace platform --expiration 24h --reusable=false
docker compose exec headscale headscale nodes list

Select and copy manually

Short-lived preauth keys limit blast radius. Isolate contractors in dedicated namespaces with strict ACLs.

Configuration and secret-handling best practices

For self-hosted remote access, operational controls matter as much as deployment mechanics:

Secrets governance: keep OIDC and DB secrets in vault-backed storage.
Least privilege scopes: request only claims required for group mapping and audit.
Network isolation: expose only Caddy externally; keep postgres/headscale internal.
Key TTL policy: onboarding keys should be time-bound and non-reusable.
Policy as code: version ACL files and require peer review before rollout.
Incident readiness: document emergency revocation playbooks for compromised users/devices.

At scale, map business units to namespaces and enforce route advertisement policy centrally. This avoids accidental east-west access between unrelated teams and simplifies forensic investigations during security reviews.

When implementing OIDC authentication, validate claim stability with your identity team. Group renames, nested group behavior, and delayed sync windows are frequent failure modes. Build a small runbook for identity incidents: denied access after HR changes, stale token claims, and emergency account disable procedures.

Access policy model and ACL example

Use identity groups as the primary boundary and destination tags as your resource boundary. This creates auditable, low-friction policy changes and enables progressive access grants during migrations.

{
  "groups": {
    "group:net-admins": ["[email protected]", "[email protected]"],
    "group:developers": ["[email protected]", "[email protected]"]
  },
  "acls": [
    {"action": "accept", "src": ["group:net-admins"], "dst": ["*:*"]},
    {"action": "accept", "src": ["group:developers"], "dst": ["tag:staging:*"]}
  ]
}

Select and copy manually

Start permissive in staging to validate paths, then progressively tighten rules in production. Keep an emergency break-glass group limited to two responders and monitor every use with after-action review.

Verification checklist (commands + expected output)

curl -I https://vpn.example.com
curl -s https://vpn.example.com/metrics | head -n 5
echo | openssl s_client -connect vpn.example.com:443 -servername vpn.example.com 2>/dev/null | openssl x509 -noout -dates -issuer
docker compose exec headscale headscale nodes list | grep -E "online|platform"

Select and copy manually

Expected: endpoint returns HTTP/2 200 and valid certificate dates.
Expected: metrics endpoint emits Prometheus metrics lines.
Expected: registered node appears online in target namespace.
Expected: unauthorized OIDC users are rejected.

Common issues and fixes

OIDC login loops forever

Usually redirect URI mismatch or incorrect issuer URL. Confirm callback value exactly and verify issuer metadata endpoint availability from server runtime.

Nodes enroll but cannot reach services

Usually ACL mismatch, blocked UDP, or wrong advertised routes. Validate ACL paths, route table, and local firewall state on both peers.

Certificate renewals fail

Often caused by DNS drift or blocked port 80 challenge path. Confirm DNS records and inspect Caddy ACME logs.

Intermittent 502 from edge

Typically headscale process restarts due to DB config mismatch. Inspect container logs and postgres readiness healthcheck output.

Upgrade causes auth instability

Can happen if noise key or OIDC claims changed unexpectedly. Stage upgrades, preserve keys, and maintain image rollback tags.

Operational SLOs, backup, and recovery

Define baseline SLOs before broad adoption: API availability, auth success rate, and enrollment latency. Attach alerting thresholds and an explicit on-call owner. Teams often skip this and discover reliability gaps only during outages. For capacity, monitor node growth, namespace count, and DB size monthly to avoid surprise exhaustion.

Also track certificate expiration lead time and identity-provider error rates. Many production incidents are caused by upstream IdP outages rather than the VPN control plane itself. Add synthetic login checks and tie alerts to a clear escalation matrix.

cat > /usr/local/bin/backup-headscale.sh <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
STAMP=$(date +%F-%H%M)
ROOT=/opt/headscale
OUT=/var/backups/headscale
mkdir -p "$OUT"
docker compose -f "$ROOT/docker-compose.yml" exec -T postgres pg_dump -U headscale headscale | gzip > "$OUT/headscale-db-$STAMP.sql.gz"
tar -C "$ROOT" -czf "$OUT/headscale-config-$STAMP.tgz" config .env caddy
find "$OUT" -type f -mtime +14 -delete
EOF
chmod +x /usr/local/bin/backup-headscale.sh
echo "15 2 * * * root /usr/local/bin/backup-headscale.sh" | sudo tee /etc/cron.d/headscale-backup

Select and copy manually

Do disaster recovery drills quarterly: restore backup to a clean host, re-point DNS to standby, and validate representative node reconnection from each namespace.

docker compose pull
docker compose up -d --no-deps postgres
until docker compose exec -T postgres pg_isready -U headscale -d headscale; do sleep 2; done
docker compose up -d headscale caddy
docker compose logs --since=5m headscale caddy

Select and copy manually

Compliance, audit evidence, and change control

For security and compliance teams, your implementation should generate evidence automatically rather than relying on manual screenshots. Record change tickets for every ACL update, log who approved each policy change, and retain onboarding/offboarding events tied to identity records. Keep a monthly export of active nodes, namespaces, route advertisements, and key issuance timestamps. This evidence simplifies ISO 27001, SOC 2, and internal risk reviews.

Integrate logs from Caddy, Headscale, and your identity provider into one timeline view so investigators can correlate authentication events, policy changes, and connection anomalies quickly. Define a formal change window for upgrades, require pre-change backup verification, and keep a tested rollback path documented in the runbook. These practices reduce MTTR and make platform ownership sustainable as the environment grows.

FAQ

1) Can Headscale fully replace managed VPN services?

For many organizations yes, especially where control-plane ownership and compliance evidence are mandatory. You must still own lifecycle operations and observability.

2) Do we need Kubernetes?

No. A disciplined Docker Compose deployment is sufficient for many teams. Move to Kubernetes when standardization or multi-zone requirements justify complexity.

3) Which IdP works best?

Any standards-compliant OIDC provider works. Choose the one your identity/security team already governs to simplify audits and policy reuse.

4) Is Magic DNS required?

Not required, but strongly recommended. It improves day-to-day usability and lowers operational mistakes tied to raw IP addressing.

5) How often should we rotate preauth keys?

Issue per-user/per-device short-lived keys and rotate continuously. Avoid reusable keys unless tightly controlled automation requires it.

6) What backup frequency is safe?

Daily is a baseline. High-churn environments may need twice-daily dumps and stricter retention for compliance windows.

7) How do we onboard contractors safely?

Create a dedicated namespace, enforce restrictive ACL tags, set hard expiration windows, and remove IdP group membership immediately at offboarding.

Suggested internal guides

Talk to us

If you want support implementing Headscale self-hosted VPN across staging and production, we can help with architecture review, migration sequencing, ACL design, and operational hardening. Share your topology, identity provider, and compliance constraints, and we will propose a phased rollout with measurable risk reduction.

For teams with strict audit obligations, we can also design evidence collection for access approvals, key rotations, and backup validation runs.

in Guides

# Caddy Docker Compose Headscale OIDC Self-Hosting VPN WireGuard

Deploy Gitea with Docker Compose and Caddy on Ubuntu (Production Guide)

Hands-on production deployment with security, verification, and operations checklist.