Headscale self-hosted VPN: Intro and real-world use case
If your team needs secure remote access to private services, a Headscale self-hosted VPN gives you a practical middle ground between consumer VPN subscriptions and heavyweight enterprise overlays. Engineering, DevOps, and support teams need private access to dashboards, staging APIs, SSH bastions, and incident tooling from untrusted networks. A WireGuard-based control plane like Headscale offers modern identity-aware access without expensive appliances or brittle hub-and-spoke constraints.
This guide deploys Headscale in production with Docker Compose deployment, a hardened Caddy reverse proxy, and OIDC authentication so identity policies stay centralized. We also implement split DNS, onboarding keys, backups, and safe upgrades. The goal is first-pass success in a real environment, not a demo-only quickstart.
Architecture and flow overview
The stack has three core services: Headscale API, PostgreSQL state backend, and Caddy edge proxy. End-user devices run Tailscale clients and enroll into your private tailnet using SSO. This preserves enterprise-grade controls while keeping operations lean.
- Headscale API: control plane for nodes, keys, routes, and policy.
- PostgreSQL: durable state and transaction-safe updates.
- Caddy: TLS termination, HTTP security headers, and structured logs.
- OIDC provider: MFA, lifecycle management, and group-based authorization.
- Clients: Linux/macOS/Windows nodes participating in WireGuard mesh networking.
Control-plane traffic is HTTPS to your domain. Data-plane traffic is encrypted WireGuard peer-to-peer when possible. This separation keeps your API surface manageable while maximizing performance.
Prerequisites
- Ubuntu 22.04/24.04 VM (2 vCPU, 4 GB RAM minimum for small teams).
- DNS A record for
vpn.example.comto your server. - OIDC app with client ID/secret and group claims.
- Inbound ports 80/443 and outbound HTTPS to your identity provider.
- Owner for patching, backups, and access governance.
Step-by-step deployment with complete commands
1) Prepare host and Docker runtime
sudo apt update && sudo apt -y upgrade
sudo apt -y install curl ca-certificates gnupg ufw jq git
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker
docker --version
docker compose versionSelect and copy manually
Reconnect your shell after docker group assignment. Validate baseline host hardening before production enrollment.
2) Create project layout and secure environment file
mkdir -p ~/headscale/{config,data,caddy}
cd ~/headscale
cat > .env <<'EOF'
DOMAIN=vpn.example.com
OIDC_ISSUER=https://auth.example.com/realms/main
OIDC_CLIENT_ID=headscale
OIDC_CLIENT_SECRET=replace_me
HEADSCALE_DNS_BASE=corp.example.internal
POSTGRES_PASSWORD=replace_with_32_char_secret
HEADSCALE_NOISE_PRIVATE_KEY=
EOF
chmod 600 .envSelect and copy manually
The .env file contains sensitive values. Keep permissions strict and use a secrets manager as the source of truth for rotation workflows.
3) Define production compose stack
version: "3.9"
services:
postgres:
image: postgres:16-alpine
container_name: headscale-postgres
restart: unless-stopped
environment:
POSTGRES_DB: headscale
POSTGRES_USER: headscale
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
volumes:
- ./data/postgres:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U headscale -d headscale"]
interval: 10s
timeout: 5s
retries: 10
headscale:
image: headscale/headscale:stable
container_name: headscale
restart: unless-stopped
env_file: .env
command: serve
depends_on:
postgres:
condition: service_healthy
volumes:
- ./config:/etc/headscale
- ./data/headscale:/var/lib/headscale
expose:
- "8080"
caddy:
image: caddy:2
container_name: headscale-caddy
restart: unless-stopped
ports: ["80:80","443:443"]
volumes:
- ./caddy/Caddyfile:/etc/caddy/Caddyfile:ro
- ./data/caddy:/data
- ./data/caddy_config:/config
depends_on: [headscale]Select and copy manually
PostgreSQL and Caddy run as peer services. For regulated environments, move PostgreSQL to a dedicated managed service and keep only app + proxy on this host.
4) Configure Headscale core settings
server_url: https://vpn.example.com
listen_addr: 0.0.0.0:8080
metrics_listen_addr: 0.0.0.0:9090
database:
type: postgres
postgres:
host: postgres
port: 5432
name: headscale
user: headscale
pass: ${POSTGRES_PASSWORD}
prefixes:
v4: 100.64.0.0/10
dns:
magic_dns: true
base_domain: corp.example.internal
nameservers:
global: [1.1.1.1, 9.9.9.9]
oidc:
only_start_if_oidc_is_available: true
issuer: ${OIDC_ISSUER}
client_id: ${OIDC_CLIENT_ID}
client_secret: ${OIDC_CLIENT_SECRET}
scope: ["openid","profile","email","groups"]
allowed_groups: ["net-admins"]Select and copy manually
Magic DNS improves operator usability dramatically. OIDC group allow-listing keeps enrollment locked to approved personas.
5) Configure Caddy edge and TLS security defaults
cat > caddy/Caddyfile <<'EOF'
vpn.example.com {
encode zstd gzip
@api path / /admin* /api* /oidc* /register*
reverse_proxy @api headscale:8080
header {
Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
X-Content-Type-Options "nosniff"
X-Frame-Options "DENY"
Referrer-Policy "strict-origin-when-cross-origin"
}
log {
output file /data/access.log
format json
}
}
EOFSelect and copy manually
Caddy reduces certificate-ops burden and ships useful logging out of the box. Route logs to your SIEM for retention and alerting.
6) Bootstrap and launch services
NOISE_KEY=$(docker run --rm headscale/headscale:stable headscale generate private-key)
sed -i "s|^HEADSCALE_NOISE_PRIVATE_KEY=.*|HEADSCALE_NOISE_PRIVATE_KEY=${NOISE_KEY}|" .env
docker compose pull
docker compose up -d
docker compose ps
docker compose logs --tail=100 headscaleSelect and copy manually
Persist noise keys and environment data across restarts. Accidental key rotation can force unnecessary client re-authentication events.
7) Create namespace and issue onboarding keys
docker compose exec headscale headscale namespaces create platform
docker compose exec headscale headscale preauthkeys create --namespace platform --expiration 24h --reusable=false
docker compose exec headscale headscale nodes listSelect and copy manually
Short-lived preauth keys limit blast radius. Isolate contractors in dedicated namespaces with strict ACLs.
Configuration and secret-handling best practices
For self-hosted remote access, operational controls matter as much as deployment mechanics:
- Secrets governance: keep OIDC and DB secrets in vault-backed storage.
- Least privilege scopes: request only claims required for group mapping and audit.
- Network isolation: expose only Caddy externally; keep postgres/headscale internal.
- Key TTL policy: onboarding keys should be time-bound and non-reusable.
- Policy as code: version ACL files and require peer review before rollout.
- Incident readiness: document emergency revocation playbooks for compromised users/devices.
At scale, map business units to namespaces and enforce route advertisement policy centrally. This avoids accidental east-west access between unrelated teams and simplifies forensic investigations during security reviews.
When implementing OIDC authentication, validate claim stability with your identity team. Group renames, nested group behavior, and delayed sync windows are frequent failure modes. Build a small runbook for identity incidents: denied access after HR changes, stale token claims, and emergency account disable procedures.
Access policy model and ACL example
Use identity groups as the primary boundary and destination tags as your resource boundary. This creates auditable, low-friction policy changes and enables progressive access grants during migrations.
{
"groups": {
"group:net-admins": ["[email protected]", "[email protected]"],
"group:developers": ["[email protected]", "[email protected]"]
},
"acls": [
{"action": "accept", "src": ["group:net-admins"], "dst": ["*:*"]},
{"action": "accept", "src": ["group:developers"], "dst": ["tag:staging:*"]}
]
}Select and copy manually
Start permissive in staging to validate paths, then progressively tighten rules in production. Keep an emergency break-glass group limited to two responders and monitor every use with after-action review.
Verification checklist (commands + expected output)
curl -I https://vpn.example.com
curl -s https://vpn.example.com/metrics | head -n 5
echo | openssl s_client -connect vpn.example.com:443 -servername vpn.example.com 2>/dev/null | openssl x509 -noout -dates -issuer
docker compose exec headscale headscale nodes list | grep -E "online|platform"Select and copy manually
- Expected: endpoint returns
HTTP/2 200and valid certificate dates. - Expected: metrics endpoint emits Prometheus metrics lines.
- Expected: registered node appears online in target namespace.
- Expected: unauthorized OIDC users are rejected.
Common issues and fixes
OIDC login loops forever
Usually redirect URI mismatch or incorrect issuer URL. Confirm callback value exactly and verify issuer metadata endpoint availability from server runtime.
Nodes enroll but cannot reach services
Usually ACL mismatch, blocked UDP, or wrong advertised routes. Validate ACL paths, route table, and local firewall state on both peers.
Certificate renewals fail
Often caused by DNS drift or blocked port 80 challenge path. Confirm DNS records and inspect Caddy ACME logs.
Intermittent 502 from edge
Typically headscale process restarts due to DB config mismatch. Inspect container logs and postgres readiness healthcheck output.
Upgrade causes auth instability
Can happen if noise key or OIDC claims changed unexpectedly. Stage upgrades, preserve keys, and maintain image rollback tags.
Operational SLOs, backup, and recovery
Define baseline SLOs before broad adoption: API availability, auth success rate, and enrollment latency. Attach alerting thresholds and an explicit on-call owner. Teams often skip this and discover reliability gaps only during outages. For capacity, monitor node growth, namespace count, and DB size monthly to avoid surprise exhaustion.
Also track certificate expiration lead time and identity-provider error rates. Many production incidents are caused by upstream IdP outages rather than the VPN control plane itself. Add synthetic login checks and tie alerts to a clear escalation matrix.
cat > /usr/local/bin/backup-headscale.sh <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
STAMP=$(date +%F-%H%M)
ROOT=/opt/headscale
OUT=/var/backups/headscale
mkdir -p "$OUT"
docker compose -f "$ROOT/docker-compose.yml" exec -T postgres pg_dump -U headscale headscale | gzip > "$OUT/headscale-db-$STAMP.sql.gz"
tar -C "$ROOT" -czf "$OUT/headscale-config-$STAMP.tgz" config .env caddy
find "$OUT" -type f -mtime +14 -delete
EOF
chmod +x /usr/local/bin/backup-headscale.sh
echo "15 2 * * * root /usr/local/bin/backup-headscale.sh" | sudo tee /etc/cron.d/headscale-backupSelect and copy manually
Do disaster recovery drills quarterly: restore backup to a clean host, re-point DNS to standby, and validate representative node reconnection from each namespace.
docker compose pull
docker compose up -d --no-deps postgres
until docker compose exec -T postgres pg_isready -U headscale -d headscale; do sleep 2; done
docker compose up -d headscale caddy
docker compose logs --since=5m headscale caddySelect and copy manually
Compliance, audit evidence, and change control
For security and compliance teams, your implementation should generate evidence automatically rather than relying on manual screenshots. Record change tickets for every ACL update, log who approved each policy change, and retain onboarding/offboarding events tied to identity records. Keep a monthly export of active nodes, namespaces, route advertisements, and key issuance timestamps. This evidence simplifies ISO 27001, SOC 2, and internal risk reviews.
Integrate logs from Caddy, Headscale, and your identity provider into one timeline view so investigators can correlate authentication events, policy changes, and connection anomalies quickly. Define a formal change window for upgrades, require pre-change backup verification, and keep a tested rollback path documented in the runbook. These practices reduce MTTR and make platform ownership sustainable as the environment grows.
FAQ
1) Can Headscale fully replace managed VPN services?
For many organizations yes, especially where control-plane ownership and compliance evidence are mandatory. You must still own lifecycle operations and observability.
2) Do we need Kubernetes?
No. A disciplined Docker Compose deployment is sufficient for many teams. Move to Kubernetes when standardization or multi-zone requirements justify complexity.
3) Which IdP works best?
Any standards-compliant OIDC provider works. Choose the one your identity/security team already governs to simplify audits and policy reuse.
4) Is Magic DNS required?
Not required, but strongly recommended. It improves day-to-day usability and lowers operational mistakes tied to raw IP addressing.
5) How often should we rotate preauth keys?
Issue per-user/per-device short-lived keys and rotate continuously. Avoid reusable keys unless tightly controlled automation requires it.
6) What backup frequency is safe?
Daily is a baseline. High-churn environments may need twice-daily dumps and stricter retention for compliance windows.
7) How do we onboard contractors safely?
Create a dedicated namespace, enforce restrictive ACL tags, set hard expiration windows, and remove IdP group membership immediately at offboarding.
Suggested internal guides
- Deploy Uptime Kuma with Docker Compose and Caddy on Ubuntu (Production Guide)
- Uptime Kuma Setup: Grafana Integration, Custom Dashboards, Alertmanager, and Enterprise Observability
- Keycloak Docker Setup Guide: User Federation, Custom Themes, Fine-Grained Authorization, and High Availability
Talk to us
If you want support implementing Headscale self-hosted VPN across staging and production, we can help with architecture review, migration sequencing, ACL design, and operational hardening. Share your topology, identity provider, and compliance constraints, and we will propose a phased rollout with measurable risk reduction.
For teams with strict audit obligations, we can also design evidence collection for access approvals, key rotations, and backup validation runs.