Teams that rely on ad-hoc VPNs eventually hit the same operational wall: hard-to-revoke access, weak visibility over active peers, and onboarding that breaks whenever one engineer leaves or rotates credentials. A self-hosted mesh control plane gives you predictable governance, but only if it is deployed with production guardrails from day one.
This guide shows how to deploy NetBird with Docker Compose and Nginx on Ubuntu for a practical, production-ready setup. The architecture is designed for internal platforms that need secure developer access to private services, CI environments, and operational tooling without exposing these systems directly to the public internet.
Real-world use case: a growing engineering team runs workloads across cloud VMs and on-prem nodes. They need role-aware peer onboarding, consistent control-plane TLS, and clear incident playbooks when connectivity degrades. By centralizing policy and lifecycle management in NetBird, they can reduce VPN sprawl and improve auditability.
Architecture and flow overview
- NetBird Management handles peer identity, setup keys, and control-plane state.
- NetBird Signal coordinates connection signaling required for peer establishment.
- NetBird Dashboard provides admin visibility into users, peers, and posture.
- Nginx terminates TLS and routes API, gRPC signal, and dashboard paths.
- Client peers use setup keys and management URL to enroll and form WireGuard-based encrypted paths.
In production, reliability depends on four foundations: DNS correctness, valid TLS certificates, stable persistent volumes, and clean secret rotation policy. If any one of these drifts, onboarding and control-plane sync will fail in ways that look random to users.
Prerequisites
- Ubuntu 22.04+ host with public reachability and DNS control.
- A domain like
netbird.example.commapped to the target server. - Ports 80 and 443 open for ACME + HTTPS traffic.
- Docker Engine and Docker Compose plugin installed.
- A policy owner for setup-key issuance and peer approval decisions.
Before deploying, define naming conventions for teams and environments (for example, platform-prod, platform-staging) and decide who has authority to create or revoke setup keys. This governance step prevents most long-term drift and permission confusion.
sudo apt update && sudo apt -y upgrade
sudo apt -y install ca-certificates curl gnupg lsb-release jq ufw openssl
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list >/dev/null
sudo apt update
sudo apt -y install docker-ce docker-ce-cli containerd.io docker-compose-plugin
sudo usermod -aG docker $USERIf the copy button does not work in your browser, manually copy from the code block above.
mkdir -p ~/netbird-prod/{management,signal,dashboard,nginx,secrets,letsencrypt,backups}
cd ~/netbird-prod
openssl rand -base64 48 > secrets/netbird_jwt_secret.txt
openssl rand -base64 48 > secrets/netbird_encryption_key.txt
chmod 600 secrets/*.txtIf the copy button does not work in your browser, manually copy from the code block above.
Step-by-step deployment
1) Create the core Compose stack
Run management, signal, and dashboard as separate services so logs and failure domains stay understandable during incidents and upgrades.
version: "3.9"
services:
netbird-management:
image: netbirdio/management:latest
container_name: netbird-management
restart: unless-stopped
volumes:
- ./management:/var/lib/netbird
- ./secrets:/run/secrets:ro
expose:
- "33073"
netbird-signal:
image: netbirdio/signal:latest
container_name: netbird-signal
restart: unless-stopped
expose:
- "10000"
netbird-dashboard:
image: netbirdio/dashboard:latest
container_name: netbird-dashboard
restart: unless-stopped
expose:
- "80"
nginx:
image: nginx:stable
container_name: netbird-nginx
restart: unless-stopped
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/conf.d:/etc/nginx/conf.d:ro
- ./letsencrypt:/etc/letsencrypt
depends_on:
- netbird-management
- netbird-signal
- netbird-dashboardIf the copy button does not work in your browser, manually copy from the code block above.
2) Configure Nginx as the ingress and TLS boundary
Keep path-level routing explicit for API and signal traffic. This avoids ambiguous reverse-proxy behavior and helps during troubleshooting.
server {
listen 80;
server_name netbird.example.com;
location /.well-known/acme-challenge/ { root /var/www/certbot; }
location / { return 301 https://$host$request_uri; }
}
server {
listen 443 ssl http2;
server_name netbird.example.com;
ssl_certificate /etc/letsencrypt/live/netbird.example.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/netbird.example.com/privkey.pem;
location /api/ {
proxy_pass http://netbird-management:33073;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-Proto https;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
location /signalexchange.SignalExchange/ {
grpc_pass grpc://netbird-signal:10000;
grpc_set_header X-Forwarded-Proto https;
}
location / {
proxy_pass http://netbird-dashboard:80;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-Proto https;
}
}If the copy button does not work in your browser, manually copy from the code block above.
3) Add management configuration and secure runtime secrets
Do not hard-code secrets in repository-tracked files. Generate once, restrict file permissions, and document rotation procedure.
# example management config fragment
{
"Stuns": [{"Proto":"udp","URI":"stun:stun.l.google.com:19302"}],
"HttpConfig": {"Address": "0.0.0.0:33073"},
"DataStoreEncryptionKey": "REPLACE_WITH_SECRET_FILE_VALUE",
"JWTSecret": "REPLACE_WITH_SECRET_FILE_VALUE",
"DeviceAuthorizationFlow": {"Provider": "hosted"}
}If the copy button does not work in your browser, manually copy from the code block above.
4) Bring up the stack and validate service health
Confirm container health and clean startup logs before allowing endpoint enrollment. Early validation here saves significant incident time later.
# Initial bring-up
cd ~/netbird-prod
docker compose up -d
# Validate all containers are healthy
docker compose ps
docker compose logs --tail=100 netbird-management
docker compose logs --tail=100 netbird-signal
docker compose logs --tail=100 netbird-dashboard
docker compose logs --tail=100 nginxIf the copy button does not work in your browser, manually copy from the code block above.
5) Create setup keys and enroll first peers
Start with canary clients from two different networks (for example, office + home ISP) to surface NAT and routing issues early.
# Issue first setup key from management API or admin UI
# Example with CLI once admin context is available:
netbird up --management-url https://netbird.example.com --setup-key <SETUP_KEY>
# On server side, verify peer registration
docker compose logs --tail=150 netbird-management | grep -i "peer\|register\|setup keyIf the copy button does not work in your browser, manually copy from the code block above.
Configuration and secret-handling best practices
Self-hosted network control planes are high-impact systems: compromise here can expose broad infrastructure access. Treat key material and admin workflows as security-critical assets.
- Store long-lived secrets outside source control and restrict on-host permissions.
- Issue setup keys with expiration and scope rules; avoid indefinite global keys.
- Separate production and non-production control planes to reduce blast radius.
- Publish an explicit revocation process for compromised peers or leaked keys.
- Track who can create keys, approve devices, and change policy defaults.
For larger teams, move secret values into a proper secret manager and inject at runtime. Even if your initial deployment is small, this pattern makes future compliance and audit requirements far easier to satisfy.
Operationally, document who owns DNS, TLS renewal monitoring, onboarding approvals, and upgrade windows. Clear ownership prevents silent failure and long incident MTTR in distributed teams.
Production operations playbook
After go-live, long-term success comes from disciplined routine operations, not the initial install. Build a lightweight cadence that your team can actually maintain:
- Daily: review failed enrollments and unusual setup-key activity.
- Weekly: verify backup integrity and test one restore path in staging.
- Bi-weekly: apply image updates in a planned window with rollback checkpoints.
- Monthly: remove stale peers, rotate sensitive keys, and audit admin roles.
- Quarterly: run incident drills for DNS breakage, cert expiry, and control-plane outage.
Keep a known-good image manifest and rollback procedure. Teams that skip this often discover too late that "latest" tags can introduce breaking changes in critical networking paths.
Add minimal observability from day one: HTTPS health checks, container restart alerts, and enrollment failure metrics. You do not need a complex observability stack to get meaningful early warning signals.
Verification checklist
- Dashboard and API are reachable via HTTPS with valid certificate chain.
- Signal path negotiates without transport errors.
- New setup key can onboard a client successfully.
- Enrolled peers appear in management view with expected status.
- Service restart preserves state and does not invalidate existing peers unexpectedly.
# HTTPS and API checks
curl -I https://netbird.example.com/
curl -I https://netbird.example.com/api/
# Confirm gRPC signal path is reachable (TLS handshake + endpoint)
openssl s_client -connect netbird.example.com:443 -servername netbird.example.com </dev/null
# Check peer state from client
netbird statusIf the copy button does not work in your browser, manually copy from the code block above.
Common issues and fixes
Clients fail to enroll with setup key errors
Check key expiration, key scope, and whether the management URL exactly matches your TLS endpoint. Small URL mismatches are a frequent cause of first-day failures.
Dashboard loads but API calls fail
Usually a reverse-proxy path mapping issue. Verify /api/ routing points at management service and includes expected forwarded headers.
Signal connection instability across some networks
Validate Nginx gRPC handling and inspect NAT/firewall behavior on affected clients. Canary testing from multiple network types helps surface this early.
TLS renewal or certificate mismatch
Confirm DNS points to the correct host and renewal paths are accessible. If certs are stale, review renewal job logs and reload Nginx safely.
State appears to reset after restart
Inspect volume mounts for management data persistence. Missing bind mounts can create the illusion of healthy services while silently losing configuration state.
Unauthorized peer growth over time
Enforce lifecycle cleanup. Remove stale peers on schedule and require ticketed approvals for long-lived setup keys.
Backup, retention, and host hardening commands:
# Backup key data paths daily
mkdir -p ~/netbird-prod/backups/$(date +%F)
cp -a ~/netbird-prod/management ~/netbird-prod/backups/$(date +%F)/
cp -a ~/netbird-prod/secrets ~/netbird-prod/backups/$(date +%F)/
cp -a ~/netbird-prod/nginx ~/netbird-prod/backups/$(date +%F)/
# Retain last 30 days
find ~/netbird-prod/backups -mindepth 1 -maxdepth 1 -type d -mtime +30 -exec rm -rf {} +If the copy button does not work in your browser, manually copy from the code block above.
0 2 * * * /usr/bin/docker compose -f /home/ubuntu/netbird-prod/docker-compose.yml pull && /usr/bin/docker compose -f /home/ubuntu/netbird-prod/docker-compose.yml up -d
10 2 * * * /usr/bin/find /home/ubuntu/netbird-prod/backups -mindepth 1 -maxdepth 1 -type d -mtime +30 -exec rm -rf {} +
20 2 * * * /usr/bin/docker image prune -af --filter "until=168h"If the copy button does not work in your browser, manually copy from the code block above.
sudo ufw allow OpenSSH
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable
# optional
sudo apt -y install fail2ban
sudo systemctl enable --now fail2banIf the copy button does not work in your browser, manually copy from the code block above.
FAQ
Is NetBird a replacement for every VPN use case?
Not always. It excels for modern peer connectivity and identity-managed private access. Some legacy site-to-site patterns may still require complementary tooling.
Can we start with one server and scale later?
Yes. Start with a single controlled deployment and strong operational hygiene. Scale architecture only when growth signals justify added complexity.
Do we need Kubernetes for production reliability?
No. Docker Compose is a practical and reliable foundation for many teams when backups, monitoring, and upgrade discipline are in place.
How often should setup keys be rotated?
Prefer short-lived keys for onboarding events and regular rotation of longer-lived credentials. Align rotation cadence with your incident-response policy.
What is the safest onboarding process for new teams?
Use team-scoped keys, explicit approvals, and canary rollout first. Document revocation and offboarding so access does not linger after project changes.
How do we handle emergency lockout scenarios?
Prepare a break-glass runbook with rollback images, emergency admin access path, and peer revocation commands. Rehearse this before a real outage.
What logs matter most during incident response?
Prioritize management service logs, reverse-proxy access/error logs, and client status outputs. Correlating these quickly narrows root cause between auth, routing, and transport layers.
Related guides
Talk to us
If you want this implemented with hardened defaults, observability, and tested recovery playbooks, our team can help.