Zabbix is still one of the most practical ways to run infrastructure monitoring in small and mid-size environments: it is open-source, supports deep host/service checks, and gives you alerting, dashboards, and historical trends without per-host licensing surprises. In many teams, the gap is not installing Zabbix itself; the gap is shipping it with production defaults—TLS at the edge, clean network boundaries, stable backups, and a repeatable upgrade path.
This guide shows a hardened deployment of Zabbix Server + Frontend + PostgreSQL + Zabbix Agent on a single Ubuntu host using Docker Compose behind Traefik with automatic Let’s Encrypt certificates. The design keeps internal services private on an internal Docker network, exposes only Traefik publicly, and puts secrets in a dedicated environment file outside version control. You will also get day-2 checks for alert flow and troubleshooting steps for the most common operational failures.
Architecture and flow overview
High-level request and monitoring flow:
- User opens
https://zabbix.example.comin browser. - Traefik receives HTTPS traffic on 443, terminates TLS, and routes to the Zabbix web container over the internal Docker network.
- Zabbix web connects to Zabbix server and PostgreSQL over private network only (no direct internet exposure).
- Zabbix agent on monitored hosts sends metrics to server port 10051 (restricted at firewall level).
- Alerts are triggered through actions/media types and can be sent to email or chat integrations.
Why this pattern works in production: Traefik centralizes certificates and edge controls, Compose keeps deployment simple and auditable, and PostgreSQL data remains in a named volume that can be snapshotted and backed up independently.
Prerequisites
- Ubuntu 22.04/24.04 VM or bare metal host (minimum 4 vCPU, 8 GB RAM, 80+ GB disk for light/medium workloads).
- A DNS A record:
zabbix.example.compointing to your host public IP. - Outbound internet from host (for container pulls and ACME certificate issuance).
- Open ports:
80/tcp,443/tcp, and optionally10051/tcpif remote agents report directly. - Docker Engine and Docker Compose plugin installed.
- A non-root sudo user on the host.
Step-by-step deployment
Step 1: Prepare host and directory layout
Create a clean deployment directory with least-privilege permissions. Keep runtime secrets in .env.prod and never commit it.
sudo mkdir -p /opt/zabbix-stack/{traefik,postgres,backups}
sudo chown -R $USER:$USER /opt/zabbix-stack
cd /opt/zabbix-stack
umask 027
Manual copy fallback: Select the command block, copy, and run it directly in your shell.
Step 2: Create secrets and environment file
Use long random values for database and admin credentials. Do not hardcode secrets in docker-compose.yml.
cat > /opt/zabbix-stack/.env.prod <<'EOF'
DOMAIN=zabbix.example.com
TZ=UTC
POSTGRES_DB=zabbix
POSTGRES_USER=zabbix
POSTGRES_PASSWORD=CHANGE_ME_DB_PASSWORD
ZBX_SERVER_HOST=zabbix-server
ZBX_DB_PASSWORD=CHANGE_ME_DB_PASSWORD
ZBX_ADMIN_USER=Admin
ZBX_ADMIN_PASSWORD=CHANGE_ME_ZABBIX_ADMIN_PASSWORD
EOF
chmod 600 /opt/zabbix-stack/.env.prod
Manual copy fallback: If heredoc paste fails in your terminal, create the file with a text editor and apply the same keys manually.
Step 3: Write Docker Compose stack
This compose file pins a known-good major version family, uses health checks, and ensures Traefik labels are attached only to the web container.
cat > /opt/zabbix-stack/docker-compose.yml <<'EOF'
services:
traefik:
image: traefik:v3.1
container_name: traefik
command:
- --api.dashboard=true
- --providers.docker=true
- --providers.docker.exposedbydefault=false
- --entrypoints.web.address=:80
- --entrypoints.websecure.address=:443
- [email protected]
- --certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json
- --certificatesresolvers.letsencrypt.acme.httpchallenge=true
- --certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web
ports:
- "80:80"
- "443:443"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./traefik:/letsencrypt
restart: unless-stopped
networks:
- edge
postgres:
image: postgres:16-alpine
container_name: zabbix-postgres
env_file: .env.prod
environment:
- POSTGRES_DB=${POSTGRES_DB}
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- TZ=${TZ}
volumes:
- pg_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
interval: 10s
timeout: 5s
retries: 12
restart: unless-stopped
networks:
- internal
zabbix-server:
image: zabbix/zabbix-server-pgsql:alpine-7.0-latest
container_name: zabbix-server
env_file: .env.prod
environment:
- DB_SERVER_HOST=postgres
- POSTGRES_DB=${POSTGRES_DB}
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- ZBX_CACHESIZE=256M
- ZBX_HISTORYCACHESIZE=256M
- ZBX_TRENDCACHESIZE=128M
- ZBX_STARTPOLLERS=20
- TZ=${TZ}
depends_on:
postgres:
condition: service_healthy
ports:
- "10051:10051"
restart: unless-stopped
networks:
- internal
zabbix-web:
image: zabbix/zabbix-web-nginx-pgsql:alpine-7.0-latest
container_name: zabbix-web
env_file: .env.prod
environment:
- ZBX_SERVER_HOST=${ZBX_SERVER_HOST}
- DB_SERVER_HOST=postgres
- POSTGRES_DB=${POSTGRES_DB}
- POSTGRES_USER=${POSTGRES_USER}
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
- PHP_TZ=${TZ}
depends_on:
postgres:
condition: service_healthy
zabbix-server:
condition: service_started
labels:
- traefik.enable=true
- traefik.http.routers.zabbix.rule=Host(`${DOMAIN}`)
- traefik.http.routers.zabbix.entrypoints=websecure
- traefik.http.routers.zabbix.tls=true
- traefik.http.routers.zabbix.tls.certresolver=letsencrypt
- traefik.http.services.zabbix.loadbalancer.server.port=8080
restart: unless-stopped
networks:
- edge
- internal
volumes:
pg_data:
networks:
edge:
internal:
EOF
Manual copy fallback: If your browser strips formatting, copy from the first cat line through EOF into a local file.
Step 4: Bring up stack and verify health
cd /opt/zabbix-stack
touch traefik/acme.json && chmod 600 traefik/acme.json
docker compose --env-file .env.prod up -d
docker compose ps
Manual copy fallback: Run each command line-by-line and confirm every container reaches Up state.
Step 5: Initialize and secure Zabbix UI
Open https://zabbix.example.com. The default login is usually Admin / zabbix on first boot if not overridden. Immediately rotate credentials and enforce strong session policy.
# Optional: set maintenance banner in UI and create named admin account
# Administration -> Users -> Create "ops-admin"
# Disable or rename default Admin account after validation
Manual copy fallback: These are UI steps; copy the checklist text into your runbook ticket.
Configuration and secrets handling best practices
For production, treat monitoring as a privileged system because it sees host metrics, credentials for checks, and incident context:
- Move secrets out of flat files: if your platform supports it, migrate DB/password values into Docker secrets or an external vault and inject at runtime.
- Segment networks: keep PostgreSQL unreachable from public interfaces; only containers in the internal network should access it.
- Constrain agent ingress: if agents report over internet, enforce source IP allowlists and consider VPN overlay (WireGuard/Tailscale) to avoid exposing 10051 broadly.
- Email/webhook integrity: use dedicated sender credentials, SPF/DKIM alignment, and signed webhooks where available.
- Least-privilege DB role: reserve superuser access for maintenance; application role should be scoped to required schema operations only.
Store an encrypted copy of .env.prod in your secrets manager and ensure backup operators can recover it during DR without requiring shell access to the production node.
Verification checklist
Run this checklist before handing the platform to operations:
# Edge and certificate checks
curl -I https://zabbix.example.com
# Expect: HTTP/2 200 and valid TLS chain
# Container health
cd /opt/zabbix-stack
docker compose ps
docker logs --tail=80 zabbix-server | tail -n 20
docker logs --tail=80 zabbix-web | tail -n 20
# DB connectivity from server container
docker exec -it zabbix-server sh -lc 'nc -zv postgres 5432'
Manual copy fallback: If interactive docker exec is blocked in automation, run equivalent non-interactive commands and store output in ticket notes.
Inside the Zabbix UI, add one Linux host with active agent checks, trigger a test alert, and verify notification delivery end-to-end (trigger -> action -> media -> recipient). This final loop is where many teams discover SMTP policy or DNS sender issues, so treat it as a release gate, not an optional check.
Common issues and fixes
1) Let’s Encrypt certificate is not issued
Symptom: browser shows default Traefik certificate or TLS warning.
Fix: ensure DNS points to the host, port 80 is reachable, and no other reverse proxy is intercepting ACME HTTP challenge. Confirm traefik/acme.json is writable by Traefik.
2) Zabbix web shows database connection error
Symptom: login page fails or setup wizard cannot connect to DB.
Fix: verify POSTGRES_PASSWORD is identical in all relevant services, and that postgres is healthy before web/server startup. Restart stack after correcting env file.
3) Agents appear unsupported or unreachable
Symptom: host status flips between unknown/unavailable.
Fix: validate hostname/IP consistency, active vs passive mode settings, firewall allowances for 10050/10051 as applicable, and NTP sync across server and hosts.
4) Slow frontend or delayed trigger evaluation
Symptom: dashboard lags or alert latency increases under load.
Fix: raise Zabbix cache sizes, tune poller/trapper counts, and move to external PostgreSQL with faster disks if retention or host count has grown beyond single-node capacity.
5) Upgrade causes schema mismatch warning
Symptom: service starts but reports DB version mismatch.
Fix: snapshot DB volume first, run controlled version jump (e.g., 6.0 LTS -> 7.0 LTS path), and verify release notes for mandatory intermediate steps before production rollout.
FAQ
Can I run this stack on one VM for production?
Yes for small environments, but define clear growth thresholds. Once host count, checks-per-second, or retention requirements climb, split database and move to a multi-node architecture.
Should I expose port 10051 publicly?
Avoid direct public exposure when possible. Prefer VPN/overlay networks or tightly scoped source IP allowlists to reduce attack surface and noisy scans.
How often should I back up the database?
At minimum daily full backup plus WAL/point-in-time strategy for critical environments. Test restores monthly to ensure backup validity, not just backup job success.
What is the safest way to rotate credentials?
Rotate in sequence: create new secret, update dependent service env values, restart affected services, verify logins/checks, then revoke old credentials and document completion.
Can I integrate Zabbix alerts with Slack or Teams?
Yes. Configure media types and actions in Zabbix, then test with non-production triggers first. Include retry strategy and escalation policy to prevent silent failures.
When should I choose Kubernetes instead of Compose?
Choose Kubernetes when you need multi-node scheduling, standardized secret/cert automation, GitOps workflows, and stronger failure-domain isolation. Compose remains excellent for single-host operational simplicity.
Related internal guides
- Deploy Umami with Docker Compose + Traefik + PostgreSQL
- Deploy Mattermost with Docker Compose + Caddy + PostgreSQL
- Deploy Harbor with Kubernetes + Helm + cert-manager
Talk to us
If you want this implemented with hardened defaults, observability, and tested recovery playbooks, our team can help.