Skip to Content

Production Guide: Deploy Zabbix with Docker Compose + Traefik + PostgreSQL on Ubuntu

A production-oriented Zabbix deployment with TLS, secret hygiene, verification checks, and day-2 operations.

Zabbix is still one of the most practical ways to run infrastructure monitoring in small and mid-size environments: it is open-source, supports deep host/service checks, and gives you alerting, dashboards, and historical trends without per-host licensing surprises. In many teams, the gap is not installing Zabbix itself; the gap is shipping it with production defaults—TLS at the edge, clean network boundaries, stable backups, and a repeatable upgrade path.

This guide shows a hardened deployment of Zabbix Server + Frontend + PostgreSQL + Zabbix Agent on a single Ubuntu host using Docker Compose behind Traefik with automatic Let’s Encrypt certificates. The design keeps internal services private on an internal Docker network, exposes only Traefik publicly, and puts secrets in a dedicated environment file outside version control. You will also get day-2 checks for alert flow and troubleshooting steps for the most common operational failures.

Architecture and flow overview

High-level request and monitoring flow:

  1. User opens https://zabbix.example.com in browser.
  2. Traefik receives HTTPS traffic on 443, terminates TLS, and routes to the Zabbix web container over the internal Docker network.
  3. Zabbix web connects to Zabbix server and PostgreSQL over private network only (no direct internet exposure).
  4. Zabbix agent on monitored hosts sends metrics to server port 10051 (restricted at firewall level).
  5. Alerts are triggered through actions/media types and can be sent to email or chat integrations.

Why this pattern works in production: Traefik centralizes certificates and edge controls, Compose keeps deployment simple and auditable, and PostgreSQL data remains in a named volume that can be snapshotted and backed up independently.

Prerequisites

  • Ubuntu 22.04/24.04 VM or bare metal host (minimum 4 vCPU, 8 GB RAM, 80+ GB disk for light/medium workloads).
  • A DNS A record: zabbix.example.com pointing to your host public IP.
  • Outbound internet from host (for container pulls and ACME certificate issuance).
  • Open ports: 80/tcp, 443/tcp, and optionally 10051/tcp if remote agents report directly.
  • Docker Engine and Docker Compose plugin installed.
  • A non-root sudo user on the host.

Step-by-step deployment

Step 1: Prepare host and directory layout

Create a clean deployment directory with least-privilege permissions. Keep runtime secrets in .env.prod and never commit it.

sudo mkdir -p /opt/zabbix-stack/{traefik,postgres,backups}
sudo chown -R $USER:$USER /opt/zabbix-stack
cd /opt/zabbix-stack
umask 027

Manual copy fallback: Select the command block, copy, and run it directly in your shell.

Step 2: Create secrets and environment file

Use long random values for database and admin credentials. Do not hardcode secrets in docker-compose.yml.

cat > /opt/zabbix-stack/.env.prod <<'EOF'
DOMAIN=zabbix.example.com
TZ=UTC
POSTGRES_DB=zabbix
POSTGRES_USER=zabbix
POSTGRES_PASSWORD=CHANGE_ME_DB_PASSWORD
ZBX_SERVER_HOST=zabbix-server
ZBX_DB_PASSWORD=CHANGE_ME_DB_PASSWORD
ZBX_ADMIN_USER=Admin
ZBX_ADMIN_PASSWORD=CHANGE_ME_ZABBIX_ADMIN_PASSWORD
EOF
chmod 600 /opt/zabbix-stack/.env.prod

Manual copy fallback: If heredoc paste fails in your terminal, create the file with a text editor and apply the same keys manually.

Step 3: Write Docker Compose stack

This compose file pins a known-good major version family, uses health checks, and ensures Traefik labels are attached only to the web container.

cat > /opt/zabbix-stack/docker-compose.yml <<'EOF'
services:
  traefik:
    image: traefik:v3.1
    container_name: traefik
    command:
      - --api.dashboard=true
      - --providers.docker=true
      - --providers.docker.exposedbydefault=false
      - --entrypoints.web.address=:80
      - --entrypoints.websecure.address=:443
      - [email protected]
      - --certificatesresolvers.letsencrypt.acme.storage=/letsencrypt/acme.json
      - --certificatesresolvers.letsencrypt.acme.httpchallenge=true
      - --certificatesresolvers.letsencrypt.acme.httpchallenge.entrypoint=web
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./traefik:/letsencrypt
    restart: unless-stopped
    networks:
      - edge

  postgres:
    image: postgres:16-alpine
    container_name: zabbix-postgres
    env_file: .env.prod
    environment:
      - POSTGRES_DB=${POSTGRES_DB}
      - POSTGRES_USER=${POSTGRES_USER}
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
      - TZ=${TZ}
    volumes:
      - pg_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 10s
      timeout: 5s
      retries: 12
    restart: unless-stopped
    networks:
      - internal

  zabbix-server:
    image: zabbix/zabbix-server-pgsql:alpine-7.0-latest
    container_name: zabbix-server
    env_file: .env.prod
    environment:
      - DB_SERVER_HOST=postgres
      - POSTGRES_DB=${POSTGRES_DB}
      - POSTGRES_USER=${POSTGRES_USER}
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
      - ZBX_CACHESIZE=256M
      - ZBX_HISTORYCACHESIZE=256M
      - ZBX_TRENDCACHESIZE=128M
      - ZBX_STARTPOLLERS=20
      - TZ=${TZ}
    depends_on:
      postgres:
        condition: service_healthy
    ports:
      - "10051:10051"
    restart: unless-stopped
    networks:
      - internal

  zabbix-web:
    image: zabbix/zabbix-web-nginx-pgsql:alpine-7.0-latest
    container_name: zabbix-web
    env_file: .env.prod
    environment:
      - ZBX_SERVER_HOST=${ZBX_SERVER_HOST}
      - DB_SERVER_HOST=postgres
      - POSTGRES_DB=${POSTGRES_DB}
      - POSTGRES_USER=${POSTGRES_USER}
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
      - PHP_TZ=${TZ}
    depends_on:
      postgres:
        condition: service_healthy
      zabbix-server:
        condition: service_started
    labels:
      - traefik.enable=true
      - traefik.http.routers.zabbix.rule=Host(`${DOMAIN}`)
      - traefik.http.routers.zabbix.entrypoints=websecure
      - traefik.http.routers.zabbix.tls=true
      - traefik.http.routers.zabbix.tls.certresolver=letsencrypt
      - traefik.http.services.zabbix.loadbalancer.server.port=8080
    restart: unless-stopped
    networks:
      - edge
      - internal

volumes:
  pg_data:

networks:
  edge:
  internal:
EOF

Manual copy fallback: If your browser strips formatting, copy from the first cat line through EOF into a local file.

Step 4: Bring up stack and verify health

cd /opt/zabbix-stack
touch traefik/acme.json && chmod 600 traefik/acme.json
docker compose --env-file .env.prod up -d
docker compose ps

Manual copy fallback: Run each command line-by-line and confirm every container reaches Up state.

Step 5: Initialize and secure Zabbix UI

Open https://zabbix.example.com. The default login is usually Admin / zabbix on first boot if not overridden. Immediately rotate credentials and enforce strong session policy.

# Optional: set maintenance banner in UI and create named admin account
# Administration -> Users -> Create "ops-admin"
# Disable or rename default Admin account after validation

Manual copy fallback: These are UI steps; copy the checklist text into your runbook ticket.

Configuration and secrets handling best practices

For production, treat monitoring as a privileged system because it sees host metrics, credentials for checks, and incident context:

  • Move secrets out of flat files: if your platform supports it, migrate DB/password values into Docker secrets or an external vault and inject at runtime.
  • Segment networks: keep PostgreSQL unreachable from public interfaces; only containers in the internal network should access it.
  • Constrain agent ingress: if agents report over internet, enforce source IP allowlists and consider VPN overlay (WireGuard/Tailscale) to avoid exposing 10051 broadly.
  • Email/webhook integrity: use dedicated sender credentials, SPF/DKIM alignment, and signed webhooks where available.
  • Least-privilege DB role: reserve superuser access for maintenance; application role should be scoped to required schema operations only.

Store an encrypted copy of .env.prod in your secrets manager and ensure backup operators can recover it during DR without requiring shell access to the production node.

Verification checklist

Run this checklist before handing the platform to operations:

# Edge and certificate checks
curl -I https://zabbix.example.com
# Expect: HTTP/2 200 and valid TLS chain

# Container health
cd /opt/zabbix-stack
docker compose ps

docker logs --tail=80 zabbix-server | tail -n 20
docker logs --tail=80 zabbix-web | tail -n 20

# DB connectivity from server container
docker exec -it zabbix-server sh -lc 'nc -zv postgres 5432'

Manual copy fallback: If interactive docker exec is blocked in automation, run equivalent non-interactive commands and store output in ticket notes.

Inside the Zabbix UI, add one Linux host with active agent checks, trigger a test alert, and verify notification delivery end-to-end (trigger -> action -> media -> recipient). This final loop is where many teams discover SMTP policy or DNS sender issues, so treat it as a release gate, not an optional check.

Common issues and fixes

1) Let’s Encrypt certificate is not issued

Symptom: browser shows default Traefik certificate or TLS warning.

Fix: ensure DNS points to the host, port 80 is reachable, and no other reverse proxy is intercepting ACME HTTP challenge. Confirm traefik/acme.json is writable by Traefik.

2) Zabbix web shows database connection error

Symptom: login page fails or setup wizard cannot connect to DB.

Fix: verify POSTGRES_PASSWORD is identical in all relevant services, and that postgres is healthy before web/server startup. Restart stack after correcting env file.

3) Agents appear unsupported or unreachable

Symptom: host status flips between unknown/unavailable.

Fix: validate hostname/IP consistency, active vs passive mode settings, firewall allowances for 10050/10051 as applicable, and NTP sync across server and hosts.

4) Slow frontend or delayed trigger evaluation

Symptom: dashboard lags or alert latency increases under load.

Fix: raise Zabbix cache sizes, tune poller/trapper counts, and move to external PostgreSQL with faster disks if retention or host count has grown beyond single-node capacity.

5) Upgrade causes schema mismatch warning

Symptom: service starts but reports DB version mismatch.

Fix: snapshot DB volume first, run controlled version jump (e.g., 6.0 LTS -> 7.0 LTS path), and verify release notes for mandatory intermediate steps before production rollout.

FAQ

Can I run this stack on one VM for production?

Yes for small environments, but define clear growth thresholds. Once host count, checks-per-second, or retention requirements climb, split database and move to a multi-node architecture.

Should I expose port 10051 publicly?

Avoid direct public exposure when possible. Prefer VPN/overlay networks or tightly scoped source IP allowlists to reduce attack surface and noisy scans.

How often should I back up the database?

At minimum daily full backup plus WAL/point-in-time strategy for critical environments. Test restores monthly to ensure backup validity, not just backup job success.

What is the safest way to rotate credentials?

Rotate in sequence: create new secret, update dependent service env values, restart affected services, verify logins/checks, then revoke old credentials and document completion.

Can I integrate Zabbix alerts with Slack or Teams?

Yes. Configure media types and actions in Zabbix, then test with non-production triggers first. Include retry strategy and escalation policy to prevent silent failures.

When should I choose Kubernetes instead of Compose?

Choose Kubernetes when you need multi-node scheduling, standardized secret/cert automation, GitOps workflows, and stronger failure-domain isolation. Compose remains excellent for single-host operational simplicity.

Related internal guides

Talk to us

If you want this implemented with hardened defaults, observability, and tested recovery playbooks, our team can help.

Contact Us

Production Guide: Deploy Umami with Docker Compose + Traefik + PostgreSQL on Ubuntu
A production-ready Umami analytics deployment with secure networking, secret handling, backups, and practical troubleshooting.