Skip to Content

Production Guide: Deploy VictoriaMetrics with Docker Compose + NGINX + systemd + UFW on Ubuntu

A production-ready VictoriaMetrics deployment with secure ingress, controlled retention, backups, and operational checks for small platform teams.

If your team needs long-retention metrics without the operational weight of a full observability stack, VictoriaMetrics is a practical choice. In many environments, Prometheus works well for short retention but becomes expensive and awkward when cardinality grows, remote-write targets multiply, and teams need a single place to query months of data quickly. This guide walks through a production deployment of single-node VictoriaMetrics on Ubuntu using Docker Compose + NGINX + systemd + UFW, with TLS termination, authentication boundaries, backup strategy, and day-2 operations checks you can hand to on-call engineers.

The scenario: you run several services and exporters, want to centralize time-series ingestion, and need predictable maintenance. We will deploy VictoriaMetrics behind NGINX, expose only required ports, enforce firewall policy, run the stack under systemd for restart behavior, and validate ingestion/query paths with realistic checks. You will also see practical pitfalls (disk pressure, high-cardinality labels, scrape storms, and query timeouts) and how to fix them quickly.

Architecture and flow overview

This deployment uses a clear separation of concerns:

  • VictoriaMetrics container stores and serves metrics on the host volume.
  • NGINX handles TLS, request limits, and controlled external access.
  • systemd unit starts/stops the Compose project cleanly at boot and on failure.
  • UFW allows only SSH/HTTP/HTTPS, blocking direct exposure of VictoriaMetrics internal port.

Data flow is straightforward: exporters or Prometheus remote-write send metrics to VictoriaMetrics; operators query through the NGINX endpoint; backups snapshot the metrics volume on a schedule. Keep ingestion and query endpoints behind the same reverse proxy policy unless your environment requires split planes.

Exporters/Prometheus --> NGINX:443 --> VictoriaMetrics:8428 --> /srv/victoriametrics/data
Operators -----------> NGINX:443 --> VictoriaMetrics query APIs

Manual copy fallback: select the command/code block text and copy if your browser strips the copy button script.

Prerequisites

  • Ubuntu 22.04 or 24.04 server with sudo access.
  • A DNS record pointing to your server (for example, metrics.example.com).
  • Docker Engine + Docker Compose plugin installed.
  • NGINX and certbot available on host.
  • At least 2 vCPU / 4 GB RAM / fast SSD for small-to-medium workloads.

Before starting, update the system and create a dedicated service directory:

sudo apt update && sudo apt -y upgrade
sudo mkdir -p /opt/victoriametrics /srv/victoriametrics/data /srv/victoriametrics/backups
sudo chown -R $USER:$USER /opt/victoriametrics /srv/victoriametrics

Manual copy fallback: select the command/code block text and copy if your browser strips the copy button script.

Step-by-step deployment

1) Create the Compose file. Pin versions in production to avoid surprise behavior changes. The example below adds practical flags: retention period, concurrency limits, and memory allowance tuned for a single node.

version: "3.9"
services:
  victoriametrics:
    image: victoriametrics/victoria-metrics:v1.102.1
    container_name: victoriametrics
    restart: unless-stopped
    command:
      - "-storageDataPath=/storage"
      - "-retentionPeriod=90d"
      - "-httpListenAddr=:8428"
      - "-search.maxConcurrentRequests=16"
      - "-search.maxQueueDuration=30s"
      - "-memory.allowedPercent=70"
      - "-selfScrapeInterval=10s"
    volumes:
      - /srv/victoriametrics/data:/storage
    ports:
      - "127.0.0.1:8428:8428"
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://127.0.0.1:8428/health"]
      interval: 30s
      timeout: 5s
      retries: 5
      start_period: 20s

Manual copy fallback: select the command/code block text and copy if your browser strips the copy button script.

Save it:

cat > /opt/victoriametrics/docker-compose.yml <<'EOF'
# paste the compose content above
EOF
cd /opt/victoriametrics
docker compose up -d

Manual copy fallback: select the command/code block text and copy if your browser strips the copy button script.

2) Configure NGINX reverse proxy. Keep the VictoriaMetrics port bound to localhost and expose only HTTPS externally.

server {
    listen 80;
    server_name metrics.example.com;
    location / {
        return 301 https://$host$request_uri;
    }
}

server {
    listen 443 ssl http2;
    server_name metrics.example.com;

    ssl_certificate /etc/letsencrypt/live/metrics.example.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/metrics.example.com/privkey.pem;

    client_max_body_size 20m;
    proxy_read_timeout 120s;
    proxy_send_timeout 120s;

    location / {
        proxy_pass http://127.0.0.1:8428;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}

Manual copy fallback: select the command/code block text and copy if your browser strips the copy button script.

Enable the site and obtain TLS certificate:

sudo tee /etc/nginx/sites-available/victoriametrics >/dev/null <<'EOF'
# paste nginx config above
EOF
sudo ln -s /etc/nginx/sites-available/victoriametrics /etc/nginx/sites-enabled/victoriametrics
sudo nginx -t && sudo systemctl reload nginx
sudo certbot --nginx -d metrics.example.com --redirect --agree-tos -m [email protected] -n

Manual copy fallback: select the command/code block text and copy if your browser strips the copy button script.

3) Add systemd management for the Compose stack. This gives your team deterministic lifecycle controls and better integration with server boot behavior.

[Unit]
Description=VictoriaMetrics Docker Compose Stack
Requires=docker.service
After=docker.service network-online.target

[Service]
Type=oneshot
WorkingDirectory=/opt/victoriametrics
ExecStart=/usr/bin/docker compose up -d
ExecStop=/usr/bin/docker compose down
RemainAfterExit=yes
TimeoutStartSec=0

[Install]
WantedBy=multi-user.target

Manual copy fallback: select the command/code block text and copy if your browser strips the copy button script.

Enable and start:

sudo tee /etc/systemd/system/victoriametrics-compose.service >/dev/null <<'EOF'
# paste unit above
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now victoriametrics-compose.service
sudo systemctl status victoriametrics-compose.service --no-pager

Manual copy fallback: select the command/code block text and copy if your browser strips the copy button script.

4) Lock down firewall policy with UFW. Only expose what is required to run and manage the service.

sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow OpenSSH
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
sudo ufw enable
sudo ufw status verbose

Manual copy fallback: select the command/code block text and copy if your browser strips the copy button script.

Configuration and secrets handling best practices

VictoriaMetrics does not require many secrets by default, but production environments still need controls around ingestion endpoints and backup credentials. Keep these principles in place:

  • Do not expose :8428 publicly. Bind it to loopback and force all traffic through NGINX policy.
  • If you use remote-write from Prometheus, store credentials in restricted environment files and reference them in Prometheus config, not in shell history.
  • For object-storage backups (S3-compatible), use least-privilege IAM keys scoped to one bucket/path and rotate quarterly.
  • Set retention intentionally (for example 30d, 90d, 180d) based on query value and storage cost, not habit.
  • Cap query concurrency and queue duration so expensive dashboards do not starve ingestion.

Example backup script to create compressed snapshots for off-host sync:

#!/usr/bin/env bash
set -euo pipefail
STAMP=$(date +%F-%H%M%S)
SRC=/srv/victoriametrics/data
DST=/srv/victoriametrics/backups
mkdir -p "$DST"
tar -I 'zstd -3' -cpf "$DST/vm-$STAMP.tar.zst" -C "$SRC" .
find "$DST" -type f -name 'vm-*.tar.zst' -mtime +14 -delete

Manual copy fallback: select the command/code block text and copy if your browser strips the copy button script.

Run backups under cron or systemd timers, then periodically test restore to a disposable host. Unverified backups are incidents waiting to happen.

Verification checklist

After deployment, run these checks in order:

  1. Container health is passing and restart policy works.
  2. NGINX serves HTTPS correctly and redirects HTTP to HTTPS.
  3. Firewall blocks direct external access to 8428.
  4. Ingestion path accepts sample writes; query endpoints return expected data.
  5. Backups complete and can be extracted without corruption.
docker ps --filter name=victoriametrics
curl -fsS http://127.0.0.1:8428/health
curl -I https://metrics.example.com/health
sudo ss -lntp | grep 8428
sudo ufw status
ls -lh /srv/victoriametrics/backups | tail -n 5

Manual copy fallback: select the command/code block text and copy if your browser strips the copy button script.

For ingestion validation from Prometheus remote-write, monitor VictoriaMetrics internal metrics and watch for sustained queue growth or rejected samples. If your queries are fast but ingestion falls behind, review label cardinality and scrape interval spread.

Common issues and fixes

1) 502 Bad Gateway from NGINX

Usually caused by container not healthy, wrong upstream address, or local firewall policy. Confirm container health first, then test curl http://127.0.0.1:8428/health from host, and finally validate NGINX site syntax with nginx -t.

2) High disk usage growth

Most common causes are overlong retention and high-cardinality labels (for example per-request IDs). Reduce retention and normalize labels before scaling hardware. Cardinality mistakes can multiply storage and query cost rapidly.

3) Slow dashboard queries during peak writes

Tune query concurrency and queue limits, then spread heavy dashboards to stagger refresh times. If needed, move to a larger VM profile with faster NVMe and more memory before increasing concurrency aggressively.

4) TLS renewal failures

Check certbot timer, DNS drift, and NGINX virtual host precedence. Keep a routine check in monitoring for certificate expiry so you catch this before end users do.

5) Backup files exist but restore fails

Backups can be incomplete if the script runs during unstable storage conditions or if retention cleanup is overly broad. Add checksum verification and perform a monthly full restore drill.

FAQ

Can I run VictoriaMetrics without NGINX?

Yes, but it is not recommended for internet-facing production. NGINX provides TLS termination, request controls, and a cleaner policy boundary. For private networks only, direct access may be acceptable with strict segmentation.

What retention period should I start with?

For most teams, 60–90 days is a sensible first target. Start with actual query needs, estimate ingestion volume, then adjust. Retention without query value is pure storage cost.

Do I need Prometheus if I use VictoriaMetrics?

You can still use Prometheus for scraping and remote-write, while VictoriaMetrics handles durable storage and long-range queries. This is a common migration path that avoids tooling disruption.

How do I secure write endpoints?

Keep write paths behind NGINX, restrict source IPs where possible, and use authenticated ingress if your topology supports it. Never leave write endpoints globally open without controls.

When should I move from single-node to clustered VictoriaMetrics?

Move when ingestion volume, retention, and SLOs exceed what one node can provide safely, or when you need higher availability than host-level recovery. Clustered mode adds complexity, so move with a clear capacity trigger.

How often should I test recovery?

At least monthly for critical monitoring stacks, and after every major backup-policy change. A documented, repeatable restore runbook is as important as backup schedule success logs.

Related guides

Talk to us

If you want this implemented with hardened defaults, observability, and tested recovery playbooks, our team can help.

Contact Us

Production Guide: Deploy Meilisearch with Docker Compose + NGINX + UFW on Ubuntu
A production-oriented Meilisearch deployment with reverse proxy, API key hygiene, backups, and operations runbook for small teams.