Skip to Content

Production Guide: Deploy ClickHouse with Docker Compose, NGINX, and Automated Backups on Ubuntu

A production-first ClickHouse deployment pattern with TLS, role-based users, backup automation, and practical ops checks.

ClickHouse is one of the fastest ways to run high-volume analytics without operating a heavyweight data warehouse stack. In real production environments, teams often need to ingest logs/events quickly, run sub-second aggregations, and keep operational overhead low. This guide walks through a practical setup we use for SMB and mid-market deployments: ClickHouse on Ubuntu with Docker Compose, fronted by NGINX for TLS and controlled ingress, with explicit user separation, backup automation, and health verification. The goal is not just β€œit starts,” but a setup you can monitor, recover, and safely hand to on-call engineers.

Use this pattern when you want clear separation between infrastructure and analytics service runtime, predictable upgrades, and strong defaults for security without introducing full Kubernetes complexity.

Architecture and Flow Overview

The deployment has four core layers:

  • Host layer (Ubuntu): base OS hardening, firewall, Docker runtime, persistent volumes.
  • Service layer (Docker Compose): ClickHouse server container with mounted config, users, and durable data path.
  • Edge layer (NGINX): TLS termination, optional IP allowlisting, request routing to ClickHouse HTTP endpoint.
  • Operations layer: scheduled backups, restore testing, health checks, and log inspection workflow.

Data enters over HTTP (or native protocol if you expose it intentionally). NGINX terminates HTTPS and forwards validated traffic to ClickHouse. ClickHouse writes to persistent volumes. Backup jobs snapshot data/config and push to off-host storage. Verification jobs continuously validate connectivity, user permissions, and query latency.

# High-level network flow
Client/BI Tool --HTTPS--> NGINX (443) --HTTP--> ClickHouse (8123)
Optional trusted clients -----------------------> ClickHouse native (9000)

If the copy button does not work in your browser, manually select the code block and copy it.

Prerequisites

  • Ubuntu 22.04/24.04 host with at least 4 vCPU, 8 GB RAM, and SSD-backed storage.
  • Domain pointed to server public IP (example: analytics.example.com).
  • Docker Engine + Docker Compose plugin installed.
  • NGINX and Certbot available on host.
  • Outbound access to backup destination (S3-compatible bucket or remote storage server).
  • SSH access with sudo privileges.

Recommended: separate non-root Linux user for ops, UFW enabled, and a cloud snapshot schedule at the VM/disk level for fast disaster recovery.

sudo apt update
sudo apt install -y ca-certificates curl gnupg ufw nginx certbot python3-certbot-nginx
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu $(. /etc/os-release && echo $VERSION_CODENAME) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update
sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
sudo usermod -aG docker $USER

If the copy button does not work in your browser, manually select the code block and copy it.

Step-by-Step Deployment

1) Create project directories and baseline files

Keep all deployment files in one predictable directory. Separate data, logs, and config so backups and restores are straightforward. We explicitly mount users/config files to avoid accidental in-container edits.

sudo mkdir -p /opt/clickhouse/{data,logs,config,users,backups,scripts}
sudo chown -R $USER:$USER /opt/clickhouse
cd /opt/clickhouse

If the copy button does not work in your browser, manually select the code block and copy it.

2) Define Docker Compose service

This Compose file pins a stable official image tag, maps persistent volumes, and enables restart policy. Avoid floating latest tags in production; pin and upgrade intentionally.

cat > /opt/clickhouse/docker-compose.yml << 'EOF'
services:
  clickhouse:
    image: clickhouse/clickhouse-server:24.8
    container_name: clickhouse
    restart: unless-stopped
    ports:
      - "127.0.0.1:8123:8123"
      - "127.0.0.1:9000:9000"
    ulimits:
      nofile:
        soft: 262144
        hard: 262144
    volumes:
      - /opt/clickhouse/data:/var/lib/clickhouse
      - /opt/clickhouse/logs:/var/log/clickhouse-server
      - /opt/clickhouse/config/config.xml:/etc/clickhouse-server/config.xml:ro
      - /opt/clickhouse/users/users.xml:/etc/clickhouse-server/users.xml:ro
    healthcheck:
      test: ["CMD", "clickhouse-client", "--query", "SELECT 1"]
      interval: 30s
      timeout: 10s
      retries: 5
EOF

If the copy button does not work in your browser, manually select the code block and copy it.

3) Configure users, profiles, and access controls

Create at least three logical users: admin (limited network), ingestion writer, and readonly BI user. Start with least privilege and grow only when required by workload. For internet-facing systems, always pair this with network restrictions and TLS.

cat > /opt/clickhouse/users/users.xml << 'EOF'

  
    
      
      
        ::/0
      
      default
      default
      0
    

    
      REPLACE_WITH_SHA256_HEX
      127.0.0.1
      default
      default
      1
    

    
      REPLACE_WITH_SHA256_HEX
      10.0.0.0/8
      default
      default
      0
    

    
      REPLACE_WITH_SHA256_HEX
      10.0.0.0/8
      readonly
      default
      1
    
  

EOF

If the copy button does not work in your browser, manually select the code block and copy it.

4) Start ClickHouse and run baseline checks

Bring up the service and verify both container health and query responsiveness. During first boot, ClickHouse may initialize system tables and take longer than expected; do not proceed to proxy setup until health checks stabilize.

cd /opt/clickhouse
docker compose up -d
docker compose ps
docker logs --tail=100 clickhouse
docker exec -it clickhouse clickhouse-client --query "SELECT version(), uptime()"

If the copy button does not work in your browser, manually select the code block and copy it.

5) Configure NGINX reverse proxy and TLS

Expose only NGINX to the public internet. Keep ClickHouse ports bound to localhost unless you have a strict private-network requirement. Add standard security headers and conservative request limits to reduce abuse risk.

sudo tee /etc/nginx/sites-available/clickhouse.conf > /dev/null << 'EOF'
server {
    listen 80;
    server_name analytics.example.com;

    location / {
        proxy_pass http://127.0.0.1:8123;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_read_timeout 300;
        client_max_body_size 50m;
    }
}
EOF

sudo ln -s /etc/nginx/sites-available/clickhouse.conf /etc/nginx/sites-enabled/clickhouse.conf
sudo nginx -t && sudo systemctl reload nginx
sudo certbot --nginx -d analytics.example.com --agree-tos -m [email protected] --non-interactive --redirect

If the copy button does not work in your browser, manually select the code block and copy it.

6) Create databases, roles, and retention model

Start with explicit schemas and lifecycle policies. For observability workloads, partition by day and keep retention in SQL so storage growth stays predictable. Avoid broad superuser-style grants for ingestion services.

CREATE DATABASE IF NOT EXISTS analytics;

CREATE TABLE IF NOT EXISTS analytics.events
(
  ts DateTime,
  service LowCardinality(String),
  level LowCardinality(String),
  message String,
  trace_id String
)
ENGINE = MergeTree
PARTITION BY toDate(ts)
ORDER BY (service, ts)
TTL ts + INTERVAL 30 DAY DELETE;

CREATE ROLE IF NOT EXISTS role_ingest;
GRANT INSERT, SELECT ON analytics.events TO role_ingest;
GRANT role_ingest TO ingest_app;

CREATE ROLE IF NOT EXISTS role_bi;
GRANT SELECT ON analytics.events TO role_bi;
GRANT role_bi TO bi_readonly;

If the copy button does not work in your browser, manually select the code block and copy it.

7) Automate backups and rehearse restore

Backups are only useful if restore is tested. At minimum, schedule daily compressed snapshots plus weekly restore verification into a temporary namespace/table. Keep one off-host copy to survive node loss.

cat > /opt/clickhouse/scripts/backup.sh << 'EOF'
#!/usr/bin/env bash
set -euo pipefail
STAMP=$(date +%F-%H%M%S)
OUT="/opt/clickhouse/backups/clickhouse-${STAMP}.tar.gz"
tar -czf "$OUT" /opt/clickhouse/data /opt/clickhouse/config /opt/clickhouse/users
find /opt/clickhouse/backups -type f -name 'clickhouse-*.tar.gz' -mtime +7 -delete
# Example off-host sync (replace with your target)
# rclone copy "$OUT" remote:clickhouse-backups/
EOF
chmod +x /opt/clickhouse/scripts/backup.sh
( crontab -l 2>/dev/null; echo "15 2 * * * /opt/clickhouse/scripts/backup.sh >> /var/log/clickhouse-backup.log 2>&1" ) | crontab -

If the copy button does not work in your browser, manually select the code block and copy it.

Configuration and Secrets Handling Best Practices

Never hardcode production passwords inside Compose files committed to source control. Store secrets in a managed vault or, if that is not yet available, at least in root-readable files with strict permissions and documented rotation procedures. For ClickHouse specifically:

  • Use SHA256 password hashes in users.xml, not plaintext.
  • Restrict user <networks> aggressively; default-open ranges are risky.
  • Separate ingestion and BI identities for traceable access and controlled grants.
  • Rotate credentials on a fixed cadence (for example every 60–90 days).
  • Back up config and user files alongside data; auth loss during restore is a common outage multiplier.

Also consider host-level controls: disable password SSH, enforce key-based auth, and allow inbound only 22/80/443 unless your architecture requires private-port exposure.

Verification Checklist

  • Container status is healthy for at least 10 minutes after boot.
  • HTTPS endpoint responds and NGINX cert chain is valid.
  • Readonly user cannot write; ingestion user can insert but not alter schema.
  • Retention TTL is visible in table definition and old data expires as expected.
  • Backup job produces archive files and log entries daily.
  • Test restore can load sample data into a staging table.

Practical probe commands:

curl -I https://analytics.example.com
curl -s "https://analytics.example.com/?query=SELECT%201"
docker exec clickhouse clickhouse-client --query "SHOW TABLES FROM analytics"
docker exec clickhouse clickhouse-client --query "SELECT count() FROM analytics.events"
tail -n 100 /var/log/clickhouse-backup.log

If the copy button does not work in your browser, manually select the code block and copy it.

Common Issues and Fixes

Issue: NGINX returns 502 Bad Gateway

Cause: ClickHouse container not healthy yet, or proxy target mismatch. Fix: confirm container health, check proxy_pass host/port, and inspect docker logs clickhouse.

Issue: Authentication works locally but fails via proxy

Cause: user network restrictions do not include proxy source path/IP assumptions. Fix: tighten and test allowed networks deliberately; avoid over-broad temporary rules that never get removed.

Issue: Disk growth faster than forecast

Cause: missing TTL, high-cardinality columns in sort key, or no partition pruning. Fix: review table engine design, retention, and cardinality strategy; validate compression and index usage with sample workloads.

Issue: Backups exist but restore fails

Cause: backup captured data but omitted compatible config/users, or archive permissions are broken. Fix: version backups with metadata, include config/users, and run scheduled restore drills monthly.

FAQ

1) Why Docker Compose instead of Kubernetes for ClickHouse?

Compose is excellent for teams that need fast delivery and low operational complexity. If you do not need multi-zone orchestration yet, Compose plus strong backup discipline is often sufficient and easier to maintain.

2) Should I expose port 9000 publicly for performance?

No. Keep native protocol private unless absolutely necessary. Public exposure increases risk significantly. Use NGINX + HTTPS for controlled ingress and expose native ports only on private networks with strict firewall policy.

3) How much RAM do I need for a production starter setup?

For light to moderate analytics, 8 GB is workable; 16 GB is more comfortable once query concurrency grows. Monitor memory pressure and query latency early; scale before user-facing dashboards degrade.

4) How do I rotate credentials without downtime?

Create new users/roles first, update clients, validate access, and only then remove old credentials. This phased rotation avoids sudden ingestion outages and gives rollback room.

5) What backup frequency is practical?

Daily full snapshots with 7–14 day retention are a solid baseline for many teams. If your recovery point objective is tighter, add incremental or more frequent exports and test restores against the same objective.

6) Can I use object storage for long-term backup retention?

Yes. S3-compatible object storage is a common pattern. Encrypt archives, store checksums, and enforce lifecycle policies so storage cost remains predictable while preserving compliance requirements.

7) How do I confirm readonly users are truly read-only?

Run explicit negative tests (INSERT/ALTER should fail) as part of deployment verification. Keep these tests in your runbook so they are repeated after upgrades or permission changes.

Related Guides

Talk to us

If you want this deployed with production hardening, monitoring, and backup automation tailored to your environment, our team can help.

Contact Us

Production Guide: Deploy Temporal on Kubernetes with Helm, cert-manager, and External PostgreSQL
A production-oriented Temporal deployment with secure persistence, TLS ingress, verification steps, and practical troubleshooting.