If your team is shipping multiple services and you need fast, cost-efficient log search without committing to a heavyweight observability stack, OpenObserve is a practical option. In this guide, we deploy OpenObserve in a production-oriented setup using Docker Compose, Traefik, and ClickHouse on Ubuntu. The goal is not just to get a green container status, but to build an implementation you can operate confidently: predictable networking, TLS termination, secret handling, retention controls, backups, and recovery checks.
We use a realistic flow: Traefik terminates HTTPS and routes traffic to OpenObserve, OpenObserve persists and indexes data in ClickHouse, and you ingest logs from common agents over OTLP/HTTP. Along the way, we include verification commands and failure-mode troubleshooting so you can move from “it starts” to “it survives production incidents.”
Architecture and flow overview
The deployment uses three layers:
- Edge layer (Traefik): Handles HTTPS certificates, HTTP-to-HTTPS redirects, and upstream routing.
- Application layer (OpenObserve): Ingestion endpoint, stream/query engine, dashboarding, and alerting workflows.
- Data layer (ClickHouse): Durable storage and fast analytical queries for high-cardinality log data.
Request path for the UI is:
Client browser → Traefik (443) → OpenObserve UI/API container → ClickHouse backend
Ingestion path for application logs is:
Application/collector → OpenObserve ingest endpoint (OTLP/HTTP) → ClickHouse storage
Why this layout works well in production:
- Traefik gives a clean TLS and routing boundary with simple label-based config.
- ClickHouse keeps query latency stable as data grows, especially for analytical log queries.
- Docker Compose keeps operational overhead low while preserving clean service separation.
# Expected request flow
# Browser -> Traefik :443 -> OpenObserve :5080 -> ClickHouse :8123/:9000
# Collector -> OpenObserve ingest endpoint -> ClickHouse
If the copy button does not work in your browser/editor, manually select and copy the command block.
Prerequisites
- Ubuntu 22.04+ host with 4 vCPU / 8 GB RAM minimum (8 vCPU / 16 GB recommended for sustained ingest).
- Docker Engine and Docker Compose plugin installed.
- DNS A/AAAA record for your OpenObserve hostname (example:
logs.example.com) pointing to the server. - Ports 80/443 reachable from the internet.
- A strong admin password for OpenObserve and secure credentials for ClickHouse.
Before deployment, create a dedicated service user (optional but recommended), update base packages, and verify clock sync. Time drift causes confusing ingest/query behavior in log systems.
sudo apt update && sudo apt -y upgrade
sudo timedatectl set-ntp true
sudo timedatectl status
# Docker quick check
docker --version
docker compose version
If the copy button does not work in your browser/editor, manually select and copy the command block.
Step-by-step deployment
1) Create project layout and external network
Use a dedicated directory so config, env files, and Compose definitions stay versionable. We also create a shared Docker network used by Traefik and backend services.
sudo mkdir -p /opt/openobserve/{traefik,clickhouse,data,config}
sudo chown -R $USER:$USER /opt/openobserve
cd /opt/openobserve
docker network create edge_net || true
If the copy button does not work in your browser/editor, manually select and copy the command block.
2) Create environment file
Place secrets in .env and keep the file readable only by privileged operators. Avoid committing it to Git. Rotate secrets on handoff or incident response events.
cat > /opt/openobserve/.env <<'EOF'
DOMAIN=logs.example.com
[email protected]
TRAEFIK_DASH_AUTH=admin:$2y$05$replace_with_htpasswd_hash
[email protected]
OO_ADMIN_PASSWORD=replace-with-strong-password
CLICKHOUSE_DB=openobserve
CLICKHOUSE_USER=oo_user
CLICKHOUSE_PASSWORD=replace-with-long-random-secret
TZ=UTC
EOF
chmod 600 /opt/openobserve/.env
If the copy button does not work in your browser/editor, manually select and copy the command block.
3) Create Docker Compose stack
The stack includes Traefik for ingress, ClickHouse for storage, and OpenObserve for ingest/query/UI. Traefik labels define routing and TLS. We intentionally keep each service explicit so debugging is straightforward.
version: "3.9"
services:
traefik:
image: traefik:v3.1
container_name: traefik
command:
- --api.dashboard=true
- --providers.docker=true
- --providers.docker.exposedbydefault=false
- --entrypoints.web.address=:80
- --entrypoints.websecure.address=:443
- --certificatesresolvers.le.acme.email=${ACME_EMAIL}
- --certificatesresolvers.le.acme.storage=/letsencrypt/acme.json
- --certificatesresolvers.le.acme.httpchallenge=true
- --certificatesresolvers.le.acme.httpchallenge.entrypoint=web
ports:
- "80:80"
- "443:443"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- ./traefik:/letsencrypt
networks: [edge_net]
clickhouse:
image: clickhouse/clickhouse-server:24.8
container_name: clickhouse
environment:
- CLICKHOUSE_DB=${CLICKHOUSE_DB}
- CLICKHOUSE_USER=${CLICKHOUSE_USER}
- CLICKHOUSE_PASSWORD=${CLICKHOUSE_PASSWORD}
- TZ=${TZ}
volumes:
- ./clickhouse:/var/lib/clickhouse
ulimits:
nofile: 262144
networks: [edge_net]
openobserve:
image: public.ecr.aws/zinclabs/openobserve:latest
container_name: openobserve
depends_on: [clickhouse]
environment:
- ZO_ROOT_USER_EMAIL=${OO_ADMIN_EMAIL}
- ZO_ROOT_USER_PASSWORD=${OO_ADMIN_PASSWORD}
- ZO_LOCAL_MODE=false
- ZO_META_STORE=postgres
- ZO_CLICKHOUSE_HOST=clickhouse
- ZO_CLICKHOUSE_PORT=8123
- ZO_CLICKHOUSE_USER=${CLICKHOUSE_USER}
- ZO_CLICKHOUSE_PASSWORD=${CLICKHOUSE_PASSWORD}
- ZO_CLICKHOUSE_DATABASE=${CLICKHOUSE_DB}
- TZ=${TZ}
labels:
- traefik.enable=true
- traefik.http.routers.openobserve.rule=Host(`${DOMAIN}`)
- traefik.http.routers.openobserve.entrypoints=websecure
- traefik.http.routers.openobserve.tls=true
- traefik.http.routers.openobserve.tls.certresolver=le
- traefik.http.services.openobserve.loadbalancer.server.port=5080
- traefik.http.routers.openobserve-http.rule=Host(`${DOMAIN}`)
- traefik.http.routers.openobserve-http.entrypoints=web
- traefik.http.routers.openobserve-http.middlewares=redirect-to-https
- traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https
volumes:
- ./data:/data
- ./config:/config
networks: [edge_net]
networks:
edge_net:
external: true
If the copy button does not work in your browser/editor, manually select and copy the command block.
4) Launch and check container health
Bring up the stack and verify that all services are running before testing TLS and login.
cd /opt/openobserve
docker compose up -d
docker compose ps
docker compose logs --tail=80 traefik
docker compose logs --tail=80 openobserve
docker compose logs --tail=80 clickhouse
If the copy button does not work in your browser/editor, manually select and copy the command block.
5) Create a first stream and test ingest/query
After first login, create an organization/stream in OpenObserve. Send sample JSON events via HTTP, then query them in the UI. Validate timestamps, fields, and parsing behavior before connecting real workloads.
curl -k -u "${OO_ADMIN_EMAIL}:${OO_ADMIN_PASSWORD}" \
-H "Content-Type: application/json" \
-X POST "https://${DOMAIN}/api/default/default/_json" \
-d '{"level":"info","service":"payments-api","message":"startup ok","env":"prod"}'
If the copy button does not work in your browser/editor, manually select and copy the command block.
Configuration and secrets handling best practices
Production reliability depends heavily on safe defaults and disciplined secret handling. Keep these practices in place:
- Least privilege: Use separate credentials for ClickHouse and avoid sharing root-level credentials across systems.
- Network boundaries: Keep ClickHouse private on internal Docker networking; do not publish database ports unless truly needed.
- Secret storage: Store
.envin restricted paths with 600 permissions; rotate secrets at regular intervals. - Data retention: Define stream-level retention early to prevent runaway storage costs.
- Backup policy: Snapshot ClickHouse and OpenObserve data directories on a schedule; test restore monthly.
- Audit and alerting: Create baseline alerts for ingestion drops, query failures, and disk usage thresholds.
For teams with strict compliance requirements, move secrets from local env files into an external secrets manager and inject them at runtime. Even in Compose deployments, this materially lowers leakage risk during handoffs and incident debugging.
# Example backup (local snapshot pattern)
cd /opt/openobserve
sudo tar -czf /var/backups/openobserve-data-$(date +%F).tgz data clickhouse config
# Example restore validation (dry run listing)
tar -tzf /var/backups/openobserve-data-$(date +%F).tgz | head
If the copy button does not work in your browser/editor, manually select and copy the command block.
Verification checklist
- HTTPS certificate is valid and auto-renew path exists in Traefik logs.
- OpenObserve login succeeds with non-default admin credentials.
- Sample ingestion appears in the expected stream within seconds.
- Query latency remains acceptable under representative log volume.
- ClickHouse disk growth aligns with retention expectations.
- Backup archive can be listed and restore drill is documented.
# Endpoint checks
curl -I https://${DOMAIN}
# Quick service-level diagnostics
docker compose ps
docker stats --no-stream
docker compose logs --tail=120 openobserve | egrep -i "error|warn|panic" || true
docker compose logs --tail=120 clickhouse | egrep -i "error|exception" || true
If the copy button does not work in your browser/editor, manually select and copy the command block.
Common issues and fixes
TLS certificate not issued
Most failures are DNS mismatch, blocked port 80, or stale challenge data. Confirm A/AAAA records, inbound firewall rules, and Traefik ACME settings. Restart Traefik after fixing DNS and watch logs for fresh challenge attempts.
OpenObserve starts but cannot query data
Usually a backend connectivity or credential mismatch to ClickHouse. Validate ZO_CLICKHOUSE_* values and test from the OpenObserve container to ClickHouse over internal DNS.
Ingestion works intermittently
Check timestamp formatting from clients, stream-level schema drift, and backpressure from undersized disk I/O. Add ingest buffering and increase resource reservations if sustained bursts are expected.
Disk fills too quickly
Set stream retention aggressively at first, then relax as you understand actual query needs. Logging everything forever is expensive and rarely useful; archive long-tail data to object storage instead.
# Network and DNS checks inside containers
docker exec -it openobserve getent hosts clickhouse
docker exec -it openobserve sh -c 'wget -qO- http://clickhouse:8123/ping'
# Confirm domain resolution from host
getent hosts ${DOMAIN}
If the copy button does not work in your browser/editor, manually select and copy the command block.
FAQ
1) Can I run OpenObserve without ClickHouse?
Yes, but for production-scale workloads, a dedicated analytical backend is strongly preferred for predictable query performance and operational headroom.
2) Is Docker Compose enough for production?
For many small and mid-sized teams, yes—if you include proper backups, monitoring, resource planning, and change control. Compose is often operationally simpler than a premature Kubernetes migration.
3) How should I size this stack initially?
Start with ingest rate, retention target, and query concurrency. For pilot deployments, 8 vCPU and 16 GB RAM is a practical baseline, then scale based on real usage metrics.
4) How do I secure admin access?
Use strong unique credentials, restrict dashboard access with IP allowlists/VPN where possible, and rotate secrets after team or environment changes.
5) What is the safest way to onboard application logs?
Use a collector (for example, OpenTelemetry Collector) with buffering and retry enabled. Avoid direct ad-hoc ingestion from every app instance without backpressure controls.
6) How often should I test restore?
At least monthly. Backup files are only useful if you can restore within your recovery objective in a repeatable runbook.
7) Can I migrate this to Kubernetes later?
Yes. Keep your configuration modular, externalize secrets, and document ingress/storage assumptions so migration is incremental rather than disruptive.
Related guides
- Deploy Grafana Loki with Docker Compose + Traefik + S3
- Deploy Redash with Docker Compose + Nginx + PostgreSQL
- Deploy NetBird with Kubernetes + Helm + cert-manager
Talk to us
If you want this implemented with hardened defaults, observability, and tested recovery playbooks, our team can help.