When teams move from early-stage observability to production operations, they often discover that fragmented tooling creates blind spots during incidents. Metrics might show saturation, logs might show errors, and traces might reveal latency, but if those signals live in separate systems with inconsistent naming and retention rules, mean time to resolution rises quickly. SigNoz provides an integrated approach by combining distributed tracing, metrics, and logs into one platform built around OpenTelemetry conventions.
This guide is for engineering teams running customer-facing workloads on Ubuntu-based Kubernetes clusters who need practical, repeatable deployment steps instead of conceptual overviews. We will deploy SigNoz with Helm, front it with ingress-nginx, secure it with cert-manager and Letβs Encrypt, and add day-2 safeguards for upgrades, backups, and troubleshooting. By the end, you will have a production-oriented SigNoz environment with TLS, persistent volumes, and a clear runbook for operating it safely.
The walkthrough assumes you already have a functioning Kubernetes control plane and worker nodes, but no prior SigNoz deployment. Every major section includes verification checks so you can catch misconfigurations early rather than discovering them during an outage.
Architecture and flow overview
The production layout in this guide follows a simple but robust flow. Applications emit telemetry with OpenTelemetry SDKs or collectors. Telemetry reaches SigNoz components running in Kubernetes, where traces, metrics, and logs are stored and indexed. The SigNoz frontend and API are exposed through ingress-nginx, and cert-manager provisions and renews TLS certificates automatically. This design minimizes manual certificate handling while keeping ingress policy centralized.
- Ingress layer: ingress-nginx handles HTTPS routing and public entry to SigNoz.
- Certificate automation: cert-manager issues certificates via ACME HTTP-01 challenge.
- Observability stack: SigNoz chart deploys query, frontend, and storage dependencies.
- Persistence: Stateful components use PersistentVolumeClaims sized for retention targets.
- Operations: routine health checks, backup exports, and controlled Helm upgrades.
For teams with strict change control, this architecture is also audit-friendly: ingress, certificate issuer, and Helm values are explicit and versionable in Git.
Prerequisites
Before starting, confirm these baseline requirements. Skipping prerequisite validation is the most common reason production installs fail late in the process.
- Ubuntu host with
kubectl,helm, and cluster-admin access configured. - A running Kubernetes cluster (v1.26+ recommended) with a default StorageClass.
- A DNS record such as
signoz.example.compointing to your ingress-nginx external IP. - Ports 80 and 443 reachable from the public internet for ACME validation.
- A dedicated namespace (we use
signoz) and a maintenance window for first deploy.
Use the following checks before you install anything.
Step-by-step deployment
1) Validate cluster connectivity and storage
Start by proving your admin context and storage class are healthy. If these checks fail, stop and fix them first.
kubectl version --short
kubectl get nodes -o wide
kubectl get storageclass
kubectl auth can-i "*" "*" --all-namespaces
If the copy button does not work in your browser, manually select and copy the command block.
2) Create namespace and install ingress-nginx
Install ingress-nginx in its own namespace so lifecycle management remains clean. Wait for the controller Service to expose an address.
kubectl create namespace ingress-nginx --dry-run=client -o yaml | kubectl apply -f -
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx --namespace ingress-nginx --set controller.replicaCount=2
kubectl -n ingress-nginx get svc ingress-nginx-controller -w
If the copy button does not work in your browser, manually select and copy the command block.
3) Install cert-manager and ACME ClusterIssuer
cert-manager should be installed from the official chart with CRDs enabled. Then define a ClusterIssuer for Letβs Encrypt production.
kubectl create namespace cert-manager --dry-run=client -o yaml | kubectl apply -f -
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm upgrade --install cert-manager jetstack/cert-manager --namespace cert-manager --set crds.enabled=true
kubectl -n cert-manager rollout status deploy/cert-manager
kubectl -n cert-manager rollout status deploy/cert-manager-webhook
If the copy button does not work in your browser, manually select and copy the command block.
cat <<'EOF' | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
email: [email protected]
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginx
EOF
kubectl get clusterissuer letsencrypt-prod
If the copy button does not work in your browser, manually select and copy the command block.
4) Prepare SigNoz Helm values
Use a committed values file so the deployment is reproducible. Set ingress host, TLS secret, and persistence values based on expected telemetry volume.
cat <<'EOF' > values-signoz.yaml
global:
storageClass: ""
frontend:
ingress:
enabled: true
className: nginx
hosts:
- host: signoz.example.com
paths:
- path: /
pathType: Prefix
tls:
- hosts:
- signoz.example.com
secretName: signoz-tls
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
clickhouse:
persistence:
enabled: true
size: 200Gi
zookeeper:
persistence:
enabled: true
size: 20Gi
alertmanager:
enabled: true
EOF
If the copy button does not work in your browser, manually select and copy the command block.
5) Install SigNoz chart
Install into a dedicated namespace and wait for workloads to stabilize. Initial startup can take several minutes while StatefulSets initialize storage.
kubectl create namespace signoz --dry-run=client -o yaml | kubectl apply -f -
helm repo add signoz https://charts.signoz.io
helm repo update
helm upgrade --install signoz signoz/signoz --namespace signoz -f values-signoz.yaml --wait --timeout 20m
kubectl -n signoz get pods -o wide
If the copy button does not work in your browser, manually select and copy the command block.
6) Confirm DNS and certificate issuance
Once ingress is up, confirm your DNS and certificate objects converge correctly. ACME challenge failures are usually DNS or port exposure issues.
kubectl -n signoz get ingress
kubectl -n signoz describe certificate signoz-tls || true
kubectl -n signoz get certificaterequest,challenge,order
curl -I https://signoz.example.com
If the copy button does not work in your browser, manually select and copy the command block.
Configuration and secrets handling best practices
Production stability depends as much on secret hygiene as on deployment success. Treat values files as non-secret defaults and move credentials into Kubernetes Secrets or external secret managers. Avoid embedding admin credentials, SMTP tokens, or webhook secrets directly into Helm values tracked in source control.
Recommended baseline:
- Store sensitive values in external secret systems (Vault, cloud secret manager) and sync to Kubernetes.
- Scope service accounts with least privilege. Avoid cluster-admin in runtime pods.
- Use network policies to limit lateral movement between namespaces.
- Rotate credentials on a schedule and after every incident response engagement.
- Version and review all Helm value changes through pull requests with peer approval.
If your organization uses sealed secrets, wire that workflow before first production rollout so emergency edits do not bypass controls.
kubectl -n signoz create secret generic signoz-smtp --from-literal=SMTP_USER='[email protected]' --from-literal=SMTP_PASS='REPLACE_ME' --dry-run=client -o yaml | kubectl apply -f -
kubectl -n signoz get secret signoz-smtp -o yaml
If the copy button does not work in your browser, manually select and copy the command block.
Verification checklist
Do not treat a green Helm install as full success. Verify functional telemetry paths and UI usability with deterministic checks.
- All SigNoz pods are Running and Ready.
- The ingress responds over HTTPS with a valid certificate chain.
- At least one service emits traces and appears in SigNoz service list.
- Metrics dashboards populate without query timeouts.
- Log ingestion appears for selected workloads.
- Alertmanager route test reaches your notification channel.
kubectl -n signoz get pods
kubectl -n signoz get ingress
openssl s_client -connect signoz.example.com:443 -servername signoz.example.com </dev/null 2>/dev/null | openssl x509 -noout -issuer -subject -dates
kubectl -n signoz logs deploy/signoz-frontend --tail=80
If the copy button does not work in your browser, manually select and copy the command block.
For high-confidence validation, instrument a sandbox service with OpenTelemetry and confirm one end-to-end request appears as a trace in the UI.
Common issues and fixes
Certificates stay in Pending
Check DNS A/AAAA records first, then verify ingress class annotations and open ports 80/443. ACME HTTP-01 fails when challenge paths cannot be reached publicly.
Pods restart due to storage pressure
Review PVC sizing and StorageClass throughput. Observability backends can grow quickly under verbose logging or high-cardinality labels.
Queries are slow during peak windows
Reduce high-cardinality tags, tune retention, and scale backend resources in planned increments. Validate each change against representative load.
Ingress works internally but not externally
Verify cloud load balancer security groups, firewall policy, and whether externalTrafficPolicy is affecting source routing in your environment.
Telemetry missing for one service
Confirm OpenTelemetry endpoint configuration, exporter protocol compatibility, and network egress policies from that workload namespace.
FAQ
Can I run SigNoz without cert-manager?
Yes, but you will manually issue and rotate TLS certificates. For production, cert-manager is safer and more maintainable.
How much storage should I allocate initially?
Start with conservative retention and measure daily growth. Many teams begin with 100β200Gi for core telemetry and scale after two weeks of real traffic.
Should I deploy SigNoz in the same cluster as app workloads?
You can, but isolate namespaces, quotas, and network policies carefully. Larger teams often prefer a dedicated observability cluster for blast-radius control.
How do I handle upgrades safely?
Pin chart versions, test upgrades in staging, snapshot persistent data where possible, and apply upgrades during low-traffic windows with rollback plans.
Can I integrate existing OpenTelemetry collectors?
Yes. Keep collector configs consistent and validate protocol settings. Standardize resource attributes early to avoid dashboard fragmentation.
What is the fastest way to improve incident response with this setup?
Define service-level dashboards, alert routes with ownership, and runbook links inside alert messages so responders move from signal to action quickly.
Do I need separate retention policies for logs and traces?
Usually yes. Logs and traces have different diagnostic value horizons and storage costs, so independent retention policies reduce cost without hurting investigations.
Related guides
- Production Guide: Deploy Sentry with Docker Compose + Caddy + PostgreSQL + Redis on Ubuntu
- Production Guide: Deploy OpenObserve with Kubernetes + Helm + cert-manager + ingress-nginx on Ubuntu
- Deploy OpenProject with Docker Compose and Traefik on Ubuntu (Production Guide)
Talk to us
Need help deploying or hardening SigNoz in production? We can help with architecture, security baselines, migration planning, and day-2 operational runbooks tailored to your team.