Temporal is an orchestration engine for durable background workflows: payments that must retry safely, onboarding pipelines with long waits, and incident automations that cannot lose state when a worker restarts. In production, teams usually hit the same pain points: inconsistent retries, duplicate side effects, and manual recovery after partial failures. This guide shows a production-first deployment of Temporal on Kubernetes using Helm, cert-manager TLS, and an external PostgreSQL backend so state survives cluster maintenance and upgrades. The goal is not just to make Temporal start, but to run it with predictable operations, safer secret handling, and verification steps you can hand to an on-call engineer.
Architecture and flow overview
The stack in this guide uses a dedicated Kubernetes namespace, a Helm-managed Temporal release, ingress with TLS certificates from cert-manager, and PostgreSQL hosted outside the cluster. Temporal server components (frontend, history, matching, and worker) run as deployments; visibility and persistence both use PostgreSQL in this setup for operational simplicity. Application workers connect to Temporal through the public HTTPS endpoint while operators use tctl or SDK health checks to verify namespace registration and workflow task processing.
- Kubernetes + Helm for repeatable rollout and upgrades.
- External PostgreSQL for durable persistence independent of node lifecycle.
- cert-manager + Ingress for managed TLS certificates.
- Kubernetes Secrets for credentials and rotation workflows.
Prerequisites
- Ubuntu admin workstation with kubectl and Helm 3 configured to your production cluster.
- DNS record for
temporal.example.compointed to your ingress controller. - Running cert-manager and a ClusterIssuer (Let's Encrypt or internal PKI).
- External PostgreSQL 14+ with network access from cluster nodes.
- A dedicated database and least-privilege DB user for Temporal.
- Change-management window for first deployment and rollback testing.
Step-by-step deployment
1) Create namespace and baseline labels
Keep Temporal isolated from unrelated workloads so resource policies and alerts are easier to reason about.
kubectl create namespace temporal
kubectl label namespace temporal app.kubernetes.io/part-of=temporal
kubectl get ns temporal --show-labels
If the copy button does not work in your browser/editor, manually select the code block and copy it.
2) Create PostgreSQL databases and user
Temporal needs persistence and visibility stores. Using separate databases simplifies backup validation and query tuning.
CREATE USER temporal_app WITH ENCRYPTED PASSWORD 'REPLACE_WITH_STRONG_SECRET';
CREATE DATABASE temporal OWNER temporal_app;
CREATE DATABASE temporal_visibility OWNER temporal_app;
GRANT ALL PRIVILEGES ON DATABASE temporal TO temporal_app;
GRANT ALL PRIVILEGES ON DATABASE temporal_visibility TO temporal_app;
If the copy button does not work in your browser/editor, manually select the code block and copy it.
3) Add Helm repos and pull chart defaults
helm repo add temporal https://go.temporal.io/helm-charts
helm repo update
helm show values temporal/temporal > temporal-values.base.yaml
If the copy button does not work in your browser/editor, manually select the code block and copy it.
4) Build production values and DB secret
Store only credentials in Kubernetes secrets; keep non-secret settings in versioned values files for peer review.
kubectl -n temporal create secret generic temporal-db-secret --from-literal=password='REPLACE_WITH_STRONG_SECRET'
cat > temporal-values.prod.yaml <<'YAML'
server:
replicaCount: 3
config:
persistence:
default:
driver: sql
sql:
pluginName: postgres
host: postgres-prod.internal:5432
database: temporal
user: temporal_app
existingSecret: temporal-db-secret
maxConns: 40
maxConnLifetime: 1h
visibility:
driver: sql
sql:
pluginName: postgres
host: postgres-prod.internal:5432
database: temporal_visibility
user: temporal_app
existingSecret: temporal-db-secret
maxConns: 20
maxConnLifetime: 1h
web:
enabled: true
replicaCount: 2
ingress:
enabled: true
className: nginx
host: temporal.example.com
tls:
- secretName: temporal-tls
hosts:
- temporal.example.com
schema:
setup:
enabled: true
update:
enabled: true
elasticsearch:
enabled: false
prometheus:
enabled: true
YAML
If the copy button does not work in your browser/editor, manually select the code block and copy it.
5) Configure TLS certificate and ingress policy
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: temporal-cert
namespace: temporal
spec:
secretName: temporal-tls
issuerRef:
kind: ClusterIssuer
name: letsencrypt-prod
dnsNames:
- temporal.example.com
If the copy button does not work in your browser/editor, manually select the code block and copy it.
kubectl apply -f temporal-certificate.yaml
kubectl -n temporal get certificate,secret temporal-cert temporal-tls
If the copy button does not work in your browser/editor, manually select the code block and copy it.
6) Install Temporal with Helm and wait for readiness
helm upgrade --install temporal temporal/temporal --namespace temporal -f temporal-values.prod.yaml --atomic --timeout 15m
kubectl -n temporal get pods -o wide
If the copy button does not work in your browser/editor, manually select the code block and copy it.
7) Register namespace and run a smoke workflow
kubectl -n temporal exec deploy/temporal-admintools -- tctl --ns production namespace register --rd 3
kubectl -n temporal exec deploy/temporal-admintools -- tctl --ns production namespace describe
If the copy button does not work in your browser/editor, manually select the code block and copy it.
Configuration and secrets handling best practices
Use a dedicated secret object for database credentials and rotate it on a fixed schedule. Pair rotation with a staged rollout: update secret, restart one Temporal component, verify queue health, then continue. Avoid embedding credentials in Helm command history or CI logs; pass them through your secret manager and inject at deploy time. For network security, limit PostgreSQL ingress to cluster node CIDRs and require TLS where possible. On RBAC, keep write access to the temporal namespace restricted to CI service accounts and cluster operators. Finally, export audit logs for helm upgrades and namespace changes so incident reviews can map config drift to a timeline quickly.
For application teams, define a namespace lifecycle policy: create namespaces through code review, set retention explicitly, and document worker retry semantics before production cutover. This reduces accidental infinite retries and runaway task queues.
Verification checklist
- All Temporal server pods are Ready with no CrashLoopBackOff.
- Ingress endpoint serves valid TLS cert chain for your domain.
- Temporal namespace registration succeeds and can be described via tctl.
- At least one sample workflow completes successfully end-to-end.
- PostgreSQL connections remain below configured maxConns under load test.
- Prometheus metrics are scraped and dashboard panels render latency/error signals.
kubectl -n temporal get pods
kubectl -n temporal logs deploy/temporal-frontend --tail=80
kubectl -n temporal exec deploy/temporal-admintools -- tctl cluster health | cat
If the copy button does not work in your browser/editor, manually select the code block and copy it.
curl -I https://temporal.example.com
kubectl -n temporal get ingress
kubectl -n temporal get certificate temporal-cert -o yaml | sed -n '1,40p'
If the copy button does not work in your browser/editor, manually select the code block and copy it.
Common issues and fixes
Workers time out while server appears healthy
Check task queue names, namespace mismatch, and worker identity configuration. Most startup failures come from environment mismatches between worker deployment and Temporal namespace settings.
Schema update job fails during upgrade
Validate database permissions for schema migrations and ensure no stale locks remain from previous failed releases. Re-run Helm with --atomic so failed upgrades rollback cleanly.
TLS handshake failures at ingress
Confirm certificate secret name matches Helm values and ingress host. If cert-manager issued on a different namespace or issuer, recreate certificate resource and verify events.
High history service latency
Inspect PostgreSQL latency and connection saturation first. Increase pool limits carefully and benchmark with representative workflow volume before scaling server replicas.
Intermittent visibility query slowness
Run index maintenance on visibility tables and keep retention windows realistic. Visibility stores degrade if old execution records accumulate without archival strategy.
FAQ
Do I need Cassandra or Elasticsearch for production Temporal?
No. Many teams run Temporal successfully on PostgreSQL for both persistence and visibility, especially when operational simplicity matters more than extreme query scale.
How many replicas should I start with?
Start with three server replicas for high availability, then tune by queue depth, workflow latency, and failure-domain requirements.
Can I run Temporal without public ingress?
Yes. Internal-only ingress or service mesh exposure is common. Keep at least one operator path for admin tools and incident triage.
What backup policy is recommended for PostgreSQL?
Use PITR-capable backups plus daily restore tests to a staging environment. A backup is only trusted after repeatable restore validation.
How do I rotate DB credentials safely?
Create a new DB user/password, update Kubernetes secret, roll components one by one, verify health, then retire old credentials after a stability window.
What should I alert on first?
Alert on failed workflow task rate, frontend error spikes, persistence latency, and certificate expiration windows. These catch most production-impacting failures early.
How do I avoid duplicate side effects in activities?
Use idempotency keys in downstream writes and design activity retries as safe replays. Temporal retries are powerful, but external side effects must be deduplicated.
Related guides
- Guide 1: Related production deployment pattern
- Guide 2: Related production deployment pattern
- Guide 3: Related production deployment pattern
Talk to us
If you want this implemented with hardened defaults, observability, and tested recovery playbooks, our team can help.