Introduction: real-world use case
Workflow orchestration becomes mission-critical once your product depends on long-running business processes: order lifecycles, customer onboarding, payment retries, document pipelines, asynchronous notifications, and AI-powered background jobs. At that scale, flaky retries and ad-hoc cron chains are not enough. Teams need deterministic state, clear failure semantics, replay safety, and operational visibility. Temporal is designed for exactly that: durable execution that survives restarts, network hiccups, and partial outages without losing process state.
This guide walks through a production-oriented deployment of Temporal using Kubernetes, Helm, PostgreSQL, and ingress-nginx. Instead of a demo-first setup, we focus on practical architecture choices you can defend in a design review: namespace isolation, secret handling, TLS exposure patterns, worker connectivity, resource sizing, rollout safety, and on-call runbooks. If your goal is to move critical workflows from brittle queues into a resilient orchestrator, this blueprint gives you a path you can run in staging and then promote to production with confidence.
We will also cover the day-2 details that usually cause painful incidents when skipped: persistence checks, schema compatibility, safe upgrades, queue backlogs, and recovery drills. A durable workflow platform is only valuable when it remains predictable under pressure, including traffic spikes and dependency failures. By the end of this guide, you will have a complete baseline for secure deployment and a concrete checklist for operational readiness.
Architecture and flow overview
The reference architecture uses a dedicated Kubernetes namespace for Temporal services, PostgreSQL as persistence, and ingress-nginx for external access where required (for example, Temporal UI behind controlled exposure). Workers run in separate application namespaces and connect through the cluster network using service DNS. This model keeps orchestration infrastructure separated from business workloads while still allowing low-latency communication.
Core Temporal components include Frontend, History, Matching, and Worker service internals managed by Helm. PostgreSQL stores workflow state and visibility data. ingress-nginx terminates incoming TLS at the edge and routes requests to Temporal UI endpoints; internal gRPC traffic between app workers and Temporal Frontend can remain private behind ClusterIP services. That separation reduces attack surface and helps enforce least privilege at both network and identity layers.
Operationally, request flow is straightforward: an application starts a workflow through SDK calls; Temporal persists state transitions in PostgreSQL; task queues dispatch activities to workers; completion and retry state are tracked durably; and operators inspect execution in Temporal UI and logs/metrics. This architecture removes hidden coupling between job retries and service uptime, making incident recovery far more deterministic.
Prerequisites
- Kubernetes v1.27+ with a reliable default StorageClass.
- kubectl and Helm 3 configured with admin-level access.
- ingress-nginx installed and reachable from your DNS.
- PostgreSQL endpoint for Temporal persistence (managed DB or in-cluster with backups).
- A DNS record for Temporal UI (for example,
temporal.example.com). - Secret management mechanism (Vault, External Secrets, SOPS, or equivalent).
- Observability stack (Prometheus/Grafana/log aggregation) for day-2 operations.
Step-by-step deployment
1) Create namespace and baseline labels
Start with explicit tenancy boundaries. Isolating Temporal improves RBAC clarity, quota management, and network policy controls.
kubectl create namespace temporal
kubectl label namespace temporal app=temporal tier=orchestration env=prod
If the copy button does not work in your browser, select the block and copy manually.
2) Create PostgreSQL and Temporal secrets
Never commit credentials to Git. Inject secrets through your standard platform path and keep rotation ownership explicit.
kubectl -n temporal create secret generic temporal-db \
--from-literal=POSTGRES_SEEDS='postgresql.prod.svc.cluster.local' \
--from-literal=POSTGRES_USER='temporal' \
--from-literal=POSTGRES_PWD='REPLACE_WITH_LONG_RANDOM_SECRET'
If the copy button does not work in your browser, select the block and copy manually.
3) Add Helm repository and inspect pinned versions
Pin chart versions to control change risk during maintenance windows.
helm repo add temporal https://go.temporal.io/helm-charts
helm repo update
helm search repo temporal/temporal --versions | head -n 10
If the copy button does not work in your browser, select the block and copy manually.
4) Create production values file
Use environment-specific values and keep sensitive material externalized. The sample below shows structure, not final production secrets.
server:
replicaCount: 3
config:
persistence:
default:
driver: sql
sql:
pluginName: postgres
databaseName: temporal
connectAddr: postgresql.prod.svc.cluster.local:5432
user: temporal
password: REPLACE_WITH_LONG_RANDOM_SECRET
visibility:
driver: sql
sql:
pluginName: postgres
databaseName: temporal_visibility
connectAddr: postgresql.prod.svc.cluster.local:5432
user: temporal
password: REPLACE_WITH_LONG_RANDOM_SECRET
web:
enabled: true
replicaCount: 2
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/ssl-redirect: "true"
hosts:
- host: temporal.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: temporal-ui-tls
hosts:
- temporal.example.com
If the copy button does not work in your browser, select the block and copy manually.
5) Deploy Temporal with Helm
Use upgrade-install for idempotent releases and include a reasonable timeout for first-time schema setup.
helm upgrade --install temporal temporal/temporal \
--namespace temporal \
--create-namespace \
--values values-temporal.yaml \
--wait --timeout 20m
If the copy button does not work in your browser, select the block and copy manually.
6) Validate pods, services, ingress, and certificates
Confirm all dependencies are healthy before onboarding applications. Most production issues come from skipping this gate.
kubectl -n temporal get pods -o wide
kubectl -n temporal get svc
kubectl -n temporal get ingress
kubectl -n temporal get certificate
If the copy button does not work in your browser, select the block and copy manually.
7) Configure worker connectivity and namespace registration
Register workflow namespaces intentionally and avoid running all workloads in the default namespace. Segmentation improves blast-radius control.
kubectl -n temporal exec deploy/temporal-admintools -- \
tctl --ns payments namespace register
kubectl -n temporal exec deploy/temporal-admintools -- \
tctl --ns onboarding namespace register
If the copy button does not work in your browser, select the block and copy manually.
8) Add autoscaling and operational guardrails
Right-size replicas based on queue depth and worker throughput. Build alerts before incidents force reactive tuning.
kubectl -n temporal autoscale deploy temporal-frontend \
--cpu-percent=70 --min=3 --max=12
kubectl -n temporal autoscale deploy temporal-history \
--cpu-percent=70 --min=3 --max=12
If the copy button does not work in your browser, select the block and copy manually.
9) Build backup and restore drills into your runbook
Durability claims are incomplete until restore is tested. Schedule recurring restore rehearsals in non-production.
# Example: logical backup from PostgreSQL
pg_dump -h postgresql.prod.svc.cluster.local -U temporal -d temporal > temporal.sql
pg_dump -h postgresql.prod.svc.cluster.local -U temporal -d temporal_visibility > temporal_visibility.sql
If the copy button does not work in your browser, select the block and copy manually.
Configuration and secrets handling best practices
Use separate credentials for Temporal persistence, operator actions, and application workers. Avoid broad shared secrets that let a single compromise affect the full control plane. If your organization supports dynamic secret issuance, prefer short-lived credentials and automatic rotation over static values. Rotation should be scripted and tested, not a one-off human task.
Define environment-specific values files (dev, staging, prod) and enforce policy checks before deployment. Keep all secrets outside the values files themselves. For Kubernetes, combine RBAC restrictions, secret encryption at rest, and network policies so only approved workloads can reach Temporal Frontend and database endpoints.
For high-trust environments, add mTLS or service-mesh identity between workers and Frontend, and keep UI exposure restricted behind identity-aware access controls. A robust workflow platform should make unauthorized access difficult by default, not by convention.
Verification checklist
- All Temporal core pods are Ready with stable restart counts.
- Temporal UI responds over HTTPS with a valid certificate chain.
- A test workflow runs to completion and appears in execution history.
- Task queues drain as expected under synthetic load.
- PostgreSQL latency and connection pools remain within thresholds.
- Alerts trigger for pod failures, certificate expiry, and queue backlog growth.
kubectl -n temporal exec deploy/temporal-admintools -- tctl workflow list
kubectl -n temporal logs deploy/temporal-frontend --tail=200
curl -I https://temporal.example.com
If the copy button does not work in your browser, select the block and copy manually.
Common issues and fixes
Temporal UI returns 502/503 through ingress
This usually indicates service/port mismatch, failed readiness probes, or ingress class misconfiguration. Check endpoints and ingress events first before editing Helm values.
kubectl -n temporal describe ingress
kubectl -n temporal get svc
kubectl -n temporal get endpoints
If the copy button does not work in your browser, select the block and copy manually.
Workflow tasks are piling up in a queue
Queue growth often means insufficient worker concurrency, downstream dependency latency, or activity-level retry storms. Increase worker replicas only after identifying bottlenecks.
kubectl -n apps get deploy -l app=my-temporal-worker
kubectl -n apps top pods
kubectl -n temporal exec deploy/temporal-admintools -- tctl taskqueue describe --taskqueue payments
If the copy button does not work in your browser, select the block and copy manually.
Database connection saturation under peak load
If PostgreSQL max connections are exhausted, tune pooling and reduce unnecessary SDK client churn. Also review visibility query pressure from dashboards.
kubectl -n temporal logs deploy/temporal-history --tail=200
psql -h postgresql.prod.svc.cluster.local -U temporal -c "select count(*) from pg_stat_activity;"
If the copy button does not work in your browser, select the block and copy manually.
FAQ
Do I need separate databases for default and visibility persistence?
It is strongly recommended for production. Separation improves performance tuning and simplifies maintenance for read-heavy visibility workloads.
Can I expose Temporal UI publicly?
You can, but production best practice is to protect it behind SSO, IP controls, or private network access to reduce attack surface.
How do I safely roll out worker code changes?
Use versioning practices supported by your Temporal SDK and roll workers gradually. Never force all worker versions to switch at once without compatibility checks.
What metrics matter most during early production adoption?
Track task queue backlog, workflow latency percentiles, worker error rates, and persistence latency. These four metrics catch most reliability regressions early.
How often should we run restore drills?
At least quarterly and after major upgrades. Recovery confidence comes from repeated practice, not assumptions.
Can Temporal replace all cron jobs immediately?
No. Migrate high-impact or failure-prone workflows first, then phase out legacy schedulers as your team gains operational maturity.
Is Kubernetes mandatory for Temporal in production?
No, but Kubernetes gives better scaling and operational consistency for most teams. If you run VMs, enforce the same discipline around backups, upgrades, and observability.
Related guides on SysBrix
- Production Guide: Deploy PostHog with Docker Compose + Traefik + PostgreSQL + ClickHouse on Ubuntu
- Production Guide: Deploy Prometheus with Docker Compose + Caddy + Alertmanager on Ubuntu
- Production Guide: Deploy GlitchTip with Docker Compose + Caddy + PostgreSQL on Ubuntu
Talk to us
If you want this implemented with hardened defaults, observability, and tested recovery playbooks, our team can help.