Skip to Content

Production Guide: Deploy Temporal on Kubernetes with Helm, cert-manager, and External PostgreSQL

A production-oriented Temporal deployment with secure persistence, TLS ingress, verification steps, and practical troubleshooting.

Temporal is an orchestration engine for durable background workflows: payments that must retry safely, onboarding pipelines with long waits, and incident automations that cannot lose state when a worker restarts. In production, teams usually hit the same pain points: inconsistent retries, duplicate side effects, and manual recovery after partial failures. This guide shows a production-first deployment of Temporal on Kubernetes using Helm, cert-manager TLS, and an external PostgreSQL backend so state survives cluster maintenance and upgrades. The goal is not just to make Temporal start, but to run it with predictable operations, safer secret handling, and verification steps you can hand to an on-call engineer.

Architecture and flow overview

The stack in this guide uses a dedicated Kubernetes namespace, a Helm-managed Temporal release, ingress with TLS certificates from cert-manager, and PostgreSQL hosted outside the cluster. Temporal server components (frontend, history, matching, and worker) run as deployments; visibility and persistence both use PostgreSQL in this setup for operational simplicity. Application workers connect to Temporal through the public HTTPS endpoint while operators use tctl or SDK health checks to verify namespace registration and workflow task processing.

  • Kubernetes + Helm for repeatable rollout and upgrades.
  • External PostgreSQL for durable persistence independent of node lifecycle.
  • cert-manager + Ingress for managed TLS certificates.
  • Kubernetes Secrets for credentials and rotation workflows.

Prerequisites

  • Ubuntu admin workstation with kubectl and Helm 3 configured to your production cluster.
  • DNS record for temporal.example.com pointed to your ingress controller.
  • Running cert-manager and a ClusterIssuer (Let's Encrypt or internal PKI).
  • External PostgreSQL 14+ with network access from cluster nodes.
  • A dedicated database and least-privilege DB user for Temporal.
  • Change-management window for first deployment and rollback testing.

Step-by-step deployment

1) Create namespace and baseline labels

Keep Temporal isolated from unrelated workloads so resource policies and alerts are easier to reason about.

kubectl create namespace temporal
kubectl label namespace temporal app.kubernetes.io/part-of=temporal
kubectl get ns temporal --show-labels

If the copy button does not work in your browser/editor, manually select the code block and copy it.

2) Create PostgreSQL databases and user

Temporal needs persistence and visibility stores. Using separate databases simplifies backup validation and query tuning.

CREATE USER temporal_app WITH ENCRYPTED PASSWORD 'REPLACE_WITH_STRONG_SECRET';
CREATE DATABASE temporal OWNER temporal_app;
CREATE DATABASE temporal_visibility OWNER temporal_app;
GRANT ALL PRIVILEGES ON DATABASE temporal TO temporal_app;
GRANT ALL PRIVILEGES ON DATABASE temporal_visibility TO temporal_app;

If the copy button does not work in your browser/editor, manually select the code block and copy it.

3) Add Helm repos and pull chart defaults

helm repo add temporal https://go.temporal.io/helm-charts
helm repo update
helm show values temporal/temporal > temporal-values.base.yaml

If the copy button does not work in your browser/editor, manually select the code block and copy it.

4) Build production values and DB secret

Store only credentials in Kubernetes secrets; keep non-secret settings in versioned values files for peer review.

kubectl -n temporal create secret generic temporal-db-secret   --from-literal=password='REPLACE_WITH_STRONG_SECRET'

cat > temporal-values.prod.yaml <<'YAML'
server:
  replicaCount: 3
  config:
    persistence:
      default:
        driver: sql
        sql:
          pluginName: postgres
          host: postgres-prod.internal:5432
          database: temporal
          user: temporal_app
          existingSecret: temporal-db-secret
          maxConns: 40
          maxConnLifetime: 1h
      visibility:
        driver: sql
        sql:
          pluginName: postgres
          host: postgres-prod.internal:5432
          database: temporal_visibility
          user: temporal_app
          existingSecret: temporal-db-secret
          maxConns: 20
          maxConnLifetime: 1h
web:
  enabled: true
  replicaCount: 2
ingress:
  enabled: true
  className: nginx
  host: temporal.example.com
  tls:
    - secretName: temporal-tls
      hosts:
        - temporal.example.com
schema:
  setup:
    enabled: true
  update:
    enabled: true
elasticsearch:
  enabled: false
prometheus:
  enabled: true
YAML

If the copy button does not work in your browser/editor, manually select the code block and copy it.

5) Configure TLS certificate and ingress policy

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: temporal-cert
  namespace: temporal
spec:
  secretName: temporal-tls
  issuerRef:
    kind: ClusterIssuer
    name: letsencrypt-prod
  dnsNames:
    - temporal.example.com

If the copy button does not work in your browser/editor, manually select the code block and copy it.

kubectl apply -f temporal-certificate.yaml
kubectl -n temporal get certificate,secret temporal-cert temporal-tls

If the copy button does not work in your browser/editor, manually select the code block and copy it.

6) Install Temporal with Helm and wait for readiness

helm upgrade --install temporal temporal/temporal   --namespace temporal   -f temporal-values.prod.yaml   --atomic --timeout 15m

kubectl -n temporal get pods -o wide

If the copy button does not work in your browser/editor, manually select the code block and copy it.

7) Register namespace and run a smoke workflow

kubectl -n temporal exec deploy/temporal-admintools --   tctl --ns production namespace register --rd 3

kubectl -n temporal exec deploy/temporal-admintools --   tctl --ns production namespace describe

If the copy button does not work in your browser/editor, manually select the code block and copy it.

Configuration and secrets handling best practices

Use a dedicated secret object for database credentials and rotate it on a fixed schedule. Pair rotation with a staged rollout: update secret, restart one Temporal component, verify queue health, then continue. Avoid embedding credentials in Helm command history or CI logs; pass them through your secret manager and inject at deploy time. For network security, limit PostgreSQL ingress to cluster node CIDRs and require TLS where possible. On RBAC, keep write access to the temporal namespace restricted to CI service accounts and cluster operators. Finally, export audit logs for helm upgrades and namespace changes so incident reviews can map config drift to a timeline quickly.

For application teams, define a namespace lifecycle policy: create namespaces through code review, set retention explicitly, and document worker retry semantics before production cutover. This reduces accidental infinite retries and runaway task queues.

Verification checklist

  1. All Temporal server pods are Ready with no CrashLoopBackOff.
  2. Ingress endpoint serves valid TLS cert chain for your domain.
  3. Temporal namespace registration succeeds and can be described via tctl.
  4. At least one sample workflow completes successfully end-to-end.
  5. PostgreSQL connections remain below configured maxConns under load test.
  6. Prometheus metrics are scraped and dashboard panels render latency/error signals.
kubectl -n temporal get pods
kubectl -n temporal logs deploy/temporal-frontend --tail=80
kubectl -n temporal exec deploy/temporal-admintools -- tctl cluster health | cat

If the copy button does not work in your browser/editor, manually select the code block and copy it.

curl -I https://temporal.example.com
kubectl -n temporal get ingress
kubectl -n temporal get certificate temporal-cert -o yaml | sed -n '1,40p'

If the copy button does not work in your browser/editor, manually select the code block and copy it.

Common issues and fixes

Workers time out while server appears healthy

Check task queue names, namespace mismatch, and worker identity configuration. Most startup failures come from environment mismatches between worker deployment and Temporal namespace settings.

Schema update job fails during upgrade

Validate database permissions for schema migrations and ensure no stale locks remain from previous failed releases. Re-run Helm with --atomic so failed upgrades rollback cleanly.

TLS handshake failures at ingress

Confirm certificate secret name matches Helm values and ingress host. If cert-manager issued on a different namespace or issuer, recreate certificate resource and verify events.

High history service latency

Inspect PostgreSQL latency and connection saturation first. Increase pool limits carefully and benchmark with representative workflow volume before scaling server replicas.

Intermittent visibility query slowness

Run index maintenance on visibility tables and keep retention windows realistic. Visibility stores degrade if old execution records accumulate without archival strategy.

FAQ

Do I need Cassandra or Elasticsearch for production Temporal?

No. Many teams run Temporal successfully on PostgreSQL for both persistence and visibility, especially when operational simplicity matters more than extreme query scale.

How many replicas should I start with?

Start with three server replicas for high availability, then tune by queue depth, workflow latency, and failure-domain requirements.

Can I run Temporal without public ingress?

Yes. Internal-only ingress or service mesh exposure is common. Keep at least one operator path for admin tools and incident triage.

What backup policy is recommended for PostgreSQL?

Use PITR-capable backups plus daily restore tests to a staging environment. A backup is only trusted after repeatable restore validation.

How do I rotate DB credentials safely?

Create a new DB user/password, update Kubernetes secret, roll components one by one, verify health, then retire old credentials after a stability window.

What should I alert on first?

Alert on failed workflow task rate, frontend error spikes, persistence latency, and certificate expiration windows. These catch most production-impacting failures early.

How do I avoid duplicate side effects in activities?

Use idempotency keys in downstream writes and design activity retries as safe replays. Temporal retries are powerful, but external side effects must be deduplicated.

Related guides

Talk to us

If you want this implemented with hardened defaults, observability, and tested recovery playbooks, our team can help.

Contact Us

Production Guide: Deploy FreshRSS with Docker Compose + Nginx + PostgreSQL on Ubuntu
A production-oriented FreshRSS deployment with PostgreSQL persistence, reverse proxy hardening, backups, and operational runbooks.