Production Guide: Deploy Grafana Loki on Kubernetes with Helm, Ingress NGINX, and Cert-Manager

Centralized logs are easy to prototype and surprisingly hard to operate in production. Teams often start with node-level files, add ad hoc shipping, and then discover too late that query latency, retention cost, and noisy high-cardinality labels can make incident response slower instead of faster.

This guide shows a production-first Loki deployment on Kubernetes using Helm, Ingress NGINX, and cert-manager. The target environment is a multi-node Ubuntu cluster with object storage for chunks and indexes, persistent volumes for stateful components, and strict handling for credentials. The same pattern works in managed Kubernetes as long as your ingress class and storage classes are adjusted.

The objective is not just to get Loki running. The objective is to get predictable operations: controlled retention, sane resource isolation, safe upgrades, and fast verification steps that help you trust the platform during a real outage.

Architecture and flow overview

In this architecture, logs are scraped by Promtail from Kubernetes pods and sent to Loki through internal services. Loki stores indexes and chunks in S3-compatible object storage to separate compute from durable log data. Ingress NGINX exposes the query endpoint over HTTPS, with cert-manager issuing and rotating certificates automatically.

The deployment keeps read and write paths explicit. Write components ingest and flush chunks; read components serve queries and cache hot results. This separation helps performance tuning because ingestion bursts and query spikes can be scaled independently. It also reduces noisy-neighbor effects when dashboard users run broad range queries during incidents.

Operationally, you should treat label design as a first-class architecture decision. Over-labeling creates cardinality explosions and can multiply storage costs. Under-labeling makes correlation harder. A practical baseline is namespace, app, pod, and environment labels, with high-cardinality values normalized or dropped before ingestion.

Prerequisites

Kubernetes cluster (v1.27+) with at least 3 worker nodes and kubectl admin access.
Helm 3.13+ installed locally.
Ingress NGINX controller installed and set as active ingress class.
cert-manager installed with a ClusterIssuer (for example Let's Encrypt).
S3-compatible object storage bucket and IAM/access credentials dedicated to Loki.
DNS record for loki.example.com pointed to your ingress load balancer.
A dedicated Kubernetes namespace (recommended: observability).

Step-by-step deployment

Step 1: Prepare namespace and baseline policies

Create a dedicated namespace and apply basic quota/limit policies before deploying charts. This prevents surprise resource contention when ingestion spikes.

kubectl create namespace observability
kubectl label namespace observability app.kubernetes.io/part-of=observability

cat <<'EOF' | kubectl apply -n observability -f -
apiVersion: v1
kind: ResourceQuota
metadata:
  name: observability-quota
spec:
  hard:
    requests.cpu: "8"
    requests.memory: 16Gi
    limits.cpu: "16"
    limits.memory: 32Gi
EOF