Introduction: real-world use case
Teams in e-commerce, real estate, healthcare operations, and field services often accumulate millions of media files over time. Consumer sync services are simple at first, but they become expensive and hard to govern when compliance, retention, and identity controls matter. Immich is compelling because it combines a modern photo UX with self-hosted ownership. You can keep images close to your applications, define your own backup and recovery policy, and avoid sudden platform lock-in.
This production guide focuses on practical operations, not only installation. You will deploy Immich with Docker Compose, place Caddy at the edge for HTTPS, use PostgreSQL for metadata durability, and define operational controls that survive real incidents. We will cover backup design, restore drills, monitoring guardrails, and common fault domains so your stack remains reliable after launch.
The objective is not a demo; it is a stable baseline that a small team can run with confidence. You can start single-node, keep costs reasonable, and still achieve disciplined operations. As usage grows, this baseline also gives you a clear migration path toward stronger isolation and higher availability.
Architecture/flow overview
Client traffic from web and mobile applications reaches Caddy on port 443. Caddy terminates TLS, applies secure headers, and forwards requests to Immich on the internal Docker network. Immich writes metadata to PostgreSQL and stores media files in a mounted uploads volume. The machine-learning service processes indexing and recognition jobs asynchronously so interactive user requests stay responsive.
- Ingress: DNS + Caddy for certificates and HTTP routing
- Core app: immich-server for API and UI
- Background workers: immich-machine-learning for embeddings/inference
- Data plane: PostgreSQL + persistent filesystem media
- Operations: backup scripts, health checks, and audit-friendly runbooks
This split makes troubleshooting easier: if uploads fail, you inspect proxy/body-size and storage permissions; if search quality degrades, you inspect ML worker queue and CPU contention; if login succeeds but media is missing, you inspect volume mounts and database consistency. Clear boundaries reduce mean-time-to-recovery when issues happen at 2 a.m.
Prerequisites
- Ubuntu 22.04 or 24.04 host (recommended 4 vCPU, 8 GB RAM, SSD storage)
- Domain/subdomain (for example
photos.example.com) with A/AAAA record - Open ports 80/443; SSH hardened with key-based auth and restricted source ranges
- Docker Engine + Compose plugin
- Offsite backup destination and encryption key management process
- Basic operational ownership model (who patches, who restores, who approves changes)
Step-by-step deployment
1) Prepare host baseline and directories
Create deterministic paths so operations and backups remain predictable.
sudo mkdir -p /opt/immich/{app,postgres,uploads,backups,scripts}
sudo chown -R $USER:$USER /opt/immich
cd /opt/immich
umask 077
touch .env
chmod 600 .envIf copy is unavailable in your editor/browser, manually select the block and copy.
2) Install Docker runtime and verify daemon health
Use reproducible installation steps and verify both engine and compose plugin versions.
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker
systemctl is-active docker
docker --version
docker compose versionIf copy is unavailable in your editor/browser, manually select the block and copy.
3) Create secure environment configuration
Generate long secrets and keep environment files out of source control and shared chat logs.
DB_PASSWORD="$(openssl rand -base64 36 | tr -d '
')"
cat > /opt/immich/.env <<EOF
TZ=UTC
IMMICH_VERSION=release
IMMICH_HOST=photos.example.com
DB_HOST=postgres
DB_PORT=5432
DB_USERNAME=immich
DB_PASSWORD=${DB_PASSWORD}
DB_DATABASE_NAME=immich
UPLOAD_LOCATION=/usr/src/app/upload
EOF
chmod 600 /opt/immich/.envIf copy is unavailable in your editor/browser, manually select the block and copy.
4) Define compose services for app, ML, and PostgreSQL
Pin image families, include health checks, and persist all stateful directories.
cat > /opt/immich/docker-compose.yml <<'YAML'
services:
postgres:
image: postgres:16
container_name: immich-postgres
restart: unless-stopped
env_file: .env
environment:
POSTGRES_USER: ${DB_USERNAME}
POSTGRES_PASSWORD: ${DB_PASSWORD}
POSTGRES_DB: ${DB_DATABASE_NAME}
volumes:
- ./postgres:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${DB_USERNAME} -d ${DB_DATABASE_NAME}"]
interval: 10s
timeout: 5s
retries: 8
immich-server:
image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION}
container_name: immich-server
restart: unless-stopped
env_file: .env
depends_on:
postgres:
condition: service_healthy
ports:
- "2283:2283"
volumes:
- ./uploads:/usr/src/app/upload
immich-machine-learning:
image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION}
container_name: immich-ml
restart: unless-stopped
env_file: .env
volumes:
- ./uploads:/usr/src/app/upload
YAMLIf copy is unavailable in your editor/browser, manually select the block and copy.
5) Configure Caddy for HTTPS and secure headers
Caddy automates certificates and keeps edge configuration concise and auditable.
sudo apt-get update
sudo apt-get install -y debian-keyring debian-archive-keyring apt-transport-https
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt-get update && sudo apt-get install -y caddy
sudo tee /etc/caddy/Caddyfile >/dev/null <<'EOF'
photos.example.com {
encode zstd gzip
reverse_proxy 127.0.0.1:2283
header {
Strict-Transport-Security "max-age=31536000; includeSubDomains; preload"
X-Content-Type-Options "nosniff"
X-Frame-Options "SAMEORIGIN"
Referrer-Policy "strict-origin-when-cross-origin"
Permissions-Policy "geolocation=(), microphone=(), camera=()"
}
}
EOF
sudo caddy validate --config /etc/caddy/Caddyfile
sudo systemctl enable --now caddyIf copy is unavailable in your editor/browser, manually select the block and copy.
6) Launch services and complete initial setup
Pull images, start containers, and verify first response path before user onboarding.
cd /opt/immich
docker compose pull
docker compose up -d
docker compose ps
curl -I https://photos.example.comIf copy is unavailable in your editor/browser, manually select the block and copy.
Configuration/secrets handling
Keep your operational model explicit. Store runtime secrets in a dedicated secret vault and synchronize only the values required by deployment automation. Never paste long-lived credentials in ticket comments or chat threads. Limit who can edit compose manifests, and require peer review for changes touching proxy config, auth settings, and backup destinations.
For regulated environments, map each secret to an owner, rotation interval, and verification method. For example, database passwords rotate quarterly with coordinated maintenance and smoke tests; offsite backup keys rotate after personnel changes or security events. Document this in a lightweight runbook so incident response does not rely on memory.
Finally, protect your media path. Use host filesystem permissions, avoid world-readable mounts, and keep backups encrypted in transit and at rest. For any remote sync process, use least-privileged credentials and immutable retention where possible.
Verification
Verification should include service health, TLS trust, storage write tests, search/index readiness, and restart resilience.
# Runtime health
cd /opt/immich
docker compose ps
docker compose logs --tail=120 immich-server
# TLS + response headers
curl -I https://photos.example.com
# Storage pressure
df -h /opt/immich/uploads
df -i /opt/immich/uploads
# Basic DB visibility
docker exec -it immich-postgres psql -U immich -d immich -c "select now();"If copy is unavailable in your editor/browser, manually select the block and copy.
- Upload 20 mixed files (JPEG/HEIC/PNG/MP4), then confirm thumbnails and timeline continuity.
- Search for known tags, locations, and recognized faces to confirm ML pipeline processing.
- Restart all containers and verify no metadata corruption or stuck background jobs.
- Record baseline metrics: p95 API latency, CPU utilization, and upload success rate.
Operational hardening and day-2 runbook
Production success depends on day-2 discipline. Define maintenance windows, monitor runtime saturation, and track capacity trends before users feel pain. Keep a written monthly checklist: patch host packages, update Caddy and container images in staging first, run vulnerability scans, and test partial restore from backups. If you can restore only metadata but not media (or vice versa), your backup policy is incomplete.
Create alert thresholds that match real risk: repeated container restarts, sudden drop in successful uploads, storage utilization above 80%, failed certificate renewal attempts, and background queue growth that does not normalize. Tie each alert to a named runbook step and owner. Operational maturity is less about fancy tooling and more about predictable response paths.
For growth, plan migration milestones early. As user count increases, separate PostgreSQL to dedicated storage, introduce object storage tiers for cold media, and add read-heavy caching where it improves experience. Each migration should include rollback checkpoints and clear success criteria. Document all changes in change-control notes to preserve auditability.
Common issues/fixes
Uploads fail with 413 Request Entity Too Large
Increase upstream body limits and verify no CDN/WAF upload cap is still active.
Intermittent 502 from reverse proxy
Inspect container restart loops, check port bindings, and ensure Caddy upstream points to the correct local endpoint.
Face recognition queue stalls
Allocate more CPU/RAM to ML service and verify model cache path is writable and persistent.
PostgreSQL corruption concerns after abrupt reboot
Run file-system checks, confirm clean shutdown policy, and restore from last verified backup if integrity checks fail.
Disk growth exceeds forecast
Enable retention policies for duplicates, monitor video-heavy uploads, and tier old data to lower-cost storage.
FAQ
1) Can Immich replace Google Photos for a team?
For many teams, yes. It offers strong core functionality and data ownership. Validate your exact collaboration and compliance requirements before migration.
2) Is Docker Compose acceptable in production?
Yes for small-to-medium deployments when paired with monitoring, backups, and disciplined change control. It is simple and maintainable for lean teams.
3) When should we move PostgreSQL off-box?
Move when database load competes with media processing, when uptime objectives tighten, or when compliance requires stronger fault isolation.
4) How often should backups run?
Run metadata backups at least daily, media snapshots hourly or daily based on upload volume, and offsite replication with immutable retention.
5) What is the best way to test restores?
Create a scheduled restore drill in a staging environment. Validate random album integrity and search availability, then measure elapsed recovery time.
6) How do we roll out upgrades safely?
Use staging, pin versions, snapshot data before upgrade, run smoke tests, and keep a documented rollback command sequence.
7) Do we need external monitoring if Immich has internal health indicators?
Yes. External uptime, certificate checks, host metrics, and alert routing provide independent visibility that catches failures internal checks can miss.
8) How should we control admin access?
Keep admin accounts minimal, enforce MFA where possible, and track all permission changes through ticketed approvals and periodic audits.
Internal links
- Deploy MinIO on Kubernetes with Helm
- Deploy Uptime Kuma with Docker Compose and Caddy
- Deploy Gitea with Docker Compose and Caddy
Talk to us
If you want support implementing Immich in production, we can help with architecture reviews, security hardening, rollout planning, migration strategy, and operations runbook design.