Apache Airflow is a strong fit when a team has outgrown ad hoc cron jobs but is not ready to hand orchestration to a black-box SaaS platform. A data team may need nightly warehouse loads, application database exports, model refresh jobs, partner file drops, and alerting that tells operators exactly which dependency failed. This guide shows a practical single-server production pattern for Airflow on Ubuntu using Docker Compose, Caddy for HTTPS, PostgreSQL for metadata, Redis for the Celery broker, and dedicated worker containers for tasks.
The deployment is intentionally conservative: every secret lives in an environment file with restricted permissions, the webserver is only exposed through Caddy, persistent state is mounted on the host, and the scheduler, triggerer, webserver, and worker roles are split so each component can be restarted or scaled independently. It is not a multi-region Airflow platform, but it is a reliable baseline for internal automation, analytics pipelines, and operational workflows where ownership, backup, and predictable upgrades matter.
Architecture and flow overview
Requests enter through Caddy on ports 80 and 443. Caddy terminates TLS and proxies the Airflow UI to the internal webserver container on port 8080. Airflow stores DAG state, task metadata, user accounts, and run history in PostgreSQL. Redis acts as the Celery broker so the scheduler can queue work for one or more workers. The scheduler parses DAG files from a shared dags directory, creates task instances, and sends executable work to workers. The triggerer handles deferrable operators without tying up worker slots.
For production use, keep DAG code in Git and deploy it into /opt/airflow/dags with a controlled process. Avoid editing DAGs directly in the container. Logs are written to a persistent host volume so task output survives container restarts. If your jobs need access to private APIs, cloud credentials, or databases, mount only the minimum configuration required and prefer Airflow Connections or environment-backed secrets rather than hard-coded values in Python DAG files.
Prerequisites
- An Ubuntu 22.04 or 24.04 server with at least 4 vCPU, 8 GB RAM, and 40 GB free disk for a small team.
- A DNS record such as
airflow.example.compointing to the server. - Docker Engine and the Docker Compose plugin installed.
- Caddy installed on the host, or permission to install it.
- Firewall access for ports 80 and 443 only; do not expose Airflow or PostgreSQL directly.
- A password manager for generated credentials and recovery keys.
Step-by-step deployment
Start by creating a dedicated directory layout. Keeping the deployment under /opt/airflow makes backup rules and operator handoff simple.
sudo mkdir -p /opt/airflow/{dags,logs,plugins,config,postgres}
sudo chown -R $USER:$USER /opt/airflow
cd /opt/airflow
umask 077
openssl rand -hex 32 > config/fernet_key.txt
openssl rand -hex 32 > config/webserver_secret_key.txt
Manual copy fallback: select the command block above and copy it if the button is not available.
Create the environment file. Replace domains, passwords, and email settings before starting the stack. The Fernet key must never change after connections have been encrypted unless you intentionally rotate it.
cat > .env <<'EOF'
AIRFLOW_IMAGE=apache/airflow:2.10.4
AIRFLOW_UID=50000
AIRFLOW_DOMAIN=airflow.example.com
POSTGRES_USER=airflow
POSTGRES_PASSWORD=replace-with-long-random-postgres-password
POSTGRES_DB=airflow
AIRFLOW_ADMIN_USER=admin
AIRFLOW_ADMIN_PASSWORD=replace-with-long-random-admin-password
[email protected]
AIRFLOW_FERNET_KEY=replace-with-output-of-config-fernet-key
AIRFLOW_WEBSERVER_SECRET_KEY=replace-with-output-of-config-webserver-secret-key
AIRFLOW__SMTP__SMTP_HOST=smtp.example.com
[email protected]
AIRFLOW__SMTP__SMTP_PASSWORD=replace-with-smtp-password
[email protected]
EOF
sed -i "s|replace-with-output-of-config-fernet-key|$(cat config/fernet_key.txt)|" .env
sed -i "s|replace-with-output-of-config-webserver-secret-key|$(cat config/webserver_secret_key.txt)|" .env
chmod 600 .env config/*.txt
Manual copy fallback: select the command block above and copy it if the button is not available.
Now create the Compose file. This stack uses the CeleryExecutor because it behaves well as DAG volume and worker count grow. You can add more workers later without changing the public endpoint.
cat > docker-compose.yml <<'EOF'
services:
postgres:
image: postgres:16-alpine
environment:
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_DB: ${POSTGRES_DB}
volumes:
- ./postgres:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
interval: 10s
timeout: 5s
retries: 5
restart: unless-stopped
redis:
image: redis:7-alpine
command: redis-server --appendonly yes
restart: unless-stopped
airflow-init:
image: ${AIRFLOW_IMAGE}
env_file: .env
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_started
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
command: bash -c "airflow db migrate && airflow users create --role Admin --username ${AIRFLOW_ADMIN_USER} --password ${AIRFLOW_ADMIN_PASSWORD} --firstname Ops --lastname Admin --email ${AIRFLOW_ADMIN_EMAIL} || true"
webserver:
image: ${AIRFLOW_IMAGE}
env_file: .env
depends_on:
airflow-init:
condition: service_completed_successfully
ports:
- "127.0.0.1:8080:8080"
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
command: webserver
restart: unless-stopped
scheduler:
image: ${AIRFLOW_IMAGE}
env_file: .env
depends_on:
airflow-init:
condition: service_completed_successfully
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
command: scheduler
restart: unless-stopped
worker:
image: ${AIRFLOW_IMAGE}
env_file: .env
depends_on:
airflow-init:
condition: service_completed_successfully
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
command: celery worker
restart: unless-stopped
triggerer:
image: ${AIRFLOW_IMAGE}
env_file: .env
depends_on:
airflow-init:
condition: service_completed_successfully
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
command: triggerer
restart: unless-stopped
EOF
Manual copy fallback: select the command block above and copy it if the button is not available.
Add the Airflow configuration values that are easier to audit as environment settings. They keep the executor, database connection, broker, and public base URL consistent across all containers.
cat >> .env <<'EOF'
AIRFLOW__CORE__EXECUTOR=CeleryExecutor
AIRFLOW__CORE__LOAD_EXAMPLES=False
AIRFLOW__CORE__FERNET_KEY=${AIRFLOW_FERNET_KEY}
AIRFLOW__WEBSERVER__SECRET_KEY=${AIRFLOW_WEBSERVER_SECRET_KEY}
AIRFLOW__WEBSERVER__BASE_URL=https://${AIRFLOW_DOMAIN}
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN=postgresql+psycopg2://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres/${POSTGRES_DB}
AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres/${POSTGRES_DB}
AIRFLOW__CELERY__BROKER_URL=redis://redis:6379/0
AIRFLOW__LOGGING__REMOTE_LOGGING=False
AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL=60
AIRFLOW__SCHEDULER__CATCHUP_BY_DEFAULT=False
EOF
Manual copy fallback: select the command block above and copy it if the button is not available.
Configure Caddy on the host. Because Compose publishes Airflow only on 127.0.0.1:8080, the web UI is reachable from the internet only through Caddy and its TLS policy.
sudo tee /etc/caddy/Caddyfile >/dev/null <<'EOF'
airflow.example.com {
encode zstd gzip
reverse_proxy 127.0.0.1:8080
header {
X-Content-Type-Options nosniff
Referrer-Policy strict-origin-when-cross-origin
X-Frame-Options SAMEORIGIN
}
}
EOF
sudo caddy validate --config /etc/caddy/Caddyfile
sudo systemctl reload caddy
Manual copy fallback: select the command block above and copy it if the button is not available.
Start the stack and watch initialization. The first run migrates the database and creates the admin account. If it fails, fix the environment file and rerun; do not delete the Fernet key or the database volume casually.
docker compose pull
docker compose up -d postgres redis
docker compose run --rm airflow-init
docker compose up -d webserver scheduler worker triggerer
docker compose ps
docker compose logs --tail=100 scheduler
Manual copy fallback: select the command block above and copy it if the button is not available.
Configuration and secrets handling best practices
Protect .env, the Fernet key, and database backups as production secrets. Airflow Connections can store encrypted passwords only when the Fernet key is stable and private. Use named service accounts for external systems, scope them to the smallest required permissions, and document which DAGs depend on each account. For cloud credentials, mount a read-only file into workers and reference it through an Airflow Connection extra field or an environment variable.
Keep custom Python dependencies out of random shell sessions. Build a small derived Airflow image when DAGs need provider packages or internal libraries, then pin versions in a repository. That makes worker restarts reproducible and prevents a scheduler from parsing DAGs with a different dependency set than the worker that executes them.
Verification checklist
- Open
https://airflow.example.comand confirm the login page uses a valid certificate. - Run
docker compose psand verify webserver, scheduler, worker, triggerer, PostgreSQL, and Redis are healthy or running. - Create a small test DAG, unpause it, and confirm the task lands on the Celery worker.
- Check
/opt/airflow/logsafter the task finishes and confirm logs are persistent. - Trigger a failed task intentionally and confirm email or external alerting reaches the operations channel.
cat > dags/sysbrix_smoke_test.py <<'EOF'
from datetime import datetime
from airflow.decorators import dag, task
@dag(schedule=None, start_date=datetime(2024, 1, 1), catchup=False, tags=["smoke"])
def sysbrix_smoke_test():
@task
def hello():
print("Airflow worker executed the smoke test successfully")
hello()
sysbrix_smoke_test()
EOF
docker compose logs --tail=50 scheduler
Manual copy fallback: select the command block above and copy it if the button is not available.
Backups and recovery routine
Back up PostgreSQL, DAGs, plugins, logs required for audit, and the secret files. The database contains operational history and encrypted connection metadata; the DAG directory contains the source of future runs; the Fernet key is required to decrypt existing connection passwords. A restore test should prove that a new server can start Airflow, read the restored metadata database, and decrypt Connections.
mkdir -p /opt/airflow/backups
BACKUP=/opt/airflow/backups/airflow-$(date +%F-%H%M).sql.gz
docker compose exec -T postgres pg_dump -U "$POSTGRES_USER" "$POSTGRES_DB" | gzip > "$BACKUP"
tar -czf /opt/airflow/backups/airflow-files-$(date +%F-%H%M).tar.gz dags plugins config .env
ls -lh /opt/airflow/backups
Manual copy fallback: select the command block above and copy it if the button is not available.
Common issues and fixes
The web UI loads but DAGs never run. Check scheduler logs first. Syntax errors, missing provider packages, or an unset AIRFLOW__CORE__EXECUTOR value can stop task creation. Confirm the worker is also running and connected to Redis.
Tasks stay queued. The worker may not be registered, Redis may be unavailable, or worker concurrency may be exhausted. Run docker compose logs worker redis and scale workers with docker compose up -d --scale worker=2 if the host has capacity.
Connections cannot be decrypted after restore. The Fernet key changed. Restore the original config/fernet_key.txt and environment value before starting Airflow against the restored database.
Caddy returns 502. Confirm Compose publishes 127.0.0.1:8080:8080 and that the webserver container is healthy. Caddy cannot proxy to a container-only expose port unless Caddy is inside the same Docker network.
FAQ
Can this run production workloads for a small company?
Yes, if the workload fits a single server and you have backups, monitoring, and a tested restore path. For heavy parallelism or strict high availability, move workers and the metadata database to managed or clustered infrastructure.
Should DAGs be edited through the Airflow UI?
No. Treat DAGs as code. Store them in Git, review changes, and deploy them into the dags directory with a repeatable process so rollback is possible.
How do I add Python packages for DAGs?
Build a derived image from the pinned Airflow image and install packages there. Avoid installing packages interactively in running containers because the change disappears on rebuild and may not be present on every worker.
Can I use NGINX or Traefik instead of Caddy?
Yes. The important pattern is the same: expose Airflow only on localhost or an internal network, terminate TLS at the proxy, and keep PostgreSQL and Redis private.
How often should backups run?
Run database backups at least daily for low-change environments and more often if Airflow stores business-critical run history or Connections. Always test restoration after changing backup tooling.
What should I monitor first?
Monitor scheduler heartbeat, failed task count, queued task age, worker container restarts, PostgreSQL disk usage, Redis availability, and certificate renewal. These signals catch most operational incidents early.
Internal links
- Deploy Healthchecks with Docker Compose + Caddy + PostgreSQL for external job monitoring.
- Deploy Grafana + Prometheus for observability patterns.
- Deploy Vikunja with Docker Compose + Caddy + PostgreSQL for another PostgreSQL-backed internal tool.
Talk to us
If you want this implemented with hardened defaults, observability, and tested recovery playbooks, our team can help.