LiteLLM Setup Proxy: One Gateway to Rule Every LLM in Your Stack
The moment you start using more than one LLM provider, things get messy fast. Different SDKs, different auth patterns, different rate limits, no unified view of what anything costs. LiteLLM fixes this by sitting between your apps and every LLM provider in existence — OpenAI, Anthropic, Google Gemini, Mistral, Cohere, Azure, Bedrock, and local models via Ollama — behind a single OpenAI-compatible API. One endpoint, one API key format, full cost tracking, model fallbacks, and per-team budgets. This guide walks you through a complete LiteLLM setup proxy from scratch.
Prerequisites
- A Linux server or local machine (Ubuntu 20.04+ recommended)
- Docker Engine and Docker Compose v2 installed
- API keys for at least one LLM provider (OpenAI, Anthropic, etc.) or a local Ollama instance
- At least 512MB RAM free — LiteLLM is lightweight
- Port 4000 available (or any custom port you prefer)
- Basic familiarity with YAML config files
Confirm Docker is ready:
docker --version
docker compose version
# Verify your OpenAI key works before proxying it
curl https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY" | jq '.data[0].id'
What Is LiteLLM and Why Run It as a Proxy?
LiteLLM has two modes: a Python library you import directly, and a proxy server you deploy and point your apps at. This guide focuses on the proxy — which is almost always the right choice for teams.
What the Proxy Gives You
- Unified OpenAI-compatible API — any app or library using the OpenAI SDK can point at LiteLLM with zero code changes. Just change the base URL.
- Provider abstraction — swap GPT-4 for Claude 3.5 Sonnet for Gemini Pro with a one-line config change. Your app code never changes.
- Virtual API keys — issue keys to teams or apps. Revoke them without rotating provider credentials. Track spend per key.
- Budget limits — set hard spend caps per key, per team, or globally. LiteLLM blocks requests when budgets are hit.
- Model fallbacks — if GPT-4 rate-limits, automatically retry with Claude or a local model. Zero downtime from provider outages.
- Load balancing — spread requests across multiple deployments of the same model (useful with Azure OpenAI regional endpoints).
- Logging and observability — built-in cost tracking per model, with integrations for Langfuse, Helicone, and more.
The bottom line: one LiteLLM proxy instance lets your entire organization talk to every LLM through a single, controlled, observable gateway.
Quick Start: Running LiteLLM Locally
Install and Run with pip (Fastest)
If you want to test LiteLLM before committing to a Docker deployment, run it directly with pip:
# Install LiteLLM with proxy extras
pip install 'litellm[proxy]'
# Run with an inline model config
litellm --model gpt-4o --port 4000
# Or with multiple models via config file
litellm --config config.yml --port 4000
The proxy is now running at http://localhost:4000. Test it immediately:
curl http://localhost:4000/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-anything' \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Say hello in one sentence"}]
}' | jq .choices[0].message.content
That sk-anything key works because auth is disabled by default. You'll lock that down shortly.
Production Deployment with Docker Compose
The LiteLLM Config File
The config file is the heart of LiteLLM. It defines which models are available, how they map to providers, fallback chains, and global settings. Create config.yml:
# config.yml
model_list:
# OpenAI models
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
- model_name: gpt-4o-mini
litellm_params:
model: openai/gpt-4o-mini
api_key: os.environ/OPENAI_API_KEY
# Anthropic
- model_name: claude-3-5-sonnet
litellm_params:
model: anthropic/claude-3-5-sonnet-20241022
api_key: os.environ/ANTHROPIC_API_KEY
# Local model via Ollama
- model_name: llama3.2
litellm_params:
model: ollama/llama3.2
api_base: http://host.docker.internal:11434
# Fallback group — try gpt-4o first, fall back to claude
- model_name: best-available
litellm_params:
model: gpt-4o
api_key: os.environ/OPENAI_API_KEY
model_info:
id: best-available-primary
router_settings:
fallbacks:
- {"best-available": ["claude-3-5-sonnet", "llama3.2"]}
retry_after: 5
num_retries: 3
litellm_settings:
success_callback: []
failure_callback: []
request_timeout: 600
set_verbose: false
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: os.environ/DATABASE_URL
store_model_in_db: true
Docker Compose with PostgreSQL
LiteLLM uses a database to persist virtual keys, usage data, and budgets. PostgreSQL is the right choice for production:
# docker-compose.yml
version: '3.8'
services:
litellm:
image: ghcr.io/berriai/litellm:main-latest
container_name: litellm
restart: unless-stopped
ports:
- "4000:4000"
volumes:
- ./config.yml:/app/config.yaml:ro
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- LITELLM_MASTER_KEY=${LITELLM_MASTER_KEY}
- DATABASE_URL=postgresql://litellm:${POSTGRES_PASSWORD}@postgres:5432/litellm
- STORE_MODEL_IN_DB=true
command: ["--config", "/app/config.yaml", "--port", "4000", "--num_workers", "4"]
depends_on:
postgres:
condition: service_healthy
networks:
- litellm_net
postgres:
image: postgres:15-alpine
container_name: litellm_db
restart: unless-stopped
environment:
- POSTGRES_DB=litellm
- POSTGRES_USER=litellm
- POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U litellm"]
interval: 5s
timeout: 5s
retries: 5
networks:
- litellm_net
volumes:
postgres_data:
networks:
litellm_net:
Create your .env file with real values — never commit this:
# .env
OPENAI_API_KEY=sk-proj-your-openai-key
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
POSTGRES_PASSWORD=a-strong-db-password
LITELLM_MASTER_KEY=sk-litellm-your-master-key-here
# Generate a strong master key:
# openssl rand -hex 20 | sed 's/^/sk-litellm-/'
Start the stack:
docker compose up -d
docker compose logs -f litellm
Wait for LiteLLM Proxy: Port 4000 in the logs. Then verify the health endpoint:
curl http://localhost:4000/health/liveliness
# Returns: {"status": "healthy"}
Managing Virtual Keys, Budgets, and Teams
Creating Virtual API Keys
The master key lets you create virtual keys for teams and applications. These virtual keys are what you hand out — your real provider API keys never leave the proxy:
# Create a key for a specific team with a monthly budget
curl -X POST http://localhost:4000/key/generate \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-litellm-your-master-key-here' \
-d '{
"key_alias": "team-backend",
"models": ["gpt-4o", "gpt-4o-mini", "claude-3-5-sonnet"],
"max_budget": 50.00,
"budget_duration": "1mo",
"metadata": {"team": "backend", "project": "search-api"}
}' | jq .key
The response is a virtual key like sk-litellm-xyz123.... That's what the backend team puts in their app. If they exceed $50 in a month, requests are automatically blocked until the budget resets. Your OpenAI bill doesn't accumulate unchecked.
Restricting Models Per Key
# Create a restricted key — only cheap models, low budget
curl -X POST http://localhost:4000/key/generate \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-litellm-your-master-key-here' \
-d '{
"key_alias": "intern-dev",
"models": ["gpt-4o-mini", "llama3.2"],
"max_budget": 5.00,
"budget_duration": "1mo",
"tpm_limit": 10000,
"rpm_limit": 60
}' | jq .key
Viewing Usage and Spend
# Get spend data for all keys
curl http://localhost:4000/key/info \
-H 'Authorization: Bearer sk-litellm-your-master-key-here' | jq '.info[] | {alias: .key_alias, spend: .spend, budget: .max_budget}'
# Get model-level spend breakdown
curl http://localhost:4000/spend/logs \
-H 'Authorization: Bearer sk-litellm-your-master-key-here' | jq '.[-5:]'
LiteLLM also ships with a built-in UI at http://localhost:4000/ui. Log in with your master key to get a dashboard showing spend by model, key, and team — useful for monthly cost reviews without writing queries.
Connecting Apps to LiteLLM
Drop-in OpenAI SDK Replacement
Any app using the OpenAI Python SDK works with LiteLLM immediately — just change the base URL:
from openai import OpenAI
# Before: direct to OpenAI
# client = OpenAI(api_key="sk-proj-...")
# After: through LiteLLM proxy
client = OpenAI(
base_url="http://localhost:4000/v1",
api_key="sk-litellm-team-backend-key"
)
# Everything else stays exactly the same
response = client.chat.completions.create(
model="gpt-4o", # Or "claude-3-5-sonnet", "llama3.2"
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Summarize this in bullet points: ..."}
],
temperature=0.3,
max_tokens=500
)
print(response.choices[0].message.content)
Connecting Dify, n8n, or Open WebUI
Any tool with an OpenAI-compatible endpoint setting works. In Dify: go to Settings → Model Provider → OpenAI-Compatible → Custom and set:
- API Base URL:
http://litellm:4000/v1(or your server IP) - API Key: your virtual key
- Model Name:
gpt-4o,claude-3-5-sonnet, or whatever you configured
From that point, Dify routes every LLM call through LiteLLM — you get full cost tracking and can swap models without touching Dify's config.
Tips, Gotchas, and Troubleshooting
Requests Failing with 401 Unauthorized
Once a master key is set, all requests need a valid key — including calls using the master key itself. Confirm the key is in the Authorization: Bearer header, not a custom header:
# Correct format
curl http://localhost:4000/v1/models \
-H 'Authorization: Bearer sk-litellm-your-master-key-here'
# Check that the key exists in the DB
curl http://localhost:4000/key/info?key=sk-litellm-team-key \
-H 'Authorization: Bearer sk-litellm-your-master-key-here'
Model Not Found Error
The model name in your API call must exactly match a model_name entry in config.yml. LiteLLM doesn't pass unknown model names through — it rejects them. List available models to confirm:
curl http://localhost:4000/v1/models \
-H 'Authorization: Bearer sk-litellm-your-master-key-here' | jq '[.data[].id]'
Ollama Not Reachable from Container
On Linux, host.docker.internal doesn't resolve by default. Find the Docker bridge IP and use it instead:
# Get Docker bridge IP
ip addr show docker0 | grep 'inet ' | awk '{print $2}' | cut -d/ -f1
# Usually 172.17.0.1
# Use in config.yml:
# api_base: http://172.17.0.1:11434
High Latency on First Request
LiteLLM loads model configs and validates provider connectivity on startup. If a provider is unreachable at startup (e.g., Ollama isn't running), it logs a warning but continues. First requests to a cold model may be slower as the connection is established. Check startup logs:
docker logs litellm 2>&1 | grep -E 'ERROR|WARNING|model'
# Run a health check per model
curl http://localhost:4000/health \
-H 'Authorization: Bearer sk-litellm-your-master-key-here' | jq .
Updating LiteLLM
docker compose pull litellm
docker compose up -d litellm
docker compose logs -f litellm
Your PostgreSQL volume persists — all virtual keys, usage history, and budgets survive the update. LiteLLM runs database migrations automatically on startup.
Pro Tips
- Set fallbacks on every production model — provider outages happen. A fallback chain of
[gpt-4o, claude-3-5-sonnet, llama3.2]means your app keeps working even when OpenAI has an incident. - Use
tpm_limitandrpm_limiton keys to enforce rate limits per team before they hit provider limits — it's a better signal than raw 429 errors from upstream. - Add Langfuse for full observability — set
success_callback: ["langfuse"]inlitellm_settingsand add your Langfuse keys as environment variables. Every LLM call gets logged with input, output, latency, and cost. - Use model aliases for portability — name your models
fast,smart, andlocalinstead of provider-specific names. Swap the underlying model in config without touching any app code. - Put LiteLLM behind Traefik for HTTPS and domain routing — the same pattern used for any other service in your stack applies here. Treat it as a first-class internal API.
Wrapping Up
A complete LiteLLM setup proxy transforms how your team consumes LLMs. Instead of every developer managing their own API keys, every app hardcoding a specific provider, and no one knowing what anything costs — you get one controlled gateway, full cost visibility, automatic fallbacks, and the freedom to swap models without touching application code.
Start with the Docker Compose stack in this guide, wire up your first two providers, and issue virtual keys to your apps. Once the proxy is running and you can see spend data flowing in, add fallback chains and budget limits. The whole setup takes an afternoon and pays off every month when you actually know what your LLM spend looks like — and can do something about it.
Need an Enterprise-Grade LLM Gateway?
If you're rolling out LiteLLM across a larger team — with SSO, audit logging, multi-region failover, or integration into existing infrastructure — the sysbrix team can design and deploy it. We build AI infrastructure that's production-ready, not just proof-of-concept.
Talk to Us →