Why Self-Host Your AI Chat Interface?
ChatGPT is convenient. It's also a black box that logs your conversations, rate-limits your free tier, and charges per token the moment you build anything serious. If you're working with sensitive data, building internal tools, or just want full control over your AI stack, there's a better path.
Open WebUI is the most polished self-hosted ChatGPT alternative available today. It gives you a clean, feature-rich chat interface that connects to local models via Ollama — or to any OpenAI-compatible API. You get conversation history, document uploads with RAG, image generation, web search, multi-user support, and custom tools. All running on hardware you own.
This Open WebUI setup guide walks you through the complete path: prerequisites, Docker deployment, Nginx with HTTPS, model management, and the actual troubleshooting fixes you'll need. Not the happy path. The real path.
Prerequisites
Sort these before you start. Each missing piece causes a different failure mode that wastes time.
Hardware
- CPU-only: Any x86-64 server or VPS with 8 GB+ RAM. Inference is slow — usable for light personal use, not teams.
- GPU (recommended): NVIDIA GPU with 8 GB+ VRAM for comfortable 7B model inference. 16 GB+ VRAM opens up 13B models. Requires CUDA 12.x drivers.
- Disk: 30 GB free minimum. A 7B Q4 model is ~4 GB; a 13B is ~8 GB. Plan accordingly.
Software
- Ubuntu 22.04 or 24.04 (this guide uses Ubuntu)
- Docker Engine 24+ and Docker Compose v2
- Ollama (installed below)
- A domain name with an A record pointed at your server — needed for HTTPS
- Ports 3000 (Open WebUI), 11434 (Ollama), 80 and 443 (Nginx) available
Verify Docker
docker --version
docker compose version
# Both should return version strings — Docker 24+, Compose v2+
If Docker isn't installed, follow the official Docker install guide first.
Step 1 — Install Ollama
Ollama is the local model runtime that Open WebUI talks to. It handles model downloads, quantization, and exposes an inference API on port 11434.
Install with the Official Script
curl -fsSL https://ollama.com/install.sh | sh
The script detects your GPU and installs the appropriate CUDA/ROCm runtime. Once complete, Ollama runs as a systemd service.
Verify and Start Ollama
systemctl status ollama
# If not running:
systemctl enable --now ollama
# Confirm the API is live
curl http://localhost:11434/api/version
You should see a JSON response with the version string. If the service fails to start, check journalctl -u ollama -n 50 for errors.
Pull Your First Models
Pull at least one model before connecting Open WebUI — it avoids an empty model list on first login:
# Fast, lightweight — good starting point
ollama pull llama3.2
# Better reasoning, needs ~8 GB VRAM
ollama pull llama3.1:8b
# Required for RAG / document search
ollama pull nomic-embed-text
# Confirm downloads
ollama list
Allow Remote Access (If Ollama Runs on a Separate Machine)
If Ollama and Open WebUI are on different servers, Ollama needs to listen on all interfaces:
sudo systemctl edit ollama
# Add these lines in the editor:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
# Save and reload
sudo systemctl daemon-reload
sudo systemctl restart ollama
Skip this if both services run on the same host — you'll use host.docker.internal instead.
Step 2 — Deploy Open WebUI with Docker
Open WebUI ships as a single Docker image. The right flags matter.
Same-Host Setup (Most Common)
Use --add-host=host.docker.internal:host-gateway so the container can reach Ollama on the host's loopback:
docker run -d \
-p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
Remote Ollama Setup
Point directly at the remote machine's IP:
docker run -d \
-p 3000:8080 \
-e OLLAMA_BASE_URL=http://192.168.1.50:11434 \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
First Login
Open http://your-server-ip:3000 in your browser. The first account created becomes admin — do this immediately. Whoever registers first owns the instance.
Step 3 — Production Docker Compose Setup
The single docker run command is fine for testing. For a setup you'll maintain, use Docker Compose with version-controlled config.
Project Directory and Compose File
mkdir ~/open-webui && cd ~/open-webui
Create docker-compose.yml:
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: always
ports:
- "127.0.0.1:3000:8080" # localhost only — Nginx handles public traffic
volumes:
- open-webui-data:/app/backend/data
extra_hosts:
- "host.docker.internal:host-gateway"
environment:
- OLLAMA_BASE_URL=http://host.docker.internal:11434
- WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY}
- CORS_ALLOW_ORIGIN=https://ai.yourdomain.com
# Uncomment to add OpenAI models alongside local ones:
# - OPENAI_API_KEY=${OPENAI_API_KEY}
volumes:
open-webui-data:
Create the .env file:
# .env — keep this out of version control
WEBUI_SECRET_KEY=replace-this-with-a-long-random-string
# OPENAI_API_KEY=sk-...…div>
docker compose up -d
docker compose logs -f open-webui
Step 4 — HTTPS with Nginx and Let's Encrypt
HTTP is fine on localhost. Anything internet-facing needs HTTPS — Open WebUI uses WebSockets for streaming, and browsers block mixed-content WebSocket connections on HTTP pages.
Install Nginx and Certbot
sudo apt update
sudo apt install -y nginx certbot python3-certbot-nginx
Nginx Site Configuration
Create /etc/nginx/sites-available/open-webui and replace ai.yourdomain.com with your actual domain:
server {
listen 80;
server_name ai.yourdomain.com;
location / {
proxy_pass http://127.0.0.1:3000;
proxy_http_version 1.1;
# Required for WebSocket streaming
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Generous timeouts — local inference can be slow
proxy_read_timeout 300s;
proxy_send_timeout 300s;
# Allow large file uploads for RAG documents
client_max_body_size 50M;
}
}
sudo ln -s /etc/nginx/sites-available/open-webui /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx
# Issue TLS certificate and auto-update Nginx config
sudo certbot --nginx -d ai.yourdomain.com
After Certbot completes, your instance is live at https://ai.yourdomain.com with auto-renewing certificates.
Important: proxy_read_timeout 300s is not optional. CPU inference on large prompts can take minutes. Without a generous timeout, Nginx terminates the connection mid-response and the user sees a blank or partial reply.
Step 5 — Models, RAG, and Admin Configuration
With the stack running, here's what to configure in the UI to get real value.
Managing Models from the Admin Panel
Navigate to Admin Settings → Connections → Ollama → Manage (the wrench icon). You can search the Ollama model library and pull models directly from the UI — no terminal needed after the initial setup.
Recommended models by use case:
llama3.2:3b — Fast responses, low VRAM, good for general Q&A
llama3.1:8b — Stronger reasoning; needs ~8 GB VRAM
mistral:7b — Excellent at code and structured output
deepseek-r1:8b — Strong reasoning model with visible thinking traces
nomic-embed-text — Embedding model for RAG; pull this regardless of other choices
Enabling RAG (Document Search)
RAG lets users upload PDFs, Word docs, and web pages and ask questions against them in chat. To configure it:
- Go to Admin Settings → Documents
- Set Embedding Engine to Ollama
- Set Embedding Model to
nomic-embed-text
- Save — RAG is now fully local
Connecting OpenAI Alongside Local Models
Open WebUI can proxy OpenAI models in the same interface as local ones. Go to Admin Settings → Connections → OpenAI API and enter your API key. Set the base URL to https://api.openai.com/v1.
User Access Control
By default, anyone who reaches the sign-up page can create an account. For a private deployment, go to Admin Settings → Users → Default User Role and set it to Pending. New registrations require manual admin approval.
Updating Open WebUI
The project ships updates frequently. Pull the latest image — your data persists in the named volume:
docker compose pull
docker compose up -d
Step 6 — Troubleshooting and Production Tips
These are the actual issues people hit. Most are fast fixes once you know the cause.
Problem: "Could Not Connect to Ollama" in the UI
The container can't reach the host's loopback. It has its own network namespace — localhost inside the container is not the host's localhost.
Fix: Confirm the run command includes --add-host=host.docker.internal:host-gateway and the env var is set to http://host.docker.internal:11434. Verify from inside the container:
docker exec -it open-webui curl http://host.docker.internal:11434/api/version
# Should return {"version":"..."}
# If this fails, Ollama isn't listening or the host mapping is wrong
Problem: Responses Cut Off Mid-Stream
This is almost always a proxy timeout. Local inference is slow — 300 words of output from a 7B model on CPU can take 2–3 minutes.
Fix: Increase proxy_read_timeout in your Nginx config to at least 300s. For very large prompts or slow hardware, go to 600s. Reload Nginx after changing it:
sudo nginx -t && sudo systemctl reload nginx
Problem: WebSocket Errors in the Browser Console
Streaming in Open WebUI uses WebSockets. Missing Upgrade and Connection headers in the Nginx config will break it. The config above includes them — double-check they're present and Nginx was reloaded after the change.
Problem: CORS Errors After Adding a Domain
When you put Nginx in front and access via a domain, you need to tell Open WebUI what origin is allowed. Set CORS_ALLOW_ORIGIN in your .env or docker-compose.yml environment block to match your public URL exactly (including https://), then restart the container.
Problem: Out of Memory During Inference
Your model is too large for available VRAM or RAM. Options in order of impact:
- Switch to a lower quantization:
ollama pull llama3.1:8b-instruct-q4_0
- Use a smaller base model (
3b instead of 8b)
- Reduce context window size in the Ollama Modelfile
- Upgrade GPU VRAM
Tip: Inspect Container Logs Before Googling
Most issues surface immediately in the logs. Check here first:
# Live logs
docker logs open-webui --tail 100 -f
# Or via Compose
docker compose logs open-webui --tail 100 -f
Tip: Back Up Before Every Update
All conversations, settings, and user accounts live in the open-webui-data Docker volume. One command backs up everything:
docker run --rm \
-v open-webui-data:/data \
-v $(pwd)/backups:/backup \
alpine tar czf /backup/open-webui-$(date +%Y%m%d).tar.gz /data
Tip: Limit Ollama Concurrency Under Heavy Load
By default, Ollama will try to serve concurrent requests, which can exhaust VRAM or RAM. If you're seeing OOM errors under multi-user load, limit parallel requests:
sudo systemctl edit ollama
# Add:
[Service]
Environment="OLLAMA_MAX_LOADED_MODELS=1"
Environment="OLLAMA_NUM_PARALLEL=1"
sudo systemctl daemon-reload && sudo systemctl restart ollama
What You've Built
At the end of this Open WebUI setup guide, you have:
- A self-hosted ChatGPT alternative with a polished, full-featured UI
- Local LLM inference via Ollama — prompts never leave your server
- RAG for querying your own documents with local embeddings
- HTTPS via Nginx with Let's Encrypt auto-renewal
- Multi-user support with role-based access control
- Optional OpenAI passthrough — one interface for both local and cloud models
- A Docker Compose setup that's version-controllable and reproducible
This stack runs comfortably on a single mid-range server for personal use or small teams. The inference quality from a locally-run 8B model is genuinely useful for most developer tasks — summarisation, code review, Q&A over docs, and drafting.
Need Enterprise-Grade AI Infrastructure?
A single-server setup has a ceiling. When you're looking at multi-GPU inference, load balancing across model nodes, SSO and LDAP integration, audit logging, or deploying this inside a private cloud with compliance requirements — the architecture changes significantly.
The Sysbrix team has deployed production AI infrastructure across a range of environments. If you're evaluating self-hosted AI at scale and want to skip the trial-and-error phase, we're happy to help you design it right the first time.