Why Run Your Own AI Chat Interface?
ChatGPT is useful. It's also a third-party service that logs your prompts, rate-limits your free tier, and charges per token the moment you build anything serious on top of it. If you're working with internal documentation, customer data, or anything sensitive, that's a problem worth solving.
Open WebUI is the cleanest solution in this space right now. It's a polished, feature-complete chat interface — think ChatGPT, but self-hosted — that connects to local models via Ollama or any OpenAI-compatible API. You get conversation history, document uploads, RAG, image generation, web search, and multi-user support. All running on hardware you own.
This Open WebUI setup guide covers the full path: prerequisites, Docker deployment, Nginx with HTTPS, model management, and the actual troubleshooting fixes you'll need — not just the happy path.
Prerequisites
Sort these before you start. Each missing piece causes a different and confusing failure mode.
Hardware Requirements
- CPU-only inference: Any x86-64 server or VPS with 8 GB+ RAM. Inference will be slow — usable for light personal use, not teams.
- GPU inference (recommended): NVIDIA GPU with 8 GB+ VRAM for comfortable 7B model use. 16 GB+ VRAM opens up 13B and 32B models. Requires CUDA 12.x drivers.
- Disk: 30 GB free minimum. A single 7B model at Q4 quantization is ~4 GB. A 13B is ~8 GB. Plan accordingly.
Software Requirements
- Ubuntu 22.04 or 24.04 (this guide is Linux-first)
- Docker Engine 24+ and Docker Compose v2
- Ollama (installed below)
- A domain name with an A record pointed at your server — needed only if you want HTTPS access beyond localhost
- Ports 3000 (Open WebUI), 11434 (Ollama), 80 and 443 (Nginx) available
Verify Docker Before You Start
docker --version
docker compose version
# Both should return version strings — Docker 24+, Compose v2+
If Docker isn't installed, follow the official Docker install guide before continuing.
Step 1 — Install and Configure Ollama
Ollama is the local model runtime. It handles model storage, quantization selection, and exposes an inference API on port 11434 that Open WebUI connects to.
Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
The installer auto-detects NVIDIA or AMD GPUs and installs the correct runtime. Once complete, Ollama runs as a systemd service.
Verify and Enable Ollama
systemctl status ollama
# If it's not running:
systemctl enable --now ollama
# Confirm the API is live
curl http://localhost:11434/api/version
You should get a JSON response with the version string. That confirms Ollama is up and accepting requests.
Pull Your First Model
Pull at least one model before connecting Open WebUI — it avoids an empty model list on first login:
# Lightweight and fast — good starting point
ollama pull llama3.2
# Better reasoning, needs ~8 GB VRAM
ollama pull llama3.1:8b
# Required later for RAG / document search
ollama pull nomic-embed-text
# Confirm downloads
ollama list
Allow Ollama to Listen on All Interfaces (Remote Setup Only)
If Ollama runs on a separate machine from Open WebUI, it needs to bind to all interfaces, not just localhost. Override the systemd unit:
sudo systemctl edit ollama
# Add these lines in the editor:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
# Save and reload
sudo systemctl daemon-reload
sudo systemctl restart ollama
If Ollama and Open WebUI are on the same host, skip this — you'll use host.docker.internal instead.
Step 2 — Deploy Open WebUI with Docker
Open WebUI ships as a single Docker image. Installation is one command, though the right flags matter.
Same-Host Setup (Ollama and Open WebUI on One Server)
Use --add-host=host.docker.internal:host-gateway to let the container reach Ollama on the host's loopback network:
docker run -d \
-p 3000:8080 \
--add-host=host.docker.internal:host-gateway \
-e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
Remote Ollama Setup (Ollama on a Separate GPU Server)
Point directly at the remote machine's IP. Ollama must already be configured to listen on all interfaces (see Step 1):
docker run -d \
-p 3000:8080 \
-e OLLAMA_BASE_URL=http://192.168.1.50:11434 \
-v open-webui:/app/backend/data \
--name open-webui \
--restart always \
ghcr.io/open-webui/open-webui:main
First Login
Open http://your-server-ip:3000 in your browser. The first account created automatically becomes admin — do this immediately. Whoever registers first owns the instance.
Step 3 — Production Setup with Docker Compose
The single docker run command gets you running. For a setup you'll actually maintain — with version-controlled config, environment secrets, and easy restarts — use Docker Compose.
Project Directory and Compose File
mkdir ~/open-webui && cd ~/open-webui
Create docker-compose.yml:
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: open-webui
restart: always
ports:
- "127.0.0.1:3000:8080" # localhost only — Nginx handles public traffic
volumes:
- open-webui-data:/app/backend/data
extra_hosts:
- "host.docker.internal:host-gateway"
environment:
- OLLAMA_BASE_URL=http://host.docker.internal:11434
- WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY}
- CORS_ALLOW_ORIGIN=https://ai.yourdomain.com
# Uncomment to add OpenAI models alongside local ones:
# - OPENAI_API_KEY=${OPENAI_API_KEY}
volumes:
open-webui-data:
Create the .env file:
# .env — keep this out of version control
WEBUI_SECRET_KEY=replace-this-with-a-long-random-string
# OPENAI_API_KEY=sk-...
docker compose up -d
docker compose logs -f open-webui
Step 4 — HTTPS with Nginx and Let's Encrypt
HTTP is fine on localhost. Anything internet-facing needs HTTPS — Open WebUI uses WebSockets for streaming, and browsers block mixed-content WebSocket connections on HTTP pages.
Install Nginx and Certbot
sudo apt update
sudo apt install -y nginx certbot python3-certbot-nginx
Nginx Site Configuration
Create /etc/nginx/sites-available/open-webui and replace ai.yourdomain.com with your actual domain:
server {
listen 80;
server_name ai.yourdomain.com;
location / {
proxy_pass http://127.0.0.1:3000;
proxy_http_version 1.1;
# Required for WebSocket streaming
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Generous timeouts — local inference can be slow
proxy_read_timeout 300s;
proxy_send_timeout 300s;
# Allow large file uploads for RAG documents
client_max_body_size 50M;
}
}
sudo ln -s /etc/nginx/sites-available/open-webui /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx
# Issue TLS certificate and auto-update Nginx config
sudo certbot --nginx -d ai.yourdomain.com
After Certbot completes, your instance is live at https://ai.yourdomain.com with auto-renewing certificates.
Important:
proxy_read_timeout 300sis not optional. CPU inference on large prompts can take minutes. Without a generous timeout, Nginx terminates the connection mid-response and the user sees a blank or partial reply.
Step 5 — Models, RAG, and Admin Configuration
With the stack running, here's what to configure in the UI to get real value out of it.
Managing Models from the Admin Panel
Navigate to Admin Settings → Connections → Ollama → Manage (the wrench icon). You can search the Ollama model library and pull models directly from the UI — no terminal needed after the initial setup.
Recommended models by use case:
llama3.2:3b— Fast responses, low VRAM, good for general Q&Allama3.1:8b— Stronger reasoning; needs ~8 GB VRAMmistral:7b— Excellent at code and structured outputdeepseek-r1:8b— Strong reasoning model with visible thinking tracesnomic-embed-text— Embedding model for RAG; pull this regardless of other choices
Enabling RAG (Document Search)
RAG lets users upload PDFs, Word docs, and web pages and ask questions against them in chat. To configure it properly:
- Go to Admin Settings → Documents
- Set Embedding Engine to Ollama
- Set Embedding Model to
nomic-embed-text - Save — RAG is now fully local
Connecting OpenAI Alongside Local Models
Open WebUI can proxy OpenAI models in the same interface as local ones. Users pick from a unified model dropdown that includes both. Go to Admin Settings → Connections → OpenAI API and enter your API key. Set the base URL to https://api.openai.com/v1.
User Access Control
By default, anyone who reaches the sign-up page can create an account. For a private deployment, lock this down immediately: go to Admin Settings → Users → Default User Role and set it to Pending. New registrations require manual admin approval before they can log in.
Updating Open WebUI
The project ships updates frequently. Pull the latest image — your data persists in the named volume:
docker compose pull
docker compose up -d
Step 6 — Troubleshooting and Production Tips
These are the actual issues people hit. Most are fast fixes once you know the cause.
Problem: "Could Not Connect to Ollama" in the UI
The container can't reach the host's loopback. It has its own network namespace — localhost inside the container is not the host's localhost.
Fix: Confirm the run command includes --add-host=host.docker.internal:host-gateway and the env var is set to http://host.docker.internal:11434. Verify connectivity from inside the container:
docker exec -it open-webui curl http://host.docker.internal:11434/api/version
# Should return {"version":"..."}
# If this fails, Ollama isn't listening or the host mapping is wrong
Problem: Responses Cut Off Mid-Stream
This is almost always a proxy timeout. Local inference is slow — 300 words of output from a 7B model on CPU can take 2–3 minutes.
Fix: Increase proxy_read_timeout in your Nginx config to at least 300s. For very large prompts or slow hardware, go to 600s. Reload Nginx after changing it:
sudo nginx -t && sudo systemctl reload nginx
Problem: WebSocket Errors in the Browser Console
Streaming in Open WebUI uses WebSockets. Missing Upgrade and Connection headers in the Nginx config will break it. The config above includes them — double-check they're present and Nginx was reloaded after the change.
Problem: CORS Errors After Adding a Domain
When you put Nginx in front and access via a domain, you need to tell Open WebUI what origin is allowed. Set CORS_ALLOW_ORIGIN in your .env or docker-compose.yml environment block to match your public URL exactly (including https://), then restart the container.
Problem: Out of Memory During Inference
Your model is too large for available VRAM or RAM. Options in order of impact:
- Switch to a lower quantization:
ollama pull llama3.1:8b-instruct-q4_0 - Use a smaller base model (
3binstead of8b) - Reduce context window size in the Ollama Modelfile
- Upgrade GPU VRAM
Tip: Inspect Container Logs Before Googling
Most issues surface immediately in the logs. Check here first:
# Live logs
docker logs open-webui --tail 100 -f
# Or via Compose
docker compose logs open-webui --tail 100 -f
Tip: Back Up Before Every Update
All conversations, settings, and user accounts live in the open-webui-data Docker volume. One command backs up everything:
docker run --rm \
-v open-webui-data:/data \
-v $(pwd)/backups:/backup \
alpine tar czf /backup/open-webui-$(date +%Y%m%d).tar.gz /data
Tip: Limit Ollama Concurrency Under Heavy Load
By default, Ollama will try to serve concurrent requests, which can exhaust VRAM or RAM. If you're seeing OOM errors under multi-user load, limit parallel requests via the systemd override:
sudo systemctl edit ollama
# Add:
[Service]
Environment="OLLAMA_MAX_LOADED_MODELS=1"
Environment="OLLAMA_NUM_PARALLEL=1"
sudo systemctl daemon-reload && sudo systemctl restart ollama
What You've Built
At the end of this Open WebUI setup guide, you have:
- A self-hosted ChatGPT alternative with a polished, full-featured UI
- Local LLM inference via Ollama — prompts never leave your server
- RAG for querying your own documents with local embeddings
- HTTPS via Nginx with Let's Encrypt auto-renewal
- Multi-user support with role-based access control
- Optional OpenAI passthrough — one interface for both local and cloud models
- A Docker Compose setup that's version-controllable and reproducible
This stack runs comfortably on a single mid-range server for personal use or small teams. The inference quality you get from a locally-run 8B model is genuinely useful for most developer tasks — summarisation, code review, Q&A over docs, and drafting.
Need Enterprise-Grade AI Infrastructure?
A single-server setup has a ceiling. When you're looking at multi-GPU inference, load balancing across model nodes, SSO and LDAP integration, audit logging, or deploying this inside a private cloud with compliance requirements — the architecture changes significantly.
The Sysbrix team has deployed production AI infrastructure across a range of environments. If you're evaluating self-hosted AI at scale and want to skip the trial-and-error phase, we're happy to help you design it right the first time.