Open WebUI Setup Guide: Self-Hosted ChatGPT Alt

Why Run Your Own AI Chat Interface?

ChatGPT is useful. It's also a third-party service that logs your prompts, rate-limits your free tier, and charges per token the moment you build anything serious on top of it. If you're working with internal documentation, customer data, or anything sensitive, that's a problem worth solving.

Open WebUI is the cleanest solution in this space right now. It's a polished, feature-complete chat interface — think ChatGPT, but self-hosted — that connects to local models via Ollama or any OpenAI-compatible API. You get conversation history, document uploads, RAG, image generation, web search, and multi-user support. All running on hardware you own.

This Open WebUI setup guide covers the full path: prerequisites, Docker deployment, Nginx with HTTPS, model management, and the actual troubleshooting fixes you'll need — not just the happy path.

Prerequisites

Sort these before you start. Each missing piece causes a different and confusing failure mode.

Hardware Requirements

CPU-only inference: Any x86-64 server or VPS with 8 GB+ RAM. Inference will be slow — usable for light personal use, not teams.
GPU inference (recommended): NVIDIA GPU with 8 GB+ VRAM for comfortable 7B model use. 16 GB+ VRAM opens up 13B and 32B models. Requires CUDA 12.x drivers.
Disk: 30 GB free minimum. A single 7B model at Q4 quantization is ~4 GB. A 13B is ~8 GB. Plan accordingly.

Software Requirements

Ubuntu 22.04 or 24.04 (this guide is Linux-first)
Docker Engine 24+ and Docker Compose v2
Ollama (installed below)
A domain name with an A record pointed at your server — needed only if you want HTTPS access beyond localhost
Ports 3000 (Open WebUI), 11434 (Ollama), 80 and 443 (Nginx) available

Verify Docker Before You Start

docker --version
docker compose version
# Both should return version strings — Docker 24+, Compose v2+

If Docker isn't installed, follow the official Docker install guide before continuing.

Step 1 — Install and Configure Ollama

Ollama is the local model runtime. It handles model storage, quantization selection, and exposes an inference API on port 11434 that Open WebUI connects to.

Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

The installer auto-detects NVIDIA or AMD GPUs and installs the correct runtime. Once complete, Ollama runs as a systemd service.

Verify and Enable Ollama

systemctl status ollama

# If it's not running:
systemctl enable --now ollama

# Confirm the API is live
curl http://localhost:11434/api/version

You should get a JSON response with the version string. That confirms Ollama is up and accepting requests.

Pull Your First Model

Pull at least one model before connecting Open WebUI — it avoids an empty model list on first login:

# Lightweight and fast — good starting point
ollama pull llama3.2

# Better reasoning, needs ~8 GB VRAM
ollama pull llama3.1:8b

# Required later for RAG / document search
ollama pull nomic-embed-text

# Confirm downloads
ollama list

Allow Ollama to Listen on All Interfaces (Remote Setup Only)

If Ollama runs on a separate machine from Open WebUI, it needs to bind to all interfaces, not just localhost. Override the systemd unit:

sudo systemctl edit ollama
# Add these lines in the editor:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"

# Save and reload
sudo systemctl daemon-reload
sudo systemctl restart ollama

If Ollama and Open WebUI are on the same host, skip this — you'll use host.docker.internal instead.

Step 2 — Deploy Open WebUI with Docker

Open WebUI ships as a single Docker image. Installation is one command, though the right flags matter.

Same-Host Setup (Ollama and Open WebUI on One Server)

Use --add-host=host.docker.internal:host-gateway to let the container reach Ollama on the host's loopback network:

docker run -d \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Remote Ollama Setup (Ollama on a Separate GPU Server)

Point directly at the remote machine's IP. Ollama must already be configured to listen on all interfaces (see Step 1):

docker run -d \
  -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://192.168.1.50:11434 \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

First Login

Open http://your-server-ip:3000 in your browser. The first account created automatically becomes admin — do this immediately. Whoever registers first owns the instance.

Step 3 — Production Setup with Docker Compose

The single docker run command gets you running. For a setup you'll actually maintain — with version-controlled config, environment secrets, and easy restarts — use Docker Compose.

Project Directory and Compose File

mkdir ~/open-webui && cd ~/open-webui

Create docker-compose.yml:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: always
    ports:
      - "127.0.0.1:3000:8080"   # localhost only — Nginx handles public traffic
    volumes:
      - open-webui-data:/app/backend/data
    extra_hosts:
      - "host.docker.internal:host-gateway"
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
      - WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY}
      - CORS_ALLOW_ORIGIN=https://ai.yourdomain.com
      # Uncomment to add OpenAI models alongside local ones:
      # - OPENAI_API_KEY=${OPENAI_API_KEY}

volumes:
  open-webui-data:

Create the .env file:

# .env — keep this out of version control
WEBUI_SECRET_KEY=replace-this-with-a-long-random-string
# OPENAI_API_KEY=sk-...

docker compose up -d
docker compose logs -f open-webui

Step 4 — HTTPS with Nginx and Let's Encrypt

HTTP is fine on localhost. Anything internet-facing needs HTTPS — Open WebUI uses WebSockets for streaming, and browsers block mixed-content WebSocket connections on HTTP pages.

Install Nginx and Certbot

sudo apt update
sudo apt install -y nginx certbot python3-certbot-nginx

Nginx Site Configuration

Create /etc/nginx/sites-available/open-webui and replace ai.yourdomain.com with your actual domain:

server {
    listen 80;
    server_name ai.yourdomain.com;

    location / {
        proxy_pass         http://127.0.0.1:3000;
        proxy_http_version 1.1;

        # Required for WebSocket streaming
        proxy_set_header   Upgrade $http_upgrade;
        proxy_set_header   Connection "upgrade";

        proxy_set_header   Host $host;
        proxy_set_header   X-Real-IP $remote_addr;
        proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Proto $scheme;

        # Generous timeouts — local inference can be slow
        proxy_read_timeout 300s;
        proxy_send_timeout 300s;

        # Allow large file uploads for RAG documents
        client_max_body_size 50M;
    }
}

sudo ln -s /etc/nginx/sites-available/open-webui /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

# Issue TLS certificate and auto-update Nginx config
sudo certbot --nginx -d ai.yourdomain.com

After Certbot completes, your instance is live at https://ai.yourdomain.com with auto-renewing certificates.

Important: proxy_read_timeout 300s is not optional. CPU inference on large prompts can take minutes. Without a generous timeout, Nginx terminates the connection mid-response and the user sees a blank or partial reply.

Step 5 — Models, RAG, and Admin Configuration

With the stack running, here's what to configure in the UI to get real value out of it.

Managing Models from the Admin Panel

Navigate to Admin Settings → Connections → Ollama → Manage (the wrench icon). You can search the Ollama model library and pull models directly from the UI — no terminal needed after the initial setup.

Recommended models by use case:

llama3.2:3b — Fast responses, low VRAM, good for general Q&A
llama3.1:8b — Stronger reasoning; needs ~8 GB VRAM
mistral:7b — Excellent at code and structured output
deepseek-r1:8b — Strong reasoning model with visible thinking traces
nomic-embed-text — Embedding model for RAG; pull this regardless of other choices

Enabling RAG (Document Search)

RAG lets users upload PDFs, Word docs, and web pages and ask questions against them in chat. To configure it properly:

Go to Admin Settings → Documents
Set Embedding Engine to Ollama
Set Embedding Model to nomic-embed-text
Save — RAG is now fully local

Connecting OpenAI Alongside Local Models

Open WebUI can proxy OpenAI models in the same interface as local ones. Users pick from a unified model dropdown that includes both. Go to Admin Settings → Connections → OpenAI API and enter your API key. Set the base URL to https://api.openai.com/v1.

User Access Control

By default, anyone who reaches the sign-up page can create an account. For a private deployment, lock this down immediately: go to Admin Settings → Users → Default User Role and set it to Pending. New registrations require manual admin approval before they can log in.

Updating Open WebUI

The project ships updates frequently. Pull the latest image — your data persists in the named volume:

docker compose pull
docker compose up -d

Step 6 — Troubleshooting and Production Tips

These are the actual issues people hit. Most are fast fixes once you know the cause.

Problem: "Could Not Connect to Ollama" in the UI

The container can't reach the host's loopback. It has its own network namespace — localhost inside the container is not the host's localhost.

Fix: Confirm the run command includes --add-host=host.docker.internal:host-gateway and the env var is set to http://host.docker.internal:11434. Verify connectivity from inside the container:

docker exec -it open-webui curl http://host.docker.internal:11434/api/version
# Should return {"version":"..."}
# If this fails, Ollama isn't listening or the host mapping is wrong

Problem: Responses Cut Off Mid-Stream

This is almost always a proxy timeout. Local inference is slow — 300 words of output from a 7B model on CPU can take 2–3 minutes.

Fix: Increase proxy_read_timeout in your Nginx config to at least 300s. For very large prompts or slow hardware, go to 600s. Reload Nginx after changing it:

sudo nginx -t && sudo systemctl reload nginx

Problem: WebSocket Errors in the Browser Console

Streaming in Open WebUI uses WebSockets. Missing Upgrade and Connection headers in the Nginx config will break it. The config above includes them — double-check they're present and Nginx was reloaded after the change.

Problem: CORS Errors After Adding a Domain

When you put Nginx in front and access via a domain, you need to tell Open WebUI what origin is allowed. Set CORS_ALLOW_ORIGIN in your .env or docker-compose.yml environment block to match your public URL exactly (including https://), then restart the container.

Problem: Out of Memory During Inference

Your model is too large for available VRAM or RAM. Options in order of impact:

Switch to a lower quantization: ollama pull llama3.1:8b-instruct-q4_0
Use a smaller base model (3b instead of 8b)
Reduce context window size in the Ollama Modelfile
Upgrade GPU VRAM

Tip: Inspect Container Logs Before Googling

Most issues surface immediately in the logs. Check here first:

# Live logs
docker logs open-webui --tail 100 -f

# Or via Compose
docker compose logs open-webui --tail 100 -f

Tip: Back Up Before Every Update

All conversations, settings, and user accounts live in the open-webui-data Docker volume. One command backs up everything:

docker run --rm \
  -v open-webui-data:/data \
  -v $(pwd)/backups:/backup \
  alpine tar czf /backup/open-webui-$(date +%Y%m%d).tar.gz /data

Tip: Limit Ollama Concurrency Under Heavy Load

By default, Ollama will try to serve concurrent requests, which can exhaust VRAM or RAM. If you're seeing OOM errors under multi-user load, limit parallel requests via the systemd override:

sudo systemctl edit ollama
# Add:
[Service]
Environment="OLLAMA_MAX_LOADED_MODELS=1"
Environment="OLLAMA_NUM_PARALLEL=1"

sudo systemctl daemon-reload && sudo systemctl restart ollama

What You've Built

At the end of this Open WebUI setup guide, you have:

A self-hosted ChatGPT alternative with a polished, full-featured UI
Local LLM inference via Ollama — prompts never leave your server
RAG for querying your own documents with local embeddings
HTTPS via Nginx with Let's Encrypt auto-renewal
Multi-user support with role-based access control
Optional OpenAI passthrough — one interface for both local and cloud models
A Docker Compose setup that's version-controllable and reproducible

This stack runs comfortably on a single mid-range server for personal use or small teams. The inference quality you get from a locally-run 8B model is genuinely useful for most developer tasks — summarisation, code review, Q&A over docs, and drafting.

Need Enterprise-Grade AI Infrastructure?

A single-server setup has a ceiling. When you're looking at multi-GPU inference, load balancing across model nodes, SSO and LDAP integration, audit logging, or deploying this inside a private cloud with compliance requirements — the architecture changes significantly.

The Sysbrix team has deployed production AI infrastructure across a range of environments. If you're evaluating self-hosted AI at scale and want to skip the trial-and-error phase, we're happy to help you design it right the first time.

Talk to Us About Enterprise AI Deployment →

Your Own AI, Your Own Rules: The Complete Open WebUI Setup Guide