Open WebUI Setup Guide: Self-Hosted ChatGPT

Why Self-Host Your AI Chat Interface?

ChatGPT is convenient. It's also a black box that logs your conversations, rate-limits your free tier, and charges per token the moment you build anything serious. If you're working with sensitive data, building internal tools, or just want full control over your AI stack, there's a better path.

Open WebUI is the most polished self-hosted ChatGPT alternative available today. It gives you a clean, feature-rich chat interface that connects to local models via Ollama — or to any OpenAI-compatible API. You get conversation history, document uploads with RAG, image generation, web search, multi-user support, and custom tools. All running on hardware you own.

This Open WebUI setup guide walks you through the complete path: prerequisites, Docker deployment, Nginx with HTTPS, model management, and the actual troubleshooting fixes you'll need. Not the happy path. The real path.

Prerequisites

Sort these before you start. Each missing piece causes a different failure mode that wastes time.

Hardware

CPU-only: Any x86-64 server or VPS with 8 GB+ RAM. Inference is slow — usable for light personal use, not teams.
GPU (recommended): NVIDIA GPU with 8 GB+ VRAM for comfortable 7B model inference. 16 GB+ VRAM opens up 13B models. Requires CUDA 12.x drivers.
Disk: 30 GB free minimum. A 7B Q4 model is ~4 GB; a 13B is ~8 GB. Plan accordingly.

Software

Ubuntu 22.04 or 24.04 (this guide uses Ubuntu)
Docker Engine 24+ and Docker Compose v2
Ollama (installed below)
A domain name with an A record pointed at your server — needed for HTTPS
Ports 3000 (Open WebUI), 11434 (Ollama), 80 and 443 (Nginx) available

Verify Docker

docker --version
docker compose version
# Both should return version strings — Docker 24+, Compose v2+

If Docker isn't installed, follow the official Docker install guide first.

Step 1 — Install Ollama

Ollama is the local model runtime that Open WebUI talks to. It handles model downloads, quantization, and exposes an inference API on port 11434.

Install with the Official Script

curl -fsSL https://ollama.com/install.sh | sh

The script detects your GPU and installs the appropriate CUDA/ROCm runtime. Once complete, Ollama runs as a systemd service.

Verify and Start Ollama

systemctl status ollama

# If not running:
systemctl enable --now ollama

# Confirm the API is live
curl http://localhost:11434/api/version

You should see a JSON response with the version string. If the service fails to start, check journalctl -u ollama -n 50 for errors.

Pull Your First Models

Pull at least one model before connecting Open WebUI — it avoids an empty model list on first login:

# Fast, lightweight — good starting point
ollama pull llama3.2

# Better reasoning, needs ~8 GB VRAM
ollama pull llama3.1:8b

# Required for RAG / document search
ollama pull nomic-embed-text

# Confirm downloads
ollama list

Allow Remote Access (If Ollama Runs on a Separate Machine)

If Ollama and Open WebUI are on different servers, Ollama needs to listen on all interfaces:

sudo systemctl edit ollama
# Add these lines in the editor:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"

# Save and reload
sudo systemctl daemon-reload
sudo systemctl restart ollama

Skip this if both services run on the same host — you'll use host.docker.internal instead.

Step 2 — Deploy Open WebUI with Docker

Open WebUI ships as a single Docker image. The right flags matter.

Same-Host Setup (Most Common)

Use --add-host=host.docker.internal:host-gateway so the container can reach Ollama on the host's loopback:

docker run -d \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Remote Ollama Setup

Point directly at the remote machine's IP:

docker run -d \
  -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://192.168.1.50:11434 \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

First Login

Open http://your-server-ip:3000 in your browser. The first account created becomes admin — do this immediately. Whoever registers first owns the instance.

Step 3 — Production Docker Compose Setup

The single docker run command is fine for testing. For a setup you'll maintain, use Docker Compose with version-controlled config.

Project Directory and Compose File

mkdir ~/open-webui && cd ~/open-webui

Create docker-compose.yml:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: always
    ports:
      - "127.0.0.1:3000:8080"   # localhost only — Nginx handles public traffic
    volumes:
      - open-webui-data:/app/backend/data
    extra_hosts:
      - "host.docker.internal:host-gateway"
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
      - WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY}
      - CORS_ALLOW_ORIGIN=https://ai.yourdomain.com
      # Uncomment to add OpenAI models alongside local ones:
      # - OPENAI_API_KEY=${OPENAI_API_KEY}

volumes:
  open-webui-data:

Create the .env file:

# .env — keep this out of version control
WEBUI_SECRET_KEY=replace-this-with-a-long-random-string
# OPENAI_API_KEY=sk-...…div>

docker compose up -d
docker compose logs -f open-webui
  



Step 4 — HTTPS with Nginx and Let's Encrypt
HTTP is fine on localhost. Anything internet-facing needs HTTPS — Open WebUI uses WebSockets for streaming, and browsers block mixed-content WebSocket connections on HTTP pages.

Install Nginx and Certbot

    
    sudo apt update
sudo apt install -y nginx certbot python3-certbot-nginx
  

Nginx Site Configuration
Create /etc/nginx/sites-available/open-webui and replace ai.yourdomain.com with your actual domain:

    
    server {
    listen 80;
    server_name ai.yourdomain.com;

    location / {
        proxy_pass         http://127.0.0.1:3000;
        proxy_http_version 1.1;

        # Required for WebSocket streaming
        proxy_set_header   Upgrade $http_upgrade;
        proxy_set_header   Connection "upgrade";

        proxy_set_header   Host $host;
        proxy_set_header   X-Real-IP $remote_addr;
        proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Proto $scheme;

        # Generous timeouts — local inference can be slow
        proxy_read_timeout 300s;
        proxy_send_timeout 300s;

        # Allow large file uploads for RAG documents
        client_max_body_size 50M;
    }
}
  


    
    sudo ln -s /etc/nginx/sites-available/open-webui /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

# Issue TLS certificate and auto-update Nginx config
sudo certbot --nginx -d ai.yourdomain.com
  

After Certbot completes, your instance is live at https://ai.yourdomain.com with auto-renewing certificates.


  Important: proxy_read_timeout 300s is not optional. CPU inference on large prompts can take minutes. Without a generous timeout, Nginx terminates the connection mid-response and the user sees a blank or partial reply.




Step 5 — Models, RAG, and Admin Configuration
With the stack running, here's what to configure in the UI to get real value.

Managing Models from the Admin Panel
Navigate to Admin Settings → Connections → Ollama → Manage (the wrench icon). You can search the Ollama model library and pull models directly from the UI — no terminal needed after the initial setup.
Recommended models by use case:

  llama3.2:3b — Fast responses, low VRAM, good for general Q&A
  llama3.1:8b — Stronger reasoning; needs ~8 GB VRAM
  mistral:7b — Excellent at code and structured output
  deepseek-r1:8b — Strong reasoning model with visible thinking traces
  nomic-embed-text — Embedding model for RAG; pull this regardless of other choices


Enabling RAG (Document Search)
RAG lets users upload PDFs, Word docs, and web pages and ask questions against them in chat. To configure it:

  Go to Admin Settings → Documents
  Set Embedding Engine to Ollama
  Set Embedding Model to nomic-embed-text
  Save — RAG is now fully local


Connecting OpenAI Alongside Local Models
Open WebUI can proxy OpenAI models in the same interface as local ones. Go to Admin Settings → Connections → OpenAI API and enter your API key. Set the base URL to https://api.openai.com/v1.

User Access Control
By default, anyone who reaches the sign-up page can create an account. For a private deployment, go to Admin Settings → Users → Default User Role and set it to Pending. New registrations require manual admin approval.

Updating Open WebUI
The project ships updates frequently. Pull the latest image — your data persists in the named volume:

    
    docker compose pull
docker compose up -d
  



Step 6 — Troubleshooting and Production Tips
These are the actual issues people hit. Most are fast fixes once you know the cause.

Problem: "Could Not Connect to Ollama" in the UI
The container can't reach the host's loopback. It has its own network namespace — localhost inside the container is not the host's localhost.
Fix: Confirm the run command includes --add-host=host.docker.internal:host-gateway and the env var is set to http://host.docker.internal:11434. Verify from inside the container:

    
    docker exec -it open-webui curl http://host.docker.internal:11434/api/version
# Should return {"version":"..."}
# If this fails, Ollama isn't listening or the host mapping is wrong
  

Problem: Responses Cut Off Mid-Stream
This is almost always a proxy timeout. Local inference is slow — 300 words of output from a 7B model on CPU can take 2–3 minutes.
Fix: Increase proxy_read_timeout in your Nginx config to at least 300s. For very large prompts or slow hardware, go to 600s. Reload Nginx after changing it:

    
    sudo nginx -t && sudo systemctl reload nginx
  

Problem: WebSocket Errors in the Browser Console
Streaming in Open WebUI uses WebSockets. Missing Upgrade and Connection headers in the Nginx config will break it. The config above includes them — double-check they're present and Nginx was reloaded after the change.

Problem: CORS Errors After Adding a Domain
When you put Nginx in front and access via a domain, you need to tell Open WebUI what origin is allowed. Set CORS_ALLOW_ORIGIN in your .env or docker-compose.yml environment block to match your public URL exactly (including https://), then restart the container.

Problem: Out of Memory During Inference
Your model is too large for available VRAM or RAM. Options in order of impact:

  Switch to a lower quantization: ollama pull llama3.1:8b-instruct-q4_0
  Use a smaller base model (3b instead of 8b)
  Reduce context window size in the Ollama Modelfile
  Upgrade GPU VRAM


Tip: Inspect Container Logs Before Googling
Most issues surface immediately in the logs. Check here first:

    
    # Live logs
docker logs open-webui --tail 100 -f

# Or via Compose
docker compose logs open-webui --tail 100 -f
  

Tip: Back Up Before Every Update
All conversations, settings, and user accounts live in the open-webui-data Docker volume. One command backs up everything:

    
    docker run --rm \
  -v open-webui-data:/data \
  -v $(pwd)/backups:/backup \
  alpine tar czf /backup/open-webui-$(date +%Y%m%d).tar.gz /data
  

Tip: Limit Ollama Concurrency Under Heavy Load
By default, Ollama will try to serve concurrent requests, which can exhaust VRAM or RAM. If you're seeing OOM errors under multi-user load, limit parallel requests:

    
    sudo systemctl edit ollama
# Add:
[Service]
Environment="OLLAMA_MAX_LOADED_MODELS=1"
Environment="OLLAMA_NUM_PARALLEL=1"

sudo systemctl daemon-reload && sudo systemctl restart ollama
  



What You've Built
At the end of this Open WebUI setup guide, you have:

  A self-hosted ChatGPT alternative with a polished, full-featured UI
  Local LLM inference via Ollama — prompts never leave your server
  RAG for querying your own documents with local embeddings
  HTTPS via Nginx with Let's Encrypt auto-renewal
  Multi-user support with role-based access control
  Optional OpenAI passthrough — one interface for both local and cloud models
  A Docker Compose setup that's version-controllable and reproducible

This stack runs comfortably on a single mid-range server for personal use or small teams. The inference quality from a locally-run 8B model is genuinely useful for most developer tasks — summarisation, code review, Q&A over docs, and drafting.



Need Enterprise-Grade AI Infrastructure?
A single-server setup has a ceiling. When you're looking at multi-GPU inference, load balancing across model nodes, SSO and LDAP integration, audit logging, or deploying this inside a private cloud with compliance requirements — the architecture changes significantly.
The Sysbrix team has deployed production AI infrastructure across a range of environments. If you're evaluating self-hosted AI at scale and want to skip the trial-and-error phase, we're happy to help you design it right the first time.
Talk to Us About Enterprise AI Deployment →

Your Own ChatGPT: A Practical Open WebUI Setup Guide for Self-Hosted AI

Why Self-Host Your AI Chat Interface?

Prerequisites

Hardware

Software

Verify Docker

Step 1 — Install Ollama

Install with the Official Script

Verify and Start Ollama

Pull Your First Models

Allow Remote Access (If Ollama Runs on a Separate Machine)

Step 2 — Deploy Open WebUI with Docker

Same-Host Setup (Most Common)

Remote Ollama Setup

First Login

Step 3 — Production Docker Compose Setup

Project Directory and Compose File

Step 4 — HTTPS with Nginx and Let's Encrypt

Install Nginx and Certbot

Nginx Site Configuration

Step 5 — Models, RAG, and Admin Configuration

Managing Models from the Admin Panel

Enabling RAG (Document Search)

Connecting OpenAI Alongside Local Models

User Access Control

Updating Open WebUI

Step 6 — Troubleshooting and Production Tips

Problem: "Could Not Connect to Ollama" in the UI

Problem: Responses Cut Off Mid-Stream

Problem: WebSocket Errors in the Browser Console

Problem: CORS Errors After Adding a Domain

Problem: Out of Memory During Inference

Tip: Inspect Container Logs Before Googling

Tip: Back Up Before Every Update

Tip: Limit Ollama Concurrency Under Heavy Load

What You've Built

Need Enterprise-Grade AI Infrastructure?