Skip to Content

Your Own ChatGPT: A Practical Open WebUI Setup Guide for Self-Hosted AI

Deploy a private, self-hosted AI chat interface on your own server in under 30 minutes — full control over your data, models, and conversations.

Why Self-Host Your AI Chat Interface?

ChatGPT is convenient. It's also a black box that logs your conversations, rate-limits your free tier, and charges per token the moment you build anything serious. If you're working with sensitive data, building internal tools, or just want full control over your AI stack, there's a better path.

Open WebUI is the most polished self-hosted ChatGPT alternative available today. It gives you a clean, feature-rich chat interface that connects to local models via Ollama — or to any OpenAI-compatible API. You get conversation history, document uploads with RAG, image generation, web search, multi-user support, and custom tools. All running on hardware you own.

This Open WebUI setup guide walks you through the complete path: prerequisites, Docker deployment, Nginx with HTTPS, model management, and the actual troubleshooting fixes you'll need. Not the happy path. The real path.


Prerequisites

Sort these before you start. Each missing piece causes a different failure mode that wastes time.

Hardware

  • CPU-only: Any x86-64 server or VPS with 8 GB+ RAM. Inference is slow — usable for light personal use, not teams.
  • GPU (recommended): NVIDIA GPU with 8 GB+ VRAM for comfortable 7B model inference. 16 GB+ VRAM opens up 13B models. Requires CUDA 12.x drivers.
  • Disk: 30 GB free minimum. A 7B Q4 model is ~4 GB; a 13B is ~8 GB. Plan accordingly.

Software

  • Ubuntu 22.04 or 24.04 (this guide uses Ubuntu)
  • Docker Engine 24+ and Docker Compose v2
  • Ollama (installed below)
  • A domain name with an A record pointed at your server — needed for HTTPS
  • Ports 3000 (Open WebUI), 11434 (Ollama), 80 and 443 (Nginx) available

Verify Docker

docker --version
docker compose version
# Both should return version strings — Docker 24+, Compose v2+

If Docker isn't installed, follow the official Docker install guide first.


Step 1 — Install Ollama

Ollama is the local model runtime that Open WebUI talks to. It handles model downloads, quantization, and exposes an inference API on port 11434.

Install with the Official Script

curl -fsSL https://ollama.com/install.sh | sh

The script detects your GPU and installs the appropriate CUDA/ROCm runtime. Once complete, Ollama runs as a systemd service.

Verify and Start Ollama

systemctl status ollama

# If not running:
systemctl enable --now ollama

# Confirm the API is live
curl http://localhost:11434/api/version

You should see a JSON response with the version string. If the service fails to start, check journalctl -u ollama -n 50 for errors.

Pull Your First Models

Pull at least one model before connecting Open WebUI — it avoids an empty model list on first login:

# Fast, lightweight — good starting point
ollama pull llama3.2

# Better reasoning, needs ~8 GB VRAM
ollama pull llama3.1:8b

# Required for RAG / document search
ollama pull nomic-embed-text

# Confirm downloads
ollama list

Allow Remote Access (If Ollama Runs on a Separate Machine)

If Ollama and Open WebUI are on different servers, Ollama needs to listen on all interfaces:

sudo systemctl edit ollama
# Add these lines in the editor:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"

# Save and reload
sudo systemctl daemon-reload
sudo systemctl restart ollama

Skip this if both services run on the same host — you'll use host.docker.internal instead.


Step 2 — Deploy Open WebUI with Docker

Open WebUI ships as a single Docker image. The right flags matter.

Same-Host Setup (Most Common)

Use --add-host=host.docker.internal:host-gateway so the container can reach Ollama on the host's loopback:

docker run -d \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -e OLLAMA_BASE_URL=http://host.docker.internal:11434 \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Remote Ollama Setup

Point directly at the remote machine's IP:

docker run -d \
  -p 3000:8080 \
  -e OLLAMA_BASE_URL=http://192.168.1.50:11434 \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

First Login

Open http://your-server-ip:3000 in your browser. The first account created becomes admin — do this immediately. Whoever registers first owns the instance.


Step 3 — Production Docker Compose Setup

The single docker run command is fine for testing. For a setup you'll maintain, use Docker Compose with version-controlled config.

Project Directory and Compose File

mkdir ~/open-webui && cd ~/open-webui

Create docker-compose.yml:

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: always
    ports:
      - "127.0.0.1:3000:8080"   # localhost only — Nginx handles public traffic
    volumes:
      - open-webui-data:/app/backend/data
    extra_hosts:
      - "host.docker.internal:host-gateway"
    environment:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434
      - WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY}
      - CORS_ALLOW_ORIGIN=https://ai.yourdomain.com
      # Uncomment to add OpenAI models alongside local ones:
      # - OPENAI_API_KEY=${OPENAI_API_KEY}

volumes:
  open-webui-data:

Create the .env file:

# .env — keep this out of version control
WEBUI_SECRET_KEY=replace-this-with-a-long-random-string
# OPENAI_API_KEY=sk-...…div>

docker compose up -d
docker compose logs -f open-webui

Step 4 — HTTPS with Nginx and Let's Encrypt

HTTP is fine on localhost. Anything internet-facing needs HTTPS — Open WebUI uses WebSockets for streaming, and browsers block mixed-content WebSocket connections on HTTP pages.

Install Nginx and Certbot

sudo apt update
sudo apt install -y nginx certbot python3-certbot-nginx

Nginx Site Configuration

Create /etc/nginx/sites-available/open-webui and replace ai.yourdomain.com with your actual domain:

server {
    listen 80;
    server_name ai.yourdomain.com;

    location / {
        proxy_pass         http://127.0.0.1:3000;
        proxy_http_version 1.1;

        # Required for WebSocket streaming
        proxy_set_header   Upgrade $http_upgrade;
        proxy_set_header   Connection "upgrade";

        proxy_set_header   Host $host;
        proxy_set_header   X-Real-IP $remote_addr;
        proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header   X-Forwarded-Proto $scheme;

        # Generous timeouts — local inference can be slow
        proxy_read_timeout 300s;
        proxy_send_timeout 300s;

        # Allow large file uploads for RAG documents
        client_max_body_size 50M;
    }
}
sudo ln -s /etc/nginx/sites-available/open-webui /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

# Issue TLS certificate and auto-update Nginx config
sudo certbot --nginx -d ai.yourdomain.com

After Certbot completes, your instance is live at https://ai.yourdomain.com with auto-renewing certificates.

Important: proxy_read_timeout 300s is not optional. CPU inference on large prompts can take minutes. Without a generous timeout, Nginx terminates the connection mid-response and the user sees a blank or partial reply.


Step 5 — Models, RAG, and Admin Configuration

With the stack running, here's what to configure in the UI to get real value.

Managing Models from the Admin Panel

Navigate to Admin Settings → Connections → Ollama → Manage (the wrench icon). You can search the Ollama model library and pull models directly from the UI — no terminal needed after the initial setup.

Recommended models by use case:

  • llama3.2:3b — Fast responses, low VRAM, good for general Q&A
  • llama3.1:8b — Stronger reasoning; needs ~8 GB VRAM
  • mistral:7b — Excellent at code and structured output
  • deepseek-r1:8b — Strong reasoning model with visible thinking traces
  • nomic-embed-text — Embedding model for RAG; pull this regardless of other choices

Enabling RAG (Document Search)

RAG lets users upload PDFs, Word docs, and web pages and ask questions against them in chat. To configure it:

  1. Go to Admin Settings → Documents
  2. Set Embedding Engine to Ollama
  3. Set Embedding Model to nomic-embed-text
  4. Save — RAG is now fully local

Connecting OpenAI Alongside Local Models

Open WebUI can proxy OpenAI models in the same interface as local ones. Go to Admin Settings → Connections → OpenAI API and enter your API key. Set the base URL to https://api.openai.com/v1.

User Access Control

By default, anyone who reaches the sign-up page can create an account. For a private deployment, go to Admin Settings → Users → Default User Role and set it to Pending. New registrations require manual admin approval.

Updating Open WebUI

The project ships updates frequently. Pull the latest image — your data persists in the named volume:

docker compose pull
docker compose up -d

Step 6 — Troubleshooting and Production Tips

These are the actual issues people hit. Most are fast fixes once you know the cause.

Problem: "Could Not Connect to Ollama" in the UI

The container can't reach the host's loopback. It has its own network namespace — localhost inside the container is not the host's localhost.

Fix: Confirm the run command includes --add-host=host.docker.internal:host-gateway and the env var is set to http://host.docker.internal:11434. Verify from inside the container:

docker exec -it open-webui curl http://host.docker.internal:11434/api/version
# Should return {"version":"..."}
# If this fails, Ollama isn't listening or the host mapping is wrong

Problem: Responses Cut Off Mid-Stream

This is almost always a proxy timeout. Local inference is slow — 300 words of output from a 7B model on CPU can take 2–3 minutes.

Fix: Increase proxy_read_timeout in your Nginx config to at least 300s. For very large prompts or slow hardware, go to 600s. Reload Nginx after changing it:

sudo nginx -t && sudo systemctl reload nginx

Problem: WebSocket Errors in the Browser Console

Streaming in Open WebUI uses WebSockets. Missing Upgrade and Connection headers in the Nginx config will break it. The config above includes them — double-check they're present and Nginx was reloaded after the change.

Problem: CORS Errors After Adding a Domain

When you put Nginx in front and access via a domain, you need to tell Open WebUI what origin is allowed. Set CORS_ALLOW_ORIGIN in your .env or docker-compose.yml environment block to match your public URL exactly (including https://), then restart the container.

Problem: Out of Memory During Inference

Your model is too large for available VRAM or RAM. Options in order of impact:

  • Switch to a lower quantization: ollama pull llama3.1:8b-instruct-q4_0
  • Use a smaller base model (3b instead of 8b)
  • Reduce context window size in the Ollama Modelfile
  • Upgrade GPU VRAM

Tip: Inspect Container Logs Before Googling

Most issues surface immediately in the logs. Check here first:

# Live logs
docker logs open-webui --tail 100 -f

# Or via Compose
docker compose logs open-webui --tail 100 -f

Tip: Back Up Before Every Update

All conversations, settings, and user accounts live in the open-webui-data Docker volume. One command backs up everything:

docker run --rm \
  -v open-webui-data:/data \
  -v $(pwd)/backups:/backup \
  alpine tar czf /backup/open-webui-$(date +%Y%m%d).tar.gz /data

Tip: Limit Ollama Concurrency Under Heavy Load

By default, Ollama will try to serve concurrent requests, which can exhaust VRAM or RAM. If you're seeing OOM errors under multi-user load, limit parallel requests:

sudo systemctl edit ollama
# Add:
[Service]
Environment="OLLAMA_MAX_LOADED_MODELS=1"
Environment="OLLAMA_NUM_PARALLEL=1"

sudo systemctl daemon-reload && sudo systemctl restart ollama

What You've Built

At the end of this Open WebUI setup guide, you have:

  • A self-hosted ChatGPT alternative with a polished, full-featured UI
  • Local LLM inference via Ollama — prompts never leave your server
  • RAG for querying your own documents with local embeddings
  • HTTPS via Nginx with Let's Encrypt auto-renewal
  • Multi-user support with role-based access control
  • Optional OpenAI passthrough — one interface for both local and cloud models
  • A Docker Compose setup that's version-controllable and reproducible

This stack runs comfortably on a single mid-range server for personal use or small teams. The inference quality from a locally-run 8B model is genuinely useful for most developer tasks — summarisation, code review, Q&A over docs, and drafting.


Need Enterprise-Grade AI Infrastructure?

A single-server setup has a ceiling. When you're looking at multi-GPU inference, load balancing across model nodes, SSO and LDAP integration, audit logging, or deploying this inside a private cloud with compliance requirements — the architecture changes significantly.

The Sysbrix team has deployed production AI infrastructure across a range of environments. If you're evaluating self-hosted AI at scale and want to skip the trial-and-error phase, we're happy to help you design it right the first time.

Talk to Us About Enterprise AI Deployment →

Run AI Locally, Zero Cloud Dependency: The Complete Ollama Setup Guide
Install Ollama on your own hardware, pull your first model, tune it for your GPU, and start querying it via CLI, REST API, and Python — in one sitting.