Skip to Content

Open WebUI Setup Guide: Run Your Own Private ChatGPT on Any Server

Deploy Open WebUI with Ollama and Docker Compose on your own VPS — and get a fully private, self-hosted AI chat interface running in under 30 minutes.

Why Self-Host an AI Chat Interface?

ChatGPT sends every message you type to OpenAI's servers. For personal projects that's fine. For anything involving client data, proprietary code, internal documents, or regulated information — it's a liability. And even for personal use, you're at the mercy of rate limits, subscription costs, and models you don't control.

Open WebUI solves this. It's a polished, feature-rich chat interface — think ChatGPT UI, but running entirely on your own hardware. Pair it with Ollama, which handles local model management, and you have a private AI stack that rivals any SaaS offering. No data leaves your server. No monthly bill that scales with usage. No vendor lock-in.

This guide walks you through the full setup: Docker Compose, persistent storage, model management, OpenAI API passthrough for cloud fallback, HTTPS behind Traefik, and production hardening.

Prerequisites

Before you start, have the following ready:

  • A server or VPS — Ubuntu 22.04/24.04 LTS recommended. For running local models comfortably:
    • CPU-only: 8+ GB RAM, 40+ GB storage (models are large)
    • GPU-accelerated: An Nvidia or AMD GPU with 8+ GB VRAM; CPU handling is significantly slower for larger models
    • Good CPU-only starting point: Hetzner CX32 (4 vCPU, 8 GB RAM) or equivalent
  • Docker Engine 24+ and Docker Compose v2 — the docker compose plugin (not the legacy docker-compose binary).
  • A domain name with an A record pointed at your server — needed for HTTPS. E.g., chat.yourdomain.com.
  • Ports 80 and 443 open on your firewall for Traefik and Let's Encrypt.
  • Basic Linux and Docker familiarity — you should be comfortable editing YAML and running shell commands.

No GPU? No problem. Ollama runs on CPU. Smaller models (3B–8B parameters) are perfectly usable on a modern CPU server, just slower. For most chat and summarisation tasks, an 8B model on CPU is more than good enough.

Step 1: Install Ollama on Your Server

Ollama is the local model runner that Open WebUI talks to. It handles model downloads, quantization management, and inference. Install it directly on the host first — this keeps the model files on the host filesystem and makes them easy to manage separately from the Open WebUI container.

# Install Ollama (works on Linux, macOS, and WSL2)
curl -fsSL https://ollama.com/install.sh | sh

# Verify installation
ollama --version

# Ollama starts automatically as a systemd service on Linux
# It listens on http://localhost:11434 by default
systemctl status ollama

Now pull your first model. Start with something small enough to run comfortably on your hardware:

# Great starting models for CPU-only servers (8 GB RAM)
ollama pull llama3.2          # Meta's Llama 3.2 3B — fast, capable, great for general chat
ollama pull mistral           # Mistral 7B — excellent reasoning, widely used
ollama pull qwen2.5:7b        # Alibaba Qwen 2.5 7B — strong on code and multilingual

# For servers with 16+ GB RAM or a GPU
ollama pull llama3.1:8b       # Llama 3.1 8B — excellent all-rounder
ollama pull deepseek-r1:8b    # DeepSeek R1 8B — strong reasoning model
ollama pull gemma3:12b        # Google Gemma 3 12B — needs ~8 GB VRAM or 16 GB RAM

# List downloaded models
ollama list

Model files are stored at /usr/share/ollama/.ollama/models on Linux by default. Make sure you have enough disk space before pulling large models — a 7B model typically takes 4–5 GB, a 13B model takes 7–8 GB.

Step 2: Deploy Open WebUI with Docker Compose

With Ollama running on the host, deploy Open WebUI as a Docker container that connects to it.

Project Structure

open-webui/
├── docker-compose.yml
├── .env
└── data/           # Open WebUI persistent data (auto-created)

The .env File

# .env — do NOT commit this file

# The public URL Open WebUI will be served at
WEBUI_URL=https://chat.yourdomain.com

# A long random string — used to sign sessions and JWT tokens
# Generate one with: openssl rand -hex 32
WEBUI_SECRET_KEY=replace_me_with_a_long_random_string

# Ollama runs on the host, so use the Docker host gateway address
OLLAMA_BASE_URL=http://host.docker.internal:11434

# Optional: connect to OpenAI for cloud model fallback
# OPENAI_API_KEY=sk-...
# OPENAI_API_BASE_URL=https://api.openai.com/v1

docker-compose.yml

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    extra_hosts:
      # Lets the container resolve "host.docker.internal" to the host IP
      # so it can reach Ollama running on the host
      - host.docker.internal:host-gateway
    volumes:
      - ./data:/app/backend/data
    environment:
      OLLAMA_BASE_URL: ${OLLAMA_BASE_URL}
      WEBUI_SECRET_KEY: ${WEBUI_SECRET_KEY}
      WEBUI_URL: ${WEBUI_URL}
      # Uncomment to disable public signup (recommended for production)
      # ENABLE_SIGNUP: "false"
      # WEBUI_AUTH: "true"
    labels:
      - "traefik.enable=true"
      - "traefik.http.routers.open-webui.rule=Host(`chat.yourdomain.com`)"
      - "traefik.http.routers.open-webui.entrypoints=websecure"
      - "traefik.http.routers.open-webui.tls.certresolver=letsencrypt"
      - "traefik.http.services.open-webui.loadbalancer.server.port=8080"
    networks:
      - traefik-net  # must match your Traefik external network

networks:
  traefik-net:
    external: true

Start it:

docker compose up -d

# Follow startup logs
docker logs open-webui -f --tail 50

# Confirm the container is running
docker ps --filter name=open-webui

Once running, open https://chat.yourdomain.com in your browser. The first account you create becomes the admin. From there you control model access, user signups, and the full admin panel.

CPU-Only Alternative: Bundled Ollama Container

If you'd rather keep everything in Docker without installing Ollama on the host, use the bundled :ollama image tag. This packages Ollama and Open WebUI together in a single container:

# All-in-one: Open WebUI + Ollama in one container (CPU only)
docker run -d \
  --name open-webui \
  -p 3000:8080 \
  -v ollama:/root/.ollama \
  -v open-webui:/app/backend/data \
  -e WEBUI_SECRET_KEY=your_secret_key_here \
  --restart always \
  ghcr.io/open-webui/open-webui:ollama

# With Nvidia GPU passthrough
docker run -d \
  --name open-webui \
  --gpus all \
  -p 3000:8080 \
  -v ollama:/root/.ollama \
  -v open-webui:/app/backend/data \
  -e WEBUI_SECRET_KEY=your_secret_key_here \
  --restart always \
  ghcr.io/open-webui/open-webui:ollama

Step 3: Connect Models and Configure the Admin Panel

Once you're logged in as admin, head to Admin Panel → Settings to configure your stack.

Verify Ollama Connection

Go to Admin Panel → Settings → Connections. You should see your Ollama instance listed and marked as connected. Any models you've already pulled with ollama pull appear here automatically. You can also pull new models directly from the UI without touching the terminal — go to Admin Panel → Models → Pull a model from Ollama.com and type a model name.

Add OpenAI or Compatible API (Optional)

Open WebUI supports any OpenAI-compatible API alongside local models. This lets you use GPT-4o, Claude (via a proxy), or any other hosted model from the same interface. Add it under Admin Panel → Settings → Connections → OpenAI API:

# Environment variables for OpenAI API fallback
# Set these in your .env and redeploy, or configure via the Admin Panel UI

OPENAI_API_KEY=sk-your-openai-api-key
OPENAI_API_BASE_URL=https://api.openai.com/v1

# You can also point this at any OpenAI-compatible endpoint:
# - Groq:      https://api.groq.com/openai/v1
# - Together:  https://api.together.xyz/v1
# - Ollama:    http://localhost:11434/v1
# - LM Studio: http://localhost:1234/v1

Control User Signups

By default Open WebUI allows anyone who can reach the URL to create an account. For a private deployment, disable public signups from Admin Panel → Settings → General → Enable New Sign Ups. You can then invite users individually. Alternatively, set ENABLE_SIGNUP=false as an environment variable from the start.

Step 4: Key Features Worth Enabling

Open WebUI ships with a serious feature set out of the box. Here are the ones that change how you work with it.

RAG — Chat with Your Documents

Upload PDFs, text files, Word documents, or paste URLs and Open WebUI will chunk, embed, and index them. In any chat, use the 📎 attachment button to attach a file, or type # to reference a document from your library. The model can then answer questions grounded in that content.

For best RAG results, increase the context length on your Ollama model under Admin Panel → Models → [model] → Advanced Parameters → Context Length. Set it to at least 8192 — the default 2048 is too short for document-grounded answers.

Web Search

Enable live web search under Admin Panel → Settings → Web Search. Supports SearXNG (self-hosted), Brave Search API, Google PSE, and others. Once enabled, users can toggle web search on in any conversation with the 🌐 button.

Image Generation

Connect AUTOMATIC1111 or ComfyUI under Admin Panel → Settings → Images to add image generation inside the chat interface. If you don't have a local image generation stack, you can connect OpenAI's DALL-E API instead.

Workspace — Prompts, Models, and Knowledge Bases

The Workspace section (left sidebar) lets you create and share:

  • Custom models — Preconfigured Ollama or API models with system prompts, temperature settings, and a description. Essentially, saved personas.
  • Prompts — Reusable prompt templates accessible with / shortcuts in any chat.
  • Knowledge — Named document collections that can be attached to any model as a persistent knowledge base.

Step 5: Production Hardening

A default Open WebUI install is functional but needs a few changes before it's production-ready.

WebSocket Support

Open WebUI requires WebSocket connections for live streaming responses. Make sure your Traefik configuration doesn't strip upgrade headers. Add this middleware to your Traefik labels if you're seeing streaming issues:

# Add to your Traefik static config or as a dynamic config file
# This ensures WebSocket upgrade headers pass through correctly
http:
  middlewares:
    websocket-headers:
      headers:
        customRequestHeaders:
          X-Forwarded-Proto: https
        # Traefik handles WS upgrade automatically, but
        # ensure your entrypoint isn't stripping Connection headers

Pin the Image Version

Never run :main in production — it's a floating tag and updates can break things silently. Pin to a specific release:

# In docker-compose.yml, pin to a specific version
image: ghcr.io/open-webui/open-webui:v0.9.6

# Or for Nvidia GPU support:
image: ghcr.io/open-webui/open-webui:v0.9.6-cuda

# Check the latest releases at:
# https://github.com/open-webui/open-webui/releases

Production Hardening Checklist

  • Set a strong WEBUI_SECRET_KEY — Generate with openssl rand -hex 32. Changing this after users are logged in will invalidate all sessions.
  • Disable public signups — Set ENABLE_SIGNUP=false or turn it off in the Admin Panel once you've created accounts for your team.
  • Back up ./data — Everything — users, conversations, documents, model configs, knowledge bases — lives in this directory. Back it up regularly. A simple rsync or snapshot of the volume is sufficient.
  • Restrict Ollama to localhost — By default Ollama listens on 127.0.0.1:11434. Keep it that way. Do not expose Ollama directly to the internet.
  • Set resource limits — Add Docker CPU/memory limits to prevent inference from starving other services on the same host.
  • Keep Ollama and Open WebUI updated — Pull the latest images regularly, but always test in staging first when pinning versions.

Tips and Troubleshooting

"Could not connect to Ollama" in the Admin Panel

The most common cause is host.docker.internal not resolving correctly. This special hostname routes from inside the container to the Docker host's network. Verify:

# Test from inside the container
docker exec open-webui curl -s http://host.docker.internal:11434/api/tags

# Should return a JSON list of your models
# If it fails, verify:
# 1. Ollama is running: systemctl status ollama
# 2. Ollama is listening on 0.0.0.0 (or at least 127.0.0.1)
# 3. Your docker-compose.yml has the extra_hosts line
sudo netstat -tlnp | grep 11434

Slow inference on CPU

CPU inference is limited by RAM bandwidth and thread count. Set the OLLAMA_NUM_PARALLEL environment variable to match the number of concurrent users you expect. Setting it too high degrades per-request speed. For a single-user or small-team setup, OLLAMA_NUM_PARALLEL=1 gives the best single-chat performance. Also try smaller quantized models — llama3.2:3b is often fast enough for most tasks and runs well on 4 GB RAM.

Streaming stops mid-response

This is almost always a WebSocket timeout at the proxy layer. Traefik's default timeout is sufficient for most requests, but very long responses can be cut off if a custom timeout is set. Check your Traefik middleware config for any responseHeaderTimeout or readTimeout values that might be too short for slow CPU inference.

Users can't log in after a secret key change

WEBUI_SECRET_KEY is used to sign session tokens. If you rotate this key, all existing sessions are invalidated and users will see a 401 / "invalid session" error. This is expected — they just need to log in again. If it happens unexpectedly, check that the env var is being read correctly in your Compose stack (docker compose config shows the resolved values).

Model doesn't appear in the chat dropdown

After pulling a new model with ollama pull, Open WebUI refreshes the model list automatically on the next page load. If it still doesn't appear, go to Admin Panel → Models and click the refresh icon next to the Ollama connection. Also check that the model wasn't hidden — models can be toggled visible/hidden per-user or globally from the Admin Panel.

Out of disk space — models are huge

LLM weights take serious storage. A 7B model is 4–5 GB, a 13B model is 7–8 GB, and a 70B model can be 40 GB+. Keep an eye on usage:

# Check how much space your models are using
ollama list

# Get the model storage directory size
du -sh /usr/share/ollama/.ollama/models/

# Remove a model you no longer use
ollama rm codellama
ollama rm llama2

# Check container and volume disk usage
docker system df
docker volume ls

HTTPS cert not provisioning / mixed content errors

Check that your DNS A record has propagated, port 80 is open (Let's Encrypt HTTP-01 challenge requires it), and your WEBUI_URL matches your actual domain with https://. A mismatched WEBUI_URL causes Open WebUI to generate internal links using the wrong protocol, which results in mixed-content browser errors.

What to Do Next

You now have a private, self-hosted AI chat interface that doesn't phone home. Here's where to take it from here:

  • Try different models — Ollama's library at ollama.com/library has hundreds of models. DeepSeek R1 for reasoning, Qwen2.5-Coder for code, Phi-4 Mini for lightweight speed — experiment with what fits your use case.
  • Build a knowledge base — Upload your documentation, runbooks, or research papers into Open WebUI's Knowledge feature and query them with any model. It's a genuinely useful internal AI search tool.
  • Add Pipelines for custom logic — Open WebUI's Pipelines feature lets you intercept and transform messages before they hit the model. Use it for content filtering, custom RAG, rate limiting, logging, or function calling.
  • Connect a web search backend — Self-host SearXNG alongside Open WebUI to give your AI access to live web results without routing search queries through any third-party service.
  • Set up multiple users — Open WebUI has full multi-user support with role-based access. Create accounts for your team, assign model visibility per-role, and manage usage from the Admin Panel.
  • Use Open WebUI as an OpenAI-compatible API — Open WebUI exposes its own API that any OpenAI client can talk to. Generate an API key under Settings → Account → API Keys and point your scripts, tools, or CI pipelines at your own instance instead of OpenAI.

Need Help Deploying This at Scale?

Self-hosting Open WebUI for personal use is straightforward. Deploying it for a team — with SSO integration, shared knowledge bases, GPU infrastructure, access controls, and ongoing maintenance — is a different project. If you need enterprise-grade private AI infrastructure, Sysbrix can help you design and operate it.

Talk to our team → We help engineering teams run private, production-ready AI stacks on their own infrastructure — no usage fees, no data leaving your network.

Keycloak Docker Setup Guide: Self-Hosted SSO and Auth That Doesn't Make You Want to Quit
Set up Keycloak on your own server with Docker Compose, PostgreSQL, and a reverse proxy — and walk away with a working SSO stack you actually understand.