Skip to Content

Open WebUI Setup Guide: Deploy a Private ChatGPT Interface That Runs on Your Own Hardware

Learn how to install Open WebUI with Docker, connect it to local Ollama models and cloud APIs, configure multi-user access, and give your team a polished private AI chat platform in under an hour.
OpenWebUI setup guide

Open WebUI Setup Guide: Deploy a Private ChatGPT Interface That Runs on Your Own Hardware

Every message you send to ChatGPT leaves your network. For internal documents, customer data, proprietary code, and sensitive business queries — that's a problem. Open WebUI gives your team a full-featured, ChatGPT-quality chat interface that runs entirely on your own server. It works with local models via Ollama, cloud APIs via OpenAI or Anthropic, and any OpenAI-compatible endpoint. This Open WebUI setup guide gets you from zero to a working private AI platform your team can use daily.


Prerequisites

  • A Linux server or local machine (Ubuntu 20.04+ recommended)
  • Docker Engine and Docker Compose v2 installed
  • At least 2GB RAM for Open WebUI itself — add 4GB+ per local model if running Ollama on the same machine
  • Port 3000 available (or any port you choose)
  • Ollama installed for local models, or an API key for a cloud LLM provider
  • A domain name for team access (optional for local use, recommended for production)

Verify Docker is ready and check system resources:

docker --version
docker compose version
free -h
df -h /

# Check if Ollama is running (if using local models):
curl -s http://localhost:11434/api/tags | jq '[.models[].name]'

# Check port 3000 is free:
sudo ss -tlnp | grep 3000

Installing Open WebUI

Option 1: Quick Start with Bundled Ollama

If you want everything in one container — Ollama and the Open WebUI frontend — this is the fastest path to a working setup:

# All-in-one: Open WebUI + Ollama bundled (CPU only)
docker run -d \
  -p 3000:8080 \
  -v ollama:/root/.ollama \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:ollama

# With NVIDIA GPU support:
docker run -d \
  --gpus all \
  -p 3000:8080 \
  -v ollama:/root/.ollama \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:cuda

# Open http://localhost:3000 — create your admin account on first visit

Option 2: Docker Compose with External Ollama (Recommended)

If Ollama is already running on your host, connect Open WebUI to it via Compose. This gives each service its own lifecycle and makes upgrades cleaner:

mkdir -p ~/open-webui && cd ~/open-webui

cat > docker-compose.yml << 'EOF'
version: '3.8'

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    environment:
      # Ollama running on host — use bridge IP on Linux
      - OLLAMA_BASE_URL=http://172.17.0.1:11434
      # Optional: pre-configure OpenAI or any compatible API
      - OPENAI_API_BASE_URL=${OPENAI_API_BASE_URL:-}
      - OPENAI_API_KEY=${OPENAI_API_KEY:-}
      # Security
      - WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY}
      - WEBUI_NAME=Company AI
      # Allow signup during initial team onboarding:
      - ENABLE_SIGNUP=true
      # Set to false after team registers to lock down:
      # - ENABLE_SIGNUP=false
    volumes:
      - open_webui_data:/app/backend/data
    extra_hosts:
      - "host.docker.internal:host-gateway"

volumes:
  open_webui_data:
EOF

# Create .env file with secrets:
cat > .env << EOF
WEBUI_SECRET_KEY=$(openssl rand -hex 32)
OPENAI_API_KEY=sk-your-openai-key-if-needed
OPENAI_API_BASE_URL=https://api.openai.com/v1
EOF

docker compose up -d
docker compose logs -f open-webui

Watch for Application startup complete in the logs. Then open http://localhost:3000. The first account you create is automatically the admin.


Connecting Models and Providers

Pulling Local Models via Ollama

In the Open WebUI interface, go to Admin Panel → Settings → Models → Pull a model from Ollama.com. Enter a model name and click Pull. The download progress appears in real time — no terminal required.

Alternatively, pull models from the host terminal:

# Pull models directly on the host:
ollama pull llama3.2        # Fast 3B — runs on almost anything
ollama pull llama3.1:8b     # Good quality, needs ~6GB RAM
ollama pull mistral         # Strong for coding and reasoning
ollama pull nomic-embed-text  # Required for document RAG

# Verify models are available:
ollama list

# Check Ollama is reachable from inside the Open WebUI container:
docker exec open-webui curl -s http://172.17.0.1:11434/api/tags | \
  python3 -m json.tool | head -20

# If 172.17.0.1 doesn't work, find the correct bridge IP:
ip addr show docker0 | grep 'inet ' | awk '{print $2}' | cut -d/ -f1

Connecting OpenAI or Any Compatible API

Go to Admin Panel → Settings → Connections → OpenAI API. Add your API base URL and key. Open WebUI immediately lists all available models from that endpoint alongside your Ollama models in the model selector dropdown.

This works with OpenAI, Anthropic (via an OpenAI-compatible proxy), LiteLLM, and any other compatible endpoint:

# For LiteLLM proxy (routes to multiple providers with one API key):
# API Base URL: http://litellm:4000/v1
# API Key: your-litellm-master-key

# For Anthropic via OpenAI-compatible proxy:
# API Base URL: https://api.anthropic.com/v1
# API Key: sk-ant-your-key

# For Azure OpenAI:
# API Base URL: https://your-resource.openai.azure.com/openai/deployments/your-deployment
# API Key: your-azure-key

# Test that models are visible after adding the connection:
curl http://localhost:3000/api/models \
  -H 'Authorization: Bearer YOUR_OPEN_WEBUI_API_KEY' | \
  jq '[.data[].id]'

User Management and Access Control

Managing Team Accounts

Go to Admin Panel → Users to manage all registered accounts. Key actions available:

  • Promote to admin — give another user full administrative access
  • Set role — Admin, User, or Pending (requires admin approval before access)
  • Suspend — block access without deleting conversation history
  • Delete — permanently remove account and all associated chats

Locking Down Registration

After your team has signed up, disable open registration so no one else can create accounts:

# Option 1: Disable signup via environment variable (recommended):
# Update docker-compose.yml environment section:
# - ENABLE_SIGNUP=false
docker compose up -d --force-recreate open-webui

# Option 2: Require admin approval for new signups:
# In Admin Panel → Settings → General:
# Set "Default User Role" to "pending"
# New users can register but can't chat until an admin changes their role

# Option 3: Invite-only (via email domain restriction):
# Add to environment:
# - ENABLE_SIGNUP=true
# - ENABLE_LOGIN_FORM=true
# - WEBUI_AUTH=true
# Users must be manually created by an admin — no self-registration

# Verify signup is disabled:
curl -X POST http://localhost:3000/api/auths/signup \
  -H 'Content-Type: application/json' \
  -d '{"name":"test","email":"[email protected]","password":"test"}'
# Should return: 403 Forbidden if signup is disabled

Per-Model Access Control

Admins can control which models are visible to regular users. In Admin Panel → Models, you can hide expensive cloud models from all users except admins, rename models to friendlier names, and set custom descriptions. This prevents junior team members from accidentally running GPT-4o for every chat when GPT-4o-mini would do.


Document RAG: Querying Your Own Files

Open WebUI includes built-in RAG (Retrieval-Augmented Generation) that lets users upload documents and ask questions about them directly in the chat interface. Files are chunked, embedded into a local vector store, and retrieved contextually on each query.

Configuring the Embedding Model

# RAG requires an embedding model — set one before uploading documents
# In Admin Panel → Settings → Documents → Embedding Model

# Option 1: Local embedding with Ollama (no data leaves your server):
# Pull the embedding model first:
ollama pull nomic-embed-text
# Then in Open WebUI:
# Embedding Model Engine: Ollama
# Embedding Model: nomic-embed-text

# Option 2: OpenAI embeddings (faster, costs per token):
# Embedding Model Engine: OpenAI
# Embedding Model: text-embedding-3-small

# Verify the embedding model is working after setting it:
# In Admin Panel → Documents → click the connection test icon
# Should show a green checkmark

# Recommended RAG settings for most use cases:
# Chunk Size: 1500 characters
# Chunk Overlap: 100 characters
# Top K Results: 5
# Score Threshold: 0.0 (include all chunks above 0 relevance)

# To use RAG in a chat:
# Upload a document using the paperclip icon in the chat input
# Open WebUI chunks, embeds, and stores it automatically
# Ask questions — responses include source citations from your document

Serving Open WebUI Over HTTPS

For team access from multiple devices, put Open WebUI behind Nginx with a proper domain and SSL certificate:

sudo apt install nginx certbot python3-certbot-nginx -y

# Create Nginx config:
sudo tee /etc/nginx/sites-available/open-webui << 'EOF'
server {
    listen 80;
    server_name ai.yourdomain.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl http2;
    server_name ai.yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/ai.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/ai.yourdomain.com/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;

    # File uploads for document RAG:
    client_max_body_size 100M;

    location / {
        proxy_pass http://localhost:3000;
        proxy_http_version 1.1;
        # WebSocket support (required for streaming responses):
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_cache_bypass $http_upgrade;
        # Streaming LLM responses need longer timeouts:
        proxy_read_timeout 300s;
        proxy_buffering off;
    }
}
EOF

sudo ln -s /etc/nginx/sites-available/open-webui /etc/nginx/sites-enabled/
sudo nginx -t

# Get SSL certificate:
sudo certbot --nginx -d ai.yourdomain.com

sudo systemctl reload nginx

# Test HTTPS:
curl -I https://ai.yourdomain.com
# Should return HTTP/2 200

After enabling HTTPS, update Open WebUI's base URL in Admin Panel → Settings → General → WebUI URL to https://ai.yourdomain.com. This ensures invite links, OAuth callbacks, and the PWA manifest all reference the correct URL.


Tips, Gotchas, and Troubleshooting

Ollama Not Reachable from the Container

# Symptom: Models section shows empty, or "Failed to fetch" for Ollama

# Step 1: Find the correct host IP to use:
ip addr show docker0 | grep 'inet ' | awk '{print $2}' | cut -d/ -f1
# Typically: 172.17.0.1

# Step 2: Test reachability from inside the container:
docker exec open-webui curl -s http://172.17.0.1:11434/api/tags | head -20

# If this fails, Ollama is only listening on localhost, not the Docker bridge:
# Fix: make Ollama listen on all interfaces:

# For systemd-managed Ollama:
sudo systemctl edit ollama
# Add:
# [Service]
# Environment="OLLAMA_HOST=0.0.0.0:11434"
sudo systemctl restart ollama

# Verify Ollama is now listening on 0.0.0.0:
ss -tlnp | grep 11434
# Should show: 0.0.0.0:11434 not 127.0.0.1:11434

# Step 3: Update OLLAMA_BASE_URL in docker-compose.yml:
# Use host.docker.internal (with extra_hosts mapping) for a stable hostname:
# - OLLAMA_BASE_URL=http://host.docker.internal:11434
docker compose up -d --force-recreate open-webui

Streaming Responses Cut Off Mid-Generation

# Streaming LLM responses use long-lived HTTP connections
# Nginx default read timeout (60s) kills them for long responses

# Verify the Nginx config has:
# proxy_read_timeout 300s;
# proxy_buffering off;

# Test by requesting a long response:
curl -N -s https://ai.yourdomain.com/api/chat \
  -H 'Content-Type: application/json' \
  -d '{"model": "llama3.2", "messages": [{"role": "user", "content": "Count from 1 to 100 slowly"}], "stream": true}'
# Should stream the entire output without cutting off

# If still cutting off, check if anything else is between client and container:
# Cloud provider load balancers often have their own timeout settings
# Check AWS ALB (60s default), Cloudflare (100s), etc.

# For Cloudflare: disable proxying (grey cloud) for the AI subdomain
# or upgrade to Cloudflare Business for configurable timeout

RAG Not Finding Information From Uploaded Documents

# Check the embedding model is configured and working:
# Admin Panel → Settings → Documents → click the test icon

# If embeddings are working but retrieval is poor:

# 1. Verify the document was actually indexed:
# In the chat where you uploaded, type # to reference documents
# The uploaded file should appear in the autocomplete list

# 2. Check RAG is enabled for the current chat:
# Look for the document icon in the chat toolbar — should be active

# 3. Check chunk size settings:
# Very large chunks (>2000 chars) can hurt precision
# Very small chunks (<200 chars) lose context
# Default 1500 is usually good for most documents

# 4. Try the hybrid search mode:
# Admin Panel → Settings → Documents → Vector Database
# Enable: Use Hybrid Search (BM25 + vector)
# This combines keyword matching with semantic similarity

# 5. Check if the document format is supported:
# Supported: PDF, DOCX, TXT, MD, HTML, CSV, Excel, PowerPoint
# NOT supported: Images, scanned PDFs without OCR
# For scanned PDFs: enable PDF OCR in Admin Panel → Documents

Updating Open WebUI

cd ~/open-webui

# Pull latest image:
docker compose pull open-webui

# Restart with new image:
docker compose up -d open-webui

# Watch startup — database migrations run automatically:
docker compose logs -f open-webui | grep -E '(startup|migration|error|ready)'

# Verify the new version is running:
docker inspect open-webui | jq -r '.[0].Config.Image'

# All user accounts, conversation history, uploaded documents, 
# and settings persist in the open_webui_data volume across updates

Pro Tips

  • Create Model Presets for common use cases — go to Workspace → Models to create named configurations that combine a base model with a system prompt, temperature, and other settings. Create presets like Code Assistant (GPT-4o, temperature 0.1), Document Summarizer (Claude, focus on brevity), and Brainstorming (GPT-4o, temperature 0.8). Users pick the right tool for the task without tweaking settings manually.
  • Use the Pipelines feature for custom middleware — Open WebUI Pipelines let you intercept every message with custom Python code. Use it for content filtering, automatic prompt enhancement, logging to an external system, or routing specific query types to different models.
  • Install it as a PWA for a native-app feel — from Chrome, click the install icon in the address bar. From iOS Safari, use Share → Add to Home Screen. Users get a full-screen app that looks and feels like a native application, with no browser chrome getting in the way.
  • Back up the data volume weekly — all conversation history, user accounts, uploaded documents, and settings live in the open_webui_data Docker volume. A simple weekly backup: docker run --rm -v open_webui_data:/data -v /backup:/out alpine tar czf /out/owui-$(date +%Y%m%d).tar.gz /data
  • Set WEBUI_SECRET_KEY and never change it — this key signs user session tokens. Changing it invalidates all existing sessions, logging out everyone simultaneously. Generate it once, store it in your password manager or Vaultwarden, and use the same value across restarts.

Wrapping Up

A complete Open WebUI setup gives your team a polished, daily-driver AI chat interface that keeps every conversation on infrastructure you control. Local models via Ollama mean sensitive queries never leave your network. Cloud API integration via OpenAI gives access to the most capable models when local inference isn't enough. Proper user management, HTTPS, and access controls make it something you can actually share across your organization without security concerns.

Deploy it, connect your Ollama models, put it behind Nginx with HTTPS, disable open registration once your team has signed up, and you have a private ChatGPT replacement running on a $10/month VPS. The ongoing operating cost is your server — not a per-seat subscription that scales with headcount.


Need a Private AI Platform Built for Your Organization?

Deploying Open WebUI for an organization — with SSO integration, GPU-accelerated local inference, departmental model access controls, audit logging, and on-premise hardware — requires more than a Docker run command. The sysbrix team designs and deploys private AI platforms for organizations that need production reliability, data privacy, and the performance that enterprise use demands.

Talk to Us →
MinIO Self-Host Setup: Distributed Mode, Erasure Coding, Multi-Site Replication, and Enterprise Storage Operations
Go beyond single-node MinIO — learn how to run a distributed erasure-coded cluster for data protection, configure active-active replication across sites, implement lifecycle management and object locking for compliance, and operate MinIO at production scale.