Open WebUI Setup Guide: Deploy Your Own Private ChatGPT with Local Models and Team Access

Every message you send to ChatGPT is processed on OpenAI's servers, logged for training, and subject to their data policies. For sensitive work — customer data, proprietary code, confidential documents — that's a real problem. Open WebUI gives you a polished, feature-rich chat interface that runs entirely on your own hardware. Your conversations never leave your server. You choose which models run. Your team gets private AI without the privacy tradeoffs. This Open WebUI setup guide gets you from zero to a fully functional private AI platform in under an hour.

Prerequisites

A Linux server or powerful workstation (Ubuntu 22.04 LTS recommended)
Docker Engine and Docker Compose v2 installed
At least 8GB RAM for running smaller models (16GB+ recommended for 13B+ models)
For GPU acceleration: an NVIDIA GPU with CUDA 12+ or an AMD GPU with ROCm
A domain name if you want HTTPS and team access (or use localhost for personal use)

Quick system check before starting:

# Verify Docker is running:
docker --version && docker compose version

# Check available RAM:
free -h

# Check for NVIDIA GPU (optional but recommended for speed):
nvidia-smi 2>/dev/null && echo "GPU available" || echo "CPU-only mode"

# Check available disk space (models are large):
df -h ~
# Need at least 10GB free for a small setup, 50GB+ for multiple models

Installing Ollama: Your Local Model Engine

Ollama handles the hard parts of running local AI models: downloading them, managing their storage, and serving them via an OpenAI-compatible API. Open WebUI talks to Ollama, which does the actual inference. If you want to connect to cloud providers instead (OpenAI, Anthropic), you can skip this section and configure those API keys directly in Open WebUI.

# Install Ollama:
curl -fsSL https://ollama.com/install.sh | sh

# Verify it's running:
ollama --version
curl http://localhost:11434/api/version  # Should return {"version":"..."}

# Pull your first model (choose based on your RAM):
# For 8GB RAM: use 7B models
ollama pull llama3.2

# For 16GB RAM: use 13B models
ollama pull llama3.1:13b

# For 32GB+ RAM: use 70B models (excellent quality)
ollama pull llama3.1:70b

# List available models:
ollama list

# Quick test before connecting to Open WebUI:
ollama run llama3.2 "Say hello in one sentence"
# Should respond within seconds on GPU, within a minute on CPU

GPU Acceleration Setup

# Verify Ollama is using your GPU:
ollama run llama3.2 "test" &
sleep 3
nvidia-smi | grep -i ollama
# Should show ollama process using VRAM

# If GPU is detected but not used:
# Check CUDA version compatibility:
nvcc --version
# Ollama requires CUDA 12+

# For AMD GPUs (ROCm):
# Ollama auto-detects ROCm on supported cards
# Check: rocm-smi (if ROCm is installed)

# Monitor GPU usage during inference:
watch -n 1 nvidia-smi  # Run in a separate terminal

Deploying Open WebUI with Docker

Open WebUI runs as a Docker container. The setup is simpler than you might expect — one docker compose file, a few environment variables, and you have a fully functional AI chat interface.

Basic Single-User Setup (Local Use Only)

# Quick start — for local personal use:
docker run -d \
  -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

# Open http://localhost:3000
# First account you create becomes the admin
# Ollama models appear automatically if Ollama is running on the same machine

Production Setup with Docker Compose

mkdir -p ~/open-webui && cd ~/open-webui

# Create docker-compose.yml:
cat > docker-compose.yml << 'EOF'
version: '3.8'

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    volumes:
      - open_webui_data:/app/backend/data
    environment:
      # Connect to Ollama running on the host machine:
      - OLLAMA_BASE_URL=http://host.docker.internal:11434

      # Security — change this secret key:
      - WEBUI_SECRET_KEY=your-random-secret-key-here-change-this

      # Optional: Pre-configure OpenAI API key
      # - OPENAI_API_KEY=sk-your-openai-key
      # - OPENAI_API_BASE_URL=https://api.openai.com/v1

      # Optional: Set your public URL (important for correct redirects):
      # - WEBUI_URL=https://chat.yourdomain.com

      # User registration settings:
      - ENABLE_SIGNUP=true          # Set to false after initial setup
      - DEFAULT_USER_ROLE=pending   # New users need admin approval
      # Change to 'user' to auto-approve registrations

    extra_hosts:
      - "host.docker.internal:host-gateway"

volumes:
  open_webui_data:
EOF

# Generate a random secret key:
echo "WEBUI_SECRET_KEY=$(openssl rand -hex 32)" >> .env

# Start Open WebUI:
docker compose up -d

# Check it's running:
docker compose logs -f open-webui
# Watch for: "Uvicorn running on http://0.0.0.0:8080"

# Access it:
echo "Open WebUI: http://localhost:3000"
echo "First account = admin"

HTTPS and Team Access

For team access, you need HTTPS (browsers block camera/microphone access without it, and it's just good security practice). The cleanest approach is using Nginx or Traefik as a reverse proxy.

sudo apt install nginx certbot python3-certbot-nginx -y

# Create Nginx config:
sudo tee /etc/nginx/sites-available/open-webui << 'EOF'
server {
    listen 80;
    server_name chat.yourdomain.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl http2;
    server_name chat.yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/chat.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/chat.yourdomain.com/privkey.pem;

    # Required for chat streaming to work:
    proxy_read_timeout 300s;
    proxy_buffering off;

    location / {
        proxy_pass http://localhost:3000;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}
EOF

sudo ln -s /etc/nginx/sites-available/open-webui /etc/nginx/sites-enabled/
sudo nginx -t

# Get SSL certificate (replace with your domain and email):
sudo certbot --nginx -d chat.yourdomain.com --email [email protected] --agree-tos -n

sudo systemctl reload nginx

# Update your docker-compose.yml to add the public URL:
# environment:
#   - WEBUI_URL=https://chat.yourdomain.com
docker compose up -d --force-recreate

Configuration and Model Management

Adding More Models

# Pull models from the command line — they appear automatically in Open WebUI:

# General purpose chat:
ollama pull llama3.2           # 3B — fast, good for quick tasks
ollama pull llama3.1           # 8B — better quality, reasonable speed

# Code assistance:
ollama pull codellama          # Specialized for code
ollama pull deepseek-coder-v2  # Excellent for code generation

# Reasoning and complex tasks:
ollama pull qwen2.5:32b       # Strong reasoning (needs 20GB+ RAM)

# Document/vision analysis (multimodal):
ollama pull llava              # Can analyze images
ollama pull minicpm-v          # Efficient vision model

# Or pull directly from Open WebUI:
# Admin Panel → Settings → Models → Pull a model from Ollama.com
# Just type the model name (e.g., "phi3:mini")

# You can also add OpenAI, Anthropic, or any OpenAI-compatible API:
# Admin Panel → Settings → Connections → Add connection
# Works with: OpenAI, Anthropic, Groq, Together.ai, LiteLLM, Mistral, etc.

# List all available models:
ollama list

User Management and Roles

# Open WebUI has three roles:
# admin: Full access, manages users and settings
# user: Standard access to chat and models
# pending: Registered but awaiting approval (when DEFAULT_USER_ROLE=pending)

# Team setup workflow:
# 1. Admin creates their account first
# 2. Team members register at your Open WebUI URL
# 3. Admin approves pending users: Admin Panel → Users → Approve

# For automated team provisioning via environment variables:
# Create users via the API:
curl -X POST http://localhost:3000/api/v1/auths/signup \
  -H 'Content-Type: application/json' \
  -d '{
    "email": "[email protected]",
    "password": "secure-password",
    "name": "Alice Smith"
  }'

# For LDAP/SSO integration (requires Open WebUI Enterprise or OIDC setup):
# Environment variable in docker-compose.yml:
# - OAUTH_CLIENT_ID=your-oauth-client-id
# - OAUTH_CLIENT_SECRET=your-oauth-secret
# - OPENID_PROVIDER_URL=https://your-idp.com/.well-known/openid-configuration
# Works with Keycloak, Auth0, Okta, Google Workspace, Azure AD

# Model access control:
# Admin Panel → Models → select a model → set access level
# Options: Everyone, Only users with specific roles, Specific groups

Tips, Gotchas, and Troubleshooting

# PROBLEM: Models not showing up in Open WebUI

# Check Ollama is accessible from the container:
docker exec open-webui curl -s http://host.docker.internal:11434/api/version
# If this fails, the host.docker.internal mapping isn't working

# Alternative: use the host IP directly
# Find your host's Docker bridge IP:
ip route show default | awk '/default/ {print $3}'
# Usually 172.17.0.1 or similar

# Update docker-compose.yml:
# OLLAMA_BASE_URL=http://172.17.0.1:11434

# PROBLEM: Responses are very slow (CPU-only inference)

# Check if Ollama is actually using the GPU:
nvidia-smi  # Run while a model is generating
# If GPU Memory-Usage shows 0MB: CUDA issue or model too large for VRAM

# Use a smaller model that fits in VRAM:
# RTX 3060 (12GB VRAM): llama3.1 (8B) fits well
# RTX 4090 (24GB VRAM): llama3.1:13b or llama3.1:70b-q4

# PROBLEM: Open WebUI won't start

# Check container logs:
docker compose logs open-webui

# Common issue: port 3000 already in use
ss -tlnp | grep 3000
# Change port in docker-compose.yml: "3001:8080" instead

# Common issue: data directory permissions
docker exec open-webui ls -la /app/backend/data
# Should be writable by the app user

# PROBLEM: Streaming responses cut off

# Your Nginx proxy_read_timeout may be too short:
# Add to nginx config:
# proxy_read_timeout 300s;
# proxy_send_timeout 300s;
# proxy_connect_timeout 60s;

# UPDATING OPEN WEBUI:
docker compose pull
docker compose up -d
# Data is preserved in the volume — updates are safe

Pro Tips for Getting the Best Results

Use quantized models (Q4_K_M or Q5_K_M) for the best quality/speed tradeoff — full precision models are usually overkill. When pulling from Ollama, models like llama3.1:8b-instruct-q4_K_M give you 90%+ of the quality at 50% of the VRAM and 2x the speed.
Create system prompts as reusable presets — Open WebUI supports saving prompt templates. Create one for "Code Review", "Document Summarizer", "Customer Email", etc. that you and your team can launch instantly rather than typing the same instructions repeatedly.
Upload documents directly to chat contexts — you can drag files into the chat window. Open WebUI extracts the text and the model can answer questions about it. For persistent document access, use the Knowledge feature to create searchable RAG document collections.
Use the API for programmatic access — Open WebUI exposes an OpenAI-compatible API. Any code that works with OpenAI can work with your Open WebUI by changing the base URL and API key: OPENAI_BASE_URL=https://chat.yourdomain.com/api and using your Open WebUI API key.
Back up your data volume weekly — all conversations, settings, and user data live in the open_webui_data Docker volume. Back it up with: docker run --rm -v open_webui_data:/data -v /backup:/backup alpine tar czf /backup/open-webui-backup.tar.gz /data

Wrapping Up

A self-hosted Open WebUI gives your team a genuinely good AI chat experience — comparable to ChatGPT in UX quality — with complete data privacy. Your conversations stay on your server, you choose your models, and there's no per-seat licensing. For organizations handling sensitive data where every chat message going to OpenAI's servers is a compliance concern, this setup eliminates that risk entirely.

The initial setup investment (30-60 minutes) pays off the first time you have a sensitive conversation you'd hesitate to have through a cloud service. For teams with multiple users, the economics are even more compelling: one server, multiple users, zero per-message costs for local models.

Need a Private AI Platform Built for Your Organization?

Deploying Open WebUI for a team with SSO integration, GPU infrastructure, fine-tuned models, and the operational practices that keep it running — the sysbrix team builds private AI platforms that organizations can actually rely on for sensitive work.

Talk to Us →

in Guides

# AI ChatGPT Alternative Ollama Open WebUI Self-Hosted

Production Guide: Deploy langfuse with Kubernetes, Helm, and Ingress-NGINX on Ubuntu

Production-first deployment with secure secrets handling, verification, and practical troubleshooting.