Open WebUI Setup Guide: Your Own ChatGPT Alternative Running Locally

You're tired of paying OpenAI subscriptions. You want your data to stay on your servers. And you want a ChatGPT-like interface that actually looks good. Open WebUI gives you exactly that—a polished, feature-rich web interface for running local large language models through Ollama. This guide gets you from docker compose up to a fully working self-hosted AI chat interface in under 15 minutes.

What You'll Build

By the end of this guide, you'll have:

Open WebUI running in Docker with a persistent database
Ollama serving local LLMs (Llama, Mistral, Gemma, and more)
GPU acceleration configured (if you have an NVIDIA or AMD card)
Multi-user support with authentication enabled
HTTPS access via reverse proxy for production use
A working ChatGPT-style interface accessible from your browser

Prerequisites

Before you start, make sure you have:

Docker 24.0+ and Docker Compose v2 installed
At least 4GB RAM (8GB+ recommended, 16GB+ for larger models)
At least 20GB free disk space for models and containers
A modern CPU with AVX support (most Intel/AMD CPUs from 2011+)
Optional: NVIDIA GPU with CUDA 12.8+ or AMD GPU with ROCm for acceleration
Ports 3000 (Open WebUI) and 11434 (Ollama) accessible

Step 1: Quick Start with Docker Run

The fastest way to test Open WebUI is a single Docker command. This pulls the latest image and starts the container with persistent storage:

docker run -d \
  -p 3000:8080 \
  -v open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  ghcr.io/open-webui/open-webui:main

Visit http://localhost:3000 and you'll see the Open WebUI login screen. The first user to register becomes the admin automatically.

Note: This quick start assumes Ollama is running separately on your host at http://host.docker.internal:11434. If you don't have Ollama installed yet, skip to Step 2 for the full Docker Compose setup.

Step 2: Production Docker Compose with Ollama

For a proper setup, run both Ollama and Open WebUI together via Docker Compose. This ensures they can communicate over an internal Docker network and both have persistent storage.

Directory Structure

open-webui-project/
├── docker-compose.yml
└── .env

The docker-compose.yml File

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    volumes:
      - ollama:/root/.ollama
    ports:
      - "11434:11434"
    environment:
      - OLLAMA_ORIGINS=*
    restart: unless-stopped
    networks:
      - ai-net

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    volumes:
      - open-webui:/app/backend/data
    ports:
      - "3000:8080"
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
      - WEBUI_SECRET_KEY=${WEBUI_SECRET_KEY}
      - WEBUI_AUTH=True
      - ENABLE_SIGNUP=True
      - DEFAULT_MODELS=llama3.2
      - DEFAULT_USER_ROLE=user
    depends_on:
      - ollama
    restart: unless-stopped
    networks:
      - ai-net

volumes:
  ollama:
  open-webui:

networks:
  ai-net:
    driver: bridge

The .env File

# Generate a secret key: openssl rand -hex 32
WEBUI_SECRET_KEY=your-64-character-hex-secret-key-here

# Optional: pin specific versions
OLLAMA_DOCKER_TAG=latest
WEBUI_DOCKER_TAG=main

Deploy the Stack

cd open-webui-project
docker compose up -d

Both services will start. Ollama pulls the base image, and Open WebUI connects to it over the internal Docker network.

Step 3: Download Your First Model

With Ollama running, pull a model and start chatting. Here are some popular options:

# Llama 3.2 (3B params, fast, good for most tasks)
docker exec ollama ollama pull llama3.2

# Mistral 7B (strong reasoning)
docker exec ollama ollama pull mistral

# Gemma 2 9B (Google's model, great performance)
docker exec ollama ollama pull gemma2:9b

# DeepSeek Coder (for coding tasks)
docker exec ollama ollama pull deepseek-coder:6.7b

Model sizes vary from 2GB to 30GB+. Check your disk space before pulling large models. Once downloaded, they appear automatically in the Open WebUI model selector.

Step 4: Configure GPU Acceleration

Running LLMs on CPU works but is slow. If you have a GPU, enable acceleration for dramatically faster responses.

NVIDIA GPU (CUDA)

First, install the NVIDIA Container Toolkit on your host:

# Ubuntu/Debian
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo systemctl restart docker

Then update your Compose file to use the CUDA-enabled Ollama image and pass the GPU through:

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    volumes:
      - ollama:/root/.ollama
    ports:
      - "11434:11434"
    environment:
      - OLLAMA_ORIGINS=*
    restart: unless-stopped
    networks:
      - ai-net
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

AMD GPU (ROCm)

Use the ROCm-enabled Ollama image:

services:
  ollama:
    image: ollama/ollama:rocm
    container_name: ollama
    volumes:
      - ollama:/root/.ollama
    ports:
      - "11434:11434"
    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri
    restart: unless-stopped
    networks:
      - ai-net

Step 5: Secure Your Instance

Before exposing Open WebUI to the internet, lock it down properly.

Enable Authentication

Authentication is enabled by default (WEBUI_AUTH=True). The first user to register becomes the admin. To disable signups after creating your admin account:

environment:
  - ENABLE_SIGNUP=False
  - WEBUI_AUTH=True

Set a Persistent Secret Key

Without WEBUI_SECRET_KEY, you'll be logged out every time the container restarts. Generate one:

openssl rand -hex 32

Copy the output into your .env file as WEBUI_SECRET_KEY.

HTTPS with Reverse Proxy

For production, put Open WebUI behind Nginx, Traefik, or Caddy with valid TLS certificates. Here's a minimal Nginx config:

server {
  listen 443 ssl http2;
  server_name ai.yourdomain.com;

  ssl_certificate /etc/letsencrypt/live/ai.yourdomain.com/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/ai.yourdomain.com/privkey.pem;

  location / {
    proxy_pass http://localhost:3000;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
  }

  # WebSocket support required for Open WebUI
  location /ws {
    proxy_pass http://localhost:3000;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
  }
}

Step 6: Connect External APIs (Optional)

Open WebUI isn't limited to local models. You can also connect OpenAI, Anthropic, or other API providers alongside your local Ollama models.

Add OpenAI API as a Backend

In the Open WebUI admin panel, go to Admin Settings → Connections and add your OpenAI API key. Your users can then choose between local and cloud models in the chat interface.

Or set it via environment variable:

environment:
  - OPENAI_API_KEY=sk-your-openai-key-here
  - OPENAI_API_BASE_URL=https://api.openai.com/v1

Tips and Troubleshooting

WebSocket connection errors

Open WebUI requires WebSocket support. If you see connection errors behind a reverse proxy, ensure your proxy forwards WebSocket upgrade headers:

proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";

Ollama connection refused

If Open WebUI can't reach Ollama, verify the container can communicate:

docker exec open-webui curl http://ollama:11434/api/tags

If this fails, check that both containers are on the same Docker network and Ollama is healthy.

Model downloads are slow

Ollama downloads models from its registry. Speed depends on your internet connection. For air-gapped environments, you can pre-download models and mount them as a volume:

# On a machine with internet
docker exec ollama ollama pull llama3.2
docker cp ollama:/root/.ollama/models /path/to/backup

# On air-gapped machine, mount the models
volumes:
  - /path/to/backup:/root/.ollama/models

Out of memory when loading models

LLMs are memory-hungry. If Ollama crashes on model load:

Use a smaller model (try llama3.2:1b instead of llama3.2)
Enable swap space as a buffer
Close other applications to free RAM
Quantized models (Q4, Q5) use less memory at the cost of some accuracy

Updating Open WebUI

To update to the latest version:

docker compose pull
docker compose up -d

Or use Watchtower for automatic updates:

docker run --rm \
  --volume /var/run/docker.sock:/var/run/docker.sock \
  nickfedor/watchtower \
  --run-once open-webui

Reset admin password

If you lose admin access, reset the database volume (this deletes all users and chats):

docker compose down
docker volume rm open-webui-project_open-webui
docker compose up -d

Next Steps

You now have a self-hosted ChatGPT alternative running locally. From here, you can:

Experiment with different models (code, reasoning, multilingual)
Set up document RAG by uploading files for context-aware chat
Configure user roles and permissions for team access
Integrate external APIs alongside local models
Enable voice input and image generation pipelines
Fine-tune models on your own data

Need Help With Production AI Infrastructure?

Running local LLMs is powerful, but production AI infrastructure involves model selection, GPU optimization, scaling, and security hardening. If your team needs help architecting a self-hosted AI platform, selecting the right models for your use case, or integrating local LLMs into your existing applications, get in touch with our team at Sysbrix. We've built and deployed AI infrastructure for startups and enterprises—and we can get yours running right.