Welcome to the OpenWebUI-Ollama single container repository!
This will enable you to build Docker images combining OpenWebUI with Ollama in a single container for a seamless AI development experience.
I created this container for my own personal use. I wanted a super easy way to build a RAG model using an LLM. Reason is so I can create and run locally my own 'expert' system on whatever I wanted (e.g. "AC_Equip_expert").
You can already achieve this using separate OWUI and ollama containers, but that is a hassle. I wanted a single container to use and build from. The ease of use comes from using OWUI to pull an ollama model, then add .pdfs to a knowledge base (Rag), attach that KB to the ollama LLM model.
Viola, instant expert system!
I have already built the containers and placed in my docker.hub repository here: (https://hub.docker.com/r/kylefoxaustin/openwebui-ollama) so that you do not have to build them yourself. Just pull the container and run it.
These containers are designed to internally store the LLM you pull into it as well as the RAG data you add to a Knowledge Base. That way, once it is exactly how you want it, you can use docker to push the container as a new image to your docker.hub site. e.g. "my_expert:latest".
This container will run 100% locally on the machine it is run on. The only internet traffic would be when you pull a new model from ollama.
The container build will pull the latest Ollama and OpenWeb-UI versions using ollama.com's install script and OpenWeb-UI's latest container release.
I made sure this container will run on an Intel/AMD CPU or ARM64 CPU by default (CPU-Only container). However I built the GPU container to use an NVIDIA GPU which is installed on the system whether with an Intel/AMD CPU or an ARM64 CPU. Note that if the GPU 'fails' to be seen, the GPU container will default to use the main cores.
Lastly I chose not to build a multi-architecture Dockerfile for the build. The Dockerfile.cpu (gpu) are Intel/AMD and the Dockerfile.cpu(gpu)-ARM64 files are DIFFERENT. You cannot take the Intel/AMD dockerfiles and build them on an ARM platform. You must use the -ARM64 files.
Have fun!
openwebui-ollama/
├── Dockerfiles/
│ ├── Dockerfile.cpu # Dockerfile for CPU-only container
│ ├── Dockerfile.gpu # Dockerfile for GPU-enabled container
│ ├── Dockerfile_ARM64.cpu # Dockerfile for ARM64 CPU-only container
│ └── Dockerfile_ARM64.gpu # Dockerfile for ARM64 GPU-enabled container
├── tools/
│ ├── tag_push.sh # Script for tagging and pushing images
│ └── test_script_cpu_gpu_containers.sh # Test script for validating containers
└── README.md # This documentation
- Repository Structure
- Overview
- System Requirements
- Building and Running
- Usage Scenarios
- Environment Variables
- Data Persistence
- Troubleshooting
- Advanced Configuration
- Security Considerations
- Updating
- Performance Tuning
- Testing
- License
These Docker images provide a combined deployment of OpenWebUI and Ollama in a single container, managed by supervisord. This approach offers several advantages over the traditional multi-container setup:
- Simplified deployment - Only one container to manage
- Reduced configuration complexity - No need to configure network communication between containers
- Shared resources - More efficient resource utilization
- Consistent state - Both applications start and stop together
The images are available in both CPU and GPU variants to suit different hardware configurations. The GPU version will automatically fall back to CPU operation if no compatible NVIDIA GPU is detected, making it versatile for different environments.
- Architecture: x86_64 only (Intel or AMD CPUs)
- Minimum: 4 CPU cores, 8GB RAM
- Recommended: 8+ CPU cores, 16GB+ RAM
- At least 10GB free disk space (more needed for models)
- Architecture: x86_64 only (Intel or AMD CPUs)
- Minimum: NVIDIA GPU with 4GB VRAM, CUDA 11.7+
- Recommended: NVIDIA GPU with 8GB+ VRAM
- NVIDIA drivers 525.60.13 or later
- NVIDIA Container Toolkit installed
- At least 10GB free disk space (more needed for models)
-
Architecture: ARM64-based device (NVIDIA Jetson, Raspberry Pi 4/5 with 64-bit OS)
-
CPU Version:
- 8GB+ RAM recommended
- At least 10GB free disk space
-
GPU Version (NVIDIA Jetson only):
- NVIDIA Jetson device (Nano, Xavier, Orin)
- JetPack 5.1.2 or later (JetPack 6.0 recommended for Orin)
- At least 8GB RAM (16GB+ recommended for larger models)
- At least 10GB free disk space
Note: The ARM64 GPU containers have been successfully tested on an NVIDIA Jetson Orin AGX platform with 64GB RAM and a 1TB SSD.
Begin by cloning the repository:
# Clone the repository
git clone https://github.com/kylefoxaustin/openwebui-ollama.git
cd openwebui-ollama# Build the CPU image
docker build -f Dockerfiles/Dockerfile.cpu -t openwebui:cpu .
# Build the GPU image (requires NVIDIA Container Toolkit)
docker build -f Dockerfiles/Dockerfile.gpu -t openwebui:gpu .Note: The GPU image will automatically fall back to CPU operation if no compatible NVIDIA GPU is detected or if the proper NVIDIA drivers and container toolkit are not installed.
docker run -d \
--name openwebui \
-p 8080:8080 \
-p 11434:11434 \
-v ollama-data:/root/.ollama \
-v openwebui-data:/app/backend/data \
openwebui:cpudocker run -d \
--name openwebui-gpu \
--gpus all \
-p 8080:8080 \
-p 11434:11434 \
-v ollama-data:/root/.ollama \
-v openwebui-data:/app/backend/data \
openwebui:gpu# Build the ARM64 CPU image
docker build -f Dockerfiles/Dockerfile_ARM64.cpu -t openwebui:arm64-cpu .
# Build the ARM64 GPU image (Jetson only)
docker build -f Dockerfiles/Dockerfile_ARM64.gpu -t openwebui:arm64-gpu .docker run -d \
--name openwebui-arm \
-p 8080:8080 \
-p 11434:11434 \
-v ollama-data:/root/.ollama \
-v openwebui-data:/app/backend/data \
kylefoxaustin/openwebui-ollama:arm64-cpudocker run -d \
--name openwebui-arm-gpu \
--runtime nvidia \
-p 8080:8080 \
-p 11434:11434 \
-v ollama-data:/root/.ollama \
-v openwebui-data:/app/backend/data \
-v /usr/local/cuda:/usr/local/cuda \
-e OLLAMA_HOST=0.0.0.0 \
-e OLLAMA_NUM_PARALLEL=1 \
-e OLLAMA_GPU_LAYERS=20 \
-e OLLAMA_MAX_QUEUE=1 \
kylefoxaustin/openwebui-ollama:arm64-gpuAccess the web interface at: http://localhost:8080
If you already have Ollama running on your host machine, you'll need to map the container's Ollama port to a different host port:
docker run -d \
--name openwebui \
-p 8080:8080 \
-p 11435:11434 \
-v ollama-data:/root/.ollama \
-v openwebui-data:/app/backend/data \
kylefoxaustin/openwebui-ollama:latestTo run both CPU and GPU containers at the same time, use different port mappings:
# CPU Container
docker run -d \
--name openwebui-cpu \
-p 8080:8080 \
-p 11434:11434 \
-v ollama-cpu-data:/root/.ollama \
-v openwebui-cpu-data:/app/backend/data \
kylefoxaustin/openwebui-ollama:latest-cpu
# GPU Container
docker run -d \
--name openwebui-gpu \
--gpus all \
-p 8081:8080 \
-p 11435:11434 \
-v ollama-gpu-data:/root/.ollama \
-v openwebui-gpu-data:/app/backend/data \
kylefoxaustin/openwebui-ollama:latest-gpuAccess the interfaces at:
- CPU version: http://localhost:8080
- GPU version: http://localhost:8081
To use your OpenWebUI image with an external Ollama instance (e.g., running on another server or container):
docker run -d \
--name openwebui-only \
-p 8080:8080 \
-e OLLAMA_BASE_URL=http://<ollama-host>:11434 \
-v openwebui-data:/app/backend/data \
openwebui:cpuReplace <ollama-host> with the hostname or IP address of your Ollama server.
| Variable | Description | Default |
|---|---|---|
OLLAMA_HOST |
Host for Ollama to listen on | 0.0.0.0 |
PORT |
Port for OpenWebUI to listen on | 8080 |
HOST |
Host for OpenWebUI to listen on | 0.0.0.0 |
OLLAMA_BASE_URL |
URL for OpenWebUI to connect to Ollama | http://localhost:11434 |
NVIDIA_VISIBLE_DEVICES |
(GPU only) Controls which GPUs are visible | all |
NVIDIA_DRIVER_CAPABILITIES |
(GPU only) Required NVIDIA capabilities | compute,utility |
OLLAMA_GPU_LAYERS |
Number of model layers to offload to GPU | 0 (CPU) or full model (GPU) |
OLLAMA_NUM_PARALLEL |
Concurrent request processing | 1 |
OLLAMA_MAX_QUEUE |
Maximum queued requests | 5 |
OLLAMA_LOAD_TIMEOUT |
Model loading timeout | 5m |
The following volumes are used for data persistence:
/root/.ollama: Ollama models and configuration/app/backend/data: OpenWebUI data (conversations, settings, etc.)
For data backup, you can simply create archives of these volumes:
# Create a backup directory
mkdir -p ~/openwebui-backups
# Backup Ollama data
docker run --rm -v ollama-data:/data -v ~/openwebui-backups:/backup \
ubuntu tar czf /backup/ollama-data-$(date +%Y%m%d).tar.gz -C /data .
# Backup OpenWebUI data
docker run --rm -v openwebui-data:/data -v ~/openwebui-backups:/backup \
ubuntu tar czf /backup/openwebui-data-$(date +%Y%m%d).tar.gz -C /data .-
Port Conflicts: If you see "address already in use" errors, you likely have another service using the same port. Use alternative ports as shown in the usage scenarios.
-
GPU not detected: Ensure your NVIDIA drivers are properly installed and the NVIDIA Container Toolkit is set up correctly. Test with:
docker run --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
-
Container crashes: Check logs with:
docker logs openwebui
For more detailed logs:
# Ollama logs docker exec -it openwebui cat /var/log/supervisor/ollama.err.log docker exec -it openwebui cat /var/log/supervisor/ollama.out.log # OpenWebUI logs docker exec -it openwebui cat /var/log/supervisor/openwebui.err.log docker exec -it openwebui cat /var/log/supervisor/openwebui.out.log # Supervisor logs docker exec -it openwebui cat /var/log/supervisor/supervisord.log
-
Models not loading: The first time you pull a model might take some time. Check the Ollama logs:
docker exec -it openwebui cat /var/log/supervisor/ollama.err.logYou can directly pull models with:
docker exec -it openwebui ollama pull <model-name>
-
Web UI not accessible: Make sure that the internal Ollama instance is properly running:
docker exec -it openwebui curl -s http://localhost:11434/api/tagsCheck if the OpenWebUI process is running:
docker exec -it openwebui supervisorctl status -
Out of memory errors: Larger models require substantial RAM and VRAM. Try a smaller model or increase your container's memory limit:
docker update --memory 16G --memory-swap 32G openwebui
-
Slow model performance: For GPU containers, make sure CUDA is properly detected:
docker exec -it openwebui-gpu nvidia-smi
-
Package Installation Failures: Some Python packages may not have ARM64 wheels available. If you encounter build errors, try modifying the requirements or building packages from source.
-
Performance Issues: ARM CPUs are typically less powerful than x86_64 CPUs. Consider using smaller models optimized for less powerful hardware.
-
GPU Not Detected: Ensure your Jetson device has the proper NVIDIA drivers installed and that you're using the
--runtime nvidiaflag when running the container. -
Internal Server Errors (HTTP 500): This often indicates that the model is overwhelming the GPU. Solutions include:
- Reduce GPU layers: Lower the
OLLAMA_GPU_LAYERSvalue to offload fewer layers to the GPU - Mount CUDA libraries: Ensure
-v /usr/local/cuda:/usr/local/cudais present - Limit parallelism: Use
-e OLLAMA_NUM_PARALLEL=1 - Control queue depth: Add
-e OLLAMA_MAX_QUEUE=1
- Reduce GPU layers: Lower the
-
Slow Model Loading or Timeouts: Jetson devices have limited GPU memory and bandwidth:
- Use smaller quantized models (e.g., Llama3-8B-Q4, TinyLlama)
- Increase timeouts with
-e OLLAMA_LOAD_TIMEOUT=10m - For Nano, consider sticking with CPU-only mode for larger models
After building your images, you can use Docker Compose for more complex setups. Here's an example configuration:
version: '3.8'
services:
openwebui:
image: openwebui:gpu # Use the image you built
container_name: openwebui
restart: unless-stopped
ports:
- "8080:8080"
- "11434:11434"
volumes:
- ollama-data:/root/.ollama
- openwebui-data:/app/backend/data
environment:
- OLLAMA_HOST=0.0.0.0
- PORT=8080
- HOST=0.0.0.0
- OLLAMA_BASE_URL=http://localhost:11434
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
ollama-data:
openwebui-data:Save this to docker-compose.yml and run with:
docker-compose up -dTo control CPU and memory usage when running your container:
docker run -d \
--name openwebui \
--cpus 4 \
--memory 8G \
-p 8080:8080 \
-p 11434:11434 \
-v ollama-data:/root/.ollama \
-v openwebui-data:/app/backend/data \
openwebui:cpuTo place your container on a specific network:
# Create a custom network
docker network create ai-network
# Run the container on that network
docker run -d \
--name openwebui \
--network ai-network \
-p 8080:8080 \
-p 11434:11434 \
-v ollama-data:/root/.ollama \
-v openwebui-data:/app/backend/data \
openwebui:cpuThese containers are designed for development and testing purposes. If deploying in a production environment, consider the following security measures:
-
Do not expose the container to the public internet without proper authentication and TLS encryption.
-
Use a reverse proxy like Nginx or Traefik with proper SSL/TLS termination.
-
Run containers with limited privileges:
docker run -d \ --name openwebui \ --security-opt=no-new-privileges \ --cap-drop=ALL \ -p 8080:8080 \ -p 11434:11434 \ -v ollama-data:/root/.ollama \ -v openwebui-data:/app/backend/data \ openwebui:cpu
-
Consider network isolation using Docker networks to limit container communication.
-
Regularly update the images to get the latest security patches.
To update to the latest version:
# Pull the latest repository changes
git pull
# Rebuild the images
docker build -f Dockerfiles/Dockerfile.cpu -t openwebui:cpu .
docker build -f Dockerfiles/Dockerfile.gpu -t openwebui:gpu .
# Restart your containers
docker stop openwebui
docker rm openwebui
docker run -d \
--name openwebui \
-p 8080:8080 \
-p 11434:11434 \
-v ollama-data:/root/.ollama \
-v openwebui-data:/app/backend/data \
openwebui:cpuFor better CPU performance:
-
Allocate more CPU cores:
docker run -d --cpus 8 ... openwebui:cpu
-
Enable CPU optimization:
docker run -d --cpuset-cpus="0-7" ... openwebui:cpu
For better GPU performance:
-
Select specific GPUs if you have multiple:
docker run -d --gpus '"device=0,1"' ... openwebui:gpu -
Increase shared memory:
docker run -d --shm-size=8g ... openwebui:gpu
-
Optimize for specific CUDA capabilities:
docker run -d \ -e NVIDIA_DRIVER_CAPABILITIES=compute,utility,video \ ... openwebui:gpu
Each Jetson platform has different capabilities requiring specific tuning:
Jetson Nano (4GB):
- Best with CPU-only container for most models
- For GPU usage, limit to very small models with high quantization (TinyLlama, Q4)
- Set
OLLAMA_GPU_LAYERS=5to minimize GPU memory usage
Jetson Xavier:
- Can handle medium-sized models with Q4 quantization
- Set
OLLAMA_GPU_LAYERS=15for balanced performance - Limit to 1-2 parallel processes
Jetson Orin Nano:
- Works well with 7B-8B class models
- Try
OLLAMA_GPU_LAYERS=20as a starting point - Can handle some parallelism with
-e OLLAMA_NUM_PARALLEL=2
Jetson Orin AGX:
- Can run larger models (up to 13B with quantization)
- Effective with
OLLAMA_GPU_LAYERS=20for stability - Can handle higher parallelism depending on model size
The OLLAMA_GPU_LAYERS parameter is particularly important as it determines how many model layers are offloaded to the GPU:
- Higher values (e.g., all layers): Pushes more computation to the GPU but may overwhelm memory bandwidth on Jetson devices
- Lower values (e.g., 20 layers): Creates a better balance between GPU and CPU processing for Jetson's architecture
- Setting to 0: Forces CPU-only operation even in the GPU container
The OLLAMA_NUM_PARALLEL parameter controls concurrent processing tasks, which should be limited on constrained devices:
- Use
1for Nano and Xavier - Try
2-4for Orin models with sufficient RAM
After building your images, you can tag and push them to Docker Hub:
-
Update the username in the script:
cd tools nano tag_push.sh # Change DOCKER_HUB_USERNAME, IMAGENAME, VERSION to your Docker Hub username, image, version
-
Run the script:
chmod +x tag_push.sh ./tag_push.sh
The purpose of this script is to be run after you have completed your builds and want to push the images to your dockerhub.com account
The tag and push script will:
- Determine which host architecture its running on (AMD64 or ARM64)
- Determine if the image exists locally
- Tag the image with your name/image/version data
- Push the image to your dockerhub.com account
This script will test each image to ensure it is working correctly. It runs without user intervention and checks the container is responding correctly to ollama and openwebui commands.
To use the script:
-
Update the username in the script:
cd tools nano test_script_cpu_gpu_containers.sh # Change DOCKER_HUB_USERNAME, IMAGENAME, VERSION to your Docker Hub username, image, version
-
Run the script:
chmod +x test_script_cpu_gpu_containers.sh ./test_script_cpu_gpu_containers.sh
The test script will:
- Determine what host architecture it is running on (AMD64 or ARM64)
- Test if both CPU and GPU images can be used (e.g. if no GPU is present, just run CPU)
- Verify that the containers start properly
- Test that OpenWebUI is accessible
- Confirm that the Ollama API is working
- For GPU containers, verify GPU accessibility
- Provide a detailed test summary
These Docker images combine OpenWebUI and Ollama, each with their respective licenses. See the original projects for more information.
- OpenWebUI: MIT License
- Ollama: MIT License
Maintained by kylefoxaustin
Last updated: April 2025