RAG-demo

Chat with (a small portion of) Wikipedia

⚠️ RAG functionality is still under development. ⚠️

Requirements

The uv Python package manager
- Installing and updating uv is easy by following the docs.
- As of 2026-01-25, I'm developing using uv version 0.9.26, and using the new experimental --pytorch-backend option.
A terminal emulator or web browser
- Any common web browser will work.
- Some terminal emulators will work better than others. See Notes on terminal emulators below.

Notes on terminal emulators

Certain terminal emulators will not work with some features of this program. In particular, on macOS consider using iTerm2 instead of the default Terminal.app (explanation). On Linux you might want to try kitty, wezterm, alacritty, or ghostty, instead of the terminal that came with your desktop environment (reason). Windows Terminal should be fine as far as I know.

Optional dependencies

Hugging Face login
API key for your favorite LLM provider (support coming soon)
Ollama installed on your system if you have a GPU
Run RAG-demo on a more capable (bigger GPU) machine over SSH if you can. It is a terminal app after all.
A C compiler if you want to build Llama.cpp from source.

Run the latest version

Run in a terminal:

uvx --python=3.12 --torch-backend=auto --from=jehoctor-rag-demo@latest chat

Or run in a web browser:

uvx --python=3.12 --torch-backend=auto --from=jehoctor-rag-demo@latest textual serve chat

CUDA acceleration via Llama.cpp

If you have an NVIDIA GPU with CUDA and build tools installed, you might be able to get CUDA acceleration without installing Ollama.

CMAKE_ARGS="-DGGML_CUDA=on" uv run --extra=llamacpp chat

Metal acceleration via Llama.cpp (on Apple Silicon)

On an Apple Silicon machine, make sure uv runs an ARM interpreter as this should cause it to install Llama.cpp with Metal support. Also, run with the extra group llamacpp. Try this:

uvx --python-platform=aarch64-apple-darwin --torch-backend=auto --from='jehoctor-rag-demo[llamacpp]@latest' chat

Ollama on Linux

Remember that you have to keep Ollama up-to-date manually on Linux. A recent version of Ollama (v0.11.10 or later) is required to run the embedding model we use. See this FAQ: https://docs.ollama.com/faq#how-can-i-upgrade-ollama.

Project feature roadmap

❌ RAG functionality
✅ torch inference via the Langchain local Hugging Face inference integration
✅ uv automatic torch backend selection (see the docs)
❌ OpenAI integration
❌ Anthropic integration

Run from the repository

First, clone this repository. Then, run one of the options below.

Run in a terminal:

uv run chat

Or run in a web browser:

uv run textual serve chat

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
.github/workflows		.github/workflows
.vscode		.vscode
automation		automation
podman/test-chat		podman/test-chat
screenshots		screenshots
src/rag_demo		src/rag_demo
tests/rag_demo		tests/rag_demo
.dockerignore		.dockerignore
.flake8		.flake8
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
justfile		justfile
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG-demo

Requirements

Notes on terminal emulators

Optional dependencies

Run the latest version

CUDA acceleration via Llama.cpp

Metal acceleration via Llama.cpp (on Apple Silicon)

Ollama on Linux

Project feature roadmap

Run from the repository

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG-demo

Requirements

Notes on terminal emulators

Optional dependencies

Run the latest version

CUDA acceleration via Llama.cpp

Metal acceleration via Llama.cpp (on Apple Silicon)

Ollama on Linux

Project feature roadmap

Run from the repository

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages