Chat with (a small portion of) Wikipedia
- The uv Python package manager
- Installing and updating
uvis easy by following the docs. - As of 2026-01-25, I'm developing using
uvversion 0.9.26, and using the new experimental--pytorch-backendoption.
- Installing and updating
- A terminal emulator or web browser
- Any common web browser will work.
- Some terminal emulators will work better than others. See Notes on terminal emulators below.
Certain terminal emulators will not work with some features of this program. In particular, on macOS consider using iTerm2 instead of the default Terminal.app (explanation). On Linux you might want to try kitty, wezterm, alacritty, or ghostty, instead of the terminal that came with your desktop environment (reason). Windows Terminal should be fine as far as I know.
- Hugging Face login
- API key for your favorite LLM provider (support coming soon)
- Ollama installed on your system if you have a GPU
- Run RAG-demo on a more capable (bigger GPU) machine over SSH if you can. It is a terminal app after all.
- A C compiler if you want to build Llama.cpp from source.
Run in a terminal:
uvx --python=3.12 --torch-backend=auto --from=jehoctor-rag-demo@latest chatOr run in a web browser:
uvx --python=3.12 --torch-backend=auto --from=jehoctor-rag-demo@latest textual serve chatIf you have an NVIDIA GPU with CUDA and build tools installed, you might be able to get CUDA acceleration without installing Ollama.
CMAKE_ARGS="-DGGML_CUDA=on" uv run --extra=llamacpp chatOn an Apple Silicon machine, make sure uv runs an ARM interpreter as this should cause it to install Llama.cpp with Metal support.
Also, run with the extra group llamacpp.
Try this:
uvx --python-platform=aarch64-apple-darwin --torch-backend=auto --from='jehoctor-rag-demo[llamacpp]@latest' chatRemember that you have to keep Ollama up-to-date manually on Linux. A recent version of Ollama (v0.11.10 or later) is required to run the embedding model we use. See this FAQ: https://docs.ollama.com/faq#how-can-i-upgrade-ollama.
- ❌ RAG functionality
- ✅ torch inference via the Langchain local Hugging Face inference integration
- ✅ uv automatic torch backend selection (see the docs)
- ❌ OpenAI integration
- ❌ Anthropic integration
First, clone this repository. Then, run one of the options below.
Run in a terminal:
uv run chatOr run in a web browser:
uv run textual serve chat