Tangi

Hardware-agnostic, auto-optimizing AI assistant with RAG for codebases and optional cloud API acceleration.

Overview

Tangi is an AI assistant designed for developers who want fast, hardware-aware inference and code-aware answers. It automatically detects system capabilities (CPU, threads, memory, BLAS backend) and tunes itself for optimal performance.

It includes a built-in Retrieval-Augmented Generation (RAG) system that indexes your codebase, enabling accurate, context-grounded responses.

New in v1.1.0: Tangi now supports Online Mode with NVIDIA NIM API integration, offering cloud-accelerated responses (40 requests/minute free tier, no credit card required) alongside local LLM inference.

Features

Main widget LLM Inference Demo

Dual-Mode Operation

Mode	Description	Best For
Offline (Local)	Run GGUF models on your hardware using llama-cpp-python	Privacy, air-gapped environments, no internet
Online (Cloud)	NVIDIA NIM API with OpenAI-compatible endpoints	Speed, complex reasoning, reduced local resource usage

Hardware-Aware Optimization

Automatic detection of physical vs logical CPU cores
NUMA-aware scheduling (multi-socket systems)
OpenBLAS auto-configuration
Dynamic batch sizing based on RAM
Optional memory locking to prevent swapping
Auto-unload local model when switching to online mode (frees 4-6GB RAM)

RAG System (Code Intelligence)

Feature	Command	Use Case
Code Indexing	`/index /path`	Index a codebase for semantic search
Standard RAG	`/search "question"`	Fast, single-pass retrieval for direct questions
Deep Search	`/ds "question"`	Multi-step iterative search for complex, cross-file analysis

Code Indexing

Semantic search across codebases
Multi-project support
Automatic ignore rules (venv, node_modules, build artifacts)
Chunk preview before querying

Interface

Markdown and plain chat modes
Session persistence
Theme support (dark/light)
KV cache for faster repeated queries
Window transparency persistence

Performance

OpenBLAS acceleration
Thread coordination (avoids BLAS/LLM contention)
Automatic token budgeting
Context window management

Cloud API Support (New!)

NVIDIA NIM (free tier: 40 requests/minute, no credit card)
OpenAI (GPT-4o, GPT-4o-mini)
Together AI
DeepSeek
Any OpenAI-compatible endpoint

Installation

Requirements

Python 3.12+
8 GB RAM minimum (16 GB recommended)
OpenBLAS (recommended)
~10 GB disk space for models
Optional: NVIDIA API key for online mode

Setup

git clone https://github.com/mreinrt/Tangi.git
cd Tangi

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

Install dependencies:

# OpenBLAS
sudo apt install libopenblas-dev        # Debian/Ubuntu
# or
sudo emerge -av sci-libs/openblas       # Gentoo

pip install -r requirements.txt

# Rebuild llama-cpp-python with OpenBLAS
pip uninstall llama-cpp-python -y
CMAKE_ARGS="-DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS" \
  pip install llama-cpp-python==0.3.16 --no-cache-dir

Quick Start

python -m Tangi

Local Mode (Offline)

Load a model (File → Load Model or select a .gguf file)
Download the embedding model (one-time setup): /get-rag
Index your codebase: /index /path/to/your/project
Select the indexed codebase (Manage Index → Select CodeBase)
Ask questions!

Online Mode (Cloud API)

Get a free NVIDIA API key from build.nvidia.com
Go to Preferences → NVIDIA NIM Online Mode
Enter your API key and Base URL (default: https://integrate.api.nvidia.com/v1)
Select a model (recommended: mistralai/mistral-nemotron)
Click Test Connection to verify
Toggle Online Mode in the status bar (bottom right)
The local model auto-unloads to free RAM
Chat with cloud acceleration!

RAG Workflow: When to Use Which Command

Step 1: Make sure to download a local RAG model by using /get-rag or store one inside of ~/.cache/huggingface/hub/. You can also select from a local RAG model by clicking File > Manage Index and clicking "Manage RAG Models".

Step 2: Index Your Codebase

/index /home/user/projects/codebase

Step 3: Select Your Active Codebase

Open Manage Index (File → Manage Index)
Select your indexed project
Click Select CodeBase

Step 4: Choose the Right Search Command

Use /search for:

Direct, factual questions
Single file lookups
Finding specific functions or classes
Simple queries
Now supports online API for faster responses

Use /ds for:

Complex, multi-file questions
System architecture understanding
Cross-module dependencies
Troubleshooting complex issues

Commands

General

Command	Description
/about	Application information
/help or /commands	Show all commands

Hugging Face

Command	Description
/hf login	Authenticate
/hf download MODEL	Download model
/hf search QUERY	Search models
/hf info MODEL	Model details
/hf cache	Cache info

RAG

Command	Description
/index PATH	Index codebase
/search QUERY	Fast retrieval (uses online API if available)
/ds QUERY	Deep search
/get-rag	Download embeddings
/remove-index PATH	Remove index
/clear	Clear context
/rag-status	Status
/cache-info	Cache stats

Architecture

Standard RAG (Offline)

Query → Retrieve → Context → Local LLM → Answer

Online RAG

Query → Retrieve → Context → Cloud API (NVIDIA NIM) → Answer

Deep Search

Query → Retrieve → Analyze → Refine → Retrieve → ... → Answer

Configuration

Memory Usage

Default: 85% RAM (adjustable in Preferences)

Online Mode Settings

Setting	Default	Description
API Base URL	`https://integrate.api.nvidia.com/v1`	Endpoint for cloud API
Model	`mistralai/mistral-nemotron`	Best for coding (92.68% HumanEval)
API Key	User-provided	Get from build.nvidia.com

Model Settings

optimal_settings = {
    'n_threads': auto-detected,
    'n_batch': auto-optimized,
    'use_mlock': based on system RAM,
}

Performance Guidelines

System	Threads	Batch	Context
2–4 cores	= cores	64–128	8K–16K
4–8 cores	cores +25%	128–256	16K–32K
8+ cores	80–100%	256–512	32K–128K

Online Mode Performance

Provider	Free Tier	Speed	Best For
NVIDIA NIM	40 requests/min	1-3 seconds	Coding, general use
OpenAI	Requires payment method	1-3 seconds	General purpose
Together AI	Free credits	1-3 seconds	Various open models

Recent Updates (v1.1.0)

Added

NVIDIA NIM API integration with online/offline toggle
Auto-unload local model when switching to online mode
Window transparency persistence across sessions
API Base URL configuration in preferences
Universal OnlineAPIClient (supports any OpenAI-compatible endpoint)

Changed

RAG search now uses online API when available (faster)
Centered response settings buttons in preferences

Fixed

Missing live_transparency_change method

License

MIT License.

Acknowledgments

llama-cpp-python
sentence-transformers
OpenBLAS
NVIDIA NIM for free API access

Support

GitHub Issues for bugs and requests.

Donations

BTC: 3GtCgHhMP7NTxsdNjcDs7TUNSBK6EXoAzz
ETH: 0x5f1ed610a96c648478a775644c9244bf4e78631e

Built by Michael Reinert

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
commands		commands
databases		databases
media		media
models		models
rag		rag
ui		ui
utils		utils
workers		workers
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
__main__.py		__main__.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Tangi

Overview

Features

Dual-Mode Operation

Hardware-Aware Optimization

RAG System (Code Intelligence)

Code Indexing

Interface

Performance

Cloud API Support (New!)

Installation

Requirements

Setup

Quick Start

Local Mode (Offline)

Online Mode (Cloud API)

RAG Workflow: When to Use Which Command

Step 1: Make sure to download a local RAG model by using /get-rag or store one inside of ~/.cache/huggingface/hub/. You can also select from a local RAG model by clicking File > Manage Index and clicking "Manage RAG Models".

Step 2: Index Your Codebase

Step 3: Select Your Active Codebase

Step 4: Choose the Right Search Command

Use /search for:

Use /ds for:

Commands

General

Hugging Face

RAG

Architecture

Standard RAG (Offline)

Online RAG

Deep Search

Configuration

Memory Usage

Online Mode Settings

Model Settings

Performance Guidelines

Online Mode Performance

Recent Updates (v1.1.0)

Added

Changed

Fixed

License

Acknowledgments

Support

Donations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages