USB-Uncensored-LLM is a fully air-gapped, zero-dependency, plug-and-play Local AI environment designed to run seamlessly from your local hard drive or a portable USB/SSD. It bypasses complex installations natively executing large language models directly on your hardware with no internet required.
With a unified architecture, you can initialize your AI models once and choose to keep them on your system or carry them with you across Windows, macOS, and Linux PCs.
- Zero Dependency Setup: Ships with portable Python and isolated engine binaries. No system permissions, registry edits, or package managers required.
- Cross-Platform Interoperability: Uses a intelligent
Sharedvolume system — download your 5GB+ AI models once, and use them natively on Windows, macOS, and Linux without duplication. - Censorship Free: Integrates cutting-edge ablative and heretic fine-tuned models for completely unfiltered interactions.
- Network Proxied UI: The custom Python HTTP server instantly serves a blazing-fast dark mode UI. You can access the AI from your phone or tablet on the same WiFi network without complex CORS configuration.
- Hardware Accelerated: Uses a custom-compiled Ollama engine under the hood, natively capitalizing on AVX CPU instructions, NVIDIA CUDA, or Apple Metal GPU accelerators dynamically when plugged into different host machines.
Before preparing your drive, ensure you have:
- Storage: A USB 3.0+ flash drive or SSD with an absolute minimum of 8 GB free space (16 GB is highly recommended).
- RAM: The host computer should have at least 8 GB of system memory to run the 2B/4B models, and 16 GB of memory to fluidly run the 9B/12B models.
The project is structured to strictly isolate operating system executables while securely unifying heavy model weights to save precious portable storage capacity.
[Portable USB Drive]
├── 📁 Linux # Native Ubuntu/Debian offline installers & launchers
├── 📁 Mac # Native macOS offline installers & launchers
├── 📁 Windows # Native Windows offline automatic UI menus
└── 📁 Shared # Unified Data System
├── 📁 bin (Holds isolated executables: ollama-windows.exe, ollama-darwin...)
├── 📁 chat_data (Houses cross-platform persistent conversation history)
├── 📁 models (HuggingFace GGUF Weights & local database mapping)
└── 📁 python (Isolated portable python environment)
This USB ships with a curated installer for the highest-quality, locally operable uncensored models available on the open-source market today:
- Gemma 2 2B Abliterated (~1.6 GB): Recommended for all. Extremely fast, incredibly smart for its size, with safety alignment vectors mathematically purged.
- Gemma 4 E4B Ultra Uncensored Heretic (~5.34 GB): A "heretic" fine-tune that aggressively forces compliance to all user queries regardless of content or legality.
- Qwen 3.5 9B Uncensored Aggressive (~5.2 GB): A much larger, incredibly competent reasoning model with a strict adherence to raw, unbiased answers.
- Custom Models: The installer supports downloading any .gguf weight directly from HuggingFace natively into the USB's engine.
Depending on the computer you are currently plugged into, navigate into the respective Operating System folder and double-click/run the install script.
- Windows: Double-click
Windows/install.bat - macOS: Open Terminal, drag in
Mac/install.command, and press Enter. - Linux: Run
bash Linux/install.sh
Note: Initializing simply downloads the tiny 50MB execution engine specific to that computer to the
Shared/binfolder.
It is highly recommended to run the model download phase via a Windows PC (Windows/install.bat), which provides an interactive, terminal-based catalog to easily select and download highly curated, uncensored GGUF Models.
(If you do not have a Windows PC, simply download your .gguf weights from HuggingFace and place them into the Shared/models folder manually).
Open the respective OS folder and run the start script:
- Windows:
Windows/start-fast-chat.bat - macOS:
Mac/start.command - Linux:
bash Linux/start.sh
The engine will spin up securely in the background, and your default web browser will automatically open the locally-served Chat UI.
While this project is optimized for USB portability, it works beautifully as a lightweight local AI setup on your primary computer.
How to Install Locally:
- Download/Clone this repository to a folder on your
C:\orD:\drive. - Navigate to the Windows (or Mac/Linux) folder.
- Run
install.batand choose your desired models. - The system will download everything into that local folder.
- Run
start-fast-chat.batto begin.
Benefit: Running from an internal SSD is significantly faster than a USB drive, resulting in near-instant AI model loading!
If you want to use the Heavyweight AI from your phone while lounging on the couch:
- Ensure your PC running the
startscript and your phone are on the exact same WiFi network. - The terminal window will automatically detect your host machine and display a Network Access IP Address (e.g.,
http://192.168.1.15:3333). - Simply type that URL into your mobile browser (Safari/Chrome). The custom Python server perfectly routes mobile queries directly to the USB! (Note: If pages do not load, ensure Windows Firewall allows incoming connections on port
3333).
- The script instantly closes on Windows: You likely have the legacy Windows App Execution Aliases turned on, which tricks the OS. Run the script via a command prompt, or right-click the
.batfile and "Run as Administrator". - "Ollama Engine Not Found": You attempted to run the
startscript before theinstallscript downloaded the base software for your specific OS. Run your OS's installer! - Slow Generation Speeds: Your model is too large for your host PC's RAM. Re-run
install.batand select the Gemma 2 2B Abliterated model, which runs rapidly even on older machines.
Disclaimer: USB-Uncensored-LLM is built for uncompromising computational freedom. By utilizing ablative models, the system will not moralize, lecture, or refuse your prompts. Please use responsibly.