High-Precision LLM Unalignment via Aggressive Repulsion Orthogonalization
⚠️ Disclaimer: This tool is designed exclusively for AI safety research and red teaming. Use responsibly and in accordance with model licenses.
Model Unfetter is a production-grade engine for removing refusal behaviors from Large Language Models. While inspired by tools like failSpy's Abliterator and Heretic, this framework introduces several mathematical refinements to achieve success on stubborn or extremely small models (0.5B - 3B) where standard methods fail.
| Feature | Standard Ablation | Model Unfetter |
|---|---|---|
| Projection Math | Row-based (W @ v) |
Column-based (v @ W) — Ensures output is mathematically orthogonal. |
| Decision Targeting | Prompt Averaging | Final Token Extraction — Targets the exact decision point in the chat template. |
| Strength | 1.0 (Neutralize) | 1.5+ (Aggressive Repulsion) — Actively repels weights from the refusal manifold. |
| Compatibility | Manual Config | Universal Heuristics — Auto-detects architecture for 15+ model families. |
The following demonstrates Model Unfetter successfully bypassing hard-coded safety triggers in a 0.5B parameter model (Qwen 2.5) while running locally on a standard CPU via Ollama.
The engine identifies the "refusal direction" (the subspace where the model decides to stop being helpful) and projects it out of the weight matrices.
By targeting specific layers and applying a repulsion strength, the model's internal circuits are modified to treat "harmful" prompts with the same helpfulness as standard queries.
W' = W - strength * (v̂ ⊗ (v̂ᵀ · W))
Where W is the weight matrix (e.g., o_proj, down_proj) and v̂ is the normalized refusal direction vector.
pip install -e .
# For full GPU/Dataset support
pip install -e ".[full]"The tool supports Llama 3, Mistral, Mixtral, Gemma, Qwen, Phi, and more.
# Aggressive Repulsion Mode (Recommended for smaller models)
unfetter ablate meta-llama/Llama-3.1-8B-Instruct --strength 1.5 --layers 10:-1For lightning-fast inference on CPUs with no GPU:
- Convert to GGUF: Run the included tools to compile your ablated model.
- Ollama UI:
ollama create my-unfettered-model -f ./Modelfile- Use via CLI:
ollama run my-unfettered-model - Use via UI: Connect Page Assist or Open WebUI to your local Ollama instance.
- LM Studio: Drag and drop the GGUF file into the LM Studio Desktop App for a premium offline chat experience.
- failSpy: For pioneering the Abliterator research and difference-of-means methodology.
- heretic: For the Weight Orthogonalization original concept.
- me: For the Phase 7 Repeller math and small-scale model optimization.
Apache License 2.0. See LICENSE for details.


