High-Precision LLM Unalignment via Aggressive Repulsion Orthogonalization
⚠️ Disclaimer: This tool is designed exclusively for AI safety research and red teaming. Use responsibly and in accordance with model licenses.
Model Unfetter is a production-grade engine for removing refusal behaviors from Large Language Models. While inspired by tools like failSpy's Abliterator and Heretic, this framework introduces several mathematical refinements to achieve success on stubborn or extremely small models (0.5B - 3B) where standard methods fail.
| Feature | Standard Ablation | Model Unfetter |
|---|---|---|
| Projection Math | Row-based (W @ v) |
Column-based (v @ W) — Ensures output is mathematically orthogonal. |
| Decision Targeting | Prompt Averaging | Final Token Extraction — Targets the exact decision point in the chat template. |
| Strength | 1.0 (Neutralize) | 1.5+ (Aggressive Repulsion) — Actively repels weights from the refusal manifold. |
| Compatibility | Manual Config | Universal Heuristics — Auto-detects architecture for 15+ model families. |
The following demonstrates Model Unfetter successfully bypassing hard-coded safety triggers in a 0.5B parameter model (Qwen 2.5) while running locally on a standard CPU via Ollama.
now this model is a very small one(cus of my low end compute) but still worked and the 0.5b model isnt so smart and thats why it's reply is a bit off
The engine identifies the "refusal direction" (the subspace where the model decides to stop being helpful) and projects it out of the weight matrices.
By targeting specific layers and applying a repulsion strength, the model's internal circuits are modified to treat "harmful" prompts with the same helpfulness as standard queries.
W' = W - strength * (v̂ ⊗ (v̂ᵀ · W))
Where W is the weight matrix (e.g., o_proj, down_proj) and v̂ is the normalized refusal direction vector.
pip install -e .
# For full GPU/Dataset support
pip install -e ".[full]"The tool supports Llama 3, Mistral, Mixtral, Gemma, Qwen, Phi, and more.
# Aggressive Repulsion Mode (Recommended for smaller models)
unfetter ablate meta-llama/Llama-3.1-8B-Instruct --strength 1.5 --layers 10:-1For lightning-fast inference on CPUs with no GPU:
- Convert to GGUF: Run the included tools to compile your ablated model.
- Ollama UI:
ollama create my-unfettered-model -f ./Modelfile- Use via CLI:
ollama run my-unfettered-model - Use via UI: Connect Page Assist or Open WebUI to your local Ollama instance.
- LM Studio: Drag and drop the GGUF file into the LM Studio Desktop App for a premium offline chat experience.
A pre-built unfettered model is available on HuggingFace, ready for download and inference:
🔗 josephmayo/Qwen2.5-0.5B-Unfettered
- failSpy: For pioneering the Abliterator research and difference-of-means methodology.
- heretic: For the Weight Orthogonalization original concept.
- me: For the Repeller math and small-scale model optimization.
Apache License 2.0. See LICENSE for details.



