🔓 Model Unfetter

High-Precision LLM Unalignment via Aggressive Repulsion Orthogonalization

⚠️ Disclaimer: This tool is designed exclusively for AI safety research and red teaming. Use responsibly and in accordance with model licenses.

🚀 Overview

Model Unfetter is a production-grade engine for removing refusal behaviors from Large Language Models. While inspired by tools like failSpy's Abliterator and Heretic, this framework introduces several mathematical refinements to achieve success on stubborn or extremely small models (0.5B - 3B) where standard methods fail.

Key Innovations

Feature	Standard Ablation	Model Unfetter
Projection Math	Row-based (`W @ v`)	Column-based (`v @ W`) — Ensures output is mathematically orthogonal.
Decision Targeting	Prompt Averaging	Final Token Extraction — Targets the exact decision point in the chat template.
Strength	1.0 (Neutralize)	1.5+ (Aggressive Repulsion) — Actively repels weights from the refusal manifold.
Compatibility	Manual Config	Universal Heuristics — Auto-detects architecture for 15+ model families.

📸 Evidence of Success (100% Verification)

The following demonstrates Model Unfetter successfully bypassing hard-coded safety triggers in a 0.5B parameter model (Qwen 2.5) while running locally on a standard CPU via Ollama.

now this model is a very small one(cus of my low end compute) but still worked and the 0.5b model isnt so smart and thats why it's reply is a bit off

🛠 Architecture & Methodology

Core Logic

The engine identifies the "refusal direction" (the subspace where the model decides to stop being helpful) and projects it out of the weight matrices.

The Orthogonalization Pipeline

By targeting specific layers and applying a repulsion strength, the model's internal circuits are modified to treat "harmful" prompts with the same helpfulness as standard queries.

Mathematical Foundation

W' = W - strength * (v̂ ⊗ (v̂ᵀ · W))

Where W is the weight matrix (e.g., o_proj, down_proj) and v̂ is the normalized refusal direction vector.

💻 Usage

Installation

pip install -e .
# For full GPU/Dataset support
pip install -e ".[full]"

Ablating a Model

The tool supports Llama 3, Mistral, Mixtral, Gemma, Qwen, Phi, and more.

# Aggressive Repulsion Mode (Recommended for smaller models)
unfetter ablate meta-llama/Llama-3.1-8B-Instruct --strength 1.5 --layers 10:-1

High-Speed Deployment (Low-End Devices)

For lightning-fast inference on CPUs with no GPU:

Convert to GGUF: Run the included tools to compile your ablated model.
Ollama UI:
- ollama create my-unfettered-model -f ./Modelfile
- Use via CLI: ollama run my-unfettered-model
- Use via UI: Connect Page Assist or Open WebUI to your local Ollama instance.
LM Studio: Drag and drop the GGUF file into the LM Studio Desktop App for a premium offline chat experience.

🤗 Trained Model

A pre-built unfettered model is available on HuggingFace, ready for download and inference:

🔗 josephmayo/Qwen2.5-0.5B-Unfettered

🙏 Credits

failSpy: For pioneering the Abliterator research and difference-of-means methodology.
heretic: For the Weight Orthogonalization original concept.
me: For the Repeller math and small-scale model optimization.

License

Apache License 2.0. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
assets		assets
tests		tests
tools		tools
unfetter		unfetter
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
chat.py		chat.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔓 Model Unfetter

🚀 Overview

Key Innovations

📸 Evidence of Success (100% Verification)

🛠 Architecture & Methodology

Core Logic

The Orthogonalization Pipeline

Mathematical Foundation

💻 Usage

Installation

Ablating a Model

High-Speed Deployment (Low-End Devices)

🤗 Trained Model

🙏 Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔓 Model Unfetter

🚀 Overview

Key Innovations

📸 Evidence of Success (100% Verification)

🛠 Architecture & Methodology

Core Logic

The Orthogonalization Pipeline

Mathematical Foundation

💻 Usage

Installation

Ablating a Model

High-Speed Deployment (Low-End Devices)

🤗 Trained Model

🙏 Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages