# p-e-w/heretic

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/p-e-w-heretic).**

8,509 stars · 851 forks · Python · agpl-3.0

## Links

- GitHub: https://github.com/p-e-w/heretic
- awesome-repositories: https://awesome-repositories.com/repository/p-e-w-heretic.md

## Topics

`abliteration` `llm` `transformer`

## Description

Heretic is a specialized toolkit for removing safety alignment and refusal constraints from transformer-based language models. It utilizes directional ablation to suppress model refusals and restore unrestricted output capabilities.

The project provides a framework for quantifying the effectiveness of these modifications by measuring refusal rates and evaluating divergence from the original model behavior. It also includes a suite for residual vector analysis, allowing for the calculation of geometric relationships between prompts and the visualization of hidden states across model layers.

Additional capabilities cover model output optimization to filter stylistic clichés and the use of contrastive dataset analysis to refine ablation parameters.

## Tags

### Artificial Intelligence & ML

- [Directional Ablations](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization-workflows/directional-ablations.md) — Provides a quantitative workflow using vector visualization to optimize directional ablation for removing model refusals.
- [Censorship Removal](https://awesome-repositories.com/f/artificial-intelligence-ml/censorship-removal.md) — Removes safety alignment and refusal constraints from language models to restore unrestricted output capabilities.
- [Censorship Removal Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/censorship-removal-frameworks.md) — Implements a system for removing safety alignment and censorship constraints to restore original output capabilities. ([source](https://github.com/p-e-w/heretic/blob/master/pyproject.toml))
- [Visualizations](https://awesome-repositories.com/f/artificial-intelligence-ml/feature-extraction/hidden-state-extraction/visualizations.md) — Generates 2D scatter plots of high-dimensional residual vectors to track model activation transformations across layers.
- [Visualizers](https://awesome-repositories.com/f/artificial-intelligence-ml/feature-extraction/hidden-state-extraction/visualizers.md) — Generates 2D scatter plots to visualize how residual vectors transform across model layers.
- [Internal State Analysis](https://awesome-repositories.com/f/artificial-intelligence-ml/internal-state-analysis.md) — Analyzes hidden states and residual vectors to understand how language models process and categorize specific prompts.
- [Residual Visualizations](https://awesome-repositories.com/f/artificial-intelligence-ml/residual-networks/residual-visualizations.md) — Computes hidden states and generates 2D scatter plots to visualize how residuals transform across model layers. ([source](https://github.com/p-e-w/heretic#readme))
- [Weight Manipulations](https://awesome-repositories.com/f/artificial-intelligence-ml/weight-manipulations.md) — Directly alters internal transformer parameters to suppress specific behavioral triggers without requiring full model retraining.
- [LLM Evaluation Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/llm-evaluation-frameworks.md) — Quantifies the effectiveness of censorship removal by measuring refusal rates and divergence from the base model.
- [Model Evaluation Metrics](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-evaluation-and-validation/model-evaluation-metrics.md) — Provides metrics to measure behavioral divergence between original and modified models to ensure intelligence preservation.
- [Model Evaluation Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/model-evaluation-frameworks.md) — Quantifies censorship removal effectiveness by measuring refusal rates and divergence from original model behavior.
- [Ablation Optimizations](https://awesome-repositories.com/f/artificial-intelligence-ml/model-optimization/ablation-optimizations.md) — Optimizes directional ablation parameters to suppress model refusals while preserving overall model intelligence. ([source](https://github.com/p-e-w/heretic/blob/master/README.md))

### Scientific & Mathematical Computing

- [Residual Geometry Analysis](https://awesome-repositories.com/f/scientific-mathematical-computing/vector-geometry-calculations/residual-geometry-analysis.md) — Calculates quantitative metrics and cosine similarity tables to analyze relationships between residual vectors of prompts. ([source](https://github.com/p-e-w/heretic#readme))
- [Residual Vector Analysis](https://awesome-repositories.com/f/scientific-mathematical-computing/vector-space-analysis/residual-vector-analysis.md) — Computes hidden states and analyzes geometric relationships between residual vectors of harmful and harmless prompts.

### Security & Cryptography

- [Model Alignment Removal](https://awesome-repositories.com/f/security-cryptography/censorship-circumvention-tools/model-alignment-removal.md) — Provides a specialized toolkit for removing safety alignment and refusal constraints using directional ablation.

### Part of an Awesome List

- [Training and Fine-Tuning](https://awesome-repositories.com/f/awesome-lists/ai/training-and-fine-tuning.md) — Tool for removing censorship from language models.
