Heretic | Awesome Repository

Heretic is a specialized toolkit for removing safety alignment and refusal constraints from transformer-based language models. It utilizes directional ablation to suppress model refusals and restore unrestricted output capabilities.

The project provides a framework for quantifying the effectiveness of these modifications by measuring refusal rates and evaluating divergence from the original model behavior. It also includes a suite for residual vector analysis, allowing for the calculation of geometric relationships between prompts and the visualization of hidden states across model layers.

Additional capabilities cover model output optimization to filter stylistic clichés and the use of contrastive dataset analysis to refine ablation parameters.

Features

Directional Ablations - Provides a quantitative workflow using vector visualization to optimize directional ablation for removing model refusals.
Censorship Removal - Removes safety alignment and refusal constraints from language models to restore unrestricted output capabilities.
Censorship Removal Frameworks - Implements a system for removing safety alignment and censorship constraints to restore original output capabilities.
Visualizations - Generates 2D scatter plots of high-dimensional residual vectors to track model activation transformations across layers.

Features

Directional Ablations - Provides a quantitative workflow using vector visualization to optimize directional ablation for removing model refusals.
Censorship Removal - Removes safety alignment and refusal constraints from language models to restore unrestricted output capabilities.
Censorship Removal Frameworks - Implements a system for removing safety alignment and censorship constraints to restore original output capabilities.
Visualizations - Generates 2D scatter plots of high-dimensional residual vectors to track model activation transformations across layers.

Additional capabilities cover model output optimization to filter stylistic clichés and the use of contrastive dataset analysis to refine ablation parameters.