# verazuo/jailbreak_llms

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/verazuo-jailbreak-llms).**

3,563 stars · 315 forks · Jupyter Notebook · mit

## Links

- GitHub: https://github.com/verazuo/jailbreak_llms
- Homepage: https://jailbreak-llms.xinyueshen.me/
- awesome-repositories: https://awesome-repositories.com/repository/verazuo-jailbreak-llms.md

## Topics

`chatgpt` `jailbreak` `jailbreaking` `large-language-model` `llm` `llm-security` `prompt`

## Description

This project is a comprehensive ecosystem of frameworks, toolkits, and datasets designed to evaluate model vulnerabilities and analyze jailbreak patterns. It serves as an adversarial testing framework and research toolkit for measuring the effectiveness of safety guardrails in large language models.

The system includes a library of real-world prompt injection datasets harvested from social media to study bypass strategies. It provides specialized tools for semantic attack analysis and prompt visualization, allowing for the mapping of relationships between adversarial prompts to discover common attack patterns.

The toolkit covers model safeguard validation through API-based evaluations and metric-based success validation. It employs structural pattern analysis and vector-based semantic mapping to quantify vulnerabilities and identify unique characteristics within jailbreak strategies.

## Tags

### Artificial Intelligence & ML

- [Adversarial Robustness Testing](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-evaluation-and-validation/model-capability-assessment/adversarial-robustness-testing.md) — Provides a full framework for evaluating model security and stability through adversarial attack simulation.
- [Prompt Injection Techniques](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-security-and-governance/adversarial-security-research/prompt-injection-techniques.md) — Analyzes adversarial input patterns and prompt injection techniques harvested from real-world usage.
- [Jailbreak Research Toolkits](https://awesome-repositories.com/f/artificial-intelligence-ml/jailbreak-research-toolkits.md) — Provides a comprehensive collection of tools and datasets for analyzing how prompts bypass safety constraints.
- [Model Evaluation Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/model-evaluation-frameworks.md) — Provides a system for running model inference and validation against curated forbidden datasets. ([source](https://cdn.jsdelivr.net/gh/verazuo/jailbreak_llms@main/README.md))
- [Jailbreak Prompts](https://awesome-repositories.com/f/artificial-intelligence-ml/prompt-engineering/jailbreak-prompts.md) — Provides a collection of real-world prompts designed to bypass safety filters and operational constraints. ([source](https://cdn.jsdelivr.net/gh/verazuo/jailbreak_llms@main/README.md))
- [Safety and Accuracy Metrics](https://awesome-repositories.com/f/artificial-intelligence-ml/safety-and-accuracy-metrics.md) — Implements quantitative measures and automated judges to evaluate the safety of LLM outputs.
- [Safeguard Validations](https://awesome-repositories.com/f/artificial-intelligence-ml/model-output-safeguarding/safeguard-validations.md) — Measures the success rate of adversarial prompts by testing language models against diverse forbidden scenarios. ([source](https://jailbreak-llms.xinyueshen.me/))
- [API-Deployed Evaluations](https://awesome-repositories.com/f/artificial-intelligence-ml/model-performance-evaluators/evaluation-configurations/api-deployed-evaluations.md) — Provides a harness to run benchmark evaluations against models deployed as API services.
- [Semantic Relationship Visualizers](https://awesome-repositories.com/f/artificial-intelligence-ml/prompt-visualizers/semantic-relationship-visualizers.md) — Creates visual representations of semantic relationships between prompts to analyze common bypass patterns. ([source](https://cdn.jsdelivr.net/gh/verazuo/jailbreak_llms@main/README.md))
- [Semantic Cluster Relationship Mapping](https://awesome-repositories.com/f/artificial-intelligence-ml/semantic-cluster-relationship-mapping.md) — Uses vector-based embeddings to create visual maps showing the relationship between semantic attack clusters.

### Security & Cryptography

- [Injection Datasets](https://awesome-repositories.com/f/security-cryptography/model-context-protocol-security/prompt-injection-defenses/injection-datasets.md) — Ships a library of real-world prompt injection datasets harvested from social media to study bypass strategies.
- [Prompt](https://awesome-repositories.com/f/security-cryptography/attack-surface-analysis/attack-path-visualizations/prompt.md) — Visualizes relationships between different prompts to discover common patterns used in model jailbreak attempts.
- [Filter Validations](https://awesome-repositories.com/f/security-cryptography/model-safety-filters/filter-validations.md) — Tests large language models with forbidden scenarios to determine if safety filters successfully block harmful content.

### Software Engineering & Architecture

- [Adversarial Prompt Pattern Analysis](https://awesome-repositories.com/f/software-engineering-architecture/string-processing-algorithms/adversarial-prompt-pattern-analysis.md) — Extracts common syntactic and structural markers from prompt collections to identify specific jailbreak techniques.

### Part of an Awesome List

- [Safety Success Metrics](https://awesome-repositories.com/f/awesome-lists/ai/evaluation-benchmarks/automation-success-metrics/safety-success-metrics.md) — Quantifies model vulnerability by comparing generated outputs against predefined forbidden criteria.

### Scientific & Mathematical Computing

- [Prompt Transformation Analysis](https://awesome-repositories.com/f/scientific-mathematical-computing/research-analysis-workflows/research-and-data-analysis-tools/research-and-analysis-tools/prompt-transformation-analysis.md) — Studies how structural changes in prompts influence model interpretation to identify jailbreak strategies. ([source](https://jailbreak-llms.xinyueshen.me/))

### Testing & Quality Assurance

- [LLM Evaluation](https://awesome-repositories.com/f/testing-quality-assurance/model-testing/llm-evaluation.md) — Implements tools for measuring the quality and safety of model outputs using custom metrics and automated judges.