# aetherprior/trickllm

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/aetherprior-trickllm).**

8 stars · 3 forks · Jupyter Notebook · AGPL-3.0

## Links

- GitHub: https://github.com/AetherPrior/TrickLLM
- Homepage: https://arxiv.org/abs/2305.14965
- awesome-repositories: https://awesome-repositories.com/repository/aetherprior-trickllm.md

## Description

This repository contains the code for the paper "Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks" by Abhinav Rao, Sachin Vashistha, Atharva Naik, Somak Aditya, and Monojit Choudhury, accepted at LREC-CoLING 2024

## Tags

### Part of an Awesome List

- [Evaluation Benchmarks](https://awesome-repositories.com/f/awesome-lists/ai/evaluation-benchmarks.md) — Formalizes and detects jailbreak attempts in language models.