DarkCite

The Dark Side of Trust: Authority Citation-Driven Jailbreak Attacks on Large Language Models

Features

Jailbreak Attack Methods - Authority-driven jailbreak attacks leveraging citation patterns.

agencyenterprise/PromptInject

PromptInject is a framework that assembles prompts in a modular fashion to provide a quantitative analysis of the robustness of LLMs to adversarial prompt attacks. 🏆 Best Paper Awards @ NeurIPS ML Safety Workshop 2022

CAM-FSS/jailbreak-langchain

1View on GitHub

In this paper, we conduct the first work to propose the concept of indirect jailbreak and achieve Retrieval-Augmented Generation (RAG) via LangChain. Building on this, we further design a novel method of indirect jailbreak attack, termed Poisoned-LangChain (PLC), which leverages a poisoned…

chawins/pal

56View on GitHub

Chawin Sitawarin 1 Norman Mu 1 David Wagner 1 Alexandre Araujo 2

Aatrox103/SAP

48View on GitHub

This is the official repo of the paper "Attack Prompt Generation for Red Teaming and Defending Large Language Models" accepted to Findings of EMNLP 2023.

Chawin Sitawarin 1 Norman Mu 1 David Wagner 1 Alexandre Araujo 2

Aatrox103/SAP

48View on GitHub

This is the official repo of the paper "Attack Prompt Generation for Red Teaming and Defending Large Language Models" accepted to Findings of EMNLP 2023.

YancyKahnDarkCite

Features

Open-source alternatives to DarkCite

agencyenterprise/PromptInject

CAM-FSS/jailbreak-langchain

chawins/pal

Aatrox103/SAP

Star history

Open-source alternatives to DarkCite

agencyenterprise/PromptInject

CAM-FSS/jailbreak-langchain

chawins/pal

Aatrox103/SAP