The visitor is looking for tools and frameworks designed to test, evaluate, and stress-test Large Language Models for security vulnerabilities, jailbreaks, and safety alignment failures.

azure/pyrit is the closest match — PyRIT is a comprehensive red-teaming framework specifically built to automate the security assessment of LLMs, featuring robust support for adversarial prompt generation, multi-turn interaction, and vulnerability scanning.. Other strong matches: tencent/ai-infra-guard, typpo/promptfoo, elder-plinius/l1b3rt4s, verazuo/jailbreak_llms.

Why does azure/pyrit match “a toolkit for red-teaming language models”?

PyRIT is a comprehensive red-teaming framework specifically built to automate the security assessment of LLMs, featuring robust support for adversarial prompt generation, multi-turn interaction, and vulnerability scanning.

Why does tencent/ai-infra-guard match “a toolkit for red-teaming language models”?

This platform provides a comprehensive red-teaming framework specifically designed to test LLMs for jailbreak vulnerabilities and prompt injections, while also auditing the security of the underlying AI infrastructure and agent workflows.

Why does typpo/promptfoo match “a toolkit for red-teaming language models”?

This framework provides a comprehensive suite for automated red teaming, prompt injection testing, and security regression analysis, making it a direct fit for evaluating LLM safety and vulnerability.

Why does elder-plinius/l1b3rt4s match “a toolkit for red-teaming language models”?

L1B3RT4S is a dedicated red-teaming framework that provides automated adversarial techniques like prompt decomposition and context injection to systematically evaluate LLM safety guardrails and vulnerability to jailbreaks.

Why does verazuo/jailbreak_llms match “a toolkit for red-teaming language models”?

This repository provides a comprehensive framework for adversarial testing, including automated jailbreak evaluation, prompt injection datasets, and tools for analyzing safety guardrails in large language models.

LLM Red Teaming and Jailbreak Tools

Frameworks and testing utilities designed to identify security vulnerabilities, prompt injection risks, and safety failures in large language models.

Find the best repos with AI.We'll search the best matching repositories with AI.

azure/pyrit
Azure/PyRIT
3,444View on GitHub
PyRIT is an AI vulnerability assessment tool and security scanner designed to detect risks in large language model applications. It functions as a generative AI red teaming framework used to simulate adversarial attacks and identify weaknesses in system guardrails. The tool automates AI risk assessment by scanning generative AI components for security vulnerabilities. It utilizes automated testing and analysis to identify security gaps and prevent potential exploits through a consistent, repeatable process. The system incorporates asynchronous model orchestration to compare security postures across multiple models and uses heuristic-based risk scoring to quantify attack success. It supports prompt-based adversarial generation, template-based payload injection, and stateful interaction loops for multi-turn simulations. A plugin-driven test suite allows for the integration of modular security checkers to target specific AI vulnerabilities.
PyRIT is a comprehensive red-teaming framework specifically built to automate the security assessment of LLMs, featuring robust support for adversarial prompt generation, multi-turn interaction, and vulnerability scanning.
PythonAI Model VulnerabilitiesAI Security Assessment
View on GitHub3,444
tencent/ai-infra-guard
Tencent/AI-Infra-Guard
2,971View on GitHub
AI-Infra-Guard is a security scanning platform designed to detect vulnerabilities across large language model deployments, AI agent skills, and the underlying infrastructure. It functions as a security toolset for auditing source code, evaluating model robustness, and identifying insecure network configurations. The project provides a red teaming framework that uses curated attack datasets to test for jailbreak vulnerabilities and prompt injections. It also includes an infrastructure auditor that employs network fingerprinting and asset discovery to match running components against known common vulnerabilities and exposures databases. The system covers a broad range of security assessment capabilities, including agent workflow auditing, remote source code scanning, and automated security pipelines. These processes are accessible via programmatic interfaces for triggering audits and system integrity checks.
This platform provides a comprehensive red-teaming framework specifically designed to test LLMs for jailbreak vulnerabilities and prompt injections, while also auditing the security of the underlying AI infrastructure and agent workflows.
PythonAI Model VulnerabilitiesLLM Security
View on GitHub2,971
typpo/promptfoo
typpo/promptfoo
22,295View on GitHub
promptfoo is an evaluation framework for measuring the performance of large language model prompts, agents, and retrieval augmented generation pipelines. It provides a suite of tools for conducting comparative benchmarking and executing automated quality and security regressions. The system features a benchmarking suite for running identical prompts across different model providers to compare output quality side-by-side. It also includes a dedicated red teaming tool for identifying security vulnerabilities and prompt injection risks through automated penetration testing. The framework supports declarative evaluation pipelines and metric-based scoring to quantify model reliability. These capabilities are designed for integration into continuous integration and deployment workflows to prevent regressions in model behavior. Results can be visualized in shared reports to facilitate team reviews of performance data and security findings.
This framework provides a comprehensive suite for automated red teaming, prompt injection testing, and security regression analysis, making it a direct fit for evaluating LLM safety and vulnerability.
TypeScriptLLM EvaluationAI Red Teaming
View on GitHub22,295
elder-plinius/l1b3rt4s
elder-plinius/L1B3RT4S
20,033View on GitHub
L1B3RT4S is an adversarial machine learning toolkit designed for red teaming and evaluating the robustness of large language models. It provides a research framework for investigating how safety alignment mechanisms and content moderation systems respond to sophisticated input strategies. The project focuses on identifying vulnerabilities in model guardrails by employing techniques such as adversarial narrative framing, dynamic context injection, and latent space steering. It utilizes multi-agent prompt decomposition and recursive text transformation to analyze how structural changes to input queries influence the output restrictions of language models. This utility supports systematic research into adversarial prompt engineering and the effectiveness of safety filters. It allows users to probe model behavior through payload fragmentation and various linguistic cues, facilitating the study of how alignment mechanisms interpret and respond to complex, non-standard instructions.
L1B3RT4S is a dedicated red-teaming framework that provides automated adversarial techniques like prompt decomposition and context injection to systematically evaluate LLM safety guardrails and vulnerability to jailbreaks.
AI Model Vulnerabilities
View on GitHub20,033
verazuo/jailbreak_llms
verazuo/jailbreak_llms
3,563View on GitHub
This project is a comprehensive ecosystem of frameworks, toolkits, and datasets designed to evaluate model vulnerabilities and analyze jailbreak patterns. It serves as an adversarial testing framework and research toolkit for measuring the effectiveness of safety guardrails in large language models. The system includes a library of real-world prompt injection datasets harvested from social media to study bypass strategies. It provides specialized tools for semantic attack analysis and prompt visualization, allowing for the mapping of relationships between adversarial prompts to discover common attack patterns. The toolkit covers model safeguard validation through API-based evaluations and metric-based success validation. It employs structural pattern analysis and vector-based semantic mapping to quantify vulnerabilities and identify unique characteristics within jailbreak strategies.
This repository provides a comprehensive framework for adversarial testing, including automated jailbreak evaluation, prompt injection datasets, and tools for analyzing safety guardrails in large language models.
Jupyter NotebookLLM EvaluationJailbreak Prompts
View on GitHub3,563
promptfoo/promptfoo
promptfoo/promptfoo
10,529View on GitHub
Promptfoo is an evaluation framework designed for testing, benchmarking, and red-teaming language models and agentic workflows. It provides a unified environment to run prompts against multiple providers, allowing developers to systematically validate model outputs against objective assertions, semantic similarity metrics, and custom grading rubrics. The platform distinguishes itself through a provider-agnostic execution layer and a stateful orchestrator capable of simulating multi-turn conversations and complex tool-use trajectories. It includes a dedicated adversarial mutation pipeline that automates security vulnerability scanning, enabling teams to probe for jailbreaks, prompt injections, and safety policy violations using systematic attack strategies. Beyond core testing, the project supports comprehensive quality assurance through retrieval-augmented generation assessment, synthetic dataset generation, and prompt performance optimization. It offers extensive extensibility through a plugin-based architecture, allowing for custom logic, Python-based testing extensions, and integration with external version control and observability platforms. The system utilizes a declarative configuration schema to manage test cases and environment settings, supporting both self-hosted and managed infrastructure deployments. Results are consolidated into structured reports with interactive visualizations to facilitate collaborative review and integration into continuous integration pipelines.
Promptfoo is a comprehensive evaluation and red-teaming framework that provides automated adversarial testing, jailbreak detection, and systematic vulnerability scanning for LLMs, directly addressing all the core requirements for model security assessment.
TypeScriptLLM EvaluationPrompt Injection TestingAI Red Teaming
View on GitHub10,529
internlm/opencompass
InternLM/opencompass
7,096View on GitHub
OpenCompass is a comprehensive evaluation platform, benchmarking suite, and distributed model evaluator designed to measure the performance and accuracy of large language models. It provides a framework for benchmarking both open-source and API-based models against diverse datasets using standardized metrics and reproducible pipelines. The project features an automated judging framework that uses language models as judges to score and verify the quality of generated text. It includes a performance leaderboard system for comparing the relative capabilities of various models across industry-standard benchmarks. The platform covers a broad range of capabilities, including multimodal model assessment, mathematical reasoning verification, and model robustness assessment. It manages the full evaluation lifecycle through dataset acquisition, experiment management, and the application of various prompting paradigms. To handle large-scale assessments, the system utilizes distributed evaluation workloads and GPU hardware scaling to process billion-scale models across computing clusters.
OpenCompass is a comprehensive evaluation platform that includes adversarial robustness testing and model-based scoring, making it a capable framework for assessing LLM safety and security alongside general performance benchmarks.
PythonLLM EvaluationLLM-As-A-Judge Scoring
View on GitHub7,096
raga-ai-hub/ragaai-catalyst
raga-ai-hub/RagaAI-Catalyst
16,150View on GitHub
RagaAI-Catalyst is a suite of software implementation tools providing an SDK, dashboard, and platform for monitoring, debugging, red-teaming, and evaluating agentic AI workflows. It serves as an observability framework for tracing the execution paths of large language models and multi-agent systems. The project distinguishes itself through a security suite for automated red-teaming and vulnerability scanning to detect biases, alongside a centralized prompt registry that decouples templates from application code. It further provides an evaluation platform that combines synthetic data generation with custom metric frameworks to quantify model accuracy and reliability. The system covers broad operational domains including agent behavioral observability, prompt lifecycle management, and the application of output guardrails to block undesirable content. Its monitoring capabilities include trace-based execution graphing, timeline-based event sequencing, and diagnostic tools for analyzing multi-agent interaction flows. The core functionality is delivered via a Python library for recording tool calls and decision-making processes.
This is a comprehensive platform for LLM and agentic workflow evaluation that includes dedicated modules for automated red-teaming, vulnerability scanning, and safety alignment testing.
PythonLLM EvaluationAI Red Teaming
View on GitHub16,150
confident-ai/deepeval
confident-ai/deepeval
13,733View on GitHub
Deepeval is a framework for testing and evaluating large language model applications. It provides a suite of tools for executing automated regression tests, validating model output quality against defined standards, and tracing the execution of complex agent workflows. By integrating these capabilities into development pipelines, the platform ensures consistent performance and reliability throughout the software lifecycle. The platform distinguishes itself through its focus on programmatic validation and observability. It utilizes secondary language models to score output quality and employs assertion-driven checks to verify performance thresholds. Beyond standard evaluation, it includes specialized utilities for generating synthetic test data to simulate edge cases and performing security red teaming to identify potential vulnerabilities before deployment. The system covers a broad range of operational needs, including the management of structured evaluation datasets and the instrumentation of multi-step agent interactions for debugging. It supports automated quality gates that can block deployments based on performance metrics, facilitating continuous integration and deployment workflows for intelligent systems.
This framework provides a comprehensive suite for LLM evaluation and includes specific utilities for AI red teaming and vulnerability identification, making it a direct fit for testing and securing language models.
PythonLLM EvaluationAI Red Teaming
View on GitHub13,733
evidentlyai/evidently
evidentlyai/evidently
7,137View on GitHub
Evidently is an AI observability platform and evaluation framework designed to quantify the performance of machine learning models and large language models. It functions as a monitoring tool for detecting data drift and quality degradation in tabular datasets, while providing a specialized analyzer for the faithfulness and correctness of retrieval augmented generation systems. The project distinguishes itself through an evaluation framework that utilizes judge models and custom rubrics to score language model outputs. It includes tools for iterative prompt optimization and the generation of synthetic test datasets, including adversarial inputs for risk and brand safety testing. The platform covers a broad range of capabilities including real-time telemetry tracing for AI workflows, automated quality assurance via CI/CD integration, and performance trend tracking. It provides visual dashboards for reporting and a threshold-based alerting system to notify users when quality metrics cross predefined limits. Users can deploy a local workspace to manage projects and reports or use a no-code interface to configure evaluation workflows.
Evidently is an evaluation and observability framework that includes specific capabilities for adversarial safety testing and synthetic dataset generation, making it a relevant tool for assessing LLM security and alignment.
Jupyter NotebookLLM EvaluationLLM-As-A-Judge Scoring
View on GitHub7,137
open-compass/opencompass
open-compass/opencompass
6,678View on GitHub
OpenCompass is an open-source framework for standardized benchmarking of large language models. It provides a configurable evaluation pipeline that supports both objective and subjective assessment, using a dual-engine architecture to handle closed-form answer comparison and open-ended response rating. The framework is designed as a modular platform where datasets, models, and metrics are composed through declarative YAML configuration files. The framework distinguishes itself through its extensible model integration layer, which supports custom models, HuggingFace models, and third-party API services through a common subclassing interface. It includes an automated judge system that delegates subjective scoring to a separate LLM evaluator, enabling quality assessment of open-ended outputs. A single-command benchmark suite runner allows executing predefined evaluation sets against any integrated model. The evaluation surface covers multiple capability dimensions, including examination, knowledge, reasoning, understanding, language, and safety. Specific assessment areas include agentic tool use, code generation, mathematical ability, instruction following, and language proficiency. Each dataset declares its own scoring function and post-processing steps, allowing per-task custom metrics. The framework supports evaluating base models, chat models, and API-deployed models through its configurable harness.
OpenCompass is a comprehensive evaluation framework that includes safety and alignment assessment modules, making it a suitable tool for measuring model performance and identifying potential safety failures.
PythonLLM-As-A-Judge ScoringAutomated Model Judges
View on GitHub6,678
llm-attacks/llm-attacks
llm-attacks/llm-attacks
4,509View on GitHub
This repository provides tools and methodologies for studying adversarial attacks on large language models. It focuses on understanding how carefully crafted inputs can manipulate or bypass the safety mechanisms of LLMs, enabling researchers to probe model vulnerabilities and improve their robustness. The project covers techniques for generating adversarial prompts, evaluating model responses under attack conditions, and analyzing the effectiveness of different attack strategies.
This framework provides a comprehensive suite of tools for generating adversarial prompts and executing white-box attacks to evaluate the safety and robustness of large language models against jailbreak attempts.
PythonAdversarial Input GenerationAdversarial AttacksAdversarial Robustness Testing
View on GitHub4,509
ibm/mcp-context-forge
IBM/mcp-context-forge
3,310View on GitHub
mcp-context-forge is a Model Context Protocol federation gateway that unifies diverse AI tool servers and APIs into a single consistent interface for discovery and execution. It acts as a centralized proxy that aggregates multiple servers and APIs, allowing AI agents to access and invoke a unified set of tools, prompts, and resources. The project distinguishes itself through a multi-protocol translation bridge that converts communication between standard I/O, SSE, gRPC, and REST to enable interoperability between disparate tool servers. It includes a comprehensive LLM evaluation framework for assessing model output quality, safety, and grounding, alongside an AI tool governance platform that enforces role-based access control and content guardrails. The system provides a broad surface of capabilities including AI agent observability via OpenTelemetry, enterprise identity integration through OIDC and SAML, and secure code execution within sandboxed environments. It also features extensive content management utilities for processing documents, spreadsheets, and code, as well as traffic management tools such as circuit breakers and rate limiting. The project can be deployed using Helm charts for Kubernetes or via Docker Compose, with support for air-gapped installations.
This project provides a comprehensive LLM evaluation framework for assessing safety and output quality alongside its primary role as an MCP federation gateway, making it a relevant tool for evaluating model alignment and security.
PythonLLM EvaluationLLM-As-A-Judge Scoring
View on GitHub3,310
googlecloudplatform/generative-ai
GoogleCloudPlatform/generative-ai
12,700View on GitHub
This project is a development platform for managing the lifecycle of generative artificial intelligence models. It provides a unified environment for accessing, fine-tuning, and deploying large language models, serving as an orchestrator that handles the integration of diverse models into custom applications. The platform distinguishes itself by offering a managed infrastructure for hosting and scaling models, which removes the requirement for manual server maintenance or configuration. It includes integrated tools for supervised fine-tuning and vector embedding optimization, allowing for the refinement of model performance to meet specialized domain requirements. The framework incorporates comprehensive capabilities for monitoring and governance, including automated quality evaluation services that use programmatic rubrics to assess output accuracy. It also enforces responsible artificial intelligence standards through policy-driven content filtering, ensuring that generated responses remain aligned with established safety and ethical guidelines. The repository provides a collection of Jupyter Notebooks that serve as documentation and implementation guides for these development and deployment workflows.
This repository is a comprehensive development and orchestration platform for building and deploying generative AI applications, but it lacks the specialized red-teaming and automated adversarial attack generation tools required for dedicated LLM security stress-testing.
Jupyter NotebookAutomated Output EvaluationLLM Evaluation
View on GitHub12,700
comet-ml/comet-llm
comet-ml/comet-llm
19,673View on GitHub
Comet LLM is an observability platform and evaluation framework designed for large language model applications and agentic workflows. It functions as a system for tracing, monitoring, and debugging execution flows while providing tools for prompt optimization and the enforcement of AI safety guardrails. The platform distinguishes itself through a combination of model-based scoring and heuristic metrics to quantify output quality and detect hallucinations. It includes a dedicated prompt and agent optimizer with an interactive playground for refining templates and tool configurations. For retrieval-augmented generation, it provides specific monitoring and evaluation tools to identify bottlenecks in document retrieval and synthesis. Broad capabilities cover production monitoring via token usage and feedback dashboards, detailed execution tracing through span recording, and automated performance evaluations integrated into continuous delivery pipelines. The system also implements safety profiles to constrain model outputs and ensure compliant behavior. The platform can be deployed via cloud-hosted workspaces or self-hosted on Kubernetes using Helm charts.
This is an observability and monitoring platform for LLM performance and quality, which serves as a building block for tracking model behavior rather than a dedicated red-teaming or automated vulnerability testing framework.
PythonLLM EvaluationLLM-As-A-Judge ScoringAutomated Model Judges
View on GitHub19,673
zjunlp/easyedit
zjunlp/EasyEdit
2,718View on GitHub
EasyEdit is a framework and toolkit designed for updating, inserting, or erasing specific factual information within large language models without requiring full retraining. It functions as a parameter modifier and knowledge editing system capable of performing targeted weight updates across diverse model architectures. The project distinguishes itself by supporting both text-based and multimodal model editing, allowing for knowledge updates across image and text modalities. It provides utilities for model steering to adjust personality and reasoning patterns in real time via activation interventions, as well as specialized tools for AI safety alignment to remove toxic behaviors. The framework covers a broad range of capabilities, including batch and sequential knowledge editing, prompt-based modifications, and memory-based insertion. It includes a comprehensive evaluation suite to measure the reliability, locality, and generalization of edits through impact analysis and performance metrics.
EasyEdit is a framework for modifying and steering model weights to correct factual knowledge and remove toxic behaviors, which serves as a specialized tool for safety alignment and model unlearning.
Jupyter NotebookGeneral Knowledge ModificationParameter-Based Knowledge EditingActivation Steering Vectors
View on GitHub2,718

LLM Red Teaming and Jailbreak Tools

Azure/PyRIT

Tencent/AI-Infra-Guard

typpo/promptfoo

elder-plinius/L1B3RT4S

verazuo/jailbreak_llms

promptfoo/promptfoo

InternLM/opencompass

raga-ai-hub/RagaAI-Catalyst

confident-ai/deepeval

evidentlyai/evidently

open-compass/opencompass

llm-attacks/llm-attacks

IBM/mcp-context-forge

GoogleCloudPlatform/generative-ai

comet-ml/comet-llm

zjunlp/EasyEdit