30 open-source projects similar to llm-attacks/llm-attacks, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Llm Attacks alternative.
Giskard is an evaluation framework, testing library, and quality monitoring system for large language models and AI agents. It serves as a toolkit for quantifying model performance and reliability, providing specialized capabilities for validating retrieval-augmented generation pipelines. The project distinguishes itself through an automated red teaming tool and security scanner designed to identify vulnerabilities, prompt injections, and safety risks. It utilizes adversarial probing and synthetic edge case generation to quantify model robustness and detect information disclosure. The platfo
OpenCompass is a comprehensive evaluation platform, benchmarking suite, and distributed model evaluator designed to measure the performance and accuracy of large language models. It provides a framework for benchmarking both open-source and API-based models against diverse datasets using standardized metrics and reproducible pipelines. The project features an automated judging framework that uses language models as judges to score and verify the quality of generated text. It includes a performance leaderboard system for comparing the relative capabilities of various models across industry-sta
PyRIT is an AI vulnerability assessment tool and security scanner designed to detect risks in large language model applications. It functions as a generative AI red teaming framework used to simulate adversarial attacks and identify weaknesses in system guardrails. The tool automates AI risk assessment by scanning generative AI components for security vulnerabilities. It utilizes automated testing and analysis to identify security gaps and prevent potential exploits through a consistent, repeatable process. The system incorporates asynchronous model orchestration to compare security postures
OpenCompass is an open-source framework for standardized benchmarking of large language models. It provides a configurable evaluation pipeline that supports both objective and subjective assessment, using a dual-engine architecture to handle closed-form answer comparison and open-ended response rating. The framework is designed as a modular platform where datasets, models, and metrics are composed through declarative YAML configuration files. The framework distinguishes itself through its extensible model integration layer, which supports custom models, HuggingFace models, and third-party API
Cleverhans is a TensorFlow adversarial machine learning library that serves as an attack framework, a robustness benchmark, and a defense library. It provides a collection of tools to generate adversarial examples, test the security of neural networks, and implement protective mechanisms to increase model resilience against malicious inputs. The project focuses on creating perturbed inputs designed to deceive machine learning models into making incorrect predictions. It enables the evaluation of deep learning model stability and accuracy when subjected to adversarial noise, providing referenc
lmms-eval is a benchmarking system and performance analysis suite designed to measure the capabilities of large multimodal models. It provides a framework for evaluating models across text, image, audio, and video datasets, serving as a multimodal dataset orchestrator and benchmarking tool to quantify accuracy and efficiency. The project distinguishes itself through a unified multimodal message protocol that structures diverse media inputs for consistent model consumption. It features specialized benchmarking for audio, video, visual, document, and spatial reasoning, alongside tools for model
Arize Phoenix is an LLM observability platform and evaluation framework designed to capture execution traces and monitor large language model applications. It serves as a prompt management system for versioning and testing templates, and as a self-hosted AI operations infrastructure for managing telemetry and experiments. The platform differentiates itself through a specialized embedding visualization tool used to detect data drift and optimize vector search. It provides a comprehensive evaluation suite that utilizes judge-based evaluators and ground-truth datasets to score model outputs, and
Superagent is a framework for AI assistant orchestration and agent security. It provides the tools to build intelligent assistants that integrate external APIs and maintain conversation memory to automate complex tasks. The project focuses on AI agent security through adversarial testing, red teaming, and the detection of prompt injections and malicious tool calls. It includes automated vulnerability patching, which scans codebases and configurations for security flaws and generates pull requests with fixes. The platform supports retrieval augmented generation by connecting language models t
AI-Infra-Guard is a security scanning platform designed to detect vulnerabilities across large language model deployments, AI agent skills, and the underlying infrastructure. It functions as a security toolset for auditing source code, evaluating model robustness, and identifying insecure network configurations. The project provides a red teaming framework that uses curated attack datasets to test for jailbreak vulnerabilities and prompt injections. It also includes an infrastructure auditor that employs network fingerprinting and asset discovery to match running components against known comm
Cleverhans is an adversarial machine learning library and toolkit designed to generate adversarial examples, incorporate them into training loops, and benchmark the resilience of machine learning models. It provides a gradient-based attack framework for constructing both white-box and black-box attacks to identify model misclassifications. The project includes capabilities for model robustness benchmarking, allowing users to evaluate and verify how models resist evasion attacks and malicious input perturbations. It also facilitates adversarial training to increase a model's resistance to pert
The Adversarial Robustness Toolbox (ART) is an open-source library that provides a unified framework for evaluating, defending, and certifying machine learning models against adversarial threats. It wraps models from any framework behind a common estimator interface, enabling composable pipelines for attack generation, defense application, robustness certification, and privacy auditing across evasion, poisoning, and extraction threats. The library distinguishes itself by covering the full adversarial ML security lifecycle within a single toolkit. It supports gradient-based adversarial example
Giskard is an AI quality assurance suite and evaluation framework designed to measure the performance, bias, and security risks of large language models and AI agents. It functions as a vulnerability scanner to detect security flaws and performance regressions. The project provides automated red-teaming and adversarial testing workflows. These tools generate prompt-injection probes and adversarial attacks based on system descriptions to identify security gaps and vulnerabilities. The platform covers AI agent auditing and RAG quality validation, using knowledge-base grounding and synthetic da
mcp-context-forge is a Model Context Protocol federation gateway that unifies diverse AI tool servers and APIs into a single consistent interface for discovery and execution. It acts as a centralized proxy that aggregates multiple servers and APIs, allowing AI agents to access and invoke a unified set of tools, prompts, and resources. The project distinguishes itself through a multi-protocol translation bridge that converts communication between standard I/O, SSE, gRPC, and REST to enable interoperability between disparate tool servers. It includes a comprehensive LLM evaluation framework for
Wandb is a centralized platform for machine learning experiment tracking, model registry management, and workflow orchestration. It provides a comprehensive suite of tools for logging, visualizing, and versioning training metrics, model artifacts, and hyperparameter sweeps to ensure reproducibility across development cycles. The platform also functions as an observability tool for large language model applications, enabling the tracing of execution steps, token usage, and reasoning processes. The project distinguishes itself through its event-driven automation capabilities, which allow users
EasyEdit is a framework and toolkit designed for updating, inserting, or erasing specific factual information within large language models without requiring full retraining. It functions as a parameter modifier and knowledge editing system capable of performing targeted weight updates across diverse model architectures. The project distinguishes itself by supporting both text-based and multimodal model editing, allowing for knowledge updates across image and text modalities. It provides utilities for model steering to adjust personality and reasoning patterns in real time via activation inter
Garak is a suite of tools for measuring AI reliability, scanning for vulnerabilities, and automating security assessments through adaptive probing. It functions as a generative AI vulnerability scanner and evaluation tool designed to identify security gaps, hallucinations, and failure modes in language models. The framework provides a toolkit for red-teaming and safety assessments, utilizing a structured system of probes and detectors to calculate failure rates. It specifically scans for risks such as data leakage and prompt injection by recording model responses to adversarial inputs. The p
Promptify is a suite of tools designed for model evaluation, prompt management, token cost tracking, structured extraction, and unified API gateway access. It provides a standardized interface to manage requests and responses across multiple large language model providers. The project features a prompt management platform for engineering and versioning prompts with structured output validation. It includes a dedicated evaluation framework to measure model performance using precision, recall, and f1 scores against labeled datasets, alongside a token cost tracker to monitor the financial expens
This project is a comprehensive ecosystem of frameworks, toolkits, and datasets designed to evaluate model vulnerabilities and analyze jailbreak patterns. It serves as an adversarial testing framework and research toolkit for measuring the effectiveness of safety guardrails in large language models. The system includes a library of real-world prompt injection datasets harvested from social media to study bypass strategies. It provides specialized tools for semantic attack analysis and prompt visualization, allowing for the mapping of relationships between adversarial prompts to discover commo
Lighteval is an open-source framework for running standardized benchmarks and custom evaluation tasks against language models. It provides a system for defining new evaluation tasks with custom prompts, metrics, and scoring in YAML configuration files, and integrates with the Hugging Face Hub for storing and comparing results. The framework supports evaluating models across multiple inference backends, including transformers, vllm, and custom APIs, through a unified generation and log-probability interface. It includes a pluggable metric registry for built-in and custom scoring, a prediction
Coze-loop is an optimization platform and orchestration management suite for large language model agents. It functions as a comprehensive environment for the development, debugging, evaluation, and monitoring of AI agent performance. The project provides a dedicated prompt engineering playground for real-time iteration and validation of model responses. It includes an evaluation framework that runs automated assessments against datasets to generate performance metrics and verify output accuracy. The system covers observability through real-time execution tracing and historical analysis of ag
Garak is an AI model evaluation tool and vulnerability scanner designed for red teaming large language models and auditing the security of retrieval-augmented generation pipelines. It identifies behavioral weaknesses, such as jailbreaks, hallucinations, and data leakage, by simulating adversarial attacks and executing automated testing vectors. The framework utilizes an adaptive probing loop where prompts can react to previous model behavior and be modified in flight via middleware. To ensure consistent analysis, it employs a provider-agnostic interface to interact with various model APIs and
RagaAI-Catalyst is a suite of software implementation tools providing an SDK, dashboard, and platform for monitoring, debugging, red-teaming, and evaluating agentic AI workflows. It serves as an observability framework for tracing the execution paths of large language models and multi-agent systems. The project distinguishes itself through a security suite for automated red-teaming and vulnerability scanning to detect biases, alongside a centralized prompt registry that decouples templates from application code. It further provides an evaluation platform that combines synthetic data generatio
Lmnr is an LLM observability platform and evaluation framework designed for tracing, logging, and monitoring language model executions. It provides the tools necessary to debug agent behavior, analyze performance, and identify failure patterns in AI agents. The platform differentiates itself through a trace-to-dataset pipeline that converts production logs into labeled test sets for regression testing. It includes a prompt-variant replay engine to compare different prompts or models side-by-side and a state-cached debugging system to replay agent loops without restarting the process. The sys
PurpleLlama is a collection of security toolsets and frameworks designed to audit large language model vulnerabilities and implement runtime input-output guardrails. It provides a security evaluation framework and benchmark suite to quantify risks associated with prompt injections and the generation of malicious code. The project includes a content moderator and input-output filters that use a standardized taxonomy to identify and block harmful content, jailbreaking attempts, and insecure commands. It also features capabilities for sensitive document classification to prevent the unauthorized
PurpleLlama is a collection of security components and toolkits designed for large language models. It provides specialized systems including a code security scanner, a content moderation system, a prompt injection firewall, and a security assessment toolkit. The project enables the identification and blocking of jailbreaking attempts and malicious prompts during model inference. It includes capabilities for detecting violating content across multiple languages and modalities and scanning generated code for vulnerabilities to prevent the execution of insecure commands. The framework further
This project is a suite of utilities for creating synthetic training data, performing model fine-tuning, and verifying output quality through evaluation frameworks. It provides a toolkit for optimizing pre-trained large language models to improve performance on specific tasks. The system includes a synthetic dataset generator that creates diverse input-output training pairs from task descriptions. It also features a system prompt generator to produce the behavioral constraints and messages required to guide a fine-tuned model. The toolkit covers a complete workflow for model refinement, incl
G0DM0D3 is a static web client and multi-model chat gateway designed for AI research, prompt optimization, and red teaming. It provides a unified interface to query numerous AI models in parallel, allowing for the simultaneous evaluation of different prompt variations and sampling parameters to identify the most successful outputs. The project features specialized tooling for probing safety filters and bypassing model constraints through an input perturbation engine that applies text obfuscation and character substitution. It includes a composite scoring system to rank model performance and a
This project is a language model finetuning framework designed to adapt large language models to specific datasets using supervised fine-tuning and low-rank adaptation. It serves as a distributed training manager that coordinates workloads and synchronizes gradients across multiple processing units to scale performance. The framework includes a specialized toolkit for low-rank adaptation to update a subset of model weights, reducing memory and hardware requirements. It provides capabilities for instruction fine-tuning, domain adaptation, and the optimization of function calling to improve how
Map-anything is a 3D scene reconstruction framework and neural geometry estimator designed to transform two-dimensional images into metric three-dimensional spatial representations using feed-forward neural networks. It provides a specialized toolkit for predicting camera intrinsics and ray directions from single images without requiring external geometric metadata. The project includes a 3D model benchmarking suite that utilizes a unified model wrapper to standardize outputs from diverse reconstruction models. This allows for consistent evaluation and accuracy measurement across various spat
This project is a framework for the iterative optimization and validation of LLM agent skills. It functions as an agent capability orchestrator and prompt optimizer, utilizing an evaluation framework to measure performance through weighted rubrics and automated rewriting. The system distinguishes itself through a closed-loop optimization cycle that employs independent reviewer agents to prevent anchoring effects and a ratchet-based version control mechanism that automatically reverts changes if they fail to improve baseline scores. It also features exploratory structural rewriting to overcome