30 open-source projects similar to protectai/rebuff, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Rebuff alternative.
Garak is a suite of tools for measuring AI reliability, scanning for vulnerabilities, and automating security assessments through adaptive probing. It functions as a generative AI vulnerability scanner and evaluation tool designed to identify security gaps, hallucinations, and failure modes in language models. The framework provides a toolkit for red-teaming and safety assessments, utilizing a structured system of probes and detectors to calculate failure rates. It specifically scans for risks such as data leakage and prompt injection by recording model responses to adversarial inputs. The p
This project is no longer actively maintained. You are welcome to fork and continue its development on your own. Thank you for your interest and support.
NeMo-Guardrails is a toolkit for adding programmable safety constraints and dialogue boundaries to large language model conversational systems. It functions as security middleware that intercepts inputs and outputs to block prompt injections, jailbreaks, and sensitive data leaks, while providing a conversational dialogue manager to define structured interaction flows through configuration files. The framework includes a hallucination filter to screen model outputs for factual accuracy and a specialized modeling language for defining conversational flows and constraints. It provides capabiliti
PyRIT is an AI vulnerability assessment tool and security scanner designed to detect risks in large language model applications. It functions as a generative AI red teaming framework used to simulate adversarial attacks and identify weaknesses in system guardrails. The tool automates AI risk assessment by scanning generative AI components for security vulnerabilities. It utilizes automated testing and analysis to identify security gaps and prevent potential exploits through a consistent, repeatable process. The system incorporates asynchronous model orchestration to compare security postures
LLM Guard is a security firewall and guardrail framework designed to scan and sanitize inputs and outputs for large language models. It functions as a proxy gateway and security layer to block prompt injections, toxicity, and sensitive data leakage while ensuring that model interactions remain compliant with organizational policies. The system distinguishes itself through a modular scanner pipeline that utilizes local model orchestration to eliminate external network dependencies. It supports real-time security filtering via streaming chunk analysis and implements a fail-fast execution model
Garak is an AI model evaluation tool and vulnerability scanner designed for red teaming large language models and auditing the security of retrieval-augmented generation pipelines. It identifies behavioral weaknesses, such as jailbreaks, hallucinations, and data leakage, by simulating adversarial attacks and executing automated testing vectors. The framework utilizes an adaptive probing loop where prompts can react to previous model behavior and be modified in flight via middleware. To ensure consistent analysis, it employs a provider-agnostic interface to interact with various model APIs and
The Adversarial Robustness Toolbox (ART) is an open-source library that provides a unified framework for evaluating, defending, and certifying machine learning models against adversarial threats. It wraps models from any framework behind a common estimator interface, enabling composable pipelines for attack generation, defense application, robustness certification, and privacy auditing across evasion, poisoning, and extraction threats. The library distinguishes itself by covering the full adversarial ML security lifecycle within a single toolkit. It supports gradient-based adversarial example
Superagent is an AI safety platform that protects applications from prompt injections, data leaks, and harmful outputs through built-in guardrails. It functions as a prompt injection detection system, data redaction tool, and red team testing tool, automatically removing personally identifiable information and protected health data from AI inputs and outputs while scanning image uploads with vision AI to detect visual prompt injection attacks before processing. The platform routes every prompt through a sequential pipeline of safety checks including injection detection, data redaction, and co
Learn_Prompting is an educational project focused on prompt engineering, providing the principles and techniques required to craft effective inputs and improve the quality of generative AI outputs. The project covers advanced prompting strategies to enhance reasoning, reliability, and output quality. This includes techniques for task decomposition, chain-of-thought reasoning, and the use of few-shot and zero-shot guidance. It also addresses model security through the study of prompt hacking, vulnerability analysis, and privacy auditing to prevent sensitive data leaks. The scope extends to th
📰 Latest News 📰 - 🗡️ What is HarmBench 🛡️ - 🌐 Overview 🌐 - ☕ Quick Start ☕ - ⚙️ Installation - 🛠️ Running the Evaluation Pipeline - ➕ Using your own models in HarmBench - ➕ Using your own red teaming methods in HarmBench - 🤗 Classifiers - ⚓ Documentation ⚓ - 🌱 HarmBench's Roadmap 🌱 -…
Cleverhans is an adversarial machine learning library and toolkit designed to generate adversarial examples, incorporate them into training loops, and benchmark the resilience of machine learning models. It provides a gradient-based attack framework for constructing both white-box and black-box attacks to identify model misclassifications. The project includes capabilities for model robustness benchmarking, allowing users to evaluate and verify how models resist evasion attacks and malicious input perturbations. It also facilitates adversarial training to increase a model's resistance to pert
Security toolkit for AI agents. Scan your machine for dangerous skills and MCP configs, monitor for supply chain attacks, test prompt injection resistance, and audit live MCP servers for tool poisoning.
A framework for Privacy Preserving Machine Learning
Human-in-the-loop approval system for AI agents. Agents request. Policies decide. Humans approve. Keep humans in control of what AI agents can do.
A Python package to assess and improve fairness of machine learning models.
IATelligence is a Python script that will extract the IAT of a PE file and request GPT to get more information about the API and the ATT&CK matrix related
Guardrails is a Python SDK that wraps calls to large language models with configurable validation pipelines, corrective actions, and structured output generation. It provides a unified API layer that connects to over 100 language models, applying consistent validation, streaming, and error-handling across providers. The framework validates and corrects model responses against safety and quality rules, detecting and mitigating risks in both inputs and outputs using pre-built and custom validators. The project distinguishes itself through a validator-pipeline architecture that sequentially appl
Interpret is an interpretable machine learning library and glassbox model framework. It provides toolkits for training inherently transparent models and applying post-hoc explanation techniques to make machine learning predictions human-understandable. The framework distinguishes itself by integrating differential privacy into the training of interpretable models to prevent sensitive data from leaking through explanations. It also features a visualization tool for rendering interactive decision paths and model behavior. The library covers model explainability through feature importance calcu
k8sgpt is a suite of Kubernetes-focused tools designed for AI-powered debugging, cluster diagnostics, and self-healing. It functions as an automated analyzer and debugger that uses large language models to explain cluster errors, suggest remediation steps, and identify resource failures. The project distinguishes itself through an extensible analysis framework that supports custom diagnostic plugins and a Model Context Protocol server, which exposes cluster diagnostics as tools for AI assistants. It includes a self-healing agent capable of automatically generating and applying fixes for detec
Shannon is an integrated security platform designed for autonomous penetration testing, static and dynamic analysis, and automated vulnerability remediation within self-hosted, private infrastructure. It functions as a unified security suite that orchestrates the entire lifecycle of vulnerability management, from initial discovery and reachability prioritization to the generation and verification of code-level patches. The platform distinguishes itself through its agentic approach to security, deploying autonomous agents to execute both black-box and white-box exploits against running applica