30 open-source projects similar to fairlearn/fairlearn, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Fairlearn alternative.
A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
Algorithms for explaining machine learning models
Source code/webpage/demos for the What-If Tool
Interpret is an interpretable machine learning library and glassbox model framework. It provides toolkits for training inherently transparent models and applying post-hoc explanation techniques to make machine learning predictions human-understandable. The framework distinguishes itself by integrating differential privacy into the training of interpretable models to prevent sensitive data from leaking through explanations. It also features a visualization tool for rendering interactive decision paths and model behavior. The library covers model explainability through feature importance calcu
Training PyTorch models with differential privacy
A framework for Privacy Preserving Machine Learning
Guardrails is a Python SDK that wraps calls to large language models with configurable validation pipelines, corrective actions, and structured output generation. It provides a unified API layer that connects to over 100 language models, applying consistent validation, streaming, and error-handling across providers. The framework validates and corrects model responses against safety and quality rules, detecting and mitigating risks in both inputs and outputs using pre-built and custom validators. The project distinguishes itself through a validator-pipeline architecture that sequentially appl
Garak is a suite of tools for measuring AI reliability, scanning for vulnerabilities, and automating security assessments through adaptive probing. It functions as a generative AI vulnerability scanner and evaluation tool designed to identify security gaps, hallucinations, and failure modes in language models. The framework provides a toolkit for red-teaming and safety assessments, utilizing a structured system of probes and detectors to calculate failure rates. It specifically scans for risks such as data leakage and prompt injection by recording model responses to adversarial inputs. The p
Lightly is a self-supervised learning framework and computer vision data curation tool designed to manage large image datasets and train models on unlabeled data. It functions as a PyTorch vision library and dataset management SDK, providing tools to convert raw images into high-dimensional vectors for similarity search, visualization, and feature extraction. The project implements a variety of self-supervised architectures, including MoCo, SimCLR, VICReg, Barlow Twins, and masked image modeling. It distinguishes itself by combining these learning frameworks with active learning capabilities,
🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
This project is an agnostic model interpretability framework and explainability tool designed to provide local interpretable explanations for individual predictions. It functions as a local surrogate model that approximates the behavior of any machine learning classifier or regression model to identify the most influential features for a specific instance. The framework is designed to be model-agnostic, meaning it can explain predictions across tabular, text, and image data regardless of the underlying architecture. It employs local linear approximations and feature importance visualization t
Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
NeMo-Guardrails is a toolkit for adding programmable safety constraints and dialogue boundaries to large language model conversational systems. It functions as security middleware that intercepts inputs and outputs to block prompt injections, jailbreaks, and sensitive data leaks, while providing a conversational dialogue manager to define structured interaction flows through configuration files. The framework includes a hallucination filter to screen model outputs for factual accuracy and a specialized modeling language for defining conversational flows and constraints. It provides capabiliti
The evaluation dataset data/samples-1680.jsonl.gz is the test set used in the following paper:
Status: Archive (code is provided as-is, no updates expected)
PySyft is a privacy-preserving machine learning framework and remote computation engine. It functions as a decentralized data analysis orchestrator that allows for the execution of data science workflows on remote servers without requiring the transfer of raw private data from the host device. The platform provides a secure collaboration environment where data owners manage permissions and authorize specific collaborators to run computations. It differentiates its workflow by utilizing mock data for local development and validation before submitting final analysis jobs to private remote serve
LLM Guard is a security firewall and guardrail framework designed to scan and sanitize inputs and outputs for large language models. It functions as a proxy gateway and security layer to block prompt injections, toxicity, and sensitive data leakage while ensuring that model interactions remain compliant with organizational policies. The system distinguishes itself through a modular scanner pipeline that utilizes local model orchestration to eliminate external network dependencies. It supports real-time security filtering via streaming chunk analysis and implements a fail-fast execution model
Captum is an open-source library for explaining model predictions by attributing them to input features, neurons, and layers using gradient-based and perturbation-based methods. It provides a modular framework for implementing, evaluating, and combining a range of explanation techniques, including gradient-based attribution, perturbation-based analysis, game-theoretic Shapley value approximation, and surrogate model explanations, with support for parallelization and noise stabilization. The library distinguishes itself through its breadth of attribution methods and its support for advanced in
SHAP is an explainable AI toolkit that provides a game theoretic framework for interpreting machine learning model predictions. It functions as a feature attribution engine, decomposing model outputs into the sum of individual feature effects to clarify how specific input variables influence a final decision. By assigning importance values to these inputs, the library enables users to understand the logic behind complex predictive models. The project distinguishes itself through its versatility and specialized calculation methods. It operates as a model-agnostic diagnostic library, capable of
SHAP is a machine learning explainer that uses a game-theoretic framework to estimate the contribution of each feature to a model prediction. It provides a set of tools for quantifying how individual input features push a specific output away from a baseline value. The project includes specialized explainers for different architectures, including high-speed implementations for decision trees and ensemble models, linearization algorithms for deep learning networks, and covariance integration for linear models. It also features a model-agnostic interpretability tool that uses a kernel method to
Library for training machine learning models with privacy for training data
The Adversarial Robustness Toolbox (ART) is an open-source library that provides a unified framework for evaluating, defending, and certifying machine learning models against adversarial threats. It wraps models from any framework behind a common estimator interface, enabling composable pipelines for attack generation, defense application, robustness certification, and privacy auditing across evasion, poisoning, and extraction threats. The library distinguishes itself by covering the full adversarial ML security lifecycle within a single toolkit. It supports gradient-based adversarial example
Interpretability and explainability of data and machine learning models
Quantus is an eXplainable AI toolkit for responsible evaluation of neural network explanations
Updated the multilingual model weights used by Detoxify with a model trained on the translated data from the 2nd Jigsaw challenge (as well as the 1st). This model has also been trained to minimise bias and now returns the same categories as the unbiased model. New best AUC score on the test set:…
📰 Latest News 📰 - 🗡️ What is HarmBench 🛡️ - 🌐 Overview 🌐 - ☕ Quick Start ☕ - ⚙️ Installation - 🛠️ Running the Evaluation Pipeline - ➕ Using your own models in HarmBench - ➕ Using your own red teaming methods in HarmBench - 🤗 Classifiers - ⚓ Documentation ⚓ - 🌱 HarmBench's Roadmap 🌱 -…
Perspective is an API that uses machine learning models to score the perceived impact a comment might have on a conversation. See https://developers.perspectiveapi.com for more information.