6 Repos
Comprehensive toolkits for understanding complex model decisions.
Distinguishing note: Broader than specific attribution methods; covers the entire interpretability workflow.
Explore 6 awesome GitHub repositories matching artificial intelligence & ml · Model Interpretability Frameworks. Refine with filters or upvote what's useful.
SHAP is an explainable AI toolkit that provides a game theoretic framework for interpreting machine learning model predictions. It functions as a feature attribution engine, decomposing model outputs into the sum of individual feature effects to clarify how specific input variables influence a final decision. By assigning importance values to these inputs, the library enables users to understand the logic behind complex predictive models. The project distinguishes itself through its versatility and specialized calculation methods. It operates as a model-agnostic diagnostic library, capable of
Provides a framework for understanding how complex predictive models reach specific decisions.
pykan is a library for implementing Kolmogorov-Arnold Networks, replacing fixed node activation functions with learnable spline functions located on the network edges. It serves as an interpretable AI framework and symbolic regression tool designed to derive transparent mathematical rules from complex data. The project focuses on converting learned numerical functions into human-readable symbolic expressions through library matching and formula conversion. It utilizes additive-compositional topologies and learnable piecewise polynomial segments to approximate non-linear mappings. The framewo
Provides a toolkit for pruning and sparsification to derive transparent rules from complex models.
This project is an agnostic model interpretability framework and explainability tool designed to provide local interpretable explanations for individual predictions. It functions as a local surrogate model that approximates the behavior of any machine learning classifier or regression model to identify the most influential features for a specific instance. The framework is designed to be model-agnostic, meaning it can explain predictions across tabular, text, and image data regardless of the underlying architecture. It employs local linear approximations and feature importance visualization t
Provides a comprehensive framework for generating local interpretable explanations for predictions across diverse data types.
Captum is an open-source library for explaining model predictions by attributing them to input features, neurons, and layers using gradient-based and perturbation-based methods. It provides a modular framework for implementing, evaluating, and combining a range of explanation techniques, including gradient-based attribution, perturbation-based analysis, game-theoretic Shapley value approximation, and surrogate model explanations, with support for parallelization and noise stabilization. The library distinguishes itself through its breadth of attribution methods and its support for advanced in
Provides a generic framework for implementing, benchmarking, and sharing new attribution methods.
Lit is a machine learning interpretability framework and model debugging tool designed to analyze model behavior and performance. It serves as an interpretability dashboard for large language models and a general performance analyzer for text, image, and tabular datasets. The project distinguishes itself through a comprehensive suite of interpretability tools, including salience map generation for feature attribution, the creation of synthetic and counterfactual examples to test robustness, and the projection of high-dimensional embeddings into visual spaces via UMAP or PCA. It further enable
Implements a comprehensive toolkit for understanding complex model decisions via feature attribution and decision boundary exploration.
Transformers-interpret ist eine Diagnosebibliothek, die für die Interpretierbarkeit von Transformer-basierten Machine-Learning-Modellen entwickelt wurde. Sie fungiert als Attributions-Framework, das den Beitrag einzelner Eingabe-Token zu den endgültigen Vorhersagen eines Modells quantifiziert, was es Benutzern ermöglicht, Entscheidungsmuster zu prüfen und NLP-Aufgaben zu debuggen. Die Bibliothek nutzt Gradienten-basierte Analyse und Hook-basierte Introspektion, um nachzuvollziehen, wie spezifische Eingabefeatures die Modellausgaben beeinflussen. Durch die Abbildung abstrakter numerischer Attributions-Scores auf menschenlesbare linguistische Einheiten bietet sie einen klaren Einblick in die Textverarbeitung von Modellen. Das Framework unterstützt gezielte Analysen, die es Benutzern ermöglichen, Vorhersagen für bestimmte Klassen zu erklären oder paarweise Eingabebeziehungen zu untersuchen. Über die Kern-Attribution hinaus enthält das Tool Visualisierungsfunktionen, die grafische und tabellarische Darstellungen der Feature-Wichtigkeit generieren. Diese Ausgaben helfen bei der Verifizierung, dass Modelle auf relevanten Daten basieren und nicht auf unbeabsichtigten Mustern, was ein tieferes Verständnis des Modellverhaltens über verschiedene Transformer-Architekturen hinweg ermöglicht.
Provides a toolkit for analyzing feature importance and decision patterns in deep learning architectures by quantifying token contributions.