19 Repos
Tools for generating graphical representations of model evaluation metrics, including confusion matrices and loss curves, to diagnose classification performance.
Distinguishing note: None of the candidates are related to machine learning model evaluation; they refer to messaging servers, linear algebra, or security.
Explore 19 awesome GitHub repositories matching artificial intelligence & ml · Model Performance Visualizations. Refine with filters or upvote what's useful.
This project is a collection of educational examples and code for implementing deep learning architectures using the PyTorch framework. It serves as a tutorial and implementation guide for building various neural network architectures for machine learning tasks. The project provides practical implementations for computer vision, including image classification and neural style transfer, as well as natural language processing examples for building sequence models and language predictors. It also covers generative models using adversarial and variational networks to synthesize or transform visua
Generates graphical representations of loss curves and gradients to diagnose model convergence.
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
Generates comprehensive visualizations of model performance metrics, including loss curves and attention heatmaps.
FlameGraph is a performance profiling and visualization toolkit designed to identify bottlenecks in software execution. It functions as a processing engine that transforms raw stack trace samples into interactive, hierarchical diagrams. By representing aggregated execution frequency as nested rectangles, the tool allows developers to visualize hot code paths and analyze system behavior across both kernel and user-space environments. The project distinguishes itself through its ability to perform differential profile analysis, which highlights performance regressions or improvements by compari
Maps complex execution paths through hierarchical diagrams to identify bottlenecks and improve software stack performance.
PyCaret is a Python AutoML platform and MLOps lifecycle manager designed to automate machine learning workflows. It functions as a low-code environment that leverages a scikit-learn native engine to execute preprocessing, training, and evaluation for tabular data. The platform distinguishes itself as an LLM-powered ML copilot, using large language model agents to analyze datasets, design experiment configurations, and explain model results. It also serves as a Kubernetes ML orchestrator and model registry, enabling the versioning of trained pipelines and their promotion to production API endp
Generates diagnostic plots like ROC curves and confusion matrices to evaluate model behavior.
CatBoost is a gradient boosting machine learning library used to train decision tree ensembles for regression, classification, and ranking tasks. It functions as a high-performance framework that provides a categorical data processor for transforming non-numeric features, a distributed trainer for large-scale datasets, and GPU acceleration to speed up model construction. The library distinguishes itself through native handling of categorical data and text features, removing the need for manual encoding. It includes a specialized model interpretability tool that leverages SHAP values and featu
Generates graphical representations of the training process and model behavior to diagnose performance.
This is a scikit-learn automated machine learning framework designed to optimize model selection and hyperparameters. It functions as an automated model selector and hyperparameter optimization tool for classification and regression tasks, utilizing an automated ensemble builder to combine high-performing models for increased predictive accuracy. The system features a distributed search engine that uses Dask for parallel machine learning optimization across CPU cores or clusters. It implements a budget-based evaluation strategy through successive halving to prioritize promising model configur
Displays a ranked list of the best performing models discovered during the automation process.
OpenCompass is a comprehensive evaluation platform, benchmarking suite, and distributed model evaluator designed to measure the performance and accuracy of large language models. It provides a framework for benchmarking both open-source and API-based models against diverse datasets using standardized metrics and reproducible pipelines. The project features an automated judging framework that uses language models as judges to score and verify the quality of generated text. It includes a performance leaderboard system for comparing the relative capabilities of various models across industry-sta
Features a performance leaderboard system to compare the relative capabilities of open-source and proprietary models.
PostgresML is a machine learning database extension for PostgreSQL that integrates model training and inference directly into the database. It functions as an in-database AI platform and vector database, enabling the execution of large language models and natural language processing tasks on stored records without exporting data to external services. The system distinguishes itself by utilizing GPU acceleration to minimize latency during model predictions and employing a hybrid storage engine that maintains relational data alongside high-dimensional vectors. It allows for the building and fin
Ships a web-based interface for analyzing training data and visualizing model metrics to monitor accuracy.
ClearML is a comprehensive MLOps platform designed to manage the entire machine learning lifecycle. It functions as an experiment tracking tool, a data versioning system, and a pipeline orchestrator, while providing infrastructure for GPU cluster management and model serving. The platform is distinguished by its ability to handle hybrid-cloud compute scheduling and fractional GPU allocation, allowing multiple workloads to share a single hardware accelerator. It employs a metadata-based approach to data versioning, using virtual views to track large datasets and artifacts without duplicating r
Provides visual evaluation tools like confusion matrices and loss curves to analyze model accuracy and data distributions.
MMDetection3D is an open-source toolbox for 3D perception, providing a unified framework for detecting and segmenting objects in three-dimensional environments. It supports a range of core tasks including monocular 3D object detection from single camera images, LiDAR-based 3D object detection from raw point clouds, and multi-modal fusion that combines camera images with LiDAR data. The toolbox also covers point cloud semantic segmentation, assigning class labels to every point in a scan for scene understanding. The project distinguishes itself through a config-driven pipeline that orchestrate
Produce prediction files in the required format for submission to the nuScenes benchmark leaderboard.
Dieses Repository ist eine Sammlung geführter Tutorials zum Erstellen und Trainieren von Machine-Learning-Modellen unter Verwendung des TensorFlow-Frameworks. Es bietet praktische Anleitungen und Beispiele für die Implementierung einer Vielzahl von Modellarchitekturen zur Lösung von Datenvorhersage- und Analyseproblemen. Die Anleitungen decken den Aufbau von Feedforward-, Convolutional- und Recurrent-Neural-Networks zur Analyse komplexer Datenmuster ab. Sie enthalten spezifische Tutorials für unüberwachtes Lernen, wie Denoising-Autoencoder und Word-to-Vec-Embeddings, sowie Beispiele für das Training generativer gegnerischer Netzwerke (GANs) zur Synthese neuer Datenproben. Der Inhalt befasst sich zudem mit Modellmanagement, einschließlich Anweisungen zum Speichern und Wiederherstellen von Netzwerkgewichten, um den Trainingsfortschritt zu persistieren. Zusätzlich wird die Visualisierung von Trainingsmetriken und Rechengraphen zur Leistungsüberwachung behandelt.
Includes tutorials for visualizing model performance metrics and computational graphs.
MMF is a modular framework for building, training, and evaluating vision-and-language models. It provides a configuration-driven experiment system where model, dataset, and training parameters are defined through composable YAML files, alongside a curated model zoo of pretrained checkpoints for state-of-the-art multimodal architectures. The framework includes a multimodal dataset loader that downloads, processes, and batches vision-and-language data, and a vision-language model trainer supporting distributed training, mixed precision, and checkpoint-based resumption. The framework distinguish
Generates JSON-formatted predictions for question answering challenge leaderboard submissions.
This is an interactive notebook-based course that teaches machine learning from Python fundamentals through deep learning and natural language processing. It uses real datasets and multiple frameworks within a structured, hands-on curriculum that combines concise explanations with executable code cells, built-in datasets, and embedded exercise checkpoints. Learning progresses through data preparation and exploration, classical machine learning workflows, computer vision with convolutional neural networks, and natural language processing with deep learning, all delivered as a cohesive progressi
Implements graphical representations of model evaluation metrics, such as loss curves and confusion matrices, to diagnose performance.
Dieses Projekt ist ein umfassendes Bildungsprogramm und Deep-Learning-Framework, das darauf ausgelegt ist, praktisches Deep Learning mit PyTorch anhand von Notebooks und Codebeispielen zu vermitteln. Es dient als High-Level-Bibliothek zum Erstellen, Trainieren und Bereitstellen neuronaler Netze und fungiert als Modell-Trainings-Orchestrator, der PyTorch-Modelle, Optimierer und Verlustfunktionen koordiniert. Das Projekt bietet spezialisierte Toolkits für Computer Vision, Natural Language Processing und die Vorverarbeitung tabellarischer Daten. Es zeichnet sich durch fortschrittliche Trainingskontrollen aus, wie z. B. diskriminative Lernraten, ein Zwei-Wege-Callback-System zur Anpassung der Trainingslogik und eine High-Level-Learner-Abstraktion, die die Geräteplatzierung und Trainingsschleifen automatisiert. Das Framework deckt ein breites Fähigkeitsspektrum ab, einschließlich der automatisierten Konstruktion von Datenpipelines, der Analyse von Modellarchitekturen und der Leistungsbewertung bei Klassifizierungs-, Regressions- und Segmentierungsaufgaben. Es enthält zudem Dienstprogramme für verteiltes Training über mehrere GPUs, Mixed-Precision-Training zur Speicheroptimierung und spezialisierte Unterstützung für medizinische Bilddaten. Das Projekt wird als eine Reihe von Jupyter Notebooks bereitgestellt.
Generates graphical representations of loss curves and metrics for analyzing model performance in a dashboard.
This project is a comprehensive deep reinforcement learning course and training platform. It provides a structured educational curriculum that combines theoretical lessons with hands-on tutorials to teach the implementation of neural networks and agent behavior. The platform integrates a model sharing hub where users can upload, download, and version trained machine learning models. It also features a benchmarking system that uses leaderboards to evaluate and compare agent performance against community standards. The educational experience is delivered through interactive notebooks and inclu
Includes ranked leaderboards to track and compare the efficiency of different trained agents.
Dieses Projekt ist eine umfassende Lehrressource und ein Kurs zum Aufbau neuronaler Netze mit PyTorch. Es deckt die grundlegenden Bausteine des Deep Learning ab, einschließlich Tensor-Manipulation, automatischer Differenzierung und der Konstruktion modularer Komponenten für neuronale Netze. Das Repository dient als technischer Leitfaden für verschiedene spezialisierte Bereiche. Es bietet Implementierungsdetails für Computer-Vision-Aufgaben wie Bildklassifizierung, Objekterkennung und semantische Segmentierung sowie Workflows für die Verarbeitung natürlicher Sprache (NLP) mit Transformern, rekurrenten Netzen und generativen Modellen. Zudem enthält es eine Referenz für generative KI, mit Fokus auf die Synthese von Bildern mittels Diffusionsmodellen und adversarialen Netzwerken. Das Material erstreckt sich auf Modelloptimierung und Deployment-Pipelines. Es behandelt Techniken zur Reduzierung der Modellgröße und zur Erhöhung der Inferenzgeschwindigkeit durch Quantisierung und den Export von Modellen in Formate wie ONNX und TensorRT. Weitere Kompetenzbereiche umfassen Data Engineering für paralleles Laden, Modellevaluierung mittels benutzerdefinierter Metriken und das Deployment von Open-Source Large Language Models. Das Projekt wird primär als eine Reihe von Jupyter Notebooks bereitgestellt.
Includes tools for generating graphical representations of model evaluation metrics, such as confusion matrices and loss curves.
Dieses Projekt ist eine Sammlung von TensorFlow-Machine-Learning-Beispielen, die Referenzimplementierungen für verschiedene neuronale Netzwerkparadigmen bereitstellen. Es deckt überwachte (supervised), unüberwachte (unsupervised), verstärkende (reinforcement) und sequentielle Lernmodelle ab. Das Repository enthält Implementierungen für Convolutional Neural Networks (CNNs), die auf Bildklassifizierung und -ranking fokussiert sind, sowie Recurrent Neural Networks (RNNs) für Zeitreihenprognosen und Sequence-to-Sequence-Übersetzung. Es bietet zudem Beispiele für Reinforcement-Learning-Agenten, die durch Belohnungsoptimierung trainiert werden, sowie unüberwachte Lerntechniken wie Autoencoder und selbstorganisierende Karten für Daten-Clustering. Zusätzliche Funktionen decken überwachte Regression und Klassifizierung, semantische Embedding-Generierung und die Verwendung von Hidden-Markov-Modellen für sequentielle Datenmodellierung ab. Das Projekt enthält zudem Utilities für das Management von Tensor-Operationen und die Visualisierung der Modellleistung über Dashboards. Der Inhalt wird als eine Reihe von Jupyter Notebooks bereitgestellt.
Includes utilities for visualizing model evaluation metrics, training curves, and computation graphs via dashboards.
Yellowbrick is a machine learning visualization library and model diagnostic tool designed to analyze feature importance, target distributions, and model error metrics. It serves as a visual toolkit for diagnosing underfitting and overfitting through the use of validation and learning curves. The project provides specialized suites for evaluating predictive models and unsupervised learning. It enables the determination of optimal cluster counts via elbow methods and silhouette coefficients, and assesses classifier and regressor quality through ROC curves, confusion matrices, and residual plot
Generates diagnostic plots for precision, recall, and error metrics to evaluate machine learning estimators.
SwanLab ist eine Open-Source-Plattform für das Tracking von Machine-Learning-Experimenten und ein Observability-Tool. Es bietet ein zentrales Dashboard zum Protokollieren von Trainingsmetriken, Hyperparametern und Hardware-Performance, um das Training von KI-Modellen zu überwachen und zu analysieren. Die Plattform zeichnet sich durch ihren Fokus auf selbst gehostete Infrastruktur aus, die es Benutzern ermöglicht, private Instanzen via Docker oder Kubernetes für eine sichere On-Premises-Datenkontrolle bereitzustellen. Sie enthält zudem spezialisierte Dienstprogramme zur Migration historischer Experiment-Logs und zur Synchronisierung von Echtzeit-Metriken aus externen Tools wie MLflow. Das System deckt ein breites Spektrum an Funktionen ab, einschließlich Multi-Modal-Media-Logging für 3D-Punktwolken und audiovisuelle Assets, Echtzeit-Hardware-Monitoring für GPUs und CPUs sowie vergleichende Analysen durch Side-by-Side-Visualisierungen von Runs. Es unterstützt das Tracking von verteiltem Training über Multi-GPU-Cluster und lässt sich in Frameworks wie PyTorch Lightning, Ray, XGBoost und LightGBM integrieren. Die administrative Verwaltung erfolgt über eine Kombination aus webbasiertem Dashboard und einer Command-Line-Interface (CLI) zur Verwaltung von Workspaces, Projekten und Benutzerberechtigungen.
Renders training data through interactive charts and ROC curves to visually evaluate model performance.