30 open-source projects similar to pair-code/lit, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Lit alternative.
AutoGluon is an automated machine learning framework and multimodal library designed to automate the end-to-end pipeline from data preprocessing to high-accuracy model training and validation. It functions as an automated model trainer for tabular, image, text, and time series data, as well as a tool for time series forecasting and foundation model finetuning. The project is distinguished by its ability to jointly process and fuse different data types, allowing for the construction of multimodal neural networks that integrate images, text, and structured tables. It supports zero-shot inferenc
This is an interactive notebook-based course that teaches machine learning from Python fundamentals through deep learning and natural language processing. It uses real datasets and multiple frameworks within a structured, hands-on curriculum that combines concise explanations with executable code cells, built-in datasets, and embedded exercise checkpoints. Learning progresses through data preparation and exploration, classical machine learning workflows, computer vision with convolutional neural networks, and natural language processing with deep learning, all delivered as a cohesive progressi
This repository is a comprehensive educational program and deep learning framework designed to teach practical deep learning using PyTorch through notebooks and code examples. It serves as a high-level library for building, training, and deploying neural networks, acting as a model training orchestrator that coordinates PyTorch models, optimizers, and loss functions. The project provides specialized toolkits for computer vision, natural language processing, and tabular data preprocessing. It distinguishes itself through advanced training controls such as discriminative learning rates, a two-w
This project is a comprehensive educational resource and technical manual focused on interpretable machine learning and explainable AI. It serves as a textbook and reference for implementing techniques that make complex machine learning models transparent and understandable to humans. The resource provides guidance on both building inherently transparent models, such as decision trees and sparse linear models, and applying post-hoc explanation methods to black-box systems. It details specific methodologies for quantifying feature importance, generating rationales for individual predictions, a
This project is a computer vision training pipeline and image classification framework. It provides a workflow for preparing custom image datasets and fine-tuning pre-trained neural networks to recognize user-defined categories. The system includes a model interpretability toolkit that generates saliency maps to highlight influential image regions and uses dimensionality reduction to project high-dimensional semantic features into 2D or 3D visualizations. The framework covers the full lifecycle of model development, including dataset preparation with proportional class splitting, performance
Lightly is a self-supervised learning framework and computer vision data curation tool designed to manage large image datasets and train models on unlabeled data. It functions as a PyTorch vision library and dataset management SDK, providing tools to convert raw images into high-dimensional vectors for similarity search, visualization, and feature extraction. The project implements a variety of self-supervised architectures, including MoCo, SimCLR, VICReg, Barlow Twins, and masked image modeling. It distinguishes itself by combining these learning frameworks with active learning capabilities,
This project is an object detection evaluation library and benchmarking tool designed to calculate precision, recall, and average precision for computer vision models. It provides a suite of utilities for parsing bounding box coordinates from text files and calculating spatial overlap to determine detection accuracy. The toolkit features a command line interface for comparing ground truth files against model predictions. It includes a precision-recall curve generator to visualize the relationship between precision and recall across different confidence thresholds and an intersection over unio
TransformerLens is a library for mechanistic interpretability research designed to reverse engineer the learned algorithms within large language models. It provides a standardized framework for wrapping diverse transformer architectures, allowing researchers to extract, manipulate, and analyze internal activations and weights through a consistent interface. The project distinguishes itself through a comprehensive system of activation hooks that can capture, patch, and ablate internal tensors during the forward pass. It includes specialized utilities for decomposing fused projections, material
This project is a machine learning educational curriculum and learning platform delivered through interactive Jupyter Notebooks. It serves as a comprehensive guide for mastering the Python data science toolkit, providing structured tutorials for numerical computing, tabular data manipulation, and statistical visualization. The curriculum includes specific implementation guides for Scikit-Learn and a practical course on TensorFlow for constructing, training, and deploying neural networks and computer vision models. It covers the end-to-end process of building predictive models, from initial pr
This repository is a collection of implementation references and solved notebooks covering supervised, unsupervised, and reinforcement learning techniques. It provides practical guides for building predictive models, clustering algorithms, and autonomous agents. The project includes specific implementations for neural network architectures, such as multi-layer perceptrons for digit recognition, and recommender systems using collaborative and content-based filtering. It also features reinforcement learning systems that utilize deep Q-learning to optimize decision-making policies. The codebase
SmolLM is a project dedicated to the development of small language models. It focuses on training and fine-tuning compact models that maintain high performance while utilizing fewer parameters. The project emphasizes efficient AI inference and on-device text generation, aiming to enable the deployment of lightweight models on edge devices with limited memory and processing power. It utilizes synthetic data generation to produce artificial datasets that improve the reasoning and training of these AI systems. The system supports a variety of optimization and training capabilities, including we
nlp-recipes is a collection of implementation guides and reference templates for applying natural language processing techniques to real-world tasks. It provides standardized workflows and code examples for developing NLP pipelines, from dataset preparation and model training to performance evaluation. The project focuses on the practical application of transformer-based models, offering patterns for fine-tuning pretrained architectures for tasks such as text classification, named entity recognition, and question answering. It also includes a toolkit for model interpretability, allowing users
Arize Phoenix is an LLM observability platform and evaluation framework designed to capture execution traces and monitor large language model applications. It serves as a prompt management system for versioning and testing templates, and as a self-hosted AI operations infrastructure for managing telemetry and experiments. The platform differentiates itself through a specialized embedding visualization tool used to detect data drift and optimize vector search. It provides a comprehensive evaluation suite that utilizes judge-based evaluators and ground-truth datasets to score model outputs, and
This project serves as an educational and practical resource for mastering machine learning workflows using Python. It provides a comprehensive collection of code examples and exercises designed to guide users through the implementation of predictive systems, ranging from fundamental algorithms to deep learning architectures. The repository distinguishes itself by offering a structured approach to both classical machine learning and neural network training. It covers the full lifecycle of model development, including the orchestration of reusable data transformation pipelines, advanced ensemb
Orange3 is a visual data mining platform that provides an interactive canvas for building data analysis workflows without writing code. At its core, it offers a widget-based visual programming environment where users connect configurable components to perform data preprocessing, machine learning model training, statistical evaluation, and interactive visualization. The platform is built on NumPy-backed data tables with domain descriptors that define variable names, types, and roles, and includes a lazy SQL query proxy for working with database tables without loading all data into memory. The
OpenCompass is an open-source framework for standardized benchmarking of large language models. It provides a configurable evaluation pipeline that supports both objective and subjective assessment, using a dual-engine architecture to handle closed-form answer comparison and open-ended response rating. The framework is designed as a modular platform where datasets, models, and metrics are composed through declarative YAML configuration files. The framework distinguishes itself through its extensible model integration layer, which supports custom models, HuggingFace models, and third-party API
MMF is a modular framework for building, training, and evaluating vision-and-language models. It provides a configuration-driven experiment system where model, dataset, and training parameters are defined through composable YAML files, alongside a curated model zoo of pretrained checkpoints for state-of-the-art multimodal architectures. The framework includes a multimodal dataset loader that downloads, processes, and batches vision-and-language data, and a vision-language model trainer supporting distributed training, mixed precision, and checkpoint-based resumption. The framework distinguish
This project is an agnostic model interpretability framework and explainability tool designed to provide local interpretable explanations for individual predictions. It functions as a local surrogate model that approximates the behavior of any machine learning classifier or regression model to identify the most influential features for a specific instance. The framework is designed to be model-agnostic, meaning it can explain predictions across tabular, text, and image data regardless of the underlying architecture. It employs local linear approximations and feature importance visualization t
Interpret is an interpretable machine learning library and glassbox model framework. It provides toolkits for training inherently transparent models and applying post-hoc explanation techniques to make machine learning predictions human-understandable. The framework distinguishes itself by integrating differential privacy into the training of interpretable models to prevent sensitive data from leaking through explanations. It also features a visualization tool for rendering interactive decision paths and model behavior. The library covers model explainability through feature importance calcu
FiftyOne is a visual tool for curating, analyzing, and managing image and video datasets for machine learning model training. It serves as a platform for identifying annotation errors, refining ground truth labels, and evaluating vision model performance by comparing predictions against ground truth to identify failure modes. The system functions as a containerized data platform that supports team collaboration on large-scale visual datasets in a cloud environment. It includes specialized capabilities for exploring high-dimensional embeddings to discover data clusters and retrieve correspondi
This project is a PyTorch person re-identification framework designed for training and evaluating models that identify individuals across different camera views. It provides a complete model training pipeline, a deep learning feature extractor for converting images into numeric vectors, and a suite of computer vision benchmarking tools to measure identity retrieval accuracy. The framework includes a specialized transfer learning toolkit that supports layer freezing, staged learning rate optimization, and differential learning rates for fine-tuning pretrained models. It distinguishes itself th
Evidently is an AI observability platform and evaluation framework designed to quantify the performance of machine learning models and large language models. It functions as a monitoring tool for detecting data drift and quality degradation in tabular datasets, while providing a specialized analyzer for the faithfulness and correctness of retrieval augmented generation systems. The project distinguishes itself through an evaluation framework that utilizes judge models and custom rubrics to score language model outputs. It includes tools for iterative prompt optimization and the generation of
RecBole is a PyTorch-based recommendation framework designed for building, training, and evaluating a wide variety of recommendation algorithms. It serves as a standardized benchmark environment that allows for the comparison of different model architectures using public datasets and consistent evaluation metrics. The project provides specialized toolkits for sequential recommendation and knowledge-graph integration, enabling the prediction of item sequences based on user history or the incorporation of structured external knowledge. It includes a dedicated hyperparameter optimization engine
This project is an automated prompt engineering and optimization tool designed to iteratively create, test, and refine prompts using a language model to improve output quality. It functions as a framework for generating candidate prompts and ranking their performance through correctness matching and ELO-based ratings. The system includes capabilities for model distillation, generating high-quality example pairs from frontier models to create training data for smaller models. It also provides tools to condense prompts for smaller models and transform instruction-tuned prompts into completion-b
This is a PyTorch CNN visualization toolkit designed for neural network interpretability. It provides a set of tools to explain model decisions and analyze the internal behavior of convolutional neural networks through the visualization of activations, gradients, and filters. The project implements specialized techniques for synthesizing representative images, including Deep Dream optimizations to amplify patterns and class-specific image generation via input optimization. It also features a saliency map generator that produces gradient-based heatmaps to identify the specific image regions in
Neuralforecast is a neural time series forecasting library designed to predict future values for one or multiple series using deep learning architectures. It functions as a distributed machine learning forecasting framework that enables the training of global models across multiple time series to improve generalization through cross-learning. The project distinguishes itself as a probabilistic forecasting toolkit that produces uncertainty intervals and probability distributions rather than single point estimates. It also includes a hierarchical forecast reconciler to ensure that predictions a
This project is a distributed machine learning platform and sparse deep learning framework designed for training and serving models with high-dimensional sparse data. It functions as an online model serving infrastructure and recommendation system engine, enabling real-time item retrieval and scoring using deep tree matching and neural networks. The system distinguishes itself through a multi-task learning framework that optimizes multiple objective functions within a shared representation space. It features a specialized online serving infrastructure that supports dynamic model hot-loading a
This project is a structured learning curriculum and technical reference for mastering deep learning with TensorFlow. It provides a comprehensive guide for building, training, and deploying neural networks, combining theoretical fundamentals with practical implementation examples. The repository distinguishes itself by covering the end-to-end machine learning workflow, from low-level tensor mathematics and linear algebra to the creation of complex model architectures. It includes specific guidance on developing data pipelines for diverse data types, such as images, text, and time-series seque
Deepagents is an LLM agent orchestration platform and stateful application server designed for deploying and managing AI agents built with computational graphs. It provides a containerized runtime environment that handles agent execution, state persistence, and the versioning of AI assistants. The platform distinguishes itself through deep integration with the Model Context Protocol, allowing agents to function as servers that expose tools and capabilities to external clients. It features a sophisticated observability suite for capturing execution traces, performing LLM-based evaluations agai
Edward is a probabilistic programming language and inference engine designed for building deep generative models and Bayesian neural networks. It utilizes the TensorFlow framework to represent probabilistic models as differentiable computational graphs. The library enables the construction of complex data distributions through Bayesian neural networks, mixture models, and Gaussian processes. It differentiates itself by providing an integrated toolkit for both supervised and unsupervised probabilistic modeling, including the implementation of generative adversarial networks and mixture density