19 Repos
Python-based modules for managing hardware, data, and model execution pipelines.
Distinguishing note: Focuses on the Python ecosystem for machine learning development.
Explore 19 awesome GitHub repositories matching artificial intelligence & ml · Python Machine Learning Libraries. Refine with filters or upvote what's useful.
Modular is a unified machine learning development platform designed for building, compiling, and deploying high-performance neural network models. It provides a comprehensive execution engine that supports both local and production-grade inference, enabling developers to manage the entire model lifecycle from initial architecture definition to scalable, containerized service deployment. The platform distinguishes itself through a hardware-agnostic runtime that abstracts diverse silicon architectures, allowing models to execute efficiently across varied compute environments. It includes a spec
Provides a comprehensive suite of programming modules to manage hardware drivers, inference engines, and neural network layers.
This repository serves as a comprehensive collection of reference implementations for the PyTorch machine learning library. It provides practical examples for building, training, and deploying deep learning models, functioning as a toolkit for developers to explore neural network architectures and training workflows. The project distinguishes itself by offering concrete demonstrations of complex machine learning operations, ranging from computer vision tasks like object detection and depth estimation to the training of large-scale transformer models. These examples illustrate how to implement
Provides a comprehensive collection of reference implementations for building, training, and deploying deep learning models using the PyTorch framework.
Datasets is a library designed for the management, processing, and sharing of large-scale data collections for machine learning workflows. It functions as both a data processing framework and a versioning platform, providing tools to organize, filter, and transform massive datasets while ensuring reproducibility across research and development teams. The library distinguishes itself by enabling the handling of datasets that exceed available system memory. It utilizes memory-mapped file access, disk-based caching, and lazy iterative streaming to maintain performance when working with large-sca
Provides a specialized Python library for accessing, sharing, and processing large-scale machine learning datasets.
This project is a Python-based educational framework designed to simulate reinforcement learning algorithms and environments. It serves as a platform for reproducing classic textbook examples, allowing users to study agent behavior, policy improvement, and the fundamental mechanics of decision-making in controlled settings. The library provides implementations for core reinforcement learning concepts, including temporal difference learning, Monte Carlo episode sampling, and tabular value function approximation. It enables the analysis of specific algorithmic behaviors, such as identifying and
Provides a Python-based library for simulating reinforcement learning environments and agent-based decision-making.
dlib is a C++ machine learning toolkit and data analysis framework. It provides a collection of algorithms and utilities for building predictive modeling applications and performing statistical analysis on large datasets within native C++ environments. The project functions as a binding library that wraps low-level C++ machine learning algorithms into high-level Python scripting interfaces. This allows for the integration of high-performance native implementations with Python for machine learning development. The framework covers the implementation of predictive models, the execution of mach
Integrates high-performance C++ implementations into the Python ecosystem for streamlined machine learning development.
This project is an educational resource providing practical code examples and implementations of machine learning algorithms using the Python language. It serves as a guide for constructing predictive pipelines, clustering models, and dimensionality reduction within the Scikit-Learn ecosystem. The repository includes comprehensive demonstrations for supervised and unsupervised learning, as well as detailed examples for implementing neural networks and deep architectures. It also provides practical guidance on exporting model parameters to JSON and wrapping trained models in web APIs for produ
Offers a wide collection of Python-based implementations for supervised and unsupervised learning algorithms.
PRML is a Python machine learning library and statistical learning toolkit. It provides code implementations of supervised and unsupervised learning concepts, including regression, classification, and neural network algorithms for statistical data modeling. The project functions as a pattern recognition toolkit used to identify theoretical structures within numerical datasets. It includes a neural network framework for solving nonlinear data mappings and a linear algebra toolkit that utilizes vectorized operations and matrix calculations. The library covers a broad range of capabilities, inc
Provides a Python-based collection of regression, classification, and neural network algorithms.
This is a Python automated machine learning framework designed to automate the design and optimization of machine learning pipelines. It functions as a genetic programming pipeline optimizer and an automated feature selection tool, using evolutionary search to discover the most effective sequences of data processing and model steps. The project focuses on multi-objective optimization to balance competing performance metrics simultaneously. It employs a genetic selection process to identify impactful variables and remove noise from raw datasets, ensuring the resulting machine learning solution
Provides a Python-based framework for the automated design and optimization of ML pipelines.
Metaflow is a Python machine learning framework and MLOps workflow orchestrator designed to manage the lifecycle of data pipelines from local prototyping to production. It serves as a distributed compute manager and an experiment tracking system, enabling the creation of reproducible pipelines that transition between development and high-availability production environments. The framework distinguishes itself through an integrated checkpointing system that automatically persists intermediate data artifacts to remote storage, allowing failed runs to be resumed from the last successful step. It
Provides a Python-based framework for building and managing reproducible machine learning pipelines.
This project is a collection of foundational machine learning algorithms and data science tools implemented in Python. It focuses on building the logic of these tools using basic programming primitives rather than relying on specialized libraries. The implementation covers several core domains, including a linear algebra library for matrix and vector operations, a statistical analysis toolkit for probability and hypothesis testing, and a framework for map-reduce distributed processing. It also includes implementations for natural language processing, graph theory for network analysis, and var
Implements foundational machine learning algorithms and data science tools from scratch using Python.
This is a Python machine learning library featuring a collection of core algorithms implemented from scratch to demonstrate foundational AI concepts. It provides a comprehensive toolkit for supervised learning, unsupervised learning, and neural network development. The project is distinguished by its custom implementation of a neural network framework, which includes multi-layer perceptrons with backpropagation, gradient descent, and weight regularization. It also includes a specialized anomaly detection toolkit that identifies outliers and rare events using Gaussian probability distributions
A comprehensive collection of core machine learning algorithms implemented from scratch in Python.
Featuretools is an automated feature engineering library and data transformation framework written in Python. It automatically generates machine learning feature vectors from multi-table datasets by applying synthesis patterns to relational and timestamped data. The system functions as a distributed feature synthesis engine, allowing the process of creating feature vectors to scale across multiple cores or clusters to handle large-scale datasets. The library supports the synthesis of multi-table datasets, time series feature generation, and the creation of custom machine learning primitives
Provides a Python-based toolkit for generating predictive feature vectors from raw timestamped data.
imbalanced-learn is a dataset balancing framework and Python machine learning extension designed to resample training data and reduce the impact of class imbalance. It provides a toolkit of algorithms for adjusting class distributions to improve model performance on minority class prediction. As a scikit-learn resampling library, it extends the ecosystem with specialized tools for balancing datasets through over-sampling and under-sampling techniques. This allows for the correction of skewed class proportions to reduce model bias toward the majority class. The library implements the scikit-l
Extends the scikit-learn ecosystem with advanced sampling methods to mitigate dataset bias.
Swift for TensorFlow is a custom toolchain that extends the Swift language with first-class automatic differentiation and differentiable types, enabling gradient-based computation directly within the compiler. It integrates the Swift compiler with TensorFlow runtime and XLA backends, allowing tensor operations to be compiled and executed on hardware-accelerated hardware for high-performance machine learning. The project distinguishes itself through compiler-integrated automatic differentiation that computes gradients of user-defined functions and types during compilation, eliminating the need
Enables Swift code to call Python ML libraries directly, reusing existing machine learning tooling.
mlxtend is a pure Python machine learning extension library that provides additional tools for association rule mining, ensemble learning, and feature selection. It is built on numpy and pandas, with all data operations accepting and returning pandas DataFrames, and custom estimators inherit from scikit-learn’s base classes to offer a uniform fit-predict interface compatible with grid search. The library implements the Apriori algorithm for mining frequent itemsets from transaction data and generating association rules with confidence and lift metrics. For classification, it combines multiple
Pure Python library extending scikit-learn with tools for association rules, feature selection, and ensemble learning.
Arduino CLI ist eine Befehlszeilenschnittstelle zum Kompilieren, Hochladen und Verwalten von Bibliotheken und Board-Cores für Arduino-kompatible Hardware. Es fungiert als Toolchain-Manager für Mikrocontroller und Hardware-Programmiertool und bietet Dienstprogramme zum Suchen und Installieren der Plattformdefinitionen und Compiler, die für verschiedene Hardware-Architekturen erforderlich sind. Das Projekt bietet einen RPC-fähigen Entwicklungs-Daemon, der es externen Programmen ermöglicht, interne Tool-Operationen auszulösen und automatisierte Elektronik-Workflows programmatisch zu verwalten. Es enthält zudem einen interaktiven Hardware-Debugger für die Echtzeit-Fehlersuche bei Code, der auf physischen Geräten ausgeführt wird. Das Tool deckt die Board- und Bibliotheksverwaltung ab, einschließlich Abhängigkeitsanalyse, Index-Synchronisierung und der Integration von Drittanbieter-Repositories. Seine Build- und Deployment-Funktionen umfassen Quellcode-Kompilierung, Binär-Uploads und Bootloader-Installation, unterstützt durch die Überwachung der seriellen Ausgabe und die Erkennung angeschlossener Hardware. Das Dienstprogramm bietet Shell-Vervollständigung und verwaltet Umgebungseinstellungen über ein dateibasiertes Konfigurationssystem.
Displays available example sketches for libraries to demonstrate practical hardware implementation.
mlfinlab ist eine Python-Bibliothek für Machine Learning im Finanzwesen, die für den Aufbau und die Validierung von Modellen im quantitativen Handel und Portfoliomanagement entwickelt wurde. Sie bietet ein Toolkit für Financial Data Engineering und ein Framework für das Backtesting quantitativer Strategien, um Rohmarktdaten in prädiktive Signale und Zielklassen umzuwandeln. Die Bibliothek enthält einen Generator für synthetische Finanzdaten, um künstliche Datensätze zu erstellen, die die statistischen Eigenschaften realer Assets für Stresstests nachahmen. Zudem bietet sie spezialisierte Tools für das Labeling und Sampling von Finanzzeitreihen, um Datenlecks in nicht-stationären Märkten zu verhindern. Das Projekt deckt ein breites Spektrum quantitativer Fähigkeiten ab, darunter Feature Engineering, Analyse der Asset-Kodependenz für die Portfoliodiversifizierung und risikoadjustierte Positionsgrößenbestimmung für die Kapitalallokation. Darüber hinaus bietet es Hilfsmittel zur Modelloptimierung durch Clustering und Kreuzvalidierung, um die Robustheit von Handelsstrategien zu bewerten.
Serves as a specialized Python library for building and validating machine learning models tailored to quantitative trading.
Dieses Projekt ist eine Python-Bibliothek für maschinelles Lernen und ein Data-Science-Toolkit, das für den Aufbau prädiktiver Modelle und die Analyse komplexer Datensätze entwickelt wurde. Es bietet eine Sammlung von Implementierungen für gängige überwachte und unüberwachte Algorithmen unter Verwendung des Scikit-Learn-Frameworks. Das Toolkit enthält eine Suite für prädiktive Modellierung zur Generierung von Vorhersagen aus historischen Daten und ein statistisches Analyse-Framework zur Anwendung von Bayes-Modellierung und Kausalitätstests. Es bietet zudem eine Datenvisualisierungssuite basierend auf Matplotlib zum Rendern statischer Diagramme und Grafiken, um Klassifikatorgrenzen und Datentrends zu interpretieren. Das Projekt deckt Daten-Clustering-Workflows zur Identifizierung von Mustern und Segmenten, explorative Datenanalyse und die Vorverarbeitung von Daten unter Verwendung von Pandas und NumPy ab.
Provides a comprehensive collection of machine learning algorithms and data science tools implemented in Python.
Machine-Learning-From-Scratch ist ein Bildungs-Repository, das Implementierungen grundlegender Machine-Learning-Modelle mit Standard-Python-Logik bereitstellt. Es dient als Ressource zum Verständnis der internen Mechanismen gängiger statistischer und prädiktiver Algorithmen, indem diese von Grund auf neu konstruiert werden, anstatt sich auf High-Level-Machine-Learning-Frameworks zu verlassen. Das Projekt zeichnet sich durch die Priorisierung von Transparenz im algorithmischen Design aus und nutzt mathematische Primitive sowie vektorisierte Array-Berechnungen, um die zugrunde liegende Analysis und statistische Logik offenzulegen. Durch die Strukturierung von Lerntechniken als modulare, unabhängige Komponenten ermöglicht das Repository die isolierte Untersuchung iterativer Trainingsschleifen und gradientenbasierter Optimierungsprozesse. Diese Sammlung deckt ein breites Spektrum an Data-Science-Techniken ab und konzentriert sich auf die manuelle Implementierung von Kernprozessen und Modelltrainingsverfahren. Das Repository wurde entwickelt, um die Kompetenzentwicklung im Bereich Data Science zu unterstützen, indem es demonstriert, wie prädiktive Modelle durch grundlegende Programmierung und analytische Praktiken funktionieren.
Ships educational implementations of popular algorithms using standard Python logic to explain internal model behavior.