30 open-source projects similar to alirezadir/production-level-deep-learning, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Production Level Deep Learning alternative.
This repository is a collection of Jupyter notebooks providing reference implementations and templates for building, training, and deploying machine learning models using Amazon SageMaker. It serves as an example library for implementing model architectures and automating the machine learning lifecycle. The library provides practical patterns for machine learning training, data engineering, and model deployment. It includes implementation guides for MLOps, including workflows for model monitoring, lineage tracking, and hyperparameter tuning. The examples cover a broad range of capabilities i
This project is a structured educational program and comprehensive training curriculum designed to teach the end-to-end lifecycle of machine learning models. It serves as a resource for engineers to master the transition of data science projects from development into reliable, production-ready systems. The curriculum focuses on the practical application of engineering best practices, emphasizing the orchestration of complex data processing and training sequences. It provides instruction on building repeatable workflows, managing experiment metadata, and implementing infrastructure automation
Seldon Core is a Kubernetes-based machine learning model server and MLOps inference framework. It functions as a multi-model serving engine and pipeline orchestrator, packaging models as scalable microservices that are exposed via standardized REST and gRPC APIs. The project distinguishes itself through graph-based inference pipelines that chain models and data transformers into sequential workflows. It optimizes hardware utilization via multi-model shared serving and dynamic memory overcommit strategies, while supporting production experimentation through weighted traffic routing, A/B testin
ClearML is a comprehensive MLOps platform designed to manage the end-to-end machine learning lifecycle, from initial experimentation to production deployment. It provides a suite of integrated tools including a pipeline orchestrator for automating workflows, an experiment tracking tool for logging hyperparameters and metrics, and a metadata-driven data versioning system for managing large-scale datasets and model artifacts. The platform is distinguished by its advanced compute management and serving capabilities. It features a GPU compute manager that supports fractional resource slicing and
ai-edu is a comprehensive AI education curriculum and machine learning courseware collection. It provides theoretical tutorials, deep learning lab exercises, and project blueprints designed to teach artificial intelligence fundamentals through a combination of study and practical implementation. The project focuses on a learning-by-doing approach, guiding users from Python programming and neural network basics to advanced topics. It includes specialized instructional content on distributed AI training, MLOps educational guides for model quantization and pruning, and detailed frameworks for im
This is a reference guide for designing, deploying, and maintaining production-ready machine learning systems, grounded in MLOps best practices. It covers the complete machine learning lifecycle, from system design and workflow planning through to deployment and ongoing maintenance, with a focus on reliability, scalability, and maintainability as business requirements evolve. The guide provides an architecture reference for establishing shared ML infrastructure, including model registries and feature stores that standardize asset reuse across teams. It details pipeline automation through conf
h2o-llmstudio is a language model training framework that provides a no-code graphical interface for fine-tuning large language models on custom datasets. It functions as a specialized tool for managing the training lifecycle, from configuring hyperparameters to monitoring performance metrics. The project distinguishes itself through a multi-GPU training orchestrator that distributes workloads via data parallel processing and a low-rank adaptation tool for memory-efficient fine-tuning. It also includes a model evaluation dashboard featuring an interactive chat interface to verify conversation
Cube Studio is a cloud-native MLOps platform and Kubernetes-based AI orchestrator designed for the entire machine learning lifecycle. It provides a distributed training framework for large-scale model fine-tuning, a GPU resource manager for hardware virtualization, and an ML pipeline orchestrator that uses visual directed acyclic graphs to manage end-to-end workflows. The platform distinguishes itself through its specialized LLM inference server, which supports retrieval-augmented generation and the construction of private knowledge bases. It features a dedicated system for supervised fine-tu
This project is a comprehensive framework for the training, fine-tuning, and deployment of large language models. It functions as a distributed deep learning platform that enables users to scale model workflows across multiple hardware nodes while providing tools for model evaluation and performance benchmarking. The platform distinguishes itself by offering specialized utilities for model compression and weight transformation, allowing users to reduce memory footprints and latency through quantization and pruning. It supports the adaptation of large models for consumer-grade hardware, facili
Ludwig is a multimodal machine learning platform and low-code framework designed for building, training, and deploying neural networks. It enables the construction of models that process text, images, audio, and tabular data through a unified interface using declarative configuration files rather than custom code. The system features a specialized low-code framework for large language models, supporting supervised fine-tuning, preference alignment, and a constrained decoding tool to force structured data output via logit extraction. It also includes an automated model architecture search to i
ZenML is an extensible machine learning orchestration framework designed to manage the end-to-end lifecycle of data pipelines and AI agent workflows. It functions as a durable orchestrator that executes machine learning tasks as directed acyclic graphs, ensuring that every step is containerized for consistent performance across local, cloud, and hybrid infrastructure. By decoupling pipeline code from underlying compute and storage backends, the platform allows developers to define infrastructure-agnostic stacks that remain portable across diverse environments. The project distinguishes itself
Apache MXNet is a deep learning framework and distributed machine learning library designed for training and deploying neural networks across distributed systems, mobile devices, and hardware accelerators. It functions as a cross-platform runtime and a dynamic dataflow scheduler that optimizes neural network execution. The framework provides a multi-language API, enabling the development of machine learning models using Python, R, Julia, Scala, Go, and JavaScript. It supports high-performance model training and the scaling of workloads across multiple GPUs and machines. The system covers cap
MLOps-Basics is a collection of implementation guides and blueprints for automating the machine learning lifecycle. It provides practical workflows for managing the transition of models from training to production deployment, focusing on the integration of operational tools into the machine learning pipeline. The project features specific architectural patterns for deploying containerized models using serverless infrastructure and cloud registries. It includes frameworks for tracking large datasets and model artifacts via remote storage, as well as guides for converting models into standardiz
This project is a PyTorch implementation of a text-to-image transformer. It is a generative AI model designed to map discrete text tokens to image pixels using a transformer network to create visual content from textual descriptions. The system utilizes a discrete VAE image encoder to compress visual data into tokens for transformer processing. It supports classifier-free guidance to adjust the influence of text prompts during inference and includes capabilities for ranking generated images based on their similarity to text prompts. The architecture incorporates sparse attention mechanisms a
AISystem is a comprehensive AI full-stack infrastructure project covering the entire pipeline from AI chip architecture to high-level training frameworks. It encompasses the development of AI compiler frameworks, inference engines, and distributed training orchestrators designed to coordinate workloads across a heterogeneous compute stack of CPUs, GPUs, and NPUs. The project focuses on the deep integration of software and hardware, employing software-hardware co-design to align tensor layouts with physical memory structures. It provides specialized capabilities for accelerating Transformer mo
This project is a CUDA programming course and technical guide focused on writing and optimizing GPU kernels for hardware acceleration. It provides structured learning resources for using the CUDA platform to execute operations on silicon architectures. The material covers the optimization of linear algebra kernels and the analysis of machine learning deployment. It includes guidance on identifying acceleration tools, mapping the deep learning ecosystem, and evaluating the frameworks used to move models from research to production environments. The scope extends to GPU performance optimizatio
This project is a PyTorch model serving framework designed to deploy and scale machine learning models in production via scalable network endpoints. It functions as a high-performance inference server, optimizer, and model lifecycle manager that handles model loading, request batching, and hardware acceleration. The system distinguishes itself through advanced orchestration and optimization capabilities, such as chaining multiple models into sequential workflows using execution graphs and employing dynamic batching to improve throughput and latency. It provides specialized support for generat
KServe is an open platform for deploying and serving generative and predictive AI models on Kubernetes. It defines inference services as custom resources with declarative YAML specifications, enabling a Kubernetes-native approach to model deployment and lifecycle management. The platform leverages Knative-based serverless scaling for automatic scale-to-zero and revision management, and supports a pluggable serving runtime architecture that maps model formats to containerized execution environments. KServe distinguishes itself through model-aware autoscaling that scales replicas based on token
Amazon DSSTNE is a machine learning toolkit and sparse tensor network library designed for deep learning models with sparse inputs and outputs. It provides a model-parallel training framework and a GPU-accelerated sparse engine to support memory-intensive networks. The framework is specifically designed for recommendation system training and large-scale sparse learning. It enables the distribution of large weight matrices and embedding tables across multiple GPU devices to handle models that exceed the memory capacity of a single processor. The project covers a broad range of capabilities in
This project is a collection of optimized scripts, deployment patterns, and reference implementations designed for scaling and accelerating state-of-the-art AI models. It serves as a multi-domain model zoo and a distributed training framework, providing PyTorch reference implementations for training and deploying models on GPU-accelerated infrastructure. The repository distinguishes itself through an optimization suite focused on NVIDIA GPU hardware, utilizing automatic mixed precision and specialized math modes to increase training speed and throughput. It provides enterprise deployment patt
DeepSpeed is a distributed deep learning optimization library and framework designed for the training and inference of massive AI models. It serves as a model parallelism orchestrator and a toolkit for scaling large language models across multiple GPUs and compute nodes. The project distinguishes itself through 3D parallelism orchestration, which combines data, pipeline, and tensor parallelism. It utilizes ZeRO-based memory partitioning to eliminate redundant storage and employs CPU-offload memory management to move weights and optimizer states to system RAM. Additionally, it provides special
ColossalAI is a distributed deep learning framework designed for training and deploying massive artificial intelligence models across clusters of hardware accelerators. It functions as a parallel computing engine that partitions model workloads and data across multiple processors to maximize memory efficiency and throughput. The platform distinguishes itself through a comprehensive suite of parallelization strategies, including multi-dimensional tensor parallelism and pipeline-based model parallelism, which segment neural network layers and stages across devices. To support large-scale genera
LiteRT is a runtime and API for executing machine learning and generative AI models on mobile, desktop, and IoT hardware. It consists of an inference engine and a specialized environment for running quantized large language and diffusion models locally on edge hardware. The system includes an ahead-of-time model compiler that translates models into hardware-specific bytecode to reduce startup latency and memory overhead. It provides a unified interface for Neural Processing Units with automatic fallback routing to CPUs or GPUs when specific subgraph support is unavailable. An edge model conve
Metaflow is a Python machine learning framework and MLOps workflow orchestrator designed to manage the lifecycle of data pipelines from local prototyping to production. It serves as a distributed compute manager and an experiment tracking system, enabling the creation of reproducible pipelines that transition between development and high-availability production environments. The framework distinguishes itself through an integrated checkpointing system that automatically persists intermediate data artifacts to remote storage, allowing failed runs to be resumed from the last successful step. It
Anomalib is a PyTorch-based library for visual anomaly detection, offering a modular framework, a comprehensive model zoo, and a benchmarking suite designed for industrial defect detection. It provides a wide range of algorithms—including generative, discriminative, teacher-student, and vision-language approaches—that support unsupervised, few-shot, and zero-shot settings. The library enables deployment through model export to ONNX and OpenVINO for edge devices, and includes a no-code web application for training and inference. It also features a command-line interface for orchestrating multi
MiniCPM-V is a multimodal large language model and vision-language system designed for complex visual and linguistic understanding. It functions as an on-device AI model, providing the capacity to process text, images, and video as a compact neural network. The project is specifically developed as an edge AI framework, utilizing quantization and weight sharding to run on memory-constrained mobile chipsets. This allows for the deployment of multimodal intelligence directly on mobile operating systems for local inference. Its capabilities cover multimodal content analysis of high-resolution im
mistral.rs is an inference engine for large language models that runs locally and exposes models behind OpenAI and Anthropic-compatible APIs. It serves as a multi-model serving platform, capable of loading several models in a single server process with per-request routing and on-demand loading and unloading. The engine supports multimodal inference, processing text alongside images, video, audio, and speech inputs, and includes a quantized model deployment runtime that reduces memory use and speeds up inference on consumer hardware. The project distinguishes itself through an agentic tool exe
xtuner is a comprehensive training engine for large language models, offering a toolkit for pre-training, supervised fine-tuning, and the optimization of vision-language multimodal models. It serves as a distributed training accelerator and a specialized framework for scaling Mixture-of-Experts models and aligning model behavior through reinforcement learning from human feedback. The project distinguishes itself through advanced memory and compute optimizations, such as sequence parallelism for ultra-long context windows and interleaved pipeline parallelism to reduce GPU idle time. It provide
pysheeet is a technical reference library providing a curated collection of code snippets and implementation patterns for advanced Python development, system integration, and high-performance computing. It serves as a comprehensive guide for implementing low-level network programming, native C extensions, and asynchronous and concurrent programming. The project provides specialized frameworks for the development and deployment of large language models, including tools for distributed GPU inference and high-performance serving. It also includes detailed patterns for high-performance computing