32 repos
Platforms and techniques for deploying, optimizing, and serving machine learning models for production use.
Explore 32 awesome GitHub repositories matching artificial intelligence & ml · Model Inference and Serving. Refine with filters or upvote what's useful.
TensorFlow is a comprehensive machine learning framework designed for the construction, training, and deployment of complex mathematical models. It utilizes a graph-based execution model that represents operations as directed acyclic graphs, enabling automatic differentiation and efficient parallel processing. The syst
Optimizes execution performance by setting specific model weights to zero through target-aware authoring and specialized kernels.
Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering
Extends standard library functionality with specialized loaders for device mapping, quantization, and custom attention backends.
Immich is a self-hosted media management platform designed to provide a centralized, private repository for photos and videos. It functions as a comprehensive system for organizing, backing up, and viewing personal media collections across mobile devices, web browsers, and external storage locations. By maintaining ful
Processes machine learning tasks using externalized models and thread pools to optimize performance for image and text analysis.
OpenCV is a comprehensive computer vision library designed for real-time performance and cross-platform deployment. It provides a native execution environment that leverages multi-threaded operations and automated memory management to handle intensive computational tasks, including image processing and machine learning
Executes pre-trained neural networks to perform classification, detection, and segmentation tasks on visual data.
This project serves as a centralized directory and interoperability hub for the Model Context Protocol, providing a curated collection of standardized service connectors that bridge artificial intelligence models with external software, databases, and APIs. It facilitates the integration of AI agents with diverse ecosy
Delivers standardized interfaces for agents to control desktop environments, manage windows, and simulate user input.
Deep-Live-Cam is a generative video transformation tool designed for real-time facial manipulation and cinematic enhancement. It functions as a local-first AI runtime, performing all media processing directly on the user's hardware to ensure complete data privacy without external network dependencies. By utilizing a hi
Executes deep learning models directly on hardware-specific providers to minimize latency.
The Model Context Protocol is a standardized communication framework designed to connect language models to external data sources, functional tools, and interactive user interfaces. It provides a vendor-neutral interface layer that enables AI hosts to discover and execute capabilities across heterogeneous service envir
Creates a unified interface layer that enables seamless interaction between diverse AI clients and backend service providers.
Browser-use is a framework for building autonomous agents that navigate, interact with, and extract data from web interfaces using natural language instructions. By acting as an orchestration layer between large language models and browser automation protocols, it enables the execution of complex, multi-step workflows
Manages settings and parameters for integrating specific generative AI models into browser-based automation workflows.
GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a compreh
Executes quantized language models using optimized C++ tensor computation libraries for local CPU and GPU hardware.
Zed is an AI-native, high-performance code editor designed for extreme responsiveness and keyboard-centric workflows. It functions as an extensible text processing workspace that integrates autonomous agents and predictive models directly into the development environment to automate complex engineering tasks, refactori
Bridge local development sessions with remote cloud-based intelligence providers to access sophisticated code completion and analysis capabilities.
The algorithm is a distributed recommendation engine pipeline designed to construct and serve personalized content timelines. It functions as a multi-stage orchestration layer that aggregates candidate content from diverse social graphs and high-dimensional embedding spaces, processing user interaction data to deliver
Deploys predictive models to score content relevance and user engagement probabilities in real-time.
Stable Diffusion is a generative machine learning pipeline that synthesizes high-resolution visual content by performing iterative denoising within a compressed latent space. By mapping natural language embeddings into pixel outputs through conditioned probabilistic processes, the framework enables the generation of im
Coordinates model loading, hardware acceleration, and output processing to streamline production-ready inference.
This project is a comprehensive, community-driven directory of machine learning resources, software libraries, and educational materials. It serves as a centralized knowledge base for developers and researchers, organizing tools and frameworks by their primary programming language and technical domain to simplify disco
Facilitates collaborative model training and analytics across decentralized data sources using unified architectural frameworks.
PaddleOCR is a comprehensive optical character recognition framework designed for detecting and transcribing text from images and documents into structured, machine-readable formats. It provides a modular computer vision pipeline that decouples image preprocessing, text detection, and character recognition into indepen
Abstracts execution logic to allow seamless model operation across diverse CPU, GPU, and mobile hardware backends.
vLLM is a high-throughput inference engine designed for the efficient serving and execution of large language models. It functions as a production-ready distributed model server, providing standard API protocols for online serving while also supporting offline batch processing. The system is built to maximize token gen
Maximizes token generation speed and memory efficiency when serving large language models to multiple concurrent users.
This project is a comprehensive educational resource and knowledge base dedicated to the development and application of large language models and autonomous agentic systems. It provides a structured framework for understanding prompt engineering, context management, and the architectural patterns required to build task
Reviews high-performance infrastructure solutions designed to minimize latency and maximize throughput for model inference.
This project provides a self-hosted, web-based interface designed to integrate large language models into academic and research workflows. It functions as a modular platform for document analysis, literature processing, and data handling, allowing users to maintain full control over their data and model connectivity th
Manage and serve large language models on local hardware through a unified, web-based interface.
OpenHands is an autonomous agent framework designed for software engineering workflows. It provides a modular platform for orchestrating AI agents that reason, plan, and execute tasks within isolated, containerized development environments. By integrating with standard version control and development tools, the system
Routes requests to different language models based on performance, cost, or capability requirements to optimize agent execution.
LlamaFactory is a unified framework for fine-tuning and adapting large language models. It provides a comprehensive platform that standardizes training workflows across diverse machine learning architectures, allowing users to execute both full-tuning and parameter-efficient methods through a single interface. The pro
Exposes trained models via standardized network protocols to facilitate scalable and reliable prediction services.
This project provides a unified interface for interacting with a wide range of artificial intelligence services, acting as a central orchestration layer for text and image generation. It standardizes access to diverse AI backends, allowing developers to integrate multiple language and vision models through a single, co
Routes incoming requests through a unified interface that dynamically selects and cycles between multiple third-party service providers.