15 个仓库
The process of provisioning cloud infrastructure specifically to host AI models as reachable API endpoints.
Distinct from Cloud Deployment: Specializes general cloud deployment for the specific purpose of AI model inference hosting.
Explore 15 awesome GitHub repositories matching devops & infrastructure · Model Endpoint Deployment. Refine with filters or upvote what's useful.
This repository is a collection of Jupyter notebooks providing reference implementations and templates for building, training, and deploying machine learning models using Amazon SageMaker. It serves as an example library for implementing model architectures and automating the machine learning lifecycle. The library provides practical patterns for machine learning training, data engineering, and model deployment. It includes implementation guides for MLOps, including workflows for model monitoring, lineage tracking, and hyperparameter tuning. The examples cover a broad range of capabilities i
Hosts trained models as persistent REST endpoints for real-time requests or via large-scale batch transform jobs.
This project is a structured learning curriculum and technical reference for mastering deep learning with TensorFlow. It provides a comprehensive guide for building, training, and deploying neural networks, combining theoretical fundamentals with practical implementation examples. The repository distinguishes itself by covering the end-to-end machine learning workflow, from low-level tensor mathematics and linear algebra to the creation of complex model architectures. It includes specific guidance on developing data pipelines for diverse data types, such as images, text, and time-series seque
Provides methods for hosting saved models as reachable API endpoints for external clients.
This project is an AI-powered IDE extension and LLM coding assistant that provides a conversational interface for generating, refactoring, and debugging code. It functions as an AI agent framework and a Model Context Protocol client, connecting AI models to external data sources and tools to automate complex development tasks. The system is distinguished by its use of autonomous AI agents capable of multi-step task execution, including the ability to read files, modify code, and run terminal commands iteratively. It supports recursive agent orchestration through subagent delegation and employ
Provisions cloud hosting environments to deploy AI models as accessible endpoints for web API interaction.
Oumi is a comprehensive large language model development platform designed for synthesizing data, fine-tuning models, and running performance evaluations. It serves as a unified environment for the entire model lifecycle, encompassing a training and fine-tuning suite, an evaluation framework, and tools for synthetic data generation and model distillation. The platform is distinguished by its iterative, failure-driven synthesis approach, which analyzes model weaknesses during evaluation to generate targeted training data. It utilizes an LLM-based judge framework to programmatically score respo
Provides mechanisms to export trained models and provision cloud infrastructure to host them as reachable API endpoints.
Flyte is a Kubernetes-based machine learning orchestrator and containerized pipeline manager designed for coordinating AI workflows and data pipelines. It functions as an engine for defining and executing resilient pipelines, utilizing a data lineage tracker to maintain immutable execution states and ensure reproducible outputs. The platform distinguishes itself by packaging individual tasks into separate containers to ensure dependency isolation and environment consistency. It provides specialized capabilities for machine learning, including the transformation of trained models into scalable
Transforms trained machine learning models into scalable API endpoints for production serving.
nlp-recipes is a collection of implementation guides and reference templates for applying natural language processing techniques to real-world tasks. It provides standardized workflows and code examples for developing NLP pipelines, from dataset preparation and model training to performance evaluation. The project focuses on the practical application of transformer-based models, offering patterns for fine-tuning pretrained architectures for tasks such as text classification, named entity recognition, and question answering. It also includes a toolkit for model interpretability, allowing users
Deploys trained NLP models as scalable web services via cloud-hosted API endpoints.
FARA is a visual computer-use agent model that controls a browser by predicting screen coordinates for clicking, typing, and scrolling, without relying on DOM or accessibility trees. It is designed to automate multi-step web tasks such as searching, form filling, booking, and shopping by reasoning over visual state and decomposing tasks into sequential actions. The model uses a compact 7-billion-parameter decoder-only transformer that can run on consumer GPUs for low-latency on-device inference, or be deployed as a managed endpoint on Azure Foundry for cloud-based inference without local infr
Deploys a computer-use model via Azure Foundry endpoint without managing infrastructure or downloading weights.
本项目是一系列 PyTorch 深度学习课程,包含实践项目和编程练习。它专注于实现神经网络架构和模型训练,以解决复杂的数据问题。 该仓库包含一个计算机视觉项目套件,用于构建图像分类器、自动编码器和风格迁移应用。它具有用于创建合成图像的生成对抗网络(GAN)实验室,以及用于将预训练权重适配到新任务的迁移学习实现。 代码库涵盖了使用循环神经网络和词嵌入进行自然语言处理的序列数据分析。其他功能包括图像数据预处理、模型性能评估以及将训练好的模型部署到云基础设施。 这些材料以一系列 Jupyter Notebook 的形式提供。
Provides instructions for hosting trained models as reachable API endpoints on cloud infrastructure.
Text Embeddings Inference 是一个高性能推理服务器,旨在将文本嵌入和序列分类模型托管为可扩展的 API 端点。它提供了一个向量嵌入 API,用于将文本转换为密集表示,以及一个用于根据查询对文档序列的相关性进行评分的交叉编码器(Cross-Encoder)重排序服务器。 该项目具有 GPU 加速的推理引擎,利用动态批处理和专用内核来最大化吞吐量。它通过 gRPC 提供高性能二进制接口作为标准 HTTP 的替代方案,以减少网络延迟和序列化开销。 该系统涵盖了广泛的功能,包括文档相似度排名、多语言文本重排序以及用于预测类别或情感的序列分类。它支持多种部署环境,从无服务器自动扩展容器到离线(Air-gapped)安装。 硬件加速适用于 NVIDIA GPU、AMD GPU 和 Apple Metal。
Creates hosted environments with specific hardware accelerators and runtime configurations for model inference.
This project is an educational resource and engineering guide for building, deploying, and optimizing large language model applications and production pipelines. It serves as a blueprint for cloud AI infrastructure, providing a framework for orchestrating inference endpoints, data warehouses, and scalable production environments. The repository provides specific implementation patterns for retrieval augmented generation to ground model responses in external data. It includes a training workflow for crawling, structuring, and processing datasets to facilitate model fine-tuning, alongside an ev
Provides a blueprint for provisioning cloud infrastructure to host AI models as reachable API endpoints.
whisper-jax 是使用 JAX 框架重写的 Whisper 自动语音识别模型的高性能实现。它专为加速推理而设计,并使用 XLA 编译来优化硬件加速器上的模型执行。 该项目专注于 TPU 优化的转录,以实现高吞吐量和速度。它包括一个权重转换流水线,将预训练的模型参数从 PyTorch 转换为 JAX 兼容的数组。 该系统支持将音频转录为文本、跨多种语言翻译语音以及生成音频时间戳。它支持批量音频处理,并通过数据并行批处理和模型并行张量分区来扩展性能。 该项目提供了一种将转录模型部署为带有 Web 界面的远程推理端点的方法。
Enables deployment of the transcription model as a remote inference endpoint with a web interface.
KoboldAI-Client 是一个用于与大型语言模型交互的基于 Web 的界面和工具包。它充当用于故事创作和对话式 AI 的本地 AI 文本生成器,为托管在本地硬件或云端环境中的模型提供前端。 该系统包括一个角色管理器,使用外部模块和软提示(soft-prompting)来引导 AI 响应特定的角色和写作风格。它还提供了一个 API 包装器,暴露了一个标准化的、兼容 OpenAI 的 REST API,允许外部应用与托管模型进行通信。 该平台支持多种写作、游戏和聊天机器人交互模式,并包括沙盒脚本以自动化数据处理并过滤模型输入和输出。部署选项范围从私有本地执行到容器化云 GPU 环境。
Provides tools for provisioning cloud infrastructure to host AI models as reachable API endpoints.
Riffusion-hobby 是一款生成式 AI 工具,通过 Stable Diffusion 生成频谱图图像并将其转换为可播放的音频来创作音乐。它作为一个频谱图音频合成器,利用深度学习将基于图像的声音频率表示转换为音频文件。 该项目作为 AI 音乐推理服务器运行,提供基于 Web 的 API 端点,用于从文本提示词和种子图像生成音频。它还包括用于执行音乐生成任务、配置扩散模型以进行自动化音频创作的命令行界面,以及用于操作声音表示的实时音频生成器。 该系统涵盖了广泛的功能,包括云模型部署、远程推理托管以及用于图像转音频转换的数字信号处理。它还提供了一个交互式 Web 游乐场,用于试验模型参数和探索音乐生成设置。
Provisions cloud infrastructure to host AI models as reachable API endpoints.
This project is an educational course and learning curriculum for implementing and fine-tuning transformer models using the Hugging Face ecosystem. It serves as a structured guide and technical walkthrough for processing multimodal data, adapting pre-trained neural networks, and deploying models. The material includes a guide for managing, versioning, and distributing model weights and datasets through a centralized asset hub. It also provides a practical tutorial on adapting models to specific datasets using parameter-efficient methods and an implementation guide for solving natural language
Provides instructions on hosting models as reachable API endpoints on optimized infrastructure.
SmolLM is a project dedicated to the development of small language models. It focuses on training and fine-tuning compact models that maintain high performance while utilizing fewer parameters. The project emphasizes efficient AI inference and on-device text generation, aiming to enable the deployment of lightweight models on edge devices with limited memory and processing power. It utilizes synthetic data generation to produce artificial datasets that improve the reasoning and training of these AI systems. The system supports a variety of optimization and training capabilities, including we
Deploys, pauses, and deletes model endpoints using managed or custom Docker images.