15 repositorios
The process of provisioning cloud infrastructure specifically to host AI models as reachable API endpoints.
Distinct from Cloud Deployment: Specializes general cloud deployment for the specific purpose of AI model inference hosting.
Explore 15 awesome GitHub repositories matching devops & infrastructure · Model Endpoint Deployment. Refine with filters or upvote what's useful.
This repository is a collection of Jupyter notebooks providing reference implementations and templates for building, training, and deploying machine learning models using Amazon SageMaker. It serves as an example library for implementing model architectures and automating the machine learning lifecycle. The library provides practical patterns for machine learning training, data engineering, and model deployment. It includes implementation guides for MLOps, including workflows for model monitoring, lineage tracking, and hyperparameter tuning. The examples cover a broad range of capabilities i
Hosts trained models as persistent REST endpoints for real-time requests or via large-scale batch transform jobs.
This project is a structured learning curriculum and technical reference for mastering deep learning with TensorFlow. It provides a comprehensive guide for building, training, and deploying neural networks, combining theoretical fundamentals with practical implementation examples. The repository distinguishes itself by covering the end-to-end machine learning workflow, from low-level tensor mathematics and linear algebra to the creation of complex model architectures. It includes specific guidance on developing data pipelines for diverse data types, such as images, text, and time-series seque
Provides methods for hosting saved models as reachable API endpoints for external clients.
This project is an AI-powered IDE extension and LLM coding assistant that provides a conversational interface for generating, refactoring, and debugging code. It functions as an AI agent framework and a Model Context Protocol client, connecting AI models to external data sources and tools to automate complex development tasks. The system is distinguished by its use of autonomous AI agents capable of multi-step task execution, including the ability to read files, modify code, and run terminal commands iteratively. It supports recursive agent orchestration through subagent delegation and employ
Provisions cloud hosting environments to deploy AI models as accessible endpoints for web API interaction.
Oumi is a comprehensive large language model development platform designed for synthesizing data, fine-tuning models, and running performance evaluations. It serves as a unified environment for the entire model lifecycle, encompassing a training and fine-tuning suite, an evaluation framework, and tools for synthetic data generation and model distillation. The platform is distinguished by its iterative, failure-driven synthesis approach, which analyzes model weaknesses during evaluation to generate targeted training data. It utilizes an LLM-based judge framework to programmatically score respo
Provides mechanisms to export trained models and provision cloud infrastructure to host them as reachable API endpoints.
Flyte is a Kubernetes-based machine learning orchestrator and containerized pipeline manager designed for coordinating AI workflows and data pipelines. It functions as an engine for defining and executing resilient pipelines, utilizing a data lineage tracker to maintain immutable execution states and ensure reproducible outputs. The platform distinguishes itself by packaging individual tasks into separate containers to ensure dependency isolation and environment consistency. It provides specialized capabilities for machine learning, including the transformation of trained models into scalable
Transforms trained machine learning models into scalable API endpoints for production serving.
nlp-recipes is a collection of implementation guides and reference templates for applying natural language processing techniques to real-world tasks. It provides standardized workflows and code examples for developing NLP pipelines, from dataset preparation and model training to performance evaluation. The project focuses on the practical application of transformer-based models, offering patterns for fine-tuning pretrained architectures for tasks such as text classification, named entity recognition, and question answering. It also includes a toolkit for model interpretability, allowing users
Deploys trained NLP models as scalable web services via cloud-hosted API endpoints.
FARA is a visual computer-use agent model that controls a browser by predicting screen coordinates for clicking, typing, and scrolling, without relying on DOM or accessibility trees. It is designed to automate multi-step web tasks such as searching, form filling, booking, and shopping by reasoning over visual state and decomposing tasks into sequential actions. The model uses a compact 7-billion-parameter decoder-only transformer that can run on consumer GPUs for low-latency on-device inference, or be deployed as a managed endpoint on Azure Foundry for cloud-based inference without local infr
Deploys a computer-use model via Azure Foundry endpoint without managing infrastructure or downloading weights.
Este proyecto es una colección de material didáctico de deep learning con PyTorch, que consiste en proyectos prácticos y ejercicios de programación. Se centra en la implementación de arquitecturas de redes neuronales y el entrenamiento de modelos para resolver problemas complejos de datos. El repositorio incluye una suite de proyectos de visión artificial para construir clasificadores de imágenes, autoencoders y aplicaciones de transferencia de estilo. Cuenta con un laboratorio de redes generativas antagónicas (GAN) para crear imágenes sintéticas e implementaciones específicas para transfer learning, adaptando pesos preentrenados a nuevas tareas. El código base cubre el análisis de datos secuenciales para procesamiento de lenguaje natural utilizando redes neuronales recurrentes y word embeddings. Las capacidades adicionales incluyen el preprocesamiento de datos de imagen, la evaluación del rendimiento del modelo y el despliegue de modelos entrenados en infraestructura cloud. Los materiales se presentan como una serie de Jupyter Notebooks.
Provides instructions for hosting trained models as reachable API endpoints on cloud infrastructure.
Text Embeddings Inference es un servidor de inferencia de alto rendimiento diseñado para alojar modelos de embedding de texto y clasificación de secuencias como endpoints de API escalables. Proporciona una API de embedding vectorial para convertir texto en representaciones densas y un servidor de reranking (re-clasificación) cross-encoder para puntuar la relevancia de secuencias de documentos frente a una consulta. El proyecto cuenta con un motor de inferencia acelerado por GPU que utiliza procesamiento por lotes dinámico y kernels especializados para maximizar el rendimiento. Ofrece una interfaz binaria de alto rendimiento a través de gRPC como alternativa al HTTP estándar para reducir la latencia de red y la sobrecarga de serialización. El sistema cubre una amplia gama de capacidades, incluyendo el ranking de similitud de documentos, el reranking de texto multilingüe y la clasificación de secuencias para predecir categorías o sentimientos. Admite diversos entornos de despliegue, desde contenedores de auto-escalado serverless hasta instalaciones aisladas (air-gapped). La aceleración por hardware está disponible para GPUs NVIDIA, GPUs AMD y Apple Metal.
Creates hosted environments with specific hardware accelerators and runtime configurations for model inference.
This project is an educational resource and engineering guide for building, deploying, and optimizing large language model applications and production pipelines. It serves as a blueprint for cloud AI infrastructure, providing a framework for orchestrating inference endpoints, data warehouses, and scalable production environments. The repository provides specific implementation patterns for retrieval augmented generation to ground model responses in external data. It includes a training workflow for crawling, structuring, and processing datasets to facilitate model fine-tuning, alongside an ev
Provides a blueprint for provisioning cloud infrastructure to host AI models as reachable API endpoints.
whisper-jax es una implementación de alto rendimiento del modelo de reconocimiento automático de voz Whisper, reescrita utilizando el framework JAX. Está diseñada para una inferencia acelerada y utiliza la compilación XLA para optimizar la ejecución del modelo en aceleradores de hardware. El proyecto se centra en la transcripción optimizada para TPU para lograr un alto rendimiento y velocidad. Incluye un pipeline de traducción de pesos que convierte los parámetros del modelo preentrenado de PyTorch en arrays compatibles con JAX. El sistema admite la transcripción de audio a texto, la traducción de voz en varios idiomas y la generación de marcas de tiempo de audio. Permite el procesamiento de audio por lotes y escala el rendimiento mediante el procesamiento por lotes paralelo a los datos y la partición de tensores paralela al modelo. El proyecto proporciona un método para desplegar el modelo de transcripción como un endpoint de inferencia remoto con una interfaz web.
Enables deployment of the transcription model as a remote inference endpoint with a web interface.
KoboldAI-Client is a web-based interface and toolkit for interacting with large language models. It functions as a local AI text generator for storytelling and conversational AI, providing a front end for models hosted either on local hardware or within cloud-provisioned environments. The system includes a persona manager that uses external modules and soft-prompting to guide AI responses toward specific characters and writing styles. It also provides an API wrapper that exposes a standardized, OpenAI-compatible REST API, allowing external applications to communicate with the hosted models.
Provides tools for provisioning cloud infrastructure to host AI models as reachable API endpoints.
Riffusion-hobby is a generative AI tool that creates music by producing spectrogram images via Stable Diffusion and converting them into playable audio. It functions as a spectrogram audio synthesizer, utilizing deep learning to transform image-based frequency representations of sound into audio files. The project operates as an AI music inference server, providing a web-based API endpoint to generate audio from text prompts and seed images. It also includes a command line interface for executing music generation tasks and configuring diffusion models for automated audio creation, as well as
Provisions cloud infrastructure to host AI models as reachable API endpoints.
This project is an educational course and learning curriculum for implementing and fine-tuning transformer models using the Hugging Face ecosystem. It serves as a structured guide and technical walkthrough for processing multimodal data, adapting pre-trained neural networks, and deploying models. The material includes a guide for managing, versioning, and distributing model weights and datasets through a centralized asset hub. It also provides a practical tutorial on adapting models to specific datasets using parameter-efficient methods and an implementation guide for solving natural language
Provides instructions on hosting models as reachable API endpoints on optimized infrastructure.
SmolLM is a project dedicated to the development of small language models. It focuses on training and fine-tuning compact models that maintain high performance while utilizing fewer parameters. The project emphasizes efficient AI inference and on-device text generation, aiming to enable the deployment of lightweight models on edge devices with limited memory and processing power. It utilizes synthetic data generation to produce artificial datasets that improve the reasoning and training of these AI systems. The system supports a variety of optimization and training capabilities, including we
Deploys, pauses, and deletes model endpoints using managed or custom Docker images.