# ML Model REST API Servers

> Search results for `serve any ML model behind a REST API` on awesome-repositories.com. 120 total matches; showing the first 50.

Explore on the web: https://awesome-repositories.com/q/serve-any-ml-model-behind-a-rest-api

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [this search on awesome-repositories.com](https://awesome-repositories.com/q/serve-any-ml-model-behind-a-rest-api).**

## Results

- [inspirehep/rest-api-doc](https://awesome-repositories.com/repository/inspirehep-rest-api-doc.md) (57 ⭐) — Documentation of the INSPIRE REST API
- [bentoml/openllm](https://awesome-repositories.com/repository/bentoml-openllm.md) (12,115 ⭐) — OpenLLM is a framework for deploying, managing, and scaling open-source large language models
- [danielmiessler/fabric](https://awesome-repositories.com/repository/danielmiessler-fabric.md) (42,408 ⭐) — Fabric is a command-line orchestrator designed to automate complex data processing and content generation tasks by chaining artificial intelligence models with modular prompt templates. It functions as a terminal-based tool that utilizes standard input and output streams, allowing users to pipe data directly into predefined reasoning strategies. By providing a model-agnostic abstraction layer, the system decouples execution logic from specific artificial intelligence vendors, normalizing requests and responses across different service providers.

The platform distinguishes itself through its p
- [hiyouga/llama-factory](https://awesome-repositories.com/repository/hiyouga-llama-factory.md) (72,241 ⭐) — LLaMA-Factory is a comprehensive suite for dataset preparation, model fine-tuning, memory optimization, and standardized API deployment. It provides a unified platform for the supervised and reward-based fine-tuning of large language models and vision-language models.

The framework includes a specialized toolkit for training vision-language models and a model serving interface that deploys trained models through high-performance APIs. It utilizes precision tuning and quantization techniques to reduce the hardware requirements and memory footprint of large models.

The system covers data pipel
- [vmware/burp-rest-api](https://awesome-repositories.com/repository/vmware-burp-rest-api.md) (566 ⭐) — REST/JSON API to the Burp Suite security tool.
- [gokumohandas/made-with-ml](https://awesome-repositories.com/repository/gokumohandas-made-with-ml.md) (48,343 ⭐) — Made-With-ML is an automated documentation generator and developer experience platform designed to transform source code into structured, searchable reference websites. It functions as a codebase intelligence tool that parses implementation details to provide clear explanations of logic and data requirements.

The system distinguishes itself by leveraging language-level type annotations and structured code comments to generate interface specifications. By utilizing static analysis to extract metadata, it automates the transformation of docstrings into web-ready documentation, ensuring that tec
- [openbmb/minicpm](https://awesome-repositories.com/repository/openbmb-minicpm.md) (9,464 ⭐) — MiniCPM is a collection of small language models designed for local, on-device deployment in resource-constrained environments. The project focuses on running dense Transformer models on consumer hardware, including GPUs, CPUs, and Apple Silicon, without requiring custom code forks.

The project distinguishes itself through heavy optimization for edge hardware, utilizing quantized weight compression in GGUF and MLX formats to reduce memory overhead. It implements advanced inference techniques such as speculative sampling and radix-tree prefix caching to accelerate generation speed and throughp
- [fastapi/fastapi](https://awesome-repositories.com/repository/fastapi-fastapi.md) (99,260 ⭐) — FastAPI is a web framework for building APIs with Python. It leverages standard language type hints to provide automatic data validation, request parsing, and interactive API documentation generation. The framework supports asynchronous request handling and manages execution contexts to prevent blocking the main event loop.

The project includes a dependency injection system that allows for the resolution and injection of reusable components into request handlers. This system supports request-scoped caching, lifecycle management, and integration with security mechanisms like OAuth2 and JSON We
- [optiv/rest-api-goat](https://awesome-repositories.com/repository/optiv-rest-api-goat.md) (88 ⭐) — This is a "Goat" project so you can get familiar with REST API testing. There is an included Postman project so you can see how everything is meant to be called. If you encounter any components of the API which don't work correctly, please create an Issue for them.
- [formbricks/formbricks](https://awesome-repositories.com/repository/formbricks-formbricks.md) (12,391 ⭐) — Formbricks is an open-source survey and feedback platform designed to help teams capture and analyze user insights through targeted, in-app, and website-based interactions. It functions as a comprehensive customer experience analytics system that allows organizations to maintain full control over their data, user attributes, and survey workflows.

The platform distinguishes itself through its event-driven architecture, which enables precise behavioral targeting by triggering surveys based on specific user actions or application events. It supports deep integration with external ecosystems by a
- [phalcon/rest-api](https://awesome-repositories.com/repository/phalcon-rest-api.md) (93 ⭐) — Sample API using Phalcon
- [opennmt/ctranslate2](https://awesome-repositories.com/repository/opennmt-ctranslate2.md) (4,319 ⭐) — CTranslate2 is a C++ inference engine and runtime for Transformer models, designed to execute models on both CPU and GPU with optimizations for speed and memory efficiency. It functions as a model format converter, quantization tool, and REST API server, enabling deployment of neural machine translation, automatic speech recognition, and text generation models.

The engine distinguishes itself through a suite of runtime optimizations including layer fusion, weight-matrix quantization, batch-by-length grouping, and a caching allocator that reuses GPU memory. It supports tensor-parallel model di
- [dusty-nv/jetson-containers](https://awesome-repositories.com/repository/dusty-nv-jetson-containers.md) (4,386 ⭐) — Jetson Containers is a container management system that builds and runs GPU-accelerated Docker images for machine learning workloads on ARM64 edge hardware. It functions as a CUDA container orchestrator, automatically detecting the host's CUDA toolkit version and GPU capabilities to ensure container compatibility at runtime, while selecting the correct container image by matching the host's JetPack or L4T version at launch time.

The project delivers pre-configured containers for executing quantized large language models and retrieval-augmented generation pipelines optimized for edge devices,
- [elastic/elasticsearch](https://awesome-repositories.com/repository/elastic-elasticsearch.md) (77,012 ⭐) — Elasticsearch is a distributed search engine and document store designed for the high-performance indexing and retrieval of massive volumes of unstructured data. It functions as a centralized analytics platform, providing a schema-flexible architecture that organizes information into searchable indices while maintaining global cluster state through a distributed consensus mechanism.

The platform distinguishes itself through its integrated approach to observability, security, and advanced analytics. It combines full-text, vector, and hybrid search capabilities with machine learning-driven insi
- [qiangxue/go-rest-api](https://awesome-repositories.com/repository/qiangxue-go-rest-api.md) (1,693 ⭐) — An idiomatic Go REST API starter kit (boilerplate) following the SOLID principles and Clean Architecture
- [crossref/rest-api-doc](https://awesome-repositories.com/repository/crossref-rest-api-doc.md) (796 ⭐) — Documentation for Crossref's REST API. For questions or suggestions, see https://community.crossref.org/
- [capsoftware/cap](https://awesome-repositories.com/repository/capsoftware-cap.md) (17,026 ⭐) — Cap is a self-hosted screen recording and video collaboration platform designed for teams to replace synchronous meetings with asynchronous video updates. It provides a comprehensive suite for capturing high-resolution desktop activity, including system audio, microphone input, and camera overlays, which are then processed through an integrated post-production workflow.

The platform distinguishes itself by offering full data sovereignty through containerized deployment and object storage abstractions, allowing users to host their media assets on private infrastructure or S3-compatible buckets
- [sjtu-ipads/powerinfer](https://awesome-repositories.com/repository/sjtu-ipads-powerinfer.md) (9,568 ⭐) — PowerInfer is an inference engine and serving framework designed to run large language models on local hardware. It combines a hybrid CPU-GPU offloader, a quantization tool, and a sparse model optimizer to enable the execution of high-parameter models on consumer-grade devices.

The system distinguishes itself through neuron-activation-based offloading, using a predictor model to preload frequent neurons into VRAM while keeping rare neurons in system memory. This hybrid execution model balances workloads between the GPU and CPU based on input patterns to optimize memory access and increase tok
- [modular/modular](https://awesome-repositories.com/repository/modular-modular.md) (26,357 ⭐) — Modular is a unified machine learning development platform designed for building, compiling, and deploying high-performance neural network models. It provides a comprehensive execution engine that supports both local and production-grade inference, enabling developers to manage the entire model lifecycle from initial architecture definition to scalable, containerized service deployment.

The platform distinguishes itself through a hardware-agnostic runtime that abstracts diverse silicon architectures, allowing models to execute efficiently across varied compute environments. It includes a spec
- [flarum/framework](https://awesome-repositories.com/repository/flarum-framework.md) (6,727 ⭐) — This project is a self-hosted forum software and extensible community platform designed to facilitate online discussions and member engagement. It functions as a REST API discussion engine, providing a backend that manages community interactions and forum data via a standardized JSON interface for external applications.

The platform is distinguished by a modular architecture that allows for deep customization through a package-based extension system and an interface extension framework. It employs an extender-based customization model, enabling external modules to modify internal system behav
- [invictify/jupter-notebook-rest-api](https://awesome-repositories.com/repository/invictify-jupter-notebook-rest-api.md) (166 ⭐) — Run your jupyter notebooks as a REST API endpoint. This isn't a jupyter server but rather just a way to run your notebooks as a REST API Endpoint.
- [openmoss/moss](https://awesome-repositories.com/repository/openmoss-moss.md) (12,140 ⭐) — MOSS is a conversational AI API server and framework designed to manage stateful multi-turn dialogues via session identifiers for remote interaction. It functions as a tool-augmented language model framework and a quantized inference engine.

The project integrates external plugins, such as search engines and calculators, to provide factual and computed data within model responses. It also includes a supervised fine-tuning toolkit for adapting base language models to specific conversational datasets and behavioral instructions.

The system supports inference optimization through 4-bit and 8-bi
- [flarum/core](https://awesome-repositories.com/repository/flarum-core.md) (6,729 ⭐) — This project is a self-hosted community engine and forum software designed for hosting threaded discussions. It functions as a JSON API community platform, exposing all data and functionality through a standardized interface to support a single-page application architecture. The system is built to be a multi-language discussion board with integrated localization and language pack support.

The platform is defined by a modular architecture that allows for extensive customization through an extension-based plugin system. This extensibility enables the modification of core behavior, the addition
- [public-transport/hafas-rest-api](https://awesome-repositories.com/repository/public-transport-hafas-rest-api.md) (25 ⭐) — Expose a hafas-client@6 instance as an HTTP REST API.
- [andrewsuzuki/elm-todo-rest-api](https://awesome-repositories.com/repository/andrewsuzuki-elm-todo-rest-api.md) (105 ⭐) — Modular, heavily-documented Elm todo app with a json rest api
- [oobabooga/text-generation-webui](https://awesome-repositories.com/repository/oobabooga-text-generation-webui.md) (47,323 ⭐) — This project is a comprehensive platform for hosting and interacting with large language models directly on local hardware. It provides a web-based graphical interface that allows users to manage model loading, configure generation parameters, and execute text or chat interactions entirely offline. By running models locally, the software ensures complete data privacy and eliminates reliance on external cloud services for generative tasks.

Beyond basic inference, the platform functions as a versatile workbench for generative AI development. It includes an integrated pipeline for fine-tuning mo
- [clearml/clearml](https://awesome-repositories.com/repository/clearml-clearml.md) (6,740 ⭐) — ClearML is a comprehensive MLOps platform designed to manage the end-to-end machine learning lifecycle, from initial experimentation to production deployment. It provides a suite of integrated tools including a pipeline orchestrator for automating workflows, an experiment tracking tool for logging hyperparameters and metrics, and a metadata-driven data versioning system for managing large-scale datasets and model artifacts.

The platform is distinguished by its advanced compute management and serving capabilities. It features a GPU compute manager that supports fractional resource slicing and
- [allegroai/clearml](https://awesome-repositories.com/repository/allegroai-clearml.md) (6,733 ⭐) — ClearML is a comprehensive MLOps platform designed to manage the entire machine learning lifecycle. It functions as an experiment tracking tool, a data versioning system, and a pipeline orchestrator, while providing infrastructure for GPU cluster management and model serving.

The platform is distinguished by its ability to handle hybrid-cloud compute scheduling and fractional GPU allocation, allowing multiple workloads to share a single hardware accelerator. It employs a metadata-based approach to data versioning, using virtual views to track large datasets and artifacts without duplicating r
- [esri-es/arcgis-rest-api](https://awesome-repositories.com/repository/esri-es-arcgis-rest-api.md) (75 ⭐) — Postman collection for ArcGIS REST API
- [microsoft/ai-edu](https://awesome-repositories.com/repository/microsoft-ai-edu.md) (14,065 ⭐) — ai-edu is a comprehensive AI education curriculum and machine learning courseware collection. It provides theoretical tutorials, deep learning lab exercises, and project blueprints designed to teach artificial intelligence fundamentals through a combination of study and practical implementation.

The project focuses on a learning-by-doing approach, guiding users from Python programming and neural network basics to advanced topics. It includes specialized instructional content on distributed AI training, MLOps educational guides for model quantization and pruning, and detailed frameworks for im
- [nvidia/triton-inference-server](https://awesome-repositories.com/repository/nvidia-triton-inference-server.md) (10,756 ⭐) — Triton Inference Server is a high-performance AI model inference server and multi-framework model runtime designed for deploying machine learning models across cloud, data center, and embedded edge infrastructure. It serves as an execution engine that allows for the concurrent running of models from various frameworks to optimize hardware utilization.

The project features a dynamic batching inference engine that groups individual requests into larger batches to increase total processing throughput. It also provides a model ensemble pipeline, which enables the chaining of multiple models toget
- [hoppscotch/hoppscotch](https://awesome-repositories.com/repository/hoppscotch-hoppscotch.md) (79,618 ⭐) — Hoppscotch is an open-source API development ecosystem designed for building, testing, and debugging REST, GraphQL, and real-time APIs. It provides a unified platform that functions across web browsers, desktop applications, and command-line interfaces, allowing developers to manage the entire API lifecycle from a single environment.

The platform distinguishes itself through a highly interactive, command-driven interface that utilizes a global spotlight palette and keyboard shortcuts to streamline complex workflows. It supports advanced request manipulation and validation by executing JavaScr
- [hasib32/rest-api-with-lumen](https://awesome-repositories.com/repository/hasib32-rest-api-with-lumen.md) (485 ⭐) — Rest API boilerplate for Lumen micro-framework.
- [openvla/openvla](https://awesome-repositories.com/repository/openvla-openvla.md) (5,305 ⭐) — OpenVLA is a vision-language-action model and framework designed for general-purpose robotic manipulation. It provides a robotic policy training framework and a control inference engine that map visual and textual inputs to robotic control actions, enabling zero-shot instruction following on hardware.

The project includes a robotics dataset pipeline for standardizing diverse trajectory data and managing dataset mixtures. It supports large-scale model training through distributed GPU compute and sharded data parallelism, alongside parameter-efficient adaptation for fine-tuning models to new ta
- [tensorflow/serving](https://awesome-repositories.com/repository/tensorflow-serving.md) (6,351 ⭐) — TensorFlow Serving is a high-performance machine learning inference server designed to deploy TensorFlow models to production environments. It functions as a complete serving system that executes predictions on input data through a graph executor, providing network endpoints that eliminate the need for a separate runtime environment for client applications.

The system is distinguished by its model version manager, which organizes and selects specific model versions within a directory hierarchy. It uses a filesystem watcher to detect new model versions and trigger automatic updates without int
- [apache/gravitino](https://awesome-repositories.com/repository/apache-gravitino.md) (2,866 ⭐) — Gravitino is a federated metadata lake and unified data catalog designed to manage tables, files, and AI models across diverse data sources and cloud storage. It serves as a centralized interface for governing schemas, access controls, and tagging across relational databases, messaging queues, and object stores.

The project distinguishes itself by unifying the management of AI assets, such as machine learning models and their version lineages, alongside traditional tabular data. It also implements the Iceberg REST specification to provide a standardized metadata server and proxy for lakehouse
- [rebeyond/behinder](https://awesome-repositories.com/repository/rebeyond-behinder.md) (6,133 ⭐)
- [huggingface/transformers](https://awesome-repositories.com/repository/huggingface-transformers.md) (161,630 ⭐) — Transformers is a comprehensive library for machine learning that provides a unified interface for training, fine-tuning, and deploying transformer-based models. It supports a wide range of tasks, including text classification, language modeling, question answering, and sequence-to-sequence translation, while offering specialized architectures for both text and vision processing. The framework includes tools for managing the entire model lifecycle, from data preprocessing and tokenization to distributed training and inference.

The library features extensive support for model optimization and
- [breezedeus/pix2text](https://awesome-repositories.com/repository/breezedeus-pix2text.md) (3,012 ⭐) — Pix2Text is an optical character recognition system and document conversion tool designed to transform images and PDFs into Markdown. It functions as a multilingual OCR engine supporting over 80 languages, a LaTeX formula recognizer for mathematical notations, and a parser integrated with vision language models.

The project utilizes a hybrid pipeline to separate plain text from mathematical formulas and tabular structures within a single pass. It converts recognized formulas into LaTeX expressions and transforms detected tables and layouts into structured Markdown formatting.

The system incl
- [pytorch/serve](https://awesome-repositories.com/repository/pytorch-serve.md) (4,354 ⭐) — Serve, optimize and scale PyTorch models in production
- [pycaret/pycaret](https://awesome-repositories.com/repository/pycaret-pycaret.md) (9,811 ⭐) — PyCaret is a Python AutoML platform and MLOps lifecycle manager designed to automate machine learning workflows. It functions as a low-code environment that leverages a scikit-learn native engine to execute preprocessing, training, and evaluation for tabular data.

The platform distinguishes itself as an LLM-powered ML copilot, using large language model agents to analyze datasets, design experiment configurations, and explain model results. It also serves as a Kubernetes ML orchestrator and model registry, enabling the versioning of trained pipelines and their promotion to production API endp
- [encoredev/encore](https://awesome-repositories.com/repository/encoredev-encore.md) (12,049 ⭐) — Encore is a distributed systems framework designed to unify backend development, infrastructure provisioning, and observability. It functions as an infrastructure-as-code platform that allows developers to define cloud resources, databases, and messaging topics directly within their application code. By analyzing these declarations at compile-time, the system automatically manages the deployment of cloud resources and security policies, ensuring parity between local development and production environments.

The platform distinguishes itself through its integrated development experience, which
- [qwenlm/qwen2.5](https://awesome-repositories.com/repository/qwenlm-qwen2-5.md) (27,307 ⭐) — Qwen2.5 is a suite of large language model foundation models designed for natural language generation, code production, and complex mathematical reasoning. The project encompasses a multilingual language model capable of processing dozens of languages and a specialized code generation model for technical problem solving and debugging.

The framework is distinguished by its long context capabilities, enabling the analysis of massive inputs ranging from 256K up to 1 million tokens. It further functions as an agentic framework, utilizing standardized templates and parsers to execute autonomous wo
- [avaloniaui/avalonia](https://awesome-repositories.com/repository/avaloniaui-avalonia.md) (30,986 ⭐) — Avalonia is a cross-platform desktop framework that enables the creation of native-feeling applications for Windows, macOS, and Linux from a single codebase. It functions as a declarative UI toolkit, allowing developers to define complex visual hierarchies and interface structures using a markup-based syntax that maps directly to underlying object properties. By utilizing the Model-View-ViewModel architectural pattern, the framework facilitates a clean separation between application logic and user interface layout, which simplifies unit testing and component maintenance.

The framework disting
- [jrosebr1/simple-keras-rest-api](https://awesome-repositories.com/repository/jrosebr1-simple-keras-rest-api.md) (372 ⭐) — This repository contains the code for Building a simple Keras + deep learning REST API, published on the Keras.io blog.
- [encode/django-rest-framework](https://awesome-repositories.com/repository/encode-django-rest-framework.md) (30,083 ⭐) — Django REST Framework is a toolkit for building standards-compliant web services that map complex data models to structured HTTP responses. It provides a modular architecture for handling the request lifecycle, including authentication, permission checks, and content negotiation. The framework is designed to facilitate the development of robust APIs by transforming complex data types into native formats and validating incoming request payloads against defined schemas.

The project distinguishes itself through a highly modular, class-based design that allows developers to build complex views an
- [kaede-no-ki/otakudesu-rest-api](https://awesome-repositories.com/repository/kaede-no-ki-otakudesu-rest-api.md) (103 ⭐) — Unofficial Rest Api of https://otakudesu.tv/
- [zai-org/chatglm3](https://awesome-repositories.com/repository/zai-org-chatglm3.md) (13,764 ⭐) — ChatGLM3 is a comprehensive framework for deploying, fine-tuning, and serving large language models. It functions as a high-performance inference engine designed to support conversational AI, enabling developers to build interactive agents capable of multi-turn dialogue, autonomous code execution, and structured tool invocation.

The project distinguishes itself through its focus on hardware-agnostic deployment and resource optimization. It supports distributed model parallelism across multiple graphics cards, paged key-value caching for concurrent request processing, and weight quantization t
- [containers/ramalama](https://awesome-repositories.com/repository/containers-ramalama.md) (2,605 ⭐) — Ramalama is a containerized runtime and management tool for large language models. It functions as an OCI AI model manager and registry client, allowing users to package, distribute, and execute AI models as standardized container images.

The project differentiates itself by using OCI-compliant distribution for models and retrieval augmented generation assets, enabling the packaging of vector databases into immutable container images. It features hardware-aware image selection that automatically detects GPU or CPU capabilities to pull the most optimized image for the host environment.

The sy
- [eugeneyan/applied-ml](https://awesome-repositories.com/repository/eugeneyan-applied-ml.md) (29,783 ⭐) — This project is a comprehensive, curated knowledge base designed to support the development and maintenance of production-grade machine learning systems. It serves as a centralized repository of industry-standard technical literature, engineering case studies, and research papers, providing a structured reference for practitioners navigating the complexities of modern data science and machine learning engineering.

The resource distinguishes itself through a cross-domain approach that bridges the gap between academic research and practical implementation. By synthesizing proven industry archit
