# bentoml/openllm

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/bentoml-openllm).**

12,115 stars · 798 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/bentoml/OpenLLM
- Homepage: https://bentoml.com
- awesome-repositories: https://awesome-repositories.com/repository/bentoml-openllm.md

## Topics

`bentoml` `fine-tuning` `llama` `llama2` `llama3-1` `llama3-2` `llama3-2-vision` `llm` `llm-inference` `llm-ops` `llm-serving` `llmops` `mistral` `mlops` `model-inference` `open-source-llm` `openllm` `vicuna`

## Description

OpenLLM is a framework for deploying, managing, and scaling open-source large language models

## Tags

### Artificial Intelligence & ML

- [Serving Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/serving-and-runtime/large-language-model-optimization/serving-frameworks.md) — Provides a platform for deploying, managing, and scaling open-source large language models as standardized API endpoints for production applications.
- [Large Language Models](https://awesome-repositories.com/f/artificial-intelligence-ml/large-language-models.md) — Deploys and hosts open-source language models as standardized API endpoints to integrate artificial intelligence into production applications.
- [Model Inference Servers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/engines-runtimes-servers/model-inference-servers.md) — Provides a system for hosting machine learning models with automated infrastructure provisioning, health monitoring, and elastic resource scaling.
- [Model Gateways](https://awesome-repositories.com/f/artificial-intelligence-ml/model-gateways.md) — Provides a centralized interface for routing requests across multiple language models to simplify performance optimization and cost tracking.
- [AI Infrastructure Managers](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-infrastructure-managers.md) — Automates the deployment, scaling, and monitoring of machine learning models across cloud environments to ensure reliable performance.
- [AI Workflow Orchestrators](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-workflow-orchestrators.md) — Connects multiple language models to build complex automated systems like retrieval-augmented generation pipelines. ([source](https://www.bentoml.com/))
- [LLM Application Orchestration](https://awesome-repositories.com/f/artificial-intelligence-ml/artificial-intelligence-tooling/language-model-integrations/llm-application-orchestration.md) — Chains multiple language models together to build complex automated pipelines and multi-step reasoning tasks.
- [Model Serving APIs](https://awesome-repositories.com/f/artificial-intelligence-ml/model-serving-apis.md) — Exposes large language models through standard interface specifications to ensure seamless compatibility with development tools. ([source](https://www.bentoml.com/blog/from-ollama-to-openllm-running-llms-in-the-cloud))
- [Open Models](https://awesome-repositories.com/f/artificial-intelligence-ml/open-models.md) — Hosts popular open-source large language models as ready-to-use endpoints with pre-configured settings. ([source](https://www.bentoml.com/))
- [Retrieval Augmented Generation Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/retrieval-augmented-generation-pipelines.md) — Connects multiple language models and data sources to build complex automated reasoning systems and advanced information retrieval workflows.
- [Model Registries](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/model-integration-serving/ai-model-orchestration/model-provider-integrations/model-registries.md) — Maintains a searchable catalog of model definitions that allows for the hot-swapping and versioning of inference services at runtime.
- [Local Model Inference Servers](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/local-model-inference-servers.md) — Runs large language models as local servers that provide standard-compliant APIs for easy integration. ([source](https://cdn.jsdelivr.net/gh/bentoml/OpenLLM@main/README.md))
- [Reasoning Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/reasoning-models/reasoning-pipelines.md) — Connects multiple model endpoints into sequential execution chains to facilitate complex tasks like retrieval-augmented generation and multi-step reasoning.
- [Custom Model Architectures](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-inference-serving/engines-runtimes-servers/custom-model-architectures.md) — Packages and hosts fine-tuned or custom model architectures using a standardized serving interface. ([source](https://www.bentoml.com/))
- [Model Configuration](https://awesome-repositories.com/f/artificial-intelligence-ml/model-configuration.md) — Uses structured metadata and engine configurations to package and deploy new open-source language models as standardized services. ([source](https://github.com/bentoml/OpenLLM/blob/main/DEVELOPMENT.md))
- [Model Packaging](https://awesome-repositories.com/f/artificial-intelligence-ml/model-packaging.md) — Uses structured metadata files to define model configurations and dependencies for consistent deployment across diverse infrastructure environments.
- [Chat Interfaces](https://awesome-repositories.com/f/artificial-intelligence-ml/chat-interfaces.md) — Provides a web-based environment for interacting with hosted models and managing concurrent conversation threads. ([source](https://cdn.jsdelivr.net/gh/bentoml/OpenLLM@main/README.md))
- [Model Repositories](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/architectures/model-architecture-evaluation/model-repositories.md) — Connects external version control repositories containing model definitions to extend the local library with custom collections. ([source](https://github.com/bentoml/OpenLLM/blob/main/DEVELOPMENT.md))
- [Model Management](https://awesome-repositories.com/f/artificial-intelligence-ml/model-management.md) — Maintains a searchable registry of available language models and supports custom repositories to expand the collection of runnable software. ([source](https://cdn.jsdelivr.net/gh/bentoml/OpenLLM@main/README.md))

### DevOps & Infrastructure

- [Cloud Deployment](https://awesome-repositories.com/f/devops-infrastructure/cloud-infrastructure/cloud-computing-serverless/development-deployment-environments/cloud-deployment.md) — Automates the transfer of hosted language models to managed cloud infrastructure to ensure scalable inference and reliable performance. ([source](https://cdn.jsdelivr.net/gh/bentoml/OpenLLM@main/README.md))
- [Container Orchestrators](https://awesome-repositories.com/f/devops-infrastructure/container-orchestration/container-orchestration-interfaces/container-orchestrators.md) — Deploys model services as isolated containers that scale automatically based on incoming request volume and resource utilization metrics.
- [Private Cloud Deployments](https://awesome-repositories.com/f/devops-infrastructure/deployment-management-strategies/execution-platforms-and-targets/deployment-environments/private-cloud-deployments.md) — Automates the setup of cloud-based inference environments with autoscaling and monitoring to support both fully-managed and private infrastructure. ([source](https://www.bentoml.com/blog/from-ollama-to-openllm-running-llms-in-the-cloud))
- [Inference Scaling Services](https://awesome-repositories.com/f/devops-infrastructure/inference-scaling-services.md) — Adjusts compute capacity through elastic auto-scaling and cross-region orchestration to optimize performance for production AI workloads. ([source](https://www.bentoml.com/))
- [Model Deployment Management](https://awesome-repositories.com/f/devops-infrastructure/model-deployment-management.md) — Controls versioning, rollbacks, and traffic shifting strategies like canary testing to ensure safe and reliable updates for production services. ([source](https://www.bentoml.com/))

### Part of an Awesome List

- [Artificial Intelligence](https://awesome-repositories.com/f/awesome-lists/ai/artificial-intelligence.md) — API endpoint provider for running open-source LLMs.
- [Inference and Serving](https://awesome-repositories.com/f/awesome-lists/ai/inference-and-serving.md) — Platform for serving and deploying open-source LLMs in production.
- [Infrastructure and Gateways](https://awesome-repositories.com/f/awesome-lists/ai/infrastructure-and-gateways.md) — Platform for operating LLMs in production.
- [Large Language Models](https://awesome-repositories.com/f/awesome-lists/ai/large-language-models.md) — Platform for serving, deploying, and monitoring LLMs in production.
- [MLOps Platforms](https://awesome-repositories.com/f/awesome-lists/ai/mlops-platforms.md) — Operates and deploys large language models in production.
- [Model Deployment and Platforms](https://awesome-repositories.com/f/awesome-lists/ai/model-deployment-and-platforms.md) — Open platform for operating large language models in production.
- [Model Serving](https://awesome-repositories.com/f/awesome-lists/ai/model-serving.md) — Platform for operating and deploying large language models.
- [Model Serving and Inference](https://awesome-repositories.com/f/awesome-lists/ai/model-serving-and-inference.md) — Tool for serving open-source LLMs as API endpoints.
- [Model Serving & Deployment](https://awesome-repositories.com/f/awesome-lists/ai/model-serving-deployment.md) — Runs open-source LLMs as OpenAI-compatible APIs.
- [Inference Frameworks](https://awesome-repositories.com/f/awesome-lists/devtools/inference-frameworks.md) — Deployment framework supporting multiple adapters and LangChain.

### System Administration & Monitoring

- [LLM Performance Monitoring](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/metric-performance-monitors/llm-performance-monitoring.md) — Tracks system performance, compute utilization, and model-specific metrics for production AI services. ([source](https://www.bentoml.com/))

### Web Development

- [Third-party API Clients](https://awesome-repositories.com/f/web-development/api-management-tools/api-development-management/api-infrastructure/third-party-api-clients.md) — Exposes model functionality through common interface specifications to ensure compatibility with existing development tools and third-party applications.