# h2oai/h2ogpt

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/h2oai-h2ogpt).**

12,016 stars · 1,313 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/h2oai/h2ogpt
- Homepage: http://h2o.ai
- awesome-repositories: https://awesome-repositories.com/repository/h2oai-h2ogpt.md

## Topics

`ai` `chatgpt` `embeddings` `fedramp` `generative` `gpt` `gpt4all` `llama2` `llm` `mixtral` `pdf` `private` `privategpt` `vectorstore`

## Description

h2oGPT is a self-hosted platform designed for running large language models and executing retrieval-augmented generation workflows locally. It provides a comprehensive web interface that allows users to index private document collections into searchable databases, enabling context-aware question answering and summarization without exposing sensitive data to external services.

The platform distinguishes itself by offering a modular architecture that supports both local model execution and connections to external inference servers. It facilitates the development of autonomous agents capable of performing multi-step tasks by delegating actions to various tools and models. Beyond simple chat, the system includes capabilities for fine-tuning models on local hardware and managing the full lifecycle of predictive assets, from data ingestion and feature engineering to model deployment and performance monitoring.

The software covers a broad range of enterprise-grade requirements, including document intelligence for extracting structured data from unstructured files, multi-GPU training support, and robust access control mechanisms. It provides tools for model explainability, compliance tracking, and collaborative experiment management to ensure transparency and reproducibility in machine learning workflows.

The project is designed for containerized deployment, utilizing standard configuration files to ensure consistent execution across local and cloud environments.

## Tags

### Artificial Intelligence & ML

- [Generative AI Dashboards](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-dashboards.md) — Provides a comprehensive dashboard for interacting with models, managing documents, and executing RAG workflows.
- [Retrieval Augmented Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/language-model-orchestration/retrieval-augmented-generation.md) — Connects private document collections to language models for context-aware question answering and summarization.
- [Local Model Execution](https://awesome-repositories.com/f/artificial-intelligence-ml/local-model-execution.md) — Executes large language models directly on local hardware to maintain data privacy and control. ([source](https://h2o.ai/platform/ai-cloud/hybrid))
- [Self-Hosted AI Platforms](https://awesome-repositories.com/f/artificial-intelligence-ml/self-hosted-ai-platforms.md) — Offers a self-hosted interface for running large language models locally to perform document analysis.
- [Document Indexing](https://awesome-repositories.com/f/artificial-intelligence-ml/document-indexing.md) — Indexes local files into searchable databases to enable context-aware retrieval-augmented generation. ([source](https://github.com/h2oai/h2ogpt/blob/main/docs/README_DOCKER.md))
- [RAG Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/rag-frameworks.md) — Indexes private documents into searchable databases to enable context-aware responses without external data exposure.
- [Agentic Workflow Orchestration](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-workflow-orchestration.md) — Coordinates multi-step tasks by delegating actions to external tools and models within a unified execution environment.
- [Autonomous Agent Orchestration](https://awesome-repositories.com/f/artificial-intelligence-ml/autonomous-agent-orchestration.md) — Orchestrates autonomous agents that combine reasoning with external tools to execute complex multi-step workflows.
- [Generative Content Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-content-tools.md) — Generates situation-specific responses and summaries based on private document collections. ([source](https://h2o.ai/))
- [Intelligent Document Processing](https://awesome-repositories.com/f/artificial-intelligence-ml/intelligent-document-processing.md) — Extracts structured data and insights from unstructured files using automated processing and intelligent character recognition.
- [Remote Inference Offloaders](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/local-model-inference-servers/remote-inference-offloaders.md) — Offload model execution to third-party engines while maintaining centralized control over application logic and request routing across your infrastructure. ([source](https://github.com/h2oai/h2ogpt/blob/main/docs/README_DOCKER.md))
- [Model Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/fine-tuning-and-customization/model-fine-tuning.md) — Enables customization and training of open-source models on local infrastructure for domain-specific tasks.
- [Multi-GPU Training Utilities](https://awesome-repositories.com/f/artificial-intelligence-ml/multi-gpu-training-utilities.md) — Distribute intensive deep learning workloads across multiple graphics processing units to accelerate model training times for large datasets. ([source](https://h2o.ai/platform/ai-cloud/make/hydrogen-torch/))
- [Agentic Workflow Automation](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-workflow-automation.md) — Connect automated AI capabilities to enterprise platforms and document repositories to perform complex searches and execute tasks across existing business processes. ([source](https://h2o.ai/))
- [Autonomous Task Execution](https://awesome-repositories.com/f/artificial-intelligence-ml/autonomous-task-execution.md) — Perform multi-step tasks by delegating actions to external models and tools within an experimental environment to test complex logic. ([source](https://cdn.jsdelivr.net/gh/h2oai/h2ogpt@main/README.md))
- [Inference Backends](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-backends.md) — Supports interchangeable model execution engines to allow flexible switching between local hardware and external API providers.
- [Model Deployment Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/deployment-pipelines-and-endpoints/model-deployment-pipelines.md) — Deploys models to production via real-time or batch endpoints with support for testing and updates. ([source](https://h2o.ai/platform/ai-cloud/make/h2o-driverless-ai/))
- [No-Code Training Interfaces](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/no-code-training-interfaces.md) — Build and tune models for text, image, video, and audio data using a no-code interface and expert-curated training techniques. ([source](https://h2o.ai/platform/ai-cloud/make/hydrogen-torch/))
- [AI Monitoring](https://awesome-repositories.com/f/artificial-intelligence-ml/ai-monitoring.md) — Evaluates model outputs through automated testing and risk monitoring to ensure safety and compliance. ([source](https://h2o.ai/))
- [Automated Machine Learning Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/automated-machine-learning-tools.md) — Perform machine learning tasks and create deep learning models using specialized engines that remove the need for manual coding or complex configuration. ([source](https://h2o.ai/platform/ai-cloud/))
- [Automated Lineage Capturers](https://awesome-repositories.com/f/artificial-intelligence-ml/data-lineage/automated-lineage-capturers.md) — Captures metadata and event logs throughout the machine learning lifecycle to ensure traceability and reproducibility. ([source](https://h2o.ai/platform/ai-cloud/operate/h2o-mlops/))
- [Model Lineage Trackers](https://awesome-repositories.com/f/artificial-intelligence-ml/data-lineage/model-lineage-trackers.md) — Records comprehensive event logs and versioning data throughout the machine learning lifecycle to ensure reproducibility and compliance.
- [Data Preparation](https://awesome-repositories.com/f/artificial-intelligence-ml/data-preparation.md) — Processes diverse unstructured data types like text, images, and audio to prepare them for predictive modeling. ([source](https://h2o.ai/platform/ai-cloud/make/hydrogen-torch/))
- [Dataset Integration](https://awesome-repositories.com/f/artificial-intelligence-ml/dataset-integration.md) — Pull data from various external storage systems and cloud repositories to prepare information for model training and analysis. ([source](https://h2o.ai/platform/ai-cloud/make/h2o-driverless-ai/))
- [External Model Connectors](https://awesome-repositories.com/f/artificial-intelligence-ml/external-model-connectors.md) — Centralize machine learning models from various frameworks to simplify the deployment, tracking, and monitoring of your predictive assets. ([source](https://h2o.ai/platform/ai-cloud/operate/h2o-mlops/))
- [Feature Stores](https://awesome-repositories.com/f/artificial-intelligence-ml/feature-stores.md) — Integrate with pipelines to reduce redundant data ingestion while providing metadata-driven recommendations for feature discovery and generation. ([source](https://h2o.ai/platform/ai-cloud/make/feature-store/))
- [Information Extraction](https://awesome-repositories.com/f/artificial-intelligence-ml/information-extraction.md) — Identifies and isolates key entities like names and invoice numbers from documents using natural language processing. ([source](https://h2o.ai/platform/ai-cloud/make/document-ai/))
- [Model Deployment Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/model-deployment-pipelines.md) — Exports and deploys finished models to external environments for production serving. ([source](https://h2o.ai/platform/ai-cloud/make/hydrogen-torch/))
- [Model Interpretability Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/model-interpretability-tools.md) — Provides visual and analytical tools to interpret and explain machine learning model decisions. ([source](https://h2o.ai/platform/ai-cloud/make/h2o-driverless-ai/))
- [Model Validation Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/model-validation-tools.md) — Verifies model performance through backtesting and drift monitoring to maintain reliability. ([source](https://h2o.ai/platform/ai-cloud/make/feature-store/))
- [Human-in-the-Loop Systems](https://awesome-repositories.com/f/artificial-intelligence-ml/human-in-the-loop-systems.md) — Integrates human-in-the-loop evaluation to refine model performance based on real-world outcomes. ([source](https://h2o.ai/platform/ai-cloud/make/feature-store/))
- [Inference Latency Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/inference-latency-optimizers.md) — Minimizes data transfer delays by placing models near core application logic. ([source](https://h2o.ai/platform/ai-cloud/hybrid))
- [Predictive Machine Learning Analytics](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/algorithms/predictive-machine-learning-analytics.md) — Select, parameterize, and train optimal machine learning algorithms based on user-defined target variables and specific business requirements. ([source](https://h2o.ai/platform/ai-cloud/make/h2o-driverless-ai/))
- [Model Training Optimizers](https://awesome-repositories.com/f/artificial-intelligence-ml/model-training-optimizers.md) — Automates parameter adjustment and result analysis to improve model precision through repeated testing. ([source](https://h2o.ai/platform/ai-cloud/make/hydrogen-torch/))

### Security & Cryptography

- [Local Language Model Hosting](https://awesome-repositories.com/f/security-cryptography/privacy-data-protection/local-only-data-processing/local-language-model-hosting.md) — Hosts generative AI models on local hardware to ensure data privacy and full control over sensitive information.
- [Compliance and Governance](https://awesome-repositories.com/f/security-cryptography/governance-policy-frameworks/compliance-governance.md) — Automates data versioning and lineage tracking to ensure adherence to privacy and regulatory standards. ([source](https://h2o.ai/platform/ai-cloud/make/feature-store/))
- [Identity and Access Management](https://awesome-repositories.com/f/security-cryptography/identity-and-access-management.md) — Controls user and group access to projects and model artifacts to ensure secure collaboration. ([source](https://h2o.ai/platform/ai-cloud/operate/h2o-mlops/))

### DevOps & Infrastructure

- [Private Cloud Hosting](https://awesome-repositories.com/f/devops-infrastructure/private-cloud-hosting.md) — Hosts end-to-end private AI platforms on-premises or in private clouds to ensure data sovereignty. ([source](https://h2o.ai/platform/ai-cloud/))
- [Model Deployment Management](https://awesome-repositories.com/f/devops-infrastructure/model-deployment-management.md) — Manages the full lifecycle of document processing models including deployment, monitoring, and logging. ([source](https://h2o.ai/platform/ai-cloud/make/document-ai/))
- [Managed Cloud Deployments](https://awesome-repositories.com/f/devops-infrastructure/deployment-management/deployment-strategies/managed-cloud-deployments.md) — Provides managed infrastructure for hosting and scaling machine learning environments in the cloud. ([source](https://h2o.ai/platform/ai-cloud/))
- [AI Application Deployment Platforms](https://awesome-repositories.com/f/devops-infrastructure/ai-application-deployment-platforms.md) — Share and deploy artificial intelligence applications across an organization using a centralized marketplace to accelerate internal innovation. ([source](https://h2o.ai/platform/ai-cloud/))
- [Cloud Infrastructure Deployment](https://awesome-repositories.com/f/devops-infrastructure/cloud-infrastructure-deployment.md) — Supports containerized deployment across diverse cloud and on-premises environments to ensure portability. ([source](https://h2o.ai/platform/ai-cloud/hybrid))
- [Container Deployment Configurations](https://awesome-repositories.com/f/devops-infrastructure/container-deployment-configurations.md) — Simplifies platform deployment and dependency management using standard Docker Compose configurations. ([source](https://github.com/h2oai/h2ogpt/blob/main/docs/README_DOCKER.md))

### Part of an Awesome List

- [Language Model Development](https://awesome-repositories.com/f/awesome-lists/ai/language-model-development.md) — Open-source generative AI for private model ownership.

### Data & Databases

- [Document and Unstructured Extraction](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing/document-unstructured-extraction.md) — Extracts structured data from unstructured documents using optical character recognition and machine learning. ([source](https://h2o.ai/platform/ai-cloud/make/document-ai/))
- [Feature Engineering Tools](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing/ml-data-pipelines/feature-engineering-tools.md) — Automates the identification and calculation of data features to enhance predictive model performance. ([source](https://h2o.ai/platform/ai-cloud/make/h2o-driverless-ai/))
- [Tabular Predictive Models](https://awesome-repositories.com/f/data-databases/tabular-data-frameworks/tabular-predictive-models.md) — Enables automated tabular predictions without requiring manual model training or complex infrastructure. ([source](https://h2o.ai/))

### System Administration & Monitoring

- [LLM Performance Monitoring](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/observability-platforms/metric-performance-monitors/llm-performance-monitoring.md) — Tracks accuracy, bias, and data drift in production to ensure models remain effective. ([source](https://h2o.ai/platform/ai-cloud/operate/h2o-mlops/))

### Business & Productivity Software

- [Team Collaboration Management](https://awesome-repositories.com/f/business-productivity-software/team-collaboration-management.md) — Offers collaborative tools for data science teams to version, compare, and register machine learning models. ([source](https://h2o.ai/platform/ai-cloud/operate/h2o-mlops/))

### Development Tools & Productivity

- [RESTful APIs](https://awesome-repositories.com/f/development-tools-productivity/api-development-sdks/restful-apis.md) — Provide access to document processing, model training, and scoring capabilities through a standard interface to connect services with existing applications. ([source](https://h2o.ai/platform/ai-cloud/make/document-ai/))
