Deployable open-source frameworks for building and hosting complete retrieval-augmented generation pipelines on your own infrastructure.
Quivr is a framework for building retrieval-augmented generation pipelines that connect large language models to custom knowledge bases. It serves as a generative AI integration layer that abstracts the process of transforming diverse document sources into searchable context for AI responses. The project orchestrates the end-to-end flow between document ingestion, vector storage management, and model provider interfaces. It features a vector-store-agnostic retrieval system and a modular API layer that allows for flexible switching between different generative model providers. The system covers document parsing for various file formats, embedding-based semantic search, and the integration of external internet search results to augment retrieval accuracy. It provides the infrastructure to manage embeddings and perform semantic searches across different database backends.
Quivr is a comprehensive self-hosted RAG application that provides a complete chat interface, document ingestion pipeline, and support for multiple LLM providers and vector databases to enable conversational interaction with your knowledge base.
Quivr is a retrieval-augmented generation platform designed to transform raw documents into searchable knowledge bases. It functions as a centralized environment where users can ingest files, index them into vector databases, and interact with language models to receive contextually relevant, data-backed responses. The platform distinguishes itself through an agentic workflow orchestrator that sequences retrieval tasks, tool execution, and model interactions to resolve complex, multi-step queries. This engine is entirely configuration-driven, allowing users to define document ingestion, chunking parameters, and workflow node sequences through structured schemas. By maintaining a unified knowledge management interface, the system tracks chat history alongside file storage, ensuring that interactions remain context-aware across diverse local and remote backends. Beyond its core orchestration, the system provides a comprehensive pipeline for document processing, including parsing for various file formats and asynchronous task execution to maintain responsiveness during data ingestion. It supports the development of specialized chatbots, including voice-enabled interfaces, by integrating speech-to-text and text-to-speech capabilities with its underlying retrieval systems. The project utilizes strict base classes to enforce configuration integrity, ensuring consistent data processing across all application settings.
Quivr is a comprehensive, self-hostable RAG platform that provides a complete document ingestion pipeline, vector database integration, and a chat interface for interacting with your knowledge base.
pdfGPT is a retrieval augmented generation application and chatbot designed to analyze PDF documents. It functions as a document analyzer and vector search interface, using large language models to answer questions grounded in the content of uploaded files. The system implements a pipeline that extracts text from PDFs, splits content into overlapping segments, and uses vector-based semantic search to retrieve relevant context. This process allows the application to provide responses with verifiable source citations, including page number references to the original document. The project also includes session-based conversation memory and chat history tracking to maintain context across multi-turn dialogues.
This application is a dedicated RAG tool designed specifically for document analysis, featuring a complete pipeline for PDF ingestion, vector-based semantic search, and a chat interface with source citations.
This project is a privacy-first backend service designed to facilitate retrieval-augmented generation by processing local documents into searchable vector representations. It provides a modular architecture that allows users to ingest diverse file formats, manage document metadata, and perform semantic searches to provide context-aware responses for chat and completion requests. The system distinguishes itself through a database-agnostic abstraction layer that supports various storage backends, ranging from local disk storage to enterprise-grade vector databases. It offers flexible deployment options, enabling users to run language models entirely on private hardware or connect to external cloud-based providers through a unified interface. To improve the quality of generated output, the engine incorporates reranking logic that refines retrieved document chunks before they are processed by the language model. The platform includes a comprehensive suite of tools for managing document intelligence pipelines, including automated parsing, text chunking, and embedding generation. Users can configure the system through environment-based profiles to match specific hardware capabilities, such as CPU or GPU-accelerated setups, and stream responses in real time to reduce latency. The application is configured via runtime settings files and environment variables, with support for building custom container images to suit specific deployment requirements.
This is a comprehensive, self-hostable RAG application that provides a complete pipeline for document ingestion, vector database integration, and a chat interface for context-aware responses.
Khoj is a self-hosted artificial intelligence platform designed for personal knowledge management and semantic information retrieval. It functions as a private assistant that indexes your local documents, notes, and external workspaces, allowing you to interact with your data through natural language queries and conversational chat. By maintaining a local-first architecture, the system ensures that your information remains under your control while providing context-aware responses grounded in your personal knowledge base. The platform distinguishes itself through a modular, cross-platform integration layer that embeds intelligent search and chat capabilities directly into your existing workflows. Whether you are working within text editors, web browsers, or mobile messaging applications, Khoj provides a unified interface to your data. It supports advanced retrieval strategies, such as dual-model architectures for semantic mapping and real-time internet grounding, which allow the assistant to synthesize private notes with external information while providing clear source citations. Beyond its core retrieval capabilities, the system offers a comprehensive suite of tools for data orchestration and research automation. It includes a pluggable ingestion pipeline for diverse file formats, automated query scheduling, and the ability to execute code or generate visual content directly within the chat interface. Users can configure custom agents, manage model routing, and secure their deployments with multi-user authentication, making it suitable for both individual use and enterprise-grade environments.
Khoj is a self-hosted AI platform that provides a complete RAG pipeline, including document ingestion, vector-based semantic search, and a chat interface for interacting with your local knowledge base.
This platform serves as a comprehensive environment for managing private language models, document knowledge bases, and automated agent workflows within secure local infrastructure. It functions as a document-aware workspace that enables users to ingest diverse file formats into searchable repositories, ensuring that all data processing and model inference remain within private, local environments to maintain data sovereignty. The system distinguishes itself through a modular agentic engine that allows for the definition of custom skills and external tool execution. By utilizing a multi-model abstraction layer, it normalizes interactions across various local and cloud-based providers, while workspace-scoped management ensures that system prompts and knowledge bases remain isolated to meet specific operational requirements. Beyond core orchestration, the platform includes a document-parsing pipeline that converts files into structured text for semantic retrieval via local vector indexing. Users can further extend functionality through command-line triggers and persistent system instructions, standardizing how artificial intelligence behaves across different business contexts.
This platform is a complete, self-hostable RAG application that provides a built-in document ingestion pipeline, vector database integration, and a chat interface for interacting with your local knowledge base.
Kotaemon is an orchestration framework designed for building modular, agentic workflows that integrate document processing, retrieval-augmented generation, and multi-step reasoning. It provides a comprehensive platform for developing document-based question answering systems, allowing users to chain language models, prompt templates, and external tools into complex, automated pipelines. The system distinguishes itself through a highly modular architecture that emphasizes component-based composition and schema-driven data exchange. It supports autonomous agents capable of decomposing complex queries through iterative processing and tool-calling, while its hybrid retrieval orchestration combines vector similarity and full-text search with re-ranking to improve the accuracy of retrieved context. The framework also features event-driven streaming, which delivers incremental results from long-running pipelines to the user interface in real-time. Beyond its core reasoning capabilities, the platform includes a suite of functional modules for the entire lifecycle of document-based applications. This includes multi-modal parsing for extracting text, tables, and visual elements from diverse file formats, as well as administrative tools for managing document collections, vector stores, and multi-user access. The system is designed to be interface-agnostic, allowing developers to wrap third-party libraries and external services into standardized, reusable processing units. The project provides a web-based user interface for interactive querying and configuration, and it supports deployment of private, isolated instances through predefined templates.
Kotaemon is a comprehensive, self-hostable RAG platform that includes a built-in chat interface, document ingestion pipelines, and advanced hybrid retrieval orchestration, making it a complete solution for document-based question answering.
This project is a comprehensive retrieval-augmented generation platform designed for building, managing, and deploying knowledge-based AI applications. It provides a unified environment for organizing datasets, configuring conversational chat assistants, and developing autonomous agents that execute multi-step reasoning workflows. By integrating document intelligence with advanced retrieval pipelines, the platform enables the creation of grounded, verifiable responses supported by traceable citations. The platform distinguishes itself through deep document understanding and sophisticated knowledge orchestration. It supports complex document parsing, including the extraction of tables and images, and utilizes graph-based indexing to enhance reasoning over large document collections. Users can configure multiple recall strategies and fused re-ranking to optimize retrieval accuracy, while the system maintains context through multi-turn dialogue management and flexible tool-use frameworks. The architecture is built on a modular, containerized microservice foundation that supports both local inference engines and external language model APIs. It includes asynchronous task processing for document ingestion and indexing, ensuring system responsiveness during heavy workloads. The platform also provides a standardized interface for model abstraction, allowing for seamless integration with existing language model ecosystems. Developers can interact with the platform through a comprehensive suite of RESTful endpoints and Python client libraries, which cover the full lifecycle of agents, datasets, and knowledge graphs. The system is designed for flexible deployment, offering configurable environment settings and support for custom containerized environments to facilitate local development and infrastructure portability.
This is a comprehensive, self-hostable RAG platform that includes a built-in document ingestion pipeline, vector database integration, support for various LLM providers, and a dedicated chat interface for interacting with your knowledge base.
mgrep is an LLM-powered semantic search engine and local file indexer designed to retrieve information from local directories and web content using natural language queries. It functions as a semantic document retriever that uses meaning and context rather than exact keyword matches to locate relevant data. The tool distinguishes itself by combining local file indexing with real-time web content retrieval to synthesize comprehensive answers. It employs retrieval-augmented generation to transform retrieved snippets from both local and remote sources into direct, concise responses. The system includes capabilities for semantic file indexing, iterative query refinement to resolve complex information needs, and automatic synchronization of local file changes to a remote storage backend.
This tool functions as a self-hosted semantic search and retrieval-augmented generation engine that processes local files and web content to provide synthesized answers, fitting the core requirements for a document-based RAG application.
h2oGPT is a self-hosted platform designed for running large language models and executing retrieval-augmented generation workflows locally. It provides a comprehensive web interface that allows users to index private document collections into searchable databases, enabling context-aware question answering and summarization without exposing sensitive data to external services. The platform distinguishes itself by offering a modular architecture that supports both local model execution and connections to external inference servers. It facilitates the development of autonomous agents capable of performing multi-step tasks by delegating actions to various tools and models. Beyond simple chat, the system includes capabilities for fine-tuning models on local hardware and managing the full lifecycle of predictive assets, from data ingestion and feature engineering to model deployment and performance monitoring. The software covers a broad range of enterprise-grade requirements, including document intelligence for extracting structured data from unstructured files, multi-GPU training support, and robust access control mechanisms. It provides tools for model explainability, compliance tracking, and collaborative experiment management to ensure transparency and reproducibility in machine learning workflows. The project is designed for containerized deployment, utilizing standard configuration files to ensure consistent execution across local and cloud environments.
This is a comprehensive, self-hostable platform that natively integrates document ingestion, vector database management, and a chat interface to facilitate retrieval-augmented generation workflows.
GPT4All is a cross-platform runtime environment designed to execute large language models directly on local consumer hardware. By leveraging an optimized C++ inference backend, it enables private, offline AI interactions without requiring an internet connection or external cloud services. The project provides a comprehensive ecosystem for managing the entire model lifecycle, including discovery, downloading, and configuration of local weights. What distinguishes the platform is its integrated retrieval-augmented generation engine, which allows users to index local documents into semantic vector spaces. This capability enables context-aware chat sessions where the model can reference private files, notes, and spreadsheets to provide grounded, relevant responses. The system also features a local HTTP server that exposes an OpenAI-compatible API, allowing developers to integrate these private, self-hosted models into existing applications and workflows. Beyond its core inference and retrieval capabilities, the project includes a graphical desktop interface for end-user interaction and a Python software development kit for programmatic access. These tools support advanced configuration of model parameters, performance monitoring, and the management of local embedding pipelines for custom semantic search tasks. The software is distributed as a unified application package, with documentation available to guide users through installation and local environment setup.
This is a self-hostable application that provides a complete RAG pipeline, including local document indexing, semantic search, and a chat interface for interacting with your private files.
53AIHub is a centralized orchestration platform for deploying and managing AI agents and prompts across multiple large language model providers. It functions as a multi-model AI gateway and an operation portal for AI services, providing a unified interface to coordinate agents and prompts from various external platforms. The project distinguishes itself as a white-label AI portal designed for self-hosted infrastructure, allowing for full control over operational data on private servers or containers. It includes a comprehensive AI SaaS administration layer with a multi-tenant subscription engine, payment gateway integration, and customizable branding for enterprise clients. The platform covers a broad capability surface including retrieval augmented generation through a dedicated knowledge base manager and vector database pipelines. It also provides identity management via single sign-on integration, conversation history storage, and operational monitoring tools to evaluate response accuracy and user behavior. The system is delivered as a containerized deployment model and is configured via environment variables for runtime setup and database connectivity.
This platform provides a comprehensive self-hosted environment for managing AI agents and includes a dedicated knowledge base manager with vector database pipelines to support Retrieval-Augmented Generation.
This platform provides a visual workflow builder for creating AI agents with built-in RAG capabilities, document ingestion, and support for local LLMs via Ollama, making it a functional tool for building and hosting your own RAG applications.
This project is a LangChain-based framework for building retrieval-augmented generation systems, autonomous agents, and multimodal chatbots. It functions as an open-source orchestrator that connects local inference engines and online APIs to manage various large language model deployments. The system distinguishes itself by providing specialized interfaces for local knowledge bases, allowing the loading and vectorization of private documents to create context-aware assistants. It also supports multimodal capabilities, enabling the processing of both text and image inputs through vision-capable models. The platform covers a broad range of capabilities, including autonomous agent orchestration with tool-calling loops, vector-database embedding for semantic search, and the integration of external data querying from search engines and databases. It includes a web-based user interface for managing conversations and configuring system prompts.
This project provides a comprehensive framework for building RAG systems with document ingestion, vector database support, and a chat interface, though it is designed as an orchestrator for developers rather than a pre-packaged, ready-to-deploy application.
SillyTavern is a comprehensive interface and orchestration platform designed for immersive AI roleplay and interactive chat experiences. It functions as a unified gateway that connects users to a wide array of local and cloud-based large language models, providing a centralized environment to manage complex character personas, narrative context, and model-driven interactions. The platform distinguishes itself through its advanced prompt engineering and automation capabilities. It utilizes a sophisticated macro-based templating engine and vector-database retrieval to dynamically inject lore, character traits, and historical context into conversations. Users can orchestrate complex workflows through a command-based scripting engine, enabling autonomous objectives, automated task execution, and the integration of external tools that allow models to perform actions or retrieve live information during a session. Beyond text generation, the application supports a rich multimodal experience, including automated image generation, voice synthesis, and character sprite animations that react to the conversation. It provides extensive administrative controls, including multi-user isolation, secure remote access via reverse-proxy routing, and a modular extension system that allows for deep customization of both the interface and backend functionality. The project is built as a web-based application that supports persistent data management, including automated backups and structured history exports. It offers granular control over model parameters, sampling, and context window management to ensure consistent and tailored performance across diverse generation environments.
SillyTavern is a feature-rich chat interface that includes vector-database retrieval and document-based context injection, making it a capable tool for RAG-based interactions despite its primary focus on roleplay and character orchestration.
Cognita is a retrieval augmented generation orchestration framework used to build pipelines that connect document stores and language models to provide grounded answers. It functions as a document ingestion pipeline and a vector database integrator, managing the process of loading, parsing, and indexing files into a searchable knowledge base. The system includes a language model gateway proxy that provides a unified API to interact with multiple different model providers. This routing layer decouples the application from specific vendors, allowing requests to be proxied through a provider-agnostic interface. The framework covers contextual information retrieval through similarity search and reranking to generate responses with source citations. It supports incremental document indexing to process new or updated files without re-indexing entire datasets and allows for the integration of various vector store implementations.
This is a framework for building RAG pipelines rather than a ready-to-use, self-hostable chat application with a built-in user interface.
This project is a retrieval-augmented generation pipeline designed for building custom ChatGPT plugins that allow language models to query private or professional documents. It implements a full retrieval workflow, from processing and indexing document chunks to retrieving relevant context for natural language queries. The system distinguishes itself through a hybrid retrieval approach that combines dense vector embeddings with sparse keyword matching, further refined by a two-stage semantic re-ranking process. It includes specialized data privacy tools for screening personally identifiable information and secures private data stores using OAuth-based user authentication. The capability surface covers multi-format file indexing for PDF, DOCX, and PPTX files, alongside document ingestion from JSON and ZIP archives. It supports multiple vector storage backends, including PostgreSQL with pgvector, Redis, and cloud-native services. The architecture is designed for containerized deployment via Docker and includes tools for metadata extraction and real-time data synchronization through webhooks. The project provides a local development server with pre-configured routing and security to verify plugin functionality before deployment.
This project is a backend plugin framework designed to extend ChatGPT with document retrieval capabilities rather than a standalone, user-facing chat application for RAG.
This project is a retrieval augmented generation framework designed to build pipelines that connect unstructured data and knowledge graphs with large language models. It functions as a vector database orchestrator for indexing text and multimodal content, as well as a system for translating natural language queries into structured database commands. The framework integrates a hybrid retrieval engine that combines dense vector search with sparse keyword matching to increase the precision of retrieved contexts. It further enhances reasoning and relationship mapping through a graph-augmented retrieval system. The system includes a toolkit for measuring the quality of retrieval and generation processes using standardized metrics. It also provides mechanisms to enforce predefined schemas and patterns on model responses to ensure consistent output for downstream applications. The project is implemented in Python.
This is a comprehensive RAG framework that provides the necessary pipelines, vector database orchestration, and hybrid retrieval capabilities to build a document-chat system, though it functions more as a developer-focused toolkit than a pre-built, ready-to-deploy chat application.
Chatbox is a cross-platform desktop application that provides a unified interface for interacting with a wide range of artificial intelligence models. It functions as a model-agnostic client, allowing users to connect to various third-party AI providers or execute open-source models directly on their own hardware. By centralizing these diverse services into a single workspace, the application enables users to manage multiple chat sessions, adjust model parameters, and switch between different AI backends with ease. The project distinguishes itself through a local-first architecture that prioritizes data privacy and user control. All conversation logs, settings, and uploaded documents are stored directly on the local device, ensuring that sensitive information remains private and accessible offline. Furthermore, the application features a built-in vector-based knowledge retrieval system that parses and indexes local files, allowing the AI to reference private documents during chat sessions to provide context-aware responses. Beyond its core chat capabilities, the application includes tools for productivity and workflow management. It supports real-time web search integration, image generation, and the ability to render professional content like formulas and charts. Users can navigate the interface efficiently using global keyboard shortcuts and automate the configuration of external services through deep-link injection, which simplifies the process of importing provider settings and credentials. The application is distributed as a native desktop shell that wraps web-based interface components to provide system-level window management. It is designed to be installed and run on standard desktop operating systems.
This is a desktop-based AI client that includes a built-in vector-based knowledge retrieval system for local documents, fulfilling the core requirements for a self-hosted RAG-capable interface.
Bisheng is an enterprise AI framework and LLM DevOps platform designed to manage the full lifecycle of large language models. It provides a unified system for dataset curation, supervised fine-tuning, model versioning, and performance evaluation. The platform features a visual workflow orchestrator for building retrieval-augmented generation pipelines and complex task sequences using flowcharts with conditional logic and human intervention points. It also includes an AI agent framework that uses a specialized guidance language to embed domain expertise and professional business logic into autonomous agents. The system covers comprehensive enterprise AI governance through role-based access control, single sign-on, and integrated observability tools for monitoring system health and traffic. Additional capabilities include layout-aware document parsing for extracting text and tables from printed or handwritten sources and high-availability infrastructure deployment.
Bisheng is an enterprise-grade RAG and AI workflow orchestration platform that provides the necessary document ingestion, vector integration, and chat-based agent capabilities to build and deploy custom RAG applications.