These open-source libraries and tools enable developers to build custom retrieval-augmented generation pipelines for document-based AI applications.
Verba is a retrieval-augmented generation interface and chatbot that uses Weaviate to provide factual answers based on private datasets. It functions as a vector database knowledge base, combining a hybrid search engine with an orchestration interface to connect various large language model providers and embedding services. The system differentiates itself through a RAG pipeline manager for adjusting text chunking rules and retrieval settings, alongside a 3D vector space visualization tool for analyzing the spatial organization and clustering of high-dimensional embeddings. It employs a modular provider system that allows for swapping between different local and cloud text generation and embedding services. The platform covers multi-modal data ingestion, processing unstructured documents, audio transcriptions, web crawls, and version control repositories into a searchable knowledge base. Its retrieval capabilities combine semantic and keyword search to extract relevant context from vector stores, utilizing configurable text chunking to optimize retrieval precision.
Verba is a comprehensive RAG framework that provides a complete pipeline for document ingestion, chunking, and LLM orchestration, while offering a dedicated interface for querying private datasets via hybrid search.
This project is a privacy-first backend service designed to facilitate retrieval-augmented generation by processing local documents into searchable vector representations. It provides a modular architecture that allows users to ingest diverse file formats, manage document metadata, and perform semantic searches to provide context-aware responses for chat and completion requests. The system distinguishes itself through a database-agnostic abstraction layer that supports various storage backends, ranging from local disk storage to enterprise-grade vector databases. It offers flexible deployment options, enabling users to run language models entirely on private hardware or connect to external cloud-based providers through a unified interface. To improve the quality of generated output, the engine incorporates reranking logic that refines retrieved document chunks before they are processed by the language model. The platform includes a comprehensive suite of tools for managing document intelligence pipelines, including automated parsing, text chunking, and embedding generation. Users can configure the system through environment-based profiles to match specific hardware capabilities, such as CPU or GPU-accelerated setups, and stream responses in real time to reduce latency. The application is configured via runtime settings files and environment variables, with support for building custom container images to suit specific deployment requirements.
This project is a comprehensive RAG backend that provides document ingestion, vector database abstraction, and LLM orchestration, making it a complete solution for querying private document collections locally.
This project is a retrieval-augmented generation pipeline designed for building custom ChatGPT plugins that allow language models to query private or professional documents. It implements a full retrieval workflow, from processing and indexing document chunks to retrieving relevant context for natural language queries. The system distinguishes itself through a hybrid retrieval approach that combines dense vector embeddings with sparse keyword matching, further refined by a two-stage semantic re-ranking process. It includes specialized data privacy tools for screening personally identifiable information and secures private data stores using OAuth-based user authentication. The capability surface covers multi-format file indexing for PDF, DOCX, and PPTX files, alongside document ingestion from JSON and ZIP archives. It supports multiple vector storage backends, including PostgreSQL with pgvector, Redis, and cloud-native services. The architecture is designed for containerized deployment via Docker and includes tools for metadata extraction and real-time data synchronization through webhooks. The project provides a local development server with pre-configured routing and security to verify plugin functionality before deployment.
This project provides a comprehensive RAG pipeline for document ingestion, vector storage, and retrieval, though it is specifically architected as a plugin backend for ChatGPT rather than a general-purpose RAG framework.
This platform serves as a comprehensive environment for managing private language models, document knowledge bases, and automated agent workflows within secure local infrastructure. It functions as a document-aware workspace that enables users to ingest diverse file formats into searchable repositories, ensuring that all data processing and model inference remain within private, local environments to maintain data sovereignty. The system distinguishes itself through a modular agentic engine that allows for the definition of custom skills and external tool execution. By utilizing a multi-model abstraction layer, it normalizes interactions across various local and cloud-based providers, while workspace-scoped management ensures that system prompts and knowledge bases remain isolated to meet specific operational requirements. Beyond core orchestration, the platform includes a document-parsing pipeline that converts files into structured text for semantic retrieval via local vector indexing. Users can further extend functionality through command-line triggers and persistent system instructions, standardizing how artificial intelligence behaves across different business contexts.
This platform is a comprehensive, self-hostable RAG solution that integrates document parsing, vector database management, and LLM orchestration into a single, privacy-focused workspace.
Kotaemon is an orchestration framework designed for building modular, agentic workflows that integrate document processing, retrieval-augmented generation, and multi-step reasoning. It provides a comprehensive platform for developing document-based question answering systems, allowing users to chain language models, prompt templates, and external tools into complex, automated pipelines. The system distinguishes itself through a highly modular architecture that emphasizes component-based composition and schema-driven data exchange. It supports autonomous agents capable of decomposing complex queries through iterative processing and tool-calling, while its hybrid retrieval orchestration combines vector similarity and full-text search with re-ranking to improve the accuracy of retrieved context. The framework also features event-driven streaming, which delivers incremental results from long-running pipelines to the user interface in real-time. Beyond its core reasoning capabilities, the platform includes a suite of functional modules for the entire lifecycle of document-based applications. This includes multi-modal parsing for extracting text, tables, and visual elements from diverse file formats, as well as administrative tools for managing document collections, vector stores, and multi-user access. The system is designed to be interface-agnostic, allowing developers to wrap third-party libraries and external services into standardized, reusable processing units. The project provides a web-based user interface for interactive querying and configuration, and it supports deployment of private, isolated instances through predefined templates.
Kotaemon is a comprehensive RAG orchestration platform that provides built-in document parsing, hybrid retrieval, and LLM workflow management, making it a complete solution for querying private document collections.
llmware is a Python framework for AI agent orchestration and model management, designed to coordinate multi-model workflows and autonomous agents. It provides a unified model catalog and standardized interface to execute specialized language models for complex research, analysis, and structured data generation. The project distinguishes itself through its heavy emphasis on local execution and quantized inference, allowing models to run on private infrastructure using CPU, GPU, and NPU acceleration via runtimes like ONNX and OpenVino. It features a specialized ability to translate natural language queries into structured SQL or CSV formats by analyzing database schemas. The framework covers a broad range of capabilities including end-to-end retrieval-augmented generation pipelines, hybrid search engines, and multimodal content processing for PDFs, Office documents, audio, and images. It also incorporates tools for structured function calling, named entity recognition, and text risk classification to detect toxicity and prompt injections. The system integrates with various SQL and vector database backends to manage knowledge collection indexing and document embeddings.
This framework provides a comprehensive suite for building RAG pipelines, including built-in document parsing, vector database integration, and LLM orchestration designed specifically for local, private infrastructure.
Quivr is a retrieval-augmented generation platform designed to transform raw documents into searchable knowledge bases. It functions as a centralized environment where users can ingest files, index them into vector databases, and interact with language models to receive contextually relevant, data-backed responses. The platform distinguishes itself through an agentic workflow orchestrator that sequences retrieval tasks, tool execution, and model interactions to resolve complex, multi-step queries. This engine is entirely configuration-driven, allowing users to define document ingestion, chunking parameters, and workflow node sequences through structured schemas. By maintaining a unified knowledge management interface, the system tracks chat history alongside file storage, ensuring that interactions remain context-aware across diverse local and remote backends. Beyond its core orchestration, the system provides a comprehensive pipeline for document processing, including parsing for various file formats and asynchronous task execution to maintain responsiveness during data ingestion. It supports the development of specialized chatbots, including voice-enabled interfaces, by integrating speech-to-text and text-to-speech capabilities with its underlying retrieval systems. The project utilizes strict base classes to enforce configuration integrity, ensuring consistent data processing across all application settings.
Quivr is a comprehensive RAG platform that provides built-in document parsing, vector database integration, and LLM orchestration, making it a complete solution for building searchable knowledge bases from private documents.
langchaingo is an LLM application framework for Go designed for building language model-powered applications and autonomous agents. It serves as an orchestration library and tool integration framework that allows developers to link prompt sequences and model calls into complex, multi-step workflows. The project provides a toolkit for implementing retrieval-augmented generation pipelines by processing unstructured documents and retrieving relevant context via vector search. It includes a dedicated integration layer for indexing high-dimensional embeddings and performing similarity searches across various vector database backends. Its broader capabilities cover AI workflow automation, the creation of autonomous agents that use reasoning to execute external tools, and the management of conversation state to maintain context across multi-turn dialogues. The framework also supports integrating external search tools, executing database queries, and triggering third-party workflows.
This Go-based framework provides the necessary orchestration, document loading, and vector database integration tools to build custom RAG pipelines for your private document collections.
SurfSense is a self-hosted platform designed for building retrieval-augmented generation pipelines and managing private knowledge bases. It functions as a containerized research stack that allows users to index diverse data sources and query them using language models, ensuring that all information retrieval is grounded in specific source citations. The platform distinguishes itself through its modular architecture, which supports the integration of custom tools and diverse language models via a unified abstraction layer. It facilitates secure, collaborative research environments by implementing role-based access control for shared knowledge bases, while also providing built-in text-to-speech capabilities to convert chat logs and documents into audio content. Beyond its core retrieval functions, the system includes comprehensive support for data ingestion from various file formats and web sources. It utilizes vector-database-backed indexing to maintain high-dimensional search capabilities and employs asynchronous background processing to handle resource-intensive tasks like media transcoding and document indexing without interrupting system responsiveness.
SurfSense is a self-hosted RAG platform that provides a complete pipeline for indexing private documents, managing vector-based knowledge bases, and orchestrating LLM interactions with source-grounded citations.
This project is an artificial intelligence application starter kit and cloud deployment framework. It provides a pre-configured foundation for building AI applications, featuring integrated authentication, orchestration, and vector database connectivity. The framework serves as an implementation template for retrieval augmented generation systems. It includes a pipeline to convert markdown documents into vector embeddings and store them in a database to enable question-and-answer functionality. The system covers the coordination of communication between user interfaces and backend AI services through a centralized orchestration layer. It further includes configurations for cloud hosting, incorporating secret management and instance scaling to maintain availability.
This repository provides a comprehensive starter kit and orchestration template specifically designed for building RAG applications, including the necessary pipelines for document ingestion, vector database integration, and LLM coordination.
This project is a comprehensive retrieval-augmented generation platform designed for building, managing, and deploying knowledge-based AI applications. It provides a unified environment for organizing datasets, configuring conversational chat assistants, and developing autonomous agents that execute multi-step reasoning workflows. By integrating document intelligence with advanced retrieval pipelines, the platform enables the creation of grounded, verifiable responses supported by traceable citations. The platform distinguishes itself through deep document understanding and sophisticated knowledge orchestration. It supports complex document parsing, including the extraction of tables and images, and utilizes graph-based indexing to enhance reasoning over large document collections. Users can configure multiple recall strategies and fused re-ranking to optimize retrieval accuracy, while the system maintains context through multi-turn dialogue management and flexible tool-use frameworks. The architecture is built on a modular, containerized microservice foundation that supports both local inference engines and external language model APIs. It includes asynchronous task processing for document ingestion and indexing, ensuring system responsiveness during heavy workloads. The platform also provides a standardized interface for model abstraction, allowing for seamless integration with existing language model ecosystems. Developers can interact with the platform through a comprehensive suite of RESTful endpoints and Python client libraries, which cover the full lifecycle of agents, datasets, and knowledge graphs. The system is designed for flexible deployment, offering configurable environment settings and support for custom containerized environments to facilitate local development and infrastructure portability.
This platform provides a complete, self-hostable environment for RAG, featuring built-in document parsing, vector and graph-based indexing, and LLM orchestration to support complex knowledge-based AI applications.
DeepLake is AI data infrastructure consisting of a multimodal data lake, a hybrid search engine, and a serverless vector database. It provides a PostgreSQL-based AI data runtime that combines multimodal storage with streaming pipelines to load and shuffle datasets from cloud storage directly into deep learning training pipelines. The system utilizes lazy indexing to store and slice images, audio, and video without loading entire files into memory. It enables retrieval-augmented generation by persisting high-dimensional embeddings in a serverless vector store and implementing hybrid search that combines vector similarity with full-text keyword matching. The project covers a broad capability surface including structured metadata indexing for numeric and JSON fields, cloud-local data synchronization, and visualization tools for inspecting dataset annotations such as bounding boxes and masks.
DeepLake provides a serverless vector database and hybrid search engine specifically designed to support retrieval-augmented generation workflows, though it functions primarily as the data infrastructure layer rather than a full-stack LLM orchestration framework.
LightRAG is a graph-based retrieval framework designed to build retrieval-augmented generation pipelines. It structures unstructured text into knowledge graphs, enabling multi-hop reasoning and complex query synthesis across large document collections. By integrating dense vector embeddings with structured knowledge graphs, the system facilitates both similarity-based and relationship-aware information retrieval. The framework distinguishes itself through a dual-level retrieval strategy that combines low-level keyword matching with high-level semantic graph traversal to capture both specific facts and broad thematic context. It supports incremental knowledge management, allowing the underlying graph structure to be updated dynamically as new data arrives without requiring a full re-indexing of the dataset. Additionally, the system functions as a multimodal information extractor, processing both text and visual data to create unified, searchable knowledge bases. The platform provides modular, prompt-driven pipeline orchestration to coordinate document parsing, knowledge extraction, and language model generation. These automated workflows allow for the synthesis of information across interconnected documents to provide context-aware responses to nuanced, multi-step inquiries.
LightRAG is a comprehensive RAG framework that integrates vector embeddings with knowledge graphs to provide advanced document parsing, LLM orchestration, and multi-hop reasoning for private data collections.
GraphRAG is a data processing pipeline and retrieval engine designed to transform unstructured text into interconnected knowledge graphs. By utilizing language models to extract entities and relationships, it builds structured representations of information that enable context-aware retrieval for downstream applications. The system distinguishes itself through hierarchical graph clustering and large-scale data synthesis, which organize massive document corpora into multi-level structures. This approach allows for both vector-based semantic searches and graph-based traversals, providing a comprehensive method for navigating complex datasets and identifying hidden connections between concepts. The platform includes a modular orchestration pipeline that manages the entire lifecycle of information, from initial ingestion and indexing to query execution. Users can refine the synthesis and retrieval processes by adjusting prompt templates and configuration arguments to align with specific data characteristics.
This framework provides a comprehensive pipeline for building knowledge graphs from unstructured documents to enable advanced, context-aware retrieval, making it a powerful tool for implementing sophisticated RAG systems.
LlamaIndex is a comprehensive development framework designed to connect private or external data sources to large language models. It functions as a data-centric toolkit that enables the construction of retrieval-augmented generation systems, allowing developers to build applications that provide context-aware answers based on specific organizational information. The project distinguishes itself through a robust agentic orchestration engine that supports the creation of autonomous agents capable of multi-step reasoning, memory management, and complex tool execution. Beyond simple retrieval, it provides a flexible, event-driven architecture for composing modular pipelines, enabling developers to chain data ingestion, transformation, and retrieval steps into sophisticated, multi-agent systems that can coordinate tasks and hand off control between individual agents. The platform covers the entire lifecycle of language model applications, including advanced document processing for parsing and structuring complex file formats, and a diagnostic layer for observability that tracks execution traces and performance metrics. It also includes a suite of evaluation tools for measuring retrieval effectiveness and response quality, alongside mechanisms for query routing and custom post-processing to ensure high-precision information delivery.
LlamaIndex is a comprehensive framework specifically built for RAG, offering extensive tools for document ingestion, vector database integration, and LLM orchestration to connect private data to language models.
Ruoyi AI is a multi-agent orchestration platform that coordinates specialized AI agents through a supervisor-based delegation pattern, allowing complex requests to be broken into subtasks that are assigned, executed, and merged under centralized control. It provides a unified abstraction layer that connects multiple AI model providers behind a single interface, so switching between providers requires no application code changes. The platform also includes a retrieval-augmented generation engine that indexes internal documents into vector stores and retrieves relevant context at query time to ground generative responses in proprietary data. What distinguishes the platform is its combination of visual workflow design and structured tool-calling in a single system. A drag-and-drop canvas lets operators construct multi-step AI pipelines from components and execute them with real-time streaming output, while a typed tool-calling protocol defines how agents invoke external functions with parameter validation and result parsing. The platform also provides a framework for defining custom tools that agents can call when interacting with external systems and data sources. Supporting capabilities include building and querying knowledge bases, integrating third-party AI platforms, and automating workflows that chain tool calls, agents, and conditional logic into repeatable sequences.
This platform provides a comprehensive RAG engine for indexing and querying private documents alongside multi-agent orchestration, making it a complete framework for building complex, document-grounded AI workflows.
LangBot is an orchestration platform designed for building, managing, and deploying AI agents. It functions as a comprehensive framework for integrating large language models with custom workflows, enabling developers to connect intelligent agents to various messaging platforms and external tools. The platform distinguishes itself through a modular, plugin-based architecture that allows for the extension of agent capabilities via custom tools and file parsers. It features a secure, sandbox-isolated runtime environment that executes untrusted code and plugin logic within resource-constrained containers, ensuring system stability and security. Additionally, it provides a robust retrieval-augmented generation pipeline that handles document ingestion, semantic indexing, and vector-based knowledge retrieval to ground AI responses in private data. Beyond its core orchestration capabilities, the system supports multi-platform bot management, allowing for centralized configuration and deployment across services like Slack, Discord, Telegram, and WeChat. It includes extensive tooling for pipeline automation, event-driven message processing, and observability, providing visibility into agent reasoning and tool execution. The platform is designed for containerized deployment and includes built-in support for managing public webhooks and service proxies to simplify external connectivity.
LangBot is a comprehensive orchestration platform that natively includes a RAG pipeline for document ingestion, semantic indexing, and vector-based retrieval, making it a complete solution for querying private document collections.
FastGPT is a comprehensive platform for building, deploying, and managing context-aware artificial intelligence applications. It provides a unified environment that integrates custom data sources with language models, utilizing a retrieval-augmented generation engine to ground responses in accurate, domain-specific information. The system is designed for enterprise-scale use, featuring multi-tenant architecture, administrative controls, and secure authentication protocols including OAuth 2.0 and custom single sign-on integration. The platform distinguishes itself through a visual, node-based workflow orchestrator that allows users to design complex business logic and automated task sequences without manual coding. It offers sophisticated knowledge base management, supporting multi-vector data mapping, hybrid search fusion, and automated website content synchronization. To ensure high-quality outputs, the system includes tools for search query optimization, result reranking, and automated performance evaluation, allowing developers to score and analyze the accuracy of their applications across multiple iterations. Beyond its core generation and retrieval capabilities, the platform provides extensive utilities for data handling and organizational management. This includes intelligent parsing of complex document formats, flexible search modes, and granular access controls for team management. Users can also leverage secure, sandboxed rendering for rich content and export cited documents for offline review, ensuring a complete lifecycle for production-ready AI services.
FastGPT is a comprehensive RAG platform that provides built-in document parsing, vector database integration, and a visual workflow orchestrator for managing LLM-based interactions with private data.
Vespa is a distributed search engine, vector database, and machine learning ranking engine. It serves as an AI search platform designed to handle large-scale document indexing and complex query processing across a cluster of nodes, combining keyword retrieval with high-dimensional embedding storage for semantic similarity search. The platform distinguishes itself by integrating machine learning models directly into the search pipeline to perform real-time inference and ranking. It converts these models into ranking expressions to score and order results based on relevance, while providing a specialized big data indexing pipeline to transform and cleanse raw documents. The system covers a broad surface of capabilities, including linguistic text analysis, distributed data indexing, and automated cluster management. It utilizes a modular runtime to coordinate application components and a subscription-based distribution system to synchronize configuration and feature flags across the cluster. The project is implemented primarily in Java and provides tools for packaging code into modular deployment bundles.
Vespa is a high-performance distributed search and vector database engine that provides the core infrastructure for indexing, semantic search, and ranking required to build a robust RAG pipeline.
Rig is a framework for building large language model applications, featuring a multi-provider client and a workflow builder for retrieval-augmented generation systems. It serves as an orchestrator for creating autonomous agents that can maintain conversation state and execute complex tasks through custom prompting and plugins. The project provides standardized interfaces for both completion and embedding model providers, allowing for unified request and response patterns across different engines. It also includes a vector database integration layer that defines a common interface for indexing and retrieving high-dimensional embeddings across various storage backends. Its broader capabilities cover generative AI workflows for multimedia content production and tools for unstructured data extraction, including sentiment analysis and text classification. The framework supports modular composition, enabling the integration of third-party plugins and custom provider implementations.
Rig is a Rust-based framework that provides the necessary abstractions for LLM orchestration, vector database integration, and retrieval-augmented generation pipelines, making it a suitable tool for building custom RAG systems.