Ragflow

This project is a comprehensive retrieval-augmented generation platform designed for building, managing, and deploying knowledge-based AI applications. It provides a unified environment for organizing datasets, configuring conversational chat assistants, and developing autonomous agents that execute multi-step reasoning workflows. By integrating document intelligence with advanced retrieval pipelines, the platform enables the creation of grounded, verifiable responses supported by traceable citations.

The platform distinguishes itself through deep document understanding and sophisticated knowledge orchestration. It supports complex document parsing, including the extraction of tables and images, and utilizes graph-based indexing to enhance reasoning over large document collections. Users can configure multiple recall strategies and fused re-ranking to optimize retrieval accuracy, while the system maintains context through multi-turn dialogue management and flexible tool-use frameworks.

The architecture is built on a modular, containerized microservice foundation that supports both local inference engines and external language model APIs. It includes asynchronous task processing for document ingestion and indexing, ensuring system responsiveness during heavy workloads. The platform also provides a standardized interface for model abstraction, allowing for seamless integration with existing language model ecosystems.

Developers can interact with the platform through a comprehensive suite of RESTful endpoints and Python client libraries, which cover the full lifecycle of agents, datasets, and knowledge graphs. The system is designed for flexible deployment, offering configurable environment settings and support for custom containerized environments to facilitate local development and infrastructure portability.

Features

Autonomous Agents - Integrates large language models with custom knowledge bases and external tools to execute complex, multi-step autonomous workflows.
Chat Assistants - Exposes API endpoints for creating and managing conversational AI assistant instances.
Retrieval-Augmented Generation Platforms - Delivers a comprehensive environment for building, managing, and deploying knowledge-based AI applications with advanced document parsing and retrieval capabilities.
Grounded Answer Generation - Generates responses with traceable citations and visual chunking to reduce hallucinations and facilitate human verification of content.
RAG Pipelines - Coordinates multi-stage recall, re-ranking, and citation-based generation to produce grounded, verifiable responses from indexed datasets.
Conversational and Voice Interaction - Enables the deployment of conversational agents that leverage indexed knowledge bases to provide context-aware, human-like interactions.
Agent Management APIs - Handles the lifecycle of autonomous agents through dedicated API endpoints for listing, managing, and interacting with system entities.
Document Knowledge Extraction - Processes unstructured data using deep document understanding to extract structured knowledge for high-quality information retrieval.
Knowledge Dataset Managers - Organizes knowledge by uploading, parsing, and indexing documents into structured datasets for retrieval-augmented generation.
Semantic Search Engines - Executes semantic searches across indexed datasets to retrieve relevant information and document snippets for answering complex queries.
Agentic Tool-Use Frameworks - Empowers agents to perform multi-step reasoning tasks by bridging internal memory with external tools and knowledge sources.
OpenAI-Compatible APIs - Standardizes HTTP endpoints for chat completions to ensure compatibility with common AI model integration interfaces.
Chat Assistant Management APIs - Provides API endpoints for listing, filtering, and retrieving metadata about configured chat assistants.
Knowledge Graph Construction - Automates the construction of knowledge graph structures from datasets via dedicated API endpoints.
Document Chunking Strategies - Segments source documents using explainable templates to optimize retrieval accuracy during knowledge base indexing.
Chat and API Access - Maintains multi-turn dialogue context and streams model outputs via interactive chat interfaces and programmatic endpoints.
Graph-Based Knowledge Indexers - Builds hierarchical summaries and multi-layered knowledge graphs to enhance reasoning over extensive document collections.
Local LLM Configurations - Configures local inference engines and external model providers through a unified interface for seamless deployment.
Knowledge Graph APIs - Facilitates the retrieval and management of dataset-specific knowledge graph structures through dedicated API endpoints.
Document Parsing Services - Offers programmatic methods to asynchronously parse and extract content from various document types for further processing.
AI-Powered Extraction Engines - Employs machine learning to accurately isolate structured data, tables, and text from complex document layouts for retrieval.
Source-Based Execution Environments - Supports running the platform directly from source code to facilitate real-time debugging and local development testing.
Configuration Management - Adjusts environment parameters to manage application behavior, resource allocation, and backend operations.
System Service Configurations - Utilizes YAML templates to define core platform service dependencies, including database connections, object storage, and authentication providers.
Orchestration and Multi-Agent Systems - Deploys autonomous agents that leverage memory, tools, and knowledge to complete complex, multi-step reasoning workflows.
Development Frameworks and Tools - Open-source RAG (Retrieval-Augmented Generation) workflow platform.
Document & Data Assistants - RAG engine based on deep document understanding.
Knowledge Retrieval - RAG engine based on deep document understanding.
RAG and Data Pipelines - RAG engine fusing retrieval with agentic capabilities.
RAG Applications - Open-source RAG engine based on deep document understanding.
Retrieval Augmented Generation - Engine focused on deep document understanding for retrieval.
Databases and RAG - RAG engine based on deep document understanding.
RAG Frameworks - Deep document parsing and RAG engine with multi-path retrieval.
Knowledge Graph Orchestrators - Combines relational data structures with vector-based search to improve context-aware response generation.
Document Parsing Pipelines - Parses diverse file formats into structured text chunks using advanced layout analysis and OCR for downstream analysis.
OpenAI-Compatible Inference Servers - Provides an API layer compatible with standard interfaces to ensure interoperability with existing large language model ecosystems.
Dataset Management APIs - Modifies dataset configurations programmatically through dedicated administrative endpoints.
RESTful APIs - Exposes core platform functionality through a comprehensive suite of HTTP interfaces for external application integration.
Knowledge Graph Deletion - Purges relational knowledge graph structures associated with specific datasets via targeted API calls.
Semantic Parsing Tools - Interprets complex documents to produce traceable, cited answers that reduce hallucinations in retrieval tasks.
Document Management APIs - Updates document configurations and storage settings programmatically through a structured management interface.
Document Retrieval APIs - Queries and lists documents within datasets using flexible filtering, pagination, and sorting parameters.
Document Deletion APIs - Deletes stored documents programmatically to maintain clean and updated knowledge repositories.
Document Parsing Controls - Terminates active document parsing tasks through specific API endpoints to manage resource utilization.
Automated Document Ingestion - Uploads and transforms various file types into structured knowledge base entries through automated ingestion routines.
Python SDKs - Offers a native library for Python developers to interact with platform services and manage retrieval workflows.
RAG Pipeline Optimizers - Tunes batch processing, OCR engines, and parsing services to minimize latency in retrieval-augmented generation pipelines.
Chat Management APIs - Manages chat assistant lifecycles, including creation, updates, and deletion, via dedicated API endpoints.
Dataset Management - Removes datasets from the system by their unique identifiers using simple administrative commands.

Star history

infiniflowragflow

Name: infiniflow/ragflow
Author: infiniflow

View on GitHub

82,922 stars9,577 forksPythonApache-2.021 viewsragflow.io

Ragflow

Features

Autonomous Agents - Integrates large language models with custom knowledge bases and external tools to execute complex, multi-step autonomous workflows.
Chat Assistants - Exposes API endpoints for creating and managing conversational AI assistant instances.
Retrieval-Augmented Generation Platforms - Delivers a comprehensive environment for building, managing, and deploying knowledge-based AI applications with advanced document parsing and retrieval capabilities.
Grounded Answer Generation - Generates responses with traceable citations and visual chunking to reduce hallucinations and facilitate human verification of content.
RAG Pipelines - Coordinates multi-stage recall, re-ranking, and citation-based generation to produce grounded, verifiable responses from indexed datasets.
Conversational and Voice Interaction - Enables the deployment of conversational agents that leverage indexed knowledge bases to provide context-aware, human-like interactions.
Agent Management APIs - Handles the lifecycle of autonomous agents through dedicated API endpoints for listing, managing, and interacting with system entities.
Document Knowledge Extraction - Processes unstructured data using deep document understanding to extract structured knowledge for high-quality information retrieval.
Knowledge Dataset Managers - Organizes knowledge by uploading, parsing, and indexing documents into structured datasets for retrieval-augmented generation.
Semantic Search Engines - Executes semantic searches across indexed datasets to retrieve relevant information and document snippets for answering complex queries.
Agentic Tool-Use Frameworks - Empowers agents to perform multi-step reasoning tasks by bridging internal memory with external tools and knowledge sources.
OpenAI-Compatible APIs - Standardizes HTTP endpoints for chat completions to ensure compatibility with common AI model integration interfaces.
Chat Assistant Management APIs - Provides API endpoints for listing, filtering, and retrieving metadata about configured chat assistants.
Knowledge Graph Construction - Automates the construction of knowledge graph structures from datasets via dedicated API endpoints.
Document Chunking Strategies - Segments source documents using explainable templates to optimize retrieval accuracy during knowledge base indexing.
Chat and API Access - Maintains multi-turn dialogue context and streams model outputs via interactive chat interfaces and programmatic endpoints.
Graph-Based Knowledge Indexers - Builds hierarchical summaries and multi-layered knowledge graphs to enhance reasoning over extensive document collections.
Local LLM Configurations - Configures local inference engines and external model providers through a unified interface for seamless deployment.
Knowledge Graph APIs - Facilitates the retrieval and management of dataset-specific knowledge graph structures through dedicated API endpoints.
Document Parsing Services - Offers programmatic methods to asynchronously parse and extract content from various document types for further processing.
AI-Powered Extraction Engines - Employs machine learning to accurately isolate structured data, tables, and text from complex document layouts for retrieval.
Source-Based Execution Environments - Supports running the platform directly from source code to facilitate real-time debugging and local development testing.
Configuration Management - Adjusts environment parameters to manage application behavior, resource allocation, and backend operations.
System Service Configurations - Utilizes YAML templates to define core platform service dependencies, including database connections, object storage, and authentication providers.
Orchestration and Multi-Agent Systems - Deploys autonomous agents that leverage memory, tools, and knowledge to complete complex, multi-step reasoning workflows.
Development Frameworks and Tools - Open-source RAG (Retrieval-Augmented Generation) workflow platform.
Document & Data Assistants - RAG engine based on deep document understanding.
Knowledge Retrieval - RAG engine based on deep document understanding.
RAG and Data Pipelines - RAG engine fusing retrieval with agentic capabilities.
RAG Applications - Open-source RAG engine based on deep document understanding.
Retrieval Augmented Generation - Engine focused on deep document understanding for retrieval.
Databases and RAG - RAG engine based on deep document understanding.
RAG Frameworks - Deep document parsing and RAG engine with multi-path retrieval.
Knowledge Graph Orchestrators - Combines relational data structures with vector-based search to improve context-aware response generation.
Document Parsing Pipelines - Parses diverse file formats into structured text chunks using advanced layout analysis and OCR for downstream analysis.
OpenAI-Compatible Inference Servers - Provides an API layer compatible with standard interfaces to ensure interoperability with existing large language model ecosystems.
Dataset Management APIs - Modifies dataset configurations programmatically through dedicated administrative endpoints.
RESTful APIs - Exposes core platform functionality through a comprehensive suite of HTTP interfaces for external application integration.
Knowledge Graph Deletion - Purges relational knowledge graph structures associated with specific datasets via targeted API calls.
Semantic Parsing Tools - Interprets complex documents to produce traceable, cited answers that reduce hallucinations in retrieval tasks.
Document Management APIs - Updates document configurations and storage settings programmatically through a structured management interface.
Document Retrieval APIs - Queries and lists documents within datasets using flexible filtering, pagination, and sorting parameters.
Document Deletion APIs - Deletes stored documents programmatically to maintain clean and updated knowledge repositories.
Document Parsing Controls - Terminates active document parsing tasks through specific API endpoints to manage resource utilization.
Automated Document Ingestion - Uploads and transforms various file types into structured knowledge base entries through automated ingestion routines.
Python SDKs - Offers a native library for Python developers to interact with platform services and manage retrieval workflows.
RAG Pipeline Optimizers - Tunes batch processing, OCR engines, and parsing services to minimize latency in retrieval-augmented generation pipelines.
Chat Management APIs - Manages chat assistant lifecycles, including creation, updates, and deletion, via dedicated API endpoints.
Dataset Management - Removes datasets from the system by their unique identifiers using simple administrative commands.

Open-source alternatives to Ragflow

Similar open-source projects, ranked by how many features they share with Ragflow.

cinnamon/kotaemon
Cinnamon/kotaemon
25,139View on GitHub
Kotaemon is an orchestration framework designed for building modular, agentic workflows that integrate document processing, retrieval-augmented generation, and multi-step reasoning. It provides a comprehensive platform for developing document-based question answering systems, allowing users to chain language models, prompt templates, and external tools into complex, automated pipelines. The system distinguishes itself through a highly modular architecture that emphasizes component-based composition and schema-driven data exchange. It supports autonomous agents capable of decomposing complex q
Pythonchatbotllmsopen-source
View on GitHub25,139
marker-inc-korea/autorag
Marker-Inc-Korea/AutoRAG
4,833View on GitHub
AutoRAG is an automation layer and optimization tool for retrieval-augmented generation. It provides a framework for measuring pipeline performance through an evaluation system and an automated search strategy that identifies the most effective combinations of retrieval and generation modules. The system distinguishes itself through AutoML-style optimization, using hyperparameter grid searches and automated trials to find the highest performing architectural configuration for a specific dataset. It includes a specialized dataset generator that creates synthetic question-answer pairs and groun
Python
View on GitHub4,833
llmware-ai/llmware
llmware-ai/llmware
14,838View on GitHub
llmware is a Python framework for AI agent orchestration and model management, designed to coordinate multi-model workflows and autonomous agents. It provides a unified model catalog and standardized interface to execute specialized language models for complex research, analysis, and structured data generation. The project distinguishes itself through its heavy emphasis on local execution and quantized inference, allowing models to run on private infrastructure using CPU, GPU, and NPU acceleration via runtimes like ONNX and OpenVino. It features a specialized ability to translate natural lang
Python
View on GitHub14,838
mastra-ai/mastra
mastra-ai/mastra
21,221View on GitHub
Mastra is an orchestration framework designed for building, deploying, and managing autonomous AI agents and multi-agent systems. It provides a comprehensive suite of primitives for creating resilient AI applications, including durable workflow orchestration, event-driven agent loops, and semantic memory management. By integrating these core components, the platform enables developers to build complex, multi-step processes that can reason about goals and execute tasks without manual intervention. The framework distinguishes itself through its focus on observability and secure, isolated execut
TypeScriptagentsaichatbots
View on GitHub21,221

See all 30 alternatives to Ragflow

Frequently asked questions

What does infiniflow/ragflow do?

What are the main features of infiniflow/ragflow?

The main features of infiniflow/ragflow are: Autonomous Agents, Chat Assistants, Retrieval-Augmented Generation Platforms, Grounded Answer Generation, RAG Pipelines, Conversational and Voice Interaction, Agent Management APIs, Document Knowledge Extraction.

What are some open-source alternatives to infiniflow/ragflow?

Open-source alternatives to infiniflow/ragflow include: cinnamon/kotaemon — Kotaemon is an orchestration framework designed for building modular, agentic workflows that integrate document… marker-inc-korea/autorag — AutoRAG is an automation layer and optimization tool for retrieval-augmented generation. It provides a framework for… llmware-ai/llmware — llmware is a Python framework for AI agent orchestration and model management, designed to coordinate multi-model… mastra-ai/mastra — Mastra is an orchestration framework designed for building, deploying, and managing autonomous AI agents and… mintplex-labs/anything-llm — This platform serves as a comprehensive environment for managing private language models, document knowledge bases,… microsoft/graphrag — GraphRAG is a data processing pipeline and retrieval engine designed to transform unstructured text into…

Ragflow

Features

Star history

Ragflow

Features

Open-source alternatives to Ragflow

Cinnamon/kotaemon

Marker-Inc-Korea/AutoRAG

llmware-ai/llmware

mastra-ai/mastra

Frequently asked questions

Star history

Frequently asked questions

Open-source alternatives to Ragflow

Cinnamon/kotaemon

Marker-Inc-Korea/AutoRAG

llmware-ai/llmware

mastra-ai/mastra