# zilliztech/deep-searcher

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/zilliztech-deep-searcher).**

7,899 stars · 763 forks · Python · Apache-2.0

## Links

- GitHub: https://github.com/zilliztech/deep-searcher
- Homepage: https://zilliztech.github.io/deep-searcher/
- awesome-repositories: https://awesome-repositories.com/repository/zilliztech-deep-searcher.md

## Topics

`agent` `agentic-rag` `claude` `deep-research` `deepseek` `deepseek-r1` `grok` `grok3` `llama4` `llm` `milvus` `openai` `qwen3` `rag` `reasoning-models` `vector-database` `zilliz`

## Description

Deep Searcher is an open-source retrieval-augmented generation engine that indexes private documents into a vector database and uses large language models to answer complex questions with cited reasoning. It functions as both a command-line interface and a web API research tool, enabling users to load data and generate comprehensive reports by combining indexed private information with LLM-powered analysis.

The system distinguishes itself through a plugin-based provider architecture that supports multiple embedding models, LLM providers, vector databases, and file loaders as interchangeable components. It offers multi-LLM orchestration, coordinating several large language model services to answer queries by routing requests and aggregating responses, while also providing configurable embedding pipelines and vector database retrieval for similarity search.

The project includes CLI-driven data ingestion for local documents and web content, with support for PDFs and text files, alongside web crawling capabilities. Configuration options allow users to select and authenticate with various embedding, LLM, vector database, file loader, and web crawler providers, while the web API service layer exposes query and data loading functions as HTTP endpoints for programmatic access.

## Tags

### Data & Databases

- [Private Data Querying](https://awesome-repositories.com/f/data-databases/custom-data-fields/metadata-querying/conversational/private-data-querying.md) — Retrieves relevant information from enterprise internal documents by indexing them into a vector database and querying with natural language. ([source](https://cdn.jsdelivr.net/gh/zilliztech/deep-searcher@master/README.md))
- [Document Loading Commands](https://awesome-repositories.com/f/data-databases/cli-data-inspection/document-loading-commands.md) — Loads documents into a vector database by running commands in a terminal. ([source](https://cdn.jsdelivr.net/gh/zilliztech/deep-searcher@master/README.md))
- [Local Document Ingestion](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-extraction-ingestion/data-ingestion/local-document-ingestion.md) — Reads files from local storage, including PDFs and text files, to ingest content into the searchable vector database. ([source](https://cdn.jsdelivr.net/gh/zilliztech/deep-searcher@master/README.md))
- [Vector-Database-Backed Retrievals](https://awesome-repositories.com/f/data-databases/database-management-systems/database-engines/vector-databases/vector-database-backed-retrievals.md) — Indexes document embeddings in a vector database and retrieves relevant chunks via similarity search for LLM context.
- [Vector-Store Augmented Generation](https://awesome-repositories.com/f/data-databases/in-memory-data-stores/vector-stores/vector-store-augmented-generation.md) — Stores document embeddings in Milvus and queries them with LLMs for grounded, cited answers.
- [Document and Web Content Indexers](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-and-indexing/command-line-indexers/document-and-web-content-indexers.md) — Ingests documents and crawls web content via CLI commands, then indexes them into the vector database for later querying.
- [Vector Databases](https://awesome-repositories.com/f/data-databases/vector-databases.md) — Indexes document embeddings in Milvus for similarity search and LLM context retrieval. ([source](https://cdn.jsdelivr.net/gh/zilliztech/deep-searcher@master/README.md))
- [Web Content Fetching](https://awesome-repositories.com/f/data-databases/remote-data-fetching/cms-content-fetching/web-content-fetching.md) — Fetches and indexes content from specified URLs using configurable web crawlers for inclusion in the knowledge base. ([source](https://cdn.jsdelivr.net/gh/zilliztech/deep-searcher@master/README.md))
- [Embedding Provider Configurations](https://awesome-repositories.com/f/data-databases/vector-search/vector-embedding-indexes/embedding-provider-configurations.md) — Switches the text embedding backend to any supported provider by setting its name and model in the configuration. ([source](https://zilliztech.github.io/deep-searcher/configuration/embedding/))

### Software Engineering & Architecture

- [Research Engines](https://awesome-repositories.com/f/software-engineering-architecture/automated-refactoring-engines/llm-refactoring-engines/research-engines.md) — An open-source engine that indexes private documents and uses LLMs to answer complex questions with cited reasoning.
- [Document Format Loaders](https://awesome-repositories.com/f/software-engineering-architecture/file-based-configuration-loaders/document-format-loaders.md) — Sets the file loader provider and its options to ingest documents from various formats. ([source](https://zilliztech.github.io/deep-searcher/configuration/))
- [Plugin-Based Architectures](https://awesome-repositories.com/f/software-engineering-architecture/software-architecture/architectural-patterns/plugin-module-systems/modular-plugin-architectures/plugin-based-architectures/plugin-based-architectures.md) — Loads embedding, LLM, vector database, and file loader implementations as interchangeable plugins through a unified configuration interface.

### Artificial Intelligence & ML

- [CLI-Based Model Querying](https://awesome-repositories.com/f/artificial-intelligence-ml/language-model-querying/cli-based-model-querying.md) — Asks questions in plain language from the command line and receives answers based on indexed private data. ([source](https://cdn.jsdelivr.net/gh/zilliztech/deep-searcher@master/README.md))
- [LLM API Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/llm-api-integrations.md) — Integrates multiple large language models such as DeepSeek and OpenAI to answer questions and generate content. ([source](https://zilliztech.github.io/deep-searcher/))
- [LLM Orchestration](https://awesome-repositories.com/f/artificial-intelligence-ml/llm-orchestration.md) — Coordinates multiple large language model providers to answer queries by routing requests and aggregating responses.
- [LLM-Powered Search Interfaces](https://awesome-repositories.com/f/artificial-intelligence-ml/llm-powered-search-interfaces.md) — Provides LLM-powered search over private documents using retrieval-augmented generation.
- [Retrieval-Augmented Report Generators](https://awesome-repositories.com/f/artificial-intelligence-ml/multimodal-report-generators/retrieval-augmented-report-generators.md) — Produces comprehensive written reports by combining retrieved private data with LLM reasoning to answer complex questions. ([source](https://cdn.jsdelivr.net/gh/zilliztech/deep-searcher@master/README.md))
- [Embedding Model Configurations](https://awesome-repositories.com/f/artificial-intelligence-ml/embedding-model-configurations.md) — Converts text into vector representations using pluggable embedding model services with configurable authentication and model selection.
- [Query and Load API Serving](https://awesome-repositories.com/f/artificial-intelligence-ml/model-serving-apis/query-and-load-api-serving.md) — Serves query and data loading functions through a web API for programmatic access. ([source](https://cdn.jsdelivr.net/gh/zilliztech/deep-searcher@master/README.md))
- [Vector Database Configurations](https://awesome-repositories.com/f/artificial-intelligence-ml/vector-database-configurations.md) — Selects and connects to a supported vector database backend for storing and searching document embeddings. ([source](https://zilliztech.github.io/deep-searcher/configuration/))

### Part of an Awesome List

- [LLM Provider Configurations](https://awesome-repositories.com/f/awesome-lists/ai/llm-providers-and-models/llm-provider-configurations.md) — Selects and authenticates with a large language model service to power question answering and content generation. ([source](https://zilliztech.github.io/deep-searcher/configuration/))
- [Private Document Search Engines](https://awesome-repositories.com/f/awesome-lists/data/vector-databases-and-search/private-document-search-engines.md) — Enables natural language search and reasoning over private enterprise documents using RAG.
- [Multi-LLM Orchestrators](https://awesome-repositories.com/f/awesome-lists/ai/question-answering/multi-llm-orchestrators.md) — Coordinates multiple LLMs with indexed private data to produce comprehensive, cited reports.

### Web Development

- [RAG Pipeline API Services](https://awesome-repositories.com/f/web-development/api-management-tools/api-development-management/web-apis/api-service-generators/rag-pipeline-api-services.md) — Exposes query and data loading functions as HTTP endpoints for programmatic access to the RAG pipeline.
- [Research APIs](https://awesome-repositories.com/f/web-development/research-apis.md) — Exposes deep research capabilities through a CLI and web API for loading data and generating reports.
