# future-house/paper-qa

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/future-house-paper-qa).**

8,161 stars · 825 forks · Python · apache-2.0

## Links

- GitHub: https://github.com/Future-House/paper-qa
- Homepage: https://futurehouse.gitbook.io/futurehouse-cookbook
- awesome-repositories: https://awesome-repositories.com/repository/future-house-paper-qa.md

## Topics

`ai` `rag` `science` `search`

## Description

Paper-qa is a retrieval augmented generation system designed for question answering and analysis of scientific literature and technical documents. It functions as an LLM-powered research assistant that extracts grounded answers and summaries with citations from a document library.

The system utilizes an agentic RAG orchestrator to iteratively refine search queries and gather evidence through multi-step tool calling. It features a multimodal document parser that extracts text, tables, and images from PDFs, alongside a vector-based indexer that embeds and caches document libraries for efficient semantic search.

The project covers a broad range of capabilities including contradiction detection across multiple papers, automated bibliographic metadata retrieval, and the ability to integrate with locally hosted language models. It manages the end-to-end workflow from multi-format document ingestion to two-stage vector retrieval and grounded answer generation.

The system includes configuration options for provider-agnostic model routing, prompt template customization, and rate limit management for API interactions.

## Tags

### Artificial Intelligence & ML

- [RAG Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/language-model-orchestration/retrieval-augmented-generation/rag-pipelines.md) — Orchestrates an agentic RAG pipeline that iteratively refines queries and gathers evidence across document libraries.
- [Academic Paper Summarizations](https://awesome-repositories.com/f/artificial-intelligence-ml/academic-paper-summarizations.md) — Generates contextual summaries of research papers by using re-ranking to identify the most relevant information. ([source](https://cdn.jsdelivr.net/gh/future-house/paper-qa@main/README.md))
- [Agentic Orchestrators](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-orchestrators.md) — Manages an iterative workflow that refines search queries and executes tools for high-accuracy information synthesis.
- [Local Document Indexing](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-rag-development/knowledge-base-retrieval/local-document-indexing.md) — Processes local files into searchable vector stores to enable fast semantic retrieval for RAG workflows.
- [Iterative Refinement Workflows](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/agentic-workflows/iterative-refinement-workflows.md) — Implements iterative feedback loops between agents to refine search queries and improve the accuracy of retrieved evidence.
- [Document Indexing](https://awesome-repositories.com/f/artificial-intelligence-ml/document-indexing.md) — Builds and stores searchable indexes of document directories to accelerate repeated queries in RAG workflows. ([source](https://cdn.jsdelivr.net/gh/future-house/paper-qa@main/README.md))
- [Grounded Answer Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/generative-ai/grounded-answer-generation.md) — Generates accurate responses supported by traceable in-text citations and source verification from a document library. ([source](https://cdn.jsdelivr.net/gh/future-house/paper-qa@main/README.md))
- [Multimodal Document Processing](https://awesome-repositories.com/f/artificial-intelligence-ml/multimodal-document-processing.md) — Provides a multimodal processing pipeline to extract text, tables, and images from PDFs for LLM consumption.
- [Question Answering Systems](https://awesome-repositories.com/f/artificial-intelligence-ml/question-answering-systems.md) — Provides a system for retrieving relevant information from scientific literature to generate high-accuracy answers. ([source](https://cdn.jsdelivr.net/gh/future-house/paper-qa@main/README.md))
- [Automated Research Paper Analysis](https://awesome-repositories.com/f/artificial-intelligence-ml/research-papers/automated-research-paper-analysis.md) — Extracts grounded answers and summaries from research papers with citations to ensure technical accuracy.
- [Retrieval Re-ranking](https://awesome-repositories.com/f/artificial-intelligence-ml/retrieval-re-ranking.md) — Employs a two-stage process that first retrieves candidate passages via vectors and then applies a re-ranking model for precision.
- [Model Provider Adapters](https://awesome-repositories.com/f/artificial-intelligence-ml/artificial-intelligence-tooling/language-model-integrations/model-provider-adapters.md) — Provides unified interfaces and adapters to decouple reasoning logic from specific LLM and embedding API providers.
- [Cross-Document Contradiction Detection](https://awesome-repositories.com/f/artificial-intelligence-ml/cross-document-contradiction-detection.md) — Identifies conflicting claims across multiple research papers by evaluating specific statements against the literature. ([source](https://cdn.jsdelivr.net/gh/future-house/paper-qa@main/README.md))
- [Local Model Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/local-model-integrations.md) — Integrates with locally hosted language models for text generation and summarization to ensure data privacy.
- [Model Provider Configurations](https://awesome-repositories.com/f/artificial-intelligence-ml/model-provider-configurations.md) — Allows configuration of specific model and embedding providers for reasoning, summarization, and vectorization tasks. ([source](https://cdn.jsdelivr.net/gh/future-house/paper-qa@main/README.md))
- [Prompt Templates](https://awesome-repositories.com/f/artificial-intelligence-ml/prompt-templates.md) — Enables overriding of default system prompts and processing steps to customize output style and logic. ([source](https://cdn.jsdelivr.net/gh/future-house/paper-qa@main/README.md))

### Part of an Awesome List

- [Question Answering](https://awesome-repositories.com/f/awesome-lists/ai/question-answering.md) — Provides a retrieval augmented generation system that extracts grounded answers and citations from scientific literature and technical documents.

### Business & Productivity Software

- [AI-Powered Research Assistants](https://awesome-repositories.com/f/business-productivity-software/knowledge-content-creation/research-assistance-tools/ai-powered-research-assistants.md) — Functions as an AI research assistant that summarizes academic papers and detects contradictions across documents.

### Data & Databases

- [Multimodal Document Ingestion](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-extraction-ingestion/data-ingestion/multimodal-document-ingestion.md) — Uses vision-based processing to parse complex layouts, tables, and figures from PDFs for better retrieval.
- [Multi-Format Document Ingestion](https://awesome-repositories.com/f/data-databases/multi-format-document-ingestion.md) — Ingests and normalizes various file types including PDFs, text, markdown, and office documents for AI analysis. ([source](https://cdn.jsdelivr.net/gh/future-house/paper-qa@main/README.md))
- [Local Knowledge Base Indexers](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-and-indexing/local-knowledge-base-indexers.md) — Parses local files and technical documents into a searchable database for semantic retrieval. ([source](https://cdn.jsdelivr.net/gh/future-house/paper-qa@main/README.md))
- [Vector Indexing](https://awesome-repositories.com/f/data-databases/vector-indexing.md) — Implements a vector-based indexing system for embedding and caching document libraries to enable semantic search.
- [Vector Embedding Indexes](https://awesome-repositories.com/f/data-databases/vector-search/vector-embedding-indexes.md) — Creates searchable representations of text using embedding models to enable efficient semantic retrieval. ([source](https://cdn.jsdelivr.net/gh/future-house/paper-qa@main/README.md))
- [Embedding Caches](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/caching-performance/caching-strategies/query-result-caching/method-result-caches/embedding-caches.md) — Caches processed document embeddings and parsed text in local files to reduce redundant computation and API costs.
- [Bibliographic Metadata Retrievers](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-information-retrieval/bibliographic-metadata-retrievers.md) — Automatically fetches bibliographic data and citation counts from external providers to enrich the document library. ([source](https://cdn.jsdelivr.net/gh/future-house/paper-qa@main/README.md))
- [Metadata Aggregators](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-information-retrieval/bibliographic-metadata-retrievers/metadata-aggregators.md) — Combines bibliographic data and citation licenses from multiple external sources for specific publications. ([source](https://cdn.jsdelivr.net/gh/future-house/paper-qa@main/README.md))