awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Document Parsers · Awesome GitHub Repositories

3 repos

Awesome GitHub RepositoriesDocument Parsers

Tools and utilities for extracting and chunking text content from various file formats for indexing.

Distinguishing note: Focuses on the extraction and chunking phase of data ingestion, distinct from general file storage.

Explore 3 awesome GitHub repositories matching data & databases · Document Parsers. Refine with filters or upvote what's useful.

  1. Home
  2. Data & Databases
  3. Document Parsers

Awesome Document Parsers GitHub Repositories

Describe the repository you're looking for…
Find the best repos with AI.We'll search the best matching repositories with AI.
  • run-llama/llama_index

    run-llama/llama_index

    47,075View on GitHub↗

    LlamaIndex is a comprehensive development framework designed to connect private or external data sources to large language models. It functions as a data-centric toolkit that enables the construction of retrieval-augmented generation systems, allowing developers to build applications that provide context-aware answers based on specific organizational information. The project distinguishes itself through a robust agentic orchestration engine that supports the creation of autonomous agents capable of multi-step reasoning, memory management, and complex tool execution. Beyond simple retrieval, i

    LlamaIndex parses spreadsheet files into structured table regions and metadata by uploading files, initiating extraction jobs, and downloading the resulting data files.

    Pythonagentsapplicationdata
    47,075View on GitHub↗
  • zhayujie/chatgpt-on-wechat

    zhayujie/chatgpt-on-wechat

    41,334View on GitHub↗

    This project is an autonomous agent framework designed to integrate large language models with popular messaging platforms. It functions as a middleware platform that enables automated, multimodal interactions by decomposing complex user goals into sequential plans, executing them through external tools, and maintaining persistent context across sessions. The framework distinguishes itself through a modular skill architecture and a hybrid memory system. Users can extend system capabilities by installing custom logic modules from community hubs or generating them through natural language. The

    Agent framework provides access to text, images, and PDF documents to provide necessary context for system tasks and user queries.

    Pythonaiai-agentchatgpt
    41,334View on GitHub↗
  • QuivrHQ/quivr

    QuivrHQ/quivr

    38,938View on GitHub↗

    Quivr is a retrieval-augmented generation platform designed to transform raw documents into searchable knowledge bases. It functions as a centralized environment where users can ingest files, index them into vector databases, and interact with language models to receive contextually relevant, data-backed responses. The platform distinguishes itself through an agentic workflow orchestrator that sequences retrieval tasks, tool execution, and model interactions to resolve complex, multi-step queries. This engine is entirely configuration-driven, allowing users to define document ingestion, chunk

    Convert PDF files into smaller manageable text chunks using dedicated processors to facilitate efficient indexing and retrieval within the system.

    Pythonaiapichatbot
    38,938View on GitHub↗