awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Data Extraction Pipelines · Awesome GitHub Repositories

1 repo

Awesome GitHub RepositoriesData Extraction Pipelines

Frameworks for transforming unstructured text and documents into structured formats suitable for machine learning models.

Distinguishing note: Focuses on the extraction of structured information from unstructured sources, distinct from general data ingestion.

Explore 1 awesome GitHub repository matching artificial intelligence & ml · Data Extraction Pipelines. Refine with filters or upvote what's useful.

  1. Home
  2. Artificial Intelligence & ML
  3. Data Extraction Pipelines

Awesome Data Extraction Pipelines GitHub Repositories

Describe the repository you're looking for…
Find the best repos with AI.We'll search the best matching repositories with AI.
  • run-llama/llama_index

    run-llama/llama_index

    47,075View on GitHub↗

    LlamaIndex is a comprehensive development framework designed to connect private or external data sources to large language models. It functions as a data-centric toolkit that enables the construction of retrieval-augmented generation systems, allowing developers to build applications that provide context-aware answers based on specific organizational information. The project distinguishes itself through a robust agentic orchestration engine that supports the creation of autonomous agents capable of multi-step reasoning, memory management, and complex tool execution. Beyond simple retrieval, i

    LlamaIndex pulls specific information from unstructured documents using programmatic interfaces or web tools to convert raw text into organized formats for automated pipelines.

    Pythonagentsapplicationdata
    47,075View on GitHub↗