awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Data Parsing and Extraction · Awesome GitHub Repositories

5 repos

Awesome GitHub RepositoriesData Parsing and Extraction

Tools focused on identifying, isolating, and converting raw or unstructured input into structured, schema-validated formats.

Explore 5 awesome GitHub repositories matching data & databases · Data Parsing and Extraction. Refine with filters or upvote what's useful.

  1. Home
  2. Data & Databases
  3. Data Processing Pipelines
  4. Data Transformation
  5. Data Parsing and Extraction

Awesome Data Parsing and Extraction GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • Significant-Gravitas/AutoGPT

    Significant-Gravitas/AutoGPT

    181,891GitHubView on GitHub↗

    AutoGPT is an orchestration platform designed for building, managing, and deploying autonomous agents. It provides a visual canvas-based environment where users can assemble agents by connecting modular blocks that represent actions, data flows, and conditional logic. The platform supports the entire agent lifecycle, i

    Pythonaiartificial-intelligenceautonomous-agents
  • firecrawl/firecrawl

    firecrawl/firecrawl

    84,034GitHubView on GitHub↗

    Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveragi

    TypeScriptaiai-agentsai-crawler
  • browser-use/browser-use

    browser-use/browser-use

    78,576GitHubView on GitHub↗

    Browser-use is a framework for building autonomous agents that navigate, interact with, and extract data from web interfaces using natural language instructions. By acting as an orchestration layer between large language models and browser automation protocols, it enables the execution of complex, multi-step workflows

    Pythonai-agentsai-toolsbrowser-automation
  • junegunn/fzf

    junegunn/fzf

    77,987GitHubView on GitHub↗

    This project is a general-purpose command-line filter that provides an interactive interface for processing standard input streams. It enables real-time fuzzy searching, data selection, and transformation, allowing users to navigate complex information or file systems directly within their terminal. By utilizing a pipe

    Gobashclifish
  • hoppscotch/hoppscotch

    hoppscotch/hoppscotch

    77,888GitHubView on GitHub↗

    Hoppscotch is an open-source API development ecosystem designed for building, testing, and debugging REST, GraphQL, and real-time APIs. It provides a unified platform that functions across web browsers, desktop applications, and command-line interfaces, allowing developers to manage the entire API lifecycle from a sing

    TypeScriptapiapi-clientapi-rest

Explore sub-tags

  • Delimiter-based ParsersParsers that process data chunks by utilizing specific characters or bytes as delimiters.
  • Field ExtractorsTools that extract specific fields from data items using index expressions.
  • LLM-Driven Data ExtractorsExtractors that leverage large language models to transform unstructured content into structured formats.
  • Schema ParsersParsers that normalize diverse external API definitions into a consistent internal representation.
  • Typed Data ExtractionUtilities for parsing unstructured inputs into specific, typed data fields.