awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Data Extraction and Synthesis Tools · Awesome GitHub Repositories

4 repos

Awesome GitHub RepositoriesData Extraction and Synthesis Tools

High-level systems designed for parsing, structuring, and interpreting web content for automated data collection or research.

Explore 4 awesome GitHub repositories matching web development · Data Extraction and Synthesis Tools. Refine with filters or upvote what's useful.

  1. Home
  2. Web Development
  3. Web Automation and Scraping
  4. Data Extraction and Synthesis Tools

Awesome Data Extraction and Synthesis Tools GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • openclaw/openclaw

    openclaw/openclaw

    211,971GitHubView on GitHub↗

    Openclaw is a platform for managing agent execution environments, providing the infrastructure to control agent lifecycles, session state, and workspace persistence. It features a centralized gateway that handles model loops, tool invocation, and streaming events, while supporting multi-agent routing and persistent mem

    TypeScriptaiassistantcrustacean
  • firecrawl/firecrawl

    firecrawl/firecrawl

    84,034GitHubView on GitHub↗

    Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveragi

    TypeScriptaiai-agentsai-crawler
  • microsoft/playwright

    microsoft/playwright

    82,810GitHubView on GitHub↗

    Playwright is a comprehensive browser automation framework designed for end-to-end testing and web workflow automation. It provides a unified API to drive web applications across multiple browser engines, enabling developers to simulate complex user interactions, perform web scraping, and validate application behavior

    TypeScriptautomationchromechromium
  • browser-use/browser-use

    browser-use/browser-use

    78,576GitHubView on GitHub↗

    Browser-use is a framework for building autonomous agents that navigate, interact with, and extract data from web interfaces using natural language instructions. By acting as an orchestration layer between large language models and browser automation protocols, it enables the execution of complex, multi-step workflows

    Pythonai-agentsai-toolsbrowser-automation

Explore sub-tags

  • Autonomous Research AgentsAutomated agents that retrieve and synthesize structured data from web sources based on provided prompts and output schemas.
  • Browser Snapshotting SystemsSystems that generate actionable references for browser elements to facilitate the identification and capture of web page components.
  • Crawl Error HandlingUtilities designed to identify, manage, and resolve issues encountered during automated web crawling processes.
  • DOM Serialization ToolsTools that convert complex web page structures into simplified text formats for easier data processing.
  • Full Page ScreenshotsUtilities that capture full-length visual representations of scrollable web pages as single image files.