awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
LLM Data Preparation Tools · Awesome GitHub Repositories

2 repos

Awesome GitHub RepositoriesLLM Data Preparation Tools

Tools that convert raw web and unstructured content into clean, structured formats suitable for large language model ingestion.

Explore 2 awesome GitHub repositories matching data & databases · LLM Data Preparation Tools. Refine with filters or upvote what's useful.

  1. Home
  2. Data & Databases
  3. Data Processing Pipelines
  4. Document and LLM Preparation
  5. LLM Data Preparation Tools

Awesome LLM Data Preparation Tools GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • firecrawl/firecrawl

    firecrawl/firecrawl

    84,034GitHubView on GitHub↗

    Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveragi

    TypeScriptaiai-agentsai-crawler
  • unclecode/crawl4ai

    unclecode/crawl4ai

    60,452GitHubView on GitHub↗

    Crawl4AI is an AI-powered web crawling and data extraction engine designed to transform complex web content into structured formats. It functions as a headless browser orchestrator, enabling the navigation of dynamic websites, the execution of custom scripts, and the capture of visual assets like screenshots and PDFs.

    Python