awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Data Processing Frameworks · Awesome GitHub Repositories

6 repos

Awesome GitHub RepositoriesData Processing Frameworks

Software libraries and platforms providing structured environments for parsing, transforming, and managing data flows.

Explore 6 awesome GitHub repositories matching data & databases · Data Processing Frameworks. Refine with filters or upvote what's useful.

  1. Home
  2. Data & Databases
  3. Data Processing Pipelines
  4. Data Processing Frameworks

Awesome Data Processing Frameworks GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • unclecode/crawl4ai

    unclecode/crawl4ai

    60,452GitHubView on GitHub↗

    Crawl4AI is an AI-powered web crawling and data extraction engine designed to transform complex web content into structured formats. It functions as a headless browser orchestrator, enabling the navigation of dynamic websites, the execution of custom scripts, and the capture of visual assets like screenshots and PDFs.

    Python
  • pathwaycom/pathway

    pathwaycom/pathway

    59,684GitHubView on GitHub↗

    Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with

    Pythonbatch-processingdata-analyticsdata-pipelines
  • pathwaycom/llm-app

    pathwaycom/llm-app

    56,311GitHubView on GitHub↗

    This project is a data processing engine and AI application platform designed for building production-grade machine learning workflows. It provides a unified programming model that handles both historical batch data and live stream ingestion, enabling the development of real-time ETL pipelines and scalable data transfo

    Jupyter Notebookchatbothugging-facellm
  • opendatalab/MinerU

    opendatalab/MinerU

    54,523GitHubView on GitHub↗

    MinerU is a document parsing pipeline designed to transform unstructured files into machine-readable, structured data. It utilizes deep learning models to perform layout analysis, identifying document regions and extracting complex content such as mathematical expressions. By combining these neural network inferences w

    Pythonai4sciencedocument-analysisextract-data
  • docling-project/docling

    docling-project/docling

    53,584GitHubView on GitHub↗

    Docling is a modular framework designed for document parsing, layout analysis, and structured data extraction. It transforms unstructured files and web content into a unified, hierarchical data model that preserves the spatial and semantic relationships between text, tables, images, and layout elements. By normalizing

    Pythonaiconvertdocument-parser
  • WerWolv/ImHex

    WerWolv/ImHex

    52,656GitHubView on GitHub↗

    ImHex is a professional-grade hex editor and binary data analysis platform designed for inspecting, modifying, and reverse engineering raw file contents. It functions as a schema-driven engine that interprets complex binary structures by applying custom definitions to map and visualize byte-level data. The platform di

    C++analyzerbinary-analysisc-plus-plus

Explore sub-tags

  • Binary Data ParsersEngines that interpret and map complex binary file structures based on defined schemas.
  • Markdown ConvertersTools that transform web content or structured data into clean Markdown format for documentation or language model ingestion.
  • Stream Processing EnginesSystems that perform continuous computation on real-time data streams with low latency.
  • Structured Data ExtractorsTools that identify and transform unstructured document content into standardized, machine-readable formats.
  • Unified Batch and Stream Processing EnginesProgramming frameworks that unify the processing of static historical records and live incoming data streams.