awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Data Processing · Awesome GitHub Repositories

27 repos

Awesome GitHub RepositoriesData Processing

Tools and frameworks that perform computational operations, transformations, and analysis on raw data sets.

Explore 27 awesome GitHub repositories matching data & databases · Data Processing. Refine with filters or upvote what's useful.

  1. Home
  2. Data & Databases
  3. Data Processing Pipelines
  4. Data Processing

Awesome Data Processing GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • vinta/awesome-python

    vinta/awesome-python

    283,687GitHubView on GitHub↗

    This project is a comprehensive, community-curated directory that organizes a vast landscape of Python software libraries, frameworks, and tools. It serves as a centralized knowledge base designed to facilitate ecosystem navigation and accelerate developer discovery across the entire software development lifecycle. Th

    Pythonawesomecollectionspython
  • macrozheng/mall

    macrozheng/mall

    82,926GitHubView on GitHub↗

    This project is an enterprise-grade Java framework designed for building scalable, full-stack e-commerce applications. It provides a comprehensive foundation for microservice-based distributed architectures, enabling the development of complex retail platforms that include product management, order processing, and secu

    Javadockerelasticsearchelk
  • elastic/elasticsearch

    elastic/elasticsearch

    76,163GitHubView on GitHub↗

    Elasticsearch is a distributed search engine and document store designed for the high-performance indexing and retrieval of massive volumes of unstructured data. It functions as a centralized analytics platform, providing a schema-flexible architecture that organizes information into searchable indices while maintainin

    Javaelasticsearchjavasearch-engine
  • abi/screenshot-to-code

    abi/screenshot-to-code

    71,707GitHubView on GitHub↗

    This project is an artificial intelligence-powered frontend generator that translates visual design inputs into functional source code. It functions as a workflow engine that interprets graphical user interfaces, mapping layout structures and styling rules to structured markup and programming language syntax. The tool

    TypeScript
  • josephmisiti/awesome-machine-learning

    josephmisiti/awesome-machine-learning

    71,702GitHubView on GitHub↗

    This project is a comprehensive, community-driven directory of machine learning resources, software libraries, and educational materials. It serves as a centralized knowledge base for developers and researchers, organizing tools and frameworks by their primary programming language and technical domain to simplify disco

    Python
  • fffaraz/awesome-cpp

    fffaraz/awesome-cpp

    69,832GitHubView on GitHub↗

    This project is a comprehensive, curated directory of high-quality libraries, tools, and educational resources for C and C++ development. It serves as an ecosystem discovery index, helping developers navigate the vast landscape of third-party components, frameworks, and technical documentation available for the languag

    awesomeawesome-listc
  • scikit-learn/scikit-learn

    scikit-learn/scikit-learn

    65,178GitHubView on GitHub↗

    Scikit-learn is a machine learning library for predictive data analysis that provides a collection of algorithms for supervised and unsupervised learning. It functions as a comprehensive toolkit for data preprocessing, dimensionality reduction, and model selection, allowing users to classify data objects, predict conti

    Pythondata-analysisdata-sciencemachine-learning
  • sindresorhus/awesome-nodejs

    sindresorhus/awesome-nodejs

    65,038GitHubView on GitHub↗

    This project is a community-driven directory that aggregates essential software projects and educational content for the Node.js ecosystem. It functions as a centralized knowledge base and discovery index, designed to simplify the navigation of a fragmented technical landscape by providing a structured collection of hi

    awesomeawesome-listjavascript
  • keras-team/keras

    keras-team/keras

    63,858GitHubView on GitHub↗

    Keras is a high-level deep learning framework designed for constructing and training neural networks through the composition of modular, functional layers. It serves as a comprehensive modeling toolkit that provides standardized procedures for defining, evaluating, and deploying complex architectures. By utilizing a di

    Pythondata-sciencedeep-learningjax
  • xingshaocheng/architect-awesome

    xingshaocheng/architect-awesome

    60,831GitHubView on GitHub↗

    This project serves as a comprehensive knowledge base and reference for distributed systems engineering and enterprise software architecture. It provides a structured collection of technical resources, design patterns, and methodologies intended to assist in the design, maintenance, and scaling of complex, high-perform

  • OpenBB-finance/OpenBB

    OpenBB-finance/OpenBB

    60,502GitHubView on GitHub↗

    OpenBB is a financial data platform and investment research terminal designed to aggregate, normalize, and distribute market data across analytical workflows. It functions as a comprehensive ecosystem that bridges disparate financial data providers with custom applications, spreadsheets, and internal modeling infrastru

    Pythonaicryptoderivatives
  • unclecode/crawl4ai

    unclecode/crawl4ai

    60,452GitHubView on GitHub↗

    Crawl4AI is an AI-powered web crawling and data extraction engine designed to transform complex web content into structured formats. It functions as a headless browser orchestrator, enabling the navigation of dynamic websites, the execution of custom scripts, and the capture of visual assets like screenshots and PDFs.

    Python
  • pathwaycom/pathway

    pathwaycom/pathway

    59,684GitHubView on GitHub↗

    Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with

    Pythonbatch-processingdata-analyticsdata-pipelines
  • PlexPt/awesome-chatgpt-prompts-zh

    PlexPt/awesome-chatgpt-prompts-zh

    58,347GitHubView on GitHub↗

    This project is a community-driven library of structured text inputs designed to guide large language models into specific roles, behaviors, and operational modes. It functions as a comprehensive repository of prompt engineering resources, providing reusable templates that allow users to override default model tendenci

    chat-gptchatgptchatgpt3
  • ultralytics/yolov5

    ultralytics/yolov5

    56,830GitHubView on GitHub↗

    YOLOv5 is a comprehensive computer vision framework designed for end-to-end deep learning, specializing in real-time object detection, image classification, and instance segmentation. It provides a unified toolkit that manages the entire lifecycle of a model, from initial dataset configuration and hyperparameter tuning

    Pythoncoremldeep-learningios
  • pathwaycom/llm-app

    pathwaycom/llm-app

    56,311GitHubView on GitHub↗

    This project is a data processing engine and AI application platform designed for building production-grade machine learning workflows. It provides a unified programming model that handles both historical batch data and live stream ingestion, enabling the development of real-time ETL pipelines and scalable data transfo

    Jupyter Notebookchatbothugging-facellm
  • meilisearch/meilisearch

    meilisearch/meilisearch

    55,992GitHubView on GitHub↗

    Meilisearch is a Rust-based search engine providing typo-tolerant full-text and vector-based semantic search with real-time conversational capabilities.

    Rustaiapiapp-search
  • rclone/rclone

    rclone/rclone

    55,637GitHubView on GitHub↗

    This project is a command-line storage manager that provides a unified interface for performing file operations across local filesystems and diverse cloud storage providers. It functions as a cross-platform storage abstraction, utilizing a modular backend architecture to map heterogeneous cloud storage APIs into a stan

    Goazure-blobazure-blob-storageazure-files
  • tiimgreen/github-cheat-sheet

    tiimgreen/github-cheat-sheet

    55,238GitHubView on GitHub↗

    This project is a community-driven knowledge base that serves as a comprehensive reference guide for Git and GitHub. It functions as both a command-line cheat sheet for terminal-based version control operations and a collaborative workflow resource detailing platform-specific conventions for managing repositories, issu

    awesomeawesome-listgit
  • RVC-Boss/GPT-SoVITS

    RVC-Boss/GPT-SoVITS

    55,111GitHubView on GitHub↗

    GPT-SoVITS is a text-to-speech synthesis engine and voice cloning toolkit designed for generating natural-sounding human speech. It functions as a neural audio processing pipeline that maps input text to high-fidelity audio waveforms, utilizing conditional variational autoencoders and flow-based decoders to ensure expr

    Pythontext-to-speechttsvits
Prev12Next

Explore sub-tags

  • Data Normalization and Schema Enforcement2 sub-tagsUtilities that standardize heterogeneous data inputs into consistent schemas or unified formats for downstream analysis.
  • Data Serialization and Parsing2 sub-tagsTools for converting between raw formats, binary representations, and structured objects for transmission or storage.
  • Dataset FormatsStandardized structures and schemas for organizing training data used in model development.
  • Distributed Processing Frameworks3 sub-tags
Systems designed for parallel execution and large-scale batch or event-driven data computation across clusters.
  • Document and Unstructured Extraction3 sub-tagsAutomated processes for parsing unstructured text, documents, or web content into structured, machine-readable formats.
  • General Data Utilities3 sub-tagsLow-level functional libraries for mathematical, string, or compression operations on raw data.
  • Machine Learning Data Pipelines5 sub-tagsSpecialized workflows for preparing, augmenting, and streaming datasets specifically for model training and feature engineering.
  • Multi-Modal Data ProcessorsSystems that extract information from combined visual and temporal data sources.
  • Object-Based PipelinesData processing chains that pass structured objects between commands.
  • Search Engines8 sub-tagsDistributed platforms that provide full-text indexing, advanced filtering, and fast query capabilities for large datasets.
  • Search FiltersMechanisms for narrowing down search results based on specific criteria.