What are the best Awesome Processing Pipelines GitHub Repositories?

End-to-end workflows that automate the movement and sequential processing of data from source to destination. Explore 64 awesome GitHub repositories matching data & databases · Processing Pipelines. Refine with filters or upvote what's useful. Top picks: safishamsi/graphify, egonex-ai/understand-anything, keras-team/keras, scrapy/scrapy, docling-project/docling, corentinj/real-time-voice-cloning, ultralytics/ultralytics, pmndrs/zustand, rclone/rclone, zylon-ai/private-gpt.

Why is safishamsi/graphify a recommended Processing Pipelines GitHub Repositories repository?

Uses graph data to perform lookups on node neighbors and shortest paths to analyze how code changes affect the system.

Why is egonex-ai/understand-anything a recommended Processing Pipelines GitHub Repositories repository?

Traces connectivity paths from modified files to identify affected downstream architectural components.

Why is keras-team/keras a recommended Processing Pipelines GitHub Repositories repository?

Streams large datasets into training loops by handling batching, shuffling, and preprocessing tasks automatically.

Why is scrapy/scrapy a recommended Processing Pipelines GitHub Repositories repository?

Processes individual data items through a sequential chain of validation, cleaning, and storage handlers before persistence.

Why is docling-project/docling a recommended Processing Pipelines GitHub Repositories repository?

Automates the ingestion, parsing, and structuring of unstructured files through a modular pipeline for downstream data analysis.

Why is corentinj/real-time-voice-cloning a recommended Processing Pipelines GitHub Repositories repository?

Structures speech synthesis into distinct, swappable encoder and decoder stages for modular performance optimization.

Why is ultralytics/ultralytics a recommended Processing Pipelines GitHub Repositories repository?

Adapts various dataset structures and annotation formats on-the-fly to feed training pipelines without requiring manual pre-conversion.

Why is pmndrs/zustand a recommended Processing Pipelines GitHub Repositories repository?

Manages sequential data processing and API workflows to update the user interface once background tasks complete.

Why is rclone/rclone a recommended Processing Pipelines GitHub Repositories repository?

Orchestrates complex file operations across multiple storage platforms through a unified command interface.

Why is zylon-ai/private-gpt a recommended Processing Pipelines GitHub Repositories repository?

Standardizes the ingestion, parsing, and vectorization of files to facilitate semantic search across internal knowledge bases.

64 repository-uri

Awesome GitHub RepositoriesProcessing Pipelines

End-to-end workflows that automate the movement and sequential processing of data from source to destination.

Explore 64 awesome GitHub repositories matching data & databases · Processing Pipelines. Refine with filters or upvote what's useful.

Găsește cele mai bune repo-uri cu AI.Vom căuta cele mai potrivite repository-uri folosind AI.

safishamsi/graphify
safishamsi/graphify
67,973Vezi pe GitHub
Graphify is a knowledge retrieval system that transforms directories of source code and documentation into structured, queryable project maps. It utilizes a code-to-graph parser to extract technical metadata and system connectivity, converting a mix of code, SQL schemas, and documentation into a unified graph structure. The project distinguishes itself by integrating these knowledge graphs with AI coding assistants through a Model Context Protocol server and dedicated tool hooks. This allows AI agents to perform lookups and impact analysis on node neighbors and shortest paths to understand ho
Uses graph data to perform lookups on node neighbors and shortest paths to analyze how code changes affect the system.
Pythonantigravityclaude-codecodex
Vezi pe GitHub67,973
egonex-ai/understand-anything
Egonex-AI/Understand-Anything
66,456Vezi pe GitHub
Understand-Anything is a codebase architecture visualization tool that transforms source code and documentation into interactive knowledge graphs. It maps files, functions, and classes into a node-edge model to visualize architectural dependencies and project structures. The project provides specialized workflows for impact analysis, tracing connectivity paths from code modifications to identify affected downstream components. It also enables technical onboarding through automated architecture tours and the conversion of technical documentation into navigable networks of interconnected ideas.
Traces connectivity paths from modified files to identify affected downstream architectural components.
TypeScriptantigravity-skillsbusiness-knowledgeclaude-code
Vezi pe GitHub66,456
keras-team/keras
keras-team/keras
64,094Vezi pe GitHub
Keras is a high-level deep learning framework designed for constructing and training neural networks through the composition of modular, functional layers. It serves as a comprehensive modeling toolkit that provides standardized procedures for defining, evaluating, and deploying complex architectures. By utilizing a directed acyclic graph approach, the framework allows users to build intricate models with multiple inputs, outputs, and shared layers, ensuring consistent numerical execution through functional state management. The project distinguishes itself as a multi-backend machine learning
Streams large datasets into training loops by handling batching, shuffling, and preprocessing tasks automatically.
Pythondata-sciencedeep-learningjax
Vezi pe GitHub64,094
scrapy/scrapy
scrapy/scrapy
62,274Vezi pe GitHub
Scrapy is a comprehensive framework designed for automated web data extraction and large-scale crawling. It operates on an asynchronous, event-driven engine that manages non-blocking network requests and data processing tasks, allowing for the efficient retrieval of structured information from web documents using path-based selectors. The system distinguishes itself through a highly modular architecture that supports complex data collection workflows. Users can implement custom middleware and signal handlers to intercept and modify request flows, while a priority-based scheduler manages concu
Processes individual data items through a sequential chain of validation, cleaning, and storage handlers before persistence.
Pythoncrawlercrawlingframework
Vezi pe GitHub62,274
docling-project/docling
docling-project/docling
61,674Vezi pe GitHub
Docling is a modular framework designed for document parsing, layout analysis, and structured data extraction. It transforms unstructured files and web content into a unified, hierarchical data model that preserves the spatial and semantic relationships between text, tables, images, and layout elements. By normalizing diverse input formats into a consistent internal representation, the library enables uniform processing across various document types. The project distinguishes itself through a schema-driven approach that maps document regions to strongly-typed objects, ensuring data accuracy t
Automates the ingestion, parsing, and structuring of unstructured files through a modular pipeline for downstream data analysis.
Pythonaiconvertdocument-parser
Vezi pe GitHub61,674
corentinj/real-time-voice-cloning
CorentinJ/Real-Time-Voice-Cloning
59,918Vezi pe GitHub
This project is a neural text-to-speech engine and voice cloning toolkit designed to generate synthetic speech that mimics the vocal characteristics of a target speaker. It functions as a real-time audio synthesizer, utilizing a deep learning pipeline to convert written text into high-fidelity speech output with minimal latency. The system employs a transfer learning framework that leverages pre-trained speaker verification models to adapt synthesis to new, unseen vocal identities. By using an encoder-based speaker embedding process, the toolkit maps variable-length audio samples into a laten
Structures speech synthesis into distinct, swappable encoder and decoder stages for modular performance optimization.
Pythondeep-learningpythonpytorch
Vezi pe GitHub59,918
ultralytics/ultralytics
ultralytics/ultralytics
58,468Vezi pe GitHub
Ultralytics is a comprehensive computer vision framework designed for training, validating, and deploying deep learning models across a wide range of visual recognition tasks. It provides a unified interface for core operations including object detection, instance segmentation, pose estimation, and image classification. By utilizing a modular architecture, the platform allows users to swap model components to balance inference speed and accuracy requirements for diverse applications. The framework distinguishes itself through its support for real-time processing and flexible deployment. It in
Adapts various dataset structures and annotation formats on-the-fly to feed training pipelines without requiring manual pre-conversion.
Pythonclicomputer-visiondeep-learning
Vezi pe GitHub58,468
pmndrs/zustand
pmndrs/zustand
58,371Vezi pe GitHub
Zustand is a state management library that provides a centralized store for managing shared application data. It functions as a reactive container that connects application state to components, allowing them to subscribe to specific slices of data and trigger updates automatically. By utilizing selector-based data access and immutable state updates, the library ensures that components only re-render when their observed data changes, maintaining a predictable and efficient data flow. The library distinguishes itself through a pluggable, middleware-based architecture that allows for the extensi
Manages sequential data processing and API workflows to update the user interface once background tasks complete.
TypeScripthacktoberfesthooksreact
Vezi pe GitHub58,371
rclone/rclone
rclone/rclone
57,877Vezi pe GitHub
This project is a command-line storage manager that provides a unified interface for performing file operations across local filesystems and diverse cloud storage providers. It functions as a cross-platform storage abstraction, utilizing a modular backend architecture to map heterogeneous cloud storage APIs into a standard set of file system operations. This allows for consistent data management and movement regardless of the underlying storage service. The tool serves as a network data transfer engine designed for automated data migration and cloud storage synchronization. It distinguishes i
Orchestrates complex file operations across multiple storage platforms through a unified command interface.
Goazure-blobazure-blob-storageazure-files
Vezi pe GitHub57,877
zylon-ai/private-gpt
zylon-ai/private-gpt
57,278Vezi pe GitHub
This project is a privacy-first backend service designed to facilitate retrieval-augmented generation by processing local documents into searchable vector representations. It provides a modular architecture that allows users to ingest diverse file formats, manage document metadata, and perform semantic searches to provide context-aware responses for chat and completion requests. The system distinguishes itself through a database-agnostic abstraction layer that supports various storage backends, ranging from local disk storage to enterprise-grade vector databases. It offers flexible deployment
Standardizes the ingestion, parsing, and vectorization of files to facilitate semantic search across internal knowledge bases.
Python
Vezi pe GitHub57,278
deepfakes/faceswap
deepfakes/faceswap
55,289Vezi pe GitHub
Faceswap is a comprehensive framework for automated media manipulation and neural face synthesis. It provides a modular pipeline that manages the entire lifecycle of facial feature extraction, deep learning model training, and image conversion. By coordinating complex computer vision workflows, the system enables users to map facial identities between source and destination datasets while maintaining structural alignment and lighting consistency across video frames. The project distinguishes itself through a highly extensible plugin-based architecture that handles hardware-accelerated process
Supervises the runtime execution and monitoring of data extraction tasks across the processing pipeline.
Pythondeep-face-swapdeep-learningdeep-neural-networks
Vezi pe GitHub55,289
zhulinsen/daily_stock_analysis
ZhuLinsen/daily_stock_analysis
42,741Vezi pe GitHub
Daily stock analysis is an automated research platform that utilizes large language models to process financial market data. The system functions as an investment analyst, transforming raw market feeds into structured reports to generate actionable trading insights. The platform distinguishes itself through a modular orchestration pipeline that allows users to integrate various artificial intelligence backends. By utilizing a provider-agnostic interface, the system enables the selection of preferred language models to interpret complex financial information according to user-defined parameter
Decomposes financial data processing into modular, swappable stages orchestrated by language models to generate insights.
Pythonagentaiaigc
Vezi pe GitHub42,741
facebook/rocksdb
facebook/rocksdb
31,767Vezi pe GitHub
RocksDB is a high-performance, embeddable persistent key-value library and storage engine based on Log-Structured Merge-trees. It is designed to provide durable storage for large-scale datasets, integrating directly into applications to manage data on flash and RAM-based hardware. The engine is distinguished by its focus on minimizing read and write amplification through multi-threaded compaction and custom memory allocators. It features specialized optimizations for flash storage, including support for zoned block devices, and provides the ability to extend store behavior via external plugin
Automatically removes old data based on a configurable time-to-live (TTL) threshold.
C++databasestorage-engine
Vezi pe GitHub31,767
fyne-io/fyne
fyne-io/fyne
27,941Vezi pe GitHub
Fyne is a cross-platform graphical user interface toolkit for the Go programming language. It provides a comprehensive framework for building native applications that run on desktop, mobile, and web environments from a single codebase. The toolkit centers on a canvas-based rendering engine and a device-independent layout engine, ensuring that visual elements maintain consistent dimensions and behavior across diverse operating systems and screen densities. The project distinguishes itself through a reactive data-binding system that automatically synchronizes application state with interface co
Registers callback functions to automatically react to state changes in bound data items.
Goandroidcross-platformfyne
Vezi pe GitHub27,941
invoke-ai/invokeai
invoke-ai/InvokeAI
27,500Vezi pe GitHub
InvokeAI is a self-hosted, professional-grade platform designed for managing generative models and performing complex image synthesis. It provides a local application environment that allows users to execute diffusion models directly on their own hardware, ensuring data privacy and complete ownership of all generated assets. The platform distinguishes itself through a node-based workflow system that enables the construction of reproducible and automated image generation pipelines. By chaining modular functional units into directed acyclic graphs, users can automate intricate production tasks
Enables construction of custom generation pipelines by connecting modular processing nodes.
TypeScriptai-artartificial-intelligencegenerative-art
Vezi pe GitHub27,500
deepset-ai/haystack
deepset-ai/haystack
24,253Vezi pe GitHub
Haystack is an orchestration framework designed for building complex search and generative AI pipelines. It functions as an agentic workflow engine, enabling the construction of automated sequences that allow AI agents to perform multi-step reasoning and data analysis. The framework utilizes a modular, component-based architecture that connects processing steps into directed acyclic graphs. By employing a provider-agnostic integration layer, it decouples core logic from specific external AI services and vector databases, allowing for the flexible exchange of underlying technologies. This desi
Orchestrates modular processing steps into automated sequences for LLM-based agentic tasks.
MDXagentagentsai
Vezi pe GitHub24,253
pubkey/rxdb
pubkey/rxdb
23,048Vezi pe GitHub
This project is a reactive, offline-first NoSQL database engine designed for JavaScript applications. It provides a robust framework for managing application state by synchronizing data across browsers, mobile devices, and server-side runtimes. By treating local storage as the primary source of truth, it enables applications to remain functional without network connectivity, automatically reconciling changes with remote backends once a connection is restored. The database distinguishes itself through a modular architecture that supports cross-environment synchronization and high-performance d
Provides reactive streams for monitoring and responding to local document modifications in real time.
TypeScriptangularbrowser-databasecouchdb
Vezi pe GitHub23,048
charmbracelet/gum
charmbracelet/gum
22,814Vezi pe GitHub
Gum is a toolkit for building interactive, visually styled command-line interfaces and prompts directly within shell scripts. It functions as a library of modular components that allow developers to enhance terminal workflows by adding structured layouts, formatted text, and user-input widgets to standard command-line operations. The project distinguishes itself by providing a suite of specialized utilities for common shell tasks, such as fuzzy-matched selection menus, interactive file system navigation, and confirmation dialogs. It translates high-level styling and layout instructions into t
Processes and displays text using templates to inject dynamic values into command-line output.
Gobashshell
Vezi pe GitHub22,814
forem/forem
forem/forem
22,726Vezi pe GitHub
Forem is an open-source platform designed for building and managing technical communities. It functions as a social publishing engine that enables members to share long-form content, participate in threaded discussions, and engage through social interactions. The platform provides tools for organizations to maintain branded profiles, host community hackathons, and facilitate collaborative learning through structured educational tracks. Beyond its social features, Forem integrates advanced capabilities for AI agent workflow orchestration and codebase knowledge graphing. It allows developers to
Identifies which architectural components are affected by code modifications to assist in impact assessment.
Rubycommunitydiscussionfeedback
Vezi pe GitHub22,726
vectordotdev/vector
vectordotdev/vector
22,071Vezi pe GitHub
Vector is a high-performance observability data pipeline designed to collect, transform, and route logs, metrics, and traces across distributed infrastructure. It functions as a modular engine that decouples data ingestion from processing and transmission, utilizing a component-based architecture to connect diverse sources to multiple destinations. The project distinguishes itself through a focus on reliability and flow control. It implements backpressure-aware data movement to prevent data loss during traffic spikes and utilizes disk-backed event buffering to ensure durability during network
Adjusts data processing configurations in real time without requiring a service restart to apply changes to the active pipeline.
Rusteventsforwarderhacktoberfest
Vezi pe GitHub22,071