202 repository-uri
Architectures optimized for continuous, low-latency data ingestion and real-time event handling, distinct from batch-oriented processing.
Explore 202 awesome GitHub repositories matching data & databases · Stream Processing Systems. Refine with filters or upvote what's useful.
Deep-Live-Cam is a generative video transformation tool designed for real-time facial manipulation and cinematic enhancement. It functions as a local-first AI runtime, performing all media processing directly on the user's hardware to ensure complete data privacy without external network dependencies. By utilizing a high-performance processing pipeline, the application enables live face swapping and interactive video modifications during active streaming sessions or on pre-recorded media. The system distinguishes itself through a hardware-abstraction execution layer that dynamically routes co
Processes video input as discrete sequential frames for real-time manipulation.
This project is a general-purpose command-line filter that provides an interactive interface for processing standard input streams. It enables real-time fuzzy searching, data selection, and transformation, allowing users to navigate complex information or file systems directly within their terminal. By utilizing a pipe-oriented architecture, it integrates into existing shell pipelines and workflows to facilitate efficient data exploration. What distinguishes this tool is its highly extensible, event-driven design that allows for deep integration with external processes. It supports asynchrono
Captures data from standard input streams to allow interactive selection and processing of piped content.
This project is a comprehensive Java backend engineering guide and technical reference focused on high-concurrency design, distributed systems, and microservices architecture. It provides detailed strategies for decomposing monolithic applications, managing service discovery, and implementing the architectural patterns required for scalable backend environments. The repository distinguishes itself through an extensive collection of big data algorithmic references and database scaling strategies. It covers memory-efficient techniques for analyzing massive datasets, such as Top-K element extrac
Provides algorithms for calculating the median of massive numeric sets and continuous streams.
OpenHands is an autonomous agent framework designed for software engineering workflows. It provides a modular platform for orchestrating AI agents that reason, plan, and execute tasks within isolated, containerized development environments. By integrating with standard version control and development tools, the system enables agents to autonomously navigate codebases, implement features, and resolve issues through iterative reasoning and tool execution. The platform distinguishes itself through a model-agnostic orchestrator that connects diverse language models to a unified tool registry. It
Outputs system events as structured JSON lines to facilitate real-time integration with external monitoring and logging pipelines.
NocoDB is a visual platform that transforms relational databases into collaborative, spreadsheet-style workspaces. By acting as a headless database backend, it provides a unified environment for designing database structures, managing record relationships, and interacting with data without requiring manual SQL queries. The platform normalizes interactions across various SQL and NoSQL data sources, allowing users to manage complex datasets through a centralized interface. The project distinguishes itself by automatically generating RESTful and GraphQL APIs from existing database schemas, enabl
Monitors database state changes to trigger external HTTP requests based on user-defined rules.
Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with identical logic, the platform ensures exactly-once processing semantics and consistent results across diverse data sources. The framework distinguishes itself through its specialized support for real-time artificial intelligence and retrieval-augmented generation. It features in
Transitions batch processing workflows into continuous, real-time streaming operations while preserving core transformation logic.
FFmpeg is a cross-platform multimedia framework designed for the recording, conversion, and streaming of audio and video content. It functions as a comprehensive toolkit that provides both a command-line utility for direct media manipulation and a collection of low-level libraries for integration into custom applications. At its core, the project utilizes a packet-based stream engine and a format-agnostic abstraction layer to handle diverse media standards, containers, and network protocols. The framework distinguishes itself through a modular, graph-based filter execution model that allows f
Manages media data as discrete, timestamped packets to facilitate efficient routing and manipulation across the pipeline.
This project is a command-line storage manager that provides a unified interface for performing file operations across local filesystems and diverse cloud storage providers. It functions as a cross-platform storage abstraction, utilizing a modular backend architecture to map heterogeneous cloud storage APIs into a standard set of file system operations. This allows for consistent data management and movement regardless of the underlying storage service. The tool serves as a network data transfer engine designed for automated data migration and cloud storage synchronization. It distinguishes i
Manages high-performance data transport by handling bandwidth, authentication, and secure connections between storage endpoints.
This project is a command-line utility designed to fetch video, audio, and image content from a wide range of web platforms. It functions by parsing page metadata and utilizing modular, site-specific scripts to extract direct media stream URLs from complex web structures, enabling the local archiving of digital media for offline use. The tool distinguishes itself through its ability to handle authenticated content, allowing users to inject browser-stored session cookies to access restricted or private media. It also supports real-time media streaming by piping remote content directly into ext
Streams binary media data into external processes to enable continuous, real-time consumption.
Requests is a high-level HTTP client library designed to simplify web communication and API integration. It provides an intuitive, human-readable interface for performing standard network operations, including request execution, connection pooling, and stateful session management. By encapsulating raw network data into structured objects, the library automates the complexities of headers, cookies, and payload transmission. The library distinguishes itself through a modular transport adapter layer that allows for custom protocol handling and extensible authentication hooks. It supports a wide
Minimizes memory usage by deferring the retrieval of large response bodies until they are explicitly accessed.
RxJava is a reactive stream processing framework and JVM reactive extensions library. It serves as an asynchronous dataflow orchestrator used to compose event-based programs by transforming, combining, and consuming real-time data flows on the Java Virtual Machine. The project distinguishes itself through integrated backpressure flow control, which manages the emission rate between producers and consumers to prevent memory exhaustion. It further provides mechanisms for concurrent thread management and parallel data processing to offload blocking operations and maintain application responsiven
Provides a framework for the continuous ingestion, transformation, and analysis of high-velocity data streams.
MediaPipe is a cross-platform machine learning framework designed for building and deploying pipelines that process live and streaming media. It provides a system for connecting processing components into custom machine learning chains to analyze real-time audio and video streams. The framework includes a suite of pre-trained models for tasks such as hand, face, and pose tracking, along with tools for retraining and customizing these models with specific datasets. It also features a dedicated benchmarker for measuring the execution speed and accuracy of machine learning models directly within
Uses typed data packets to communicate between pipeline components, ensuring synchronized media frames and timestamps.
This project is a low-dependency engine designed for training large language models using native C and CUDA. It provides a bare-metal environment for tensor computation, allowing for the execution of neural network operations directly on hardware accelerators without the overhead of high-level software abstractions. The framework distinguishes itself by implementing manual gradient backpropagation and custom hardware-specific kernels, providing granular control over memory mapping and computational precision. It supports distributed training across multiple graphics processors and compute nod
Reads pre-processed tokenized data directly from disk into memory to bypass input-output bottlenecks during training.
Facefusion is a modular framework designed for automated image and video manipulation, specializing in tasks such as face swapping, enhancement, and restoration. It functions as a computer vision processing pipeline that chains independent machine learning modules to perform complex transformations, including facial animation, age modification, and lip synchronization. The system is built to handle both real-time interactive feeds and large-scale batch processing tasks. The platform distinguishes itself through a highly extensible architecture that supports custom processing modules and inter
Processes video inputs as discrete sequential frames for real-time manipulation.
Hyperframes is an HTML-to-video rendering engine and composition tool that transforms web layouts and CSS into encoded video files. It functions as a headless browser video pipeline and a distributed video rendering framework, allowing users to create seekable animations and programmatic motion designs using HTML, CSS, and JavaScript. The project differentiates itself as an AI agent video orchestrator, enabling the automation of video scripts and compositions through natural language prompts. It supports distributed video encoding by splitting rendering tasks across multiple serverless functi
Distributes frame processing across multiple worker processes to increase overall encoding speed.
Apache Flink is a distributed processing engine designed for both high-throughput, low-latency data streams and finite batch workloads. It functions as a stateful stream processor and a SQL stream processing engine, providing a unified runtime to execute relational queries and event-based transformations. The system is distinguished by its ability to manage persistent operator state to ensure exactly-once processing guarantees and consistency during failures. It features specialized capabilities for complex event processing to detect temporal patterns and handles out-of-order events using eve
Provides a distributed framework for the continuous ingestion, transformation, and analysis of high-velocity data streams.
Toon is a data serialization library and toolkit designed to convert complex objects into compact, human-readable formats optimized for large language models. By focusing on token efficiency, the library minimizes the context window footprint of structured data through techniques like key folding and tabular layout optimization. It provides a streaming-capable processor that handles the encoding and decoding of hierarchical data while maintaining structural integrity. The project distinguishes itself through its path-aware transformation pipeline and configurable serialization logic, which al
Handles massive datasets through event-driven stream processing to ensure memory efficiency.
MiniCPM-o is a multimodal large language model designed to function as a real-time conversational assistant on edge devices. By mapping text, image, video, and audio inputs into a unified latent space, the system enables simultaneous cross-modal reasoning and full-duplex interaction. It is built as an edge-side inference engine, utilizing quantized model weights to maintain high-performance processing on consumer hardware. The system distinguishes itself through its integrated speech synthesis and voice cloning capabilities, which allow for the generation of expressive, personalized vocal out
Utilizes concurrent input and output buffers to enable full-duplex, real-time conversational stream processing.
This project is an educational platform and tutorial series designed to teach the Go programming language through the practice of test-driven development. It provides a structured path for developers to master language fundamentals, concurrency, and standard library usage by building functional applications in small, verifiable increments. The core methodology centers on the test-driven development cycle, where failing tests are written before implementation to define requirements and ensure code correctness. This approach is applied across a wide range of practical scenarios, including the c
Move the read pointer back to the beginning of a data stream to allow multiple read operations on the same source.
simdjson is a high-performance, header-only C++ library designed for parsing, querying, and serializing JSON data with minimal memory overhead. It functions as a hardware-aware data processing engine that leverages vector instructions to achieve gigabyte-per-second parsing speeds. By detecting host processor capabilities at runtime, the library automatically selects the most efficient instruction sets to accelerate structural analysis and validation. The library distinguishes itself through a focus on extreme efficiency and resource management. It utilizes memory mapping and padded buffer ali
Handling continuous streams or concatenated JSON documents incrementally to maintain low memory usage while maintaining high throughput.