95 Repos
Architectures and frameworks designed for the continuous ingestion, transformation, and analysis of high-velocity data streams.
Explore 95 awesome GitHub repositories matching data & databases · Stream Processing. Refine with filters or upvote what's useful.
Deep-Live-Cam is a generative video transformation tool designed for real-time facial manipulation and cinematic enhancement. It functions as a local-first AI runtime, performing all media processing directly on the user's hardware to ensure complete data privacy without external network dependencies. By utilizing a high-performance processing pipeline, the application enables live face swapping and interactive video modifications during active streaming sessions or on pre-recorded media. The system distinguishes itself through a hardware-abstraction execution layer that dynamically routes co
Processes video input as discrete sequential frames for real-time manipulation.
This project is a comprehensive Java backend engineering guide and technical reference focused on high-concurrency design, distributed systems, and microservices architecture. It provides detailed strategies for decomposing monolithic applications, managing service discovery, and implementing the architectural patterns required for scalable backend environments. The repository distinguishes itself through an extensive collection of big data algorithmic references and database scaling strategies. It covers memory-efficient techniques for analyzing massive datasets, such as Top-K element extrac
Provides algorithms for calculating the median of massive numeric sets and continuous streams.
RxJava is a reactive stream processing framework and JVM reactive extensions library. It serves as an asynchronous dataflow orchestrator used to compose event-based programs by transforming, combining, and consuming real-time data flows on the Java Virtual Machine. The project distinguishes itself through integrated backpressure flow control, which manages the emission rate between producers and consumers to prevent memory exhaustion. It further provides mechanisms for concurrent thread management and parallel data processing to offload blocking operations and maintain application responsiven
Provides a framework for the continuous ingestion, transformation, and analysis of high-velocity data streams.
Facefusion is a modular framework designed for automated image and video manipulation, specializing in tasks such as face swapping, enhancement, and restoration. It functions as a computer vision processing pipeline that chains independent machine learning modules to perform complex transformations, including facial animation, age modification, and lip synchronization. The system is built to handle both real-time interactive feeds and large-scale batch processing tasks. The platform distinguishes itself through a highly extensible architecture that supports custom processing modules and inter
Processes video inputs as discrete sequential frames for real-time manipulation.
Hyperframes is an HTML-to-video rendering engine and composition tool that transforms web layouts and CSS into encoded video files. It functions as a headless browser video pipeline and a distributed video rendering framework, allowing users to create seekable animations and programmatic motion designs using HTML, CSS, and JavaScript. The project differentiates itself as an AI agent video orchestrator, enabling the automation of video scripts and compositions through natural language prompts. It supports distributed video encoding by splitting rendering tasks across multiple serverless functi
Distributes frame processing across multiple worker processes to increase overall encoding speed.
Apache Flink is a distributed processing engine designed for both high-throughput, low-latency data streams and finite batch workloads. It functions as a stateful stream processor and a SQL stream processing engine, providing a unified runtime to execute relational queries and event-based transformations. The system is distinguished by its ability to manage persistent operator state to ensure exactly-once processing guarantees and consistency during failures. It features specialized capabilities for complex event processing to detect temporal patterns and handles out-of-order events using eve
Provides a distributed framework for the continuous ingestion, transformation, and analysis of high-velocity data streams.
simdjson is a high-performance, header-only C++ library designed for parsing, querying, and serializing JSON data with minimal memory overhead. It functions as a hardware-aware data processing engine that leverages vector instructions to achieve gigabyte-per-second parsing speeds. By detecting host processor capabilities at runtime, the library automatically selects the most efficient instruction sets to accelerate structural analysis and validation. The library distinguishes itself through a focus on extreme efficiency and resource management. It utilizes memory mapping and padded buffer ali
Handling continuous streams or concatenated JSON documents incrementally to maintain low memory usage while maintaining high throughput.
This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent in modern data infrastructure. The repository explores the mechanics of distributed data management, including strategies for replication, partitioning, and achieving consensus across multiple nodes. It details the design of storage engines, indexing techniques, and transaction management models, whi
Executes continuous stream processing workflows to derive real-time insights.
RocketMQ is a cloud-native distributed messaging platform and streaming engine. It functions as a distributed transactional queue that ensures atomicity between local transactions and message delivery, and serves as an MQTT IoT message broker to bridge lightweight device traffic into high-performance data streams. The system is distinguished by a Kubernetes-native architecture that decouples compute from storage to allow independent scaling of traffic and data retention. It utilizes a tiered storage model to offload older data to remote storage and employs quorum-based replication and automat
Manages high-throughput data streams with strict message ordering, delayed delivery, and historical retrieval.
This project is a feature-rich Go client library designed for interacting with Redis. It serves as a comprehensive interface for managing remote data stores, enabling developers to execute standard database commands, handle complex data structures, and perform asynchronous operations within Go applications. The library distinguishes itself through its support for advanced Redis capabilities, including connection pooling, pipelining, and transactional integrity. It provides specialized primitives for managing distributed clusters, including automated topology updates and request routing to sha
Provides architectures and frameworks designed for the continuous ingestion, transformation, and analysis of high-velocity data streams.
Vector is a high-performance observability data pipeline designed to collect, transform, and route logs, metrics, and traces across distributed infrastructure. It functions as a modular engine that decouples data ingestion from processing and transmission, utilizing a component-based architecture to connect diverse sources to multiple destinations. The project distinguishes itself through a focus on reliability and flow control. It implements backpressure-aware data movement to prevent data loss during traffic spikes and utilizes disk-backed event buffering to ensure durability during network
Manages high-volume observability data streams with buffering, backpressure, and reliable delivery guarantees.
This project is a comprehensive framework for building AI-powered applications, providing a unified toolkit for orchestrating language models, autonomous agents, and interactive user interfaces. It serves as a central library for managing the entire lifecycle of AI interactions, from initial prompt generation and model provider abstraction to complex, multi-step reasoning and tool execution. The framework distinguishes itself through its deep integration with frontend development, specifically by enabling generative user interfaces that render dynamic components directly from model outputs. I
Transforms raw message chunks into structured streams for real-time AI response processing.
Crush is a framework designed to orchestrate and secure the execution of external tools invoked by large language models. It functions as a middleware layer that manages the flow of agentic tool calls, providing a controlled environment for terminal-based automation and task processing. The project distinguishes itself by implementing a policy-driven security layer that intercepts, validates, and modifies tool execution requests. By wrapping command calls within a process-boundary layer, it allows for the automated approval of specific operations and the dynamic injection of contextual metada
Pipes structured data through modular command-line filters for automated data processing.
NATS Server is a high-performance, lightweight messaging system designed for cloud-native applications, edge computing, and distributed microservices. It functions as a distributed publish-subscribe broker that routes messages using hierarchical, dot-separated subject strings, enabling decoupled communication between services without requiring centralized broker lookups. The system supports core messaging patterns including asynchronous publish-subscribe, request-reply, and load-balanced queue processing. The platform distinguishes itself through a decentralized architecture that eliminates t
Interleaves multiple data streams with support for subject-based transformations and filtering.
DeOldify is a deep learning system and a set of pre-trained computer vision models designed to apply realistic colors to grayscale photographs and video footage. It functions as a neural media restoration tool that uses trained networks to estimate original hues for black-and-white media and remove glitches and artifacts from aged images and film. The project employs a NoGAN colorization technique that removes the GAN discriminator during training to prevent artifacts and avoid over-saturation of pixels. For cinematic sequences, it applies temporal frame consistency to maintain color stabilit
Applies temporal stability constraints to prevent color flickering across consecutive video frames.
CodeFormer is a deep learning framework designed for the restoration and enhancement of facial images and video sequences. It functions as a comprehensive processing engine capable of reconstructing high-quality facial features from degraded, blurry, or damaged inputs, while also providing tools for image upscaling and generative inpainting to fill missing or corrupted regions. The system distinguishes itself by utilizing a codebook-based quantization approach that maps input patches to high-quality facial representations, supported by transformer-based global modeling to ensure structural co
Applies frame-to-frame constraints to minimize flickering and maintain stability across video sequences.
OpenObserve is a unified observability data platform designed to ingest, store, and analyze logs, metrics, and traces. It functions as a cloud-native monitoring tool that centralizes telemetry from diverse sources, including standard collectors and cloud service providers, into a single, scalable system. By utilizing a columnar storage engine backed by object storage, the platform enables efficient long-term data retention and high-performance analytical querying. The platform distinguishes itself through deep integration with artificial intelligence, allowing users to query data using natura
Transforms, filters, and enriches telemetry data in real-time during ingestion before persisting it to the underlying storage layer.
Nodemailer is a comprehensive library for Node.js applications designed to handle the composition, signing, and transmission of email messages. It provides a robust framework for constructing MIME-compliant content, managing complex attachments, and routing messages through various delivery channels, including standard SMTP servers, local mail transfer agents, and cloud-based email services. The library distinguishes itself through a modular, plugin-based transport architecture that allows for custom delivery mechanisms and environment-specific configurations. It includes advanced features fo
Processes email data as continuous streams to efficiently manage large attachments and complex structures without memory exhaustion.
This project is a comprehensive technical interview preparation resource and computer science interview guide. It serves as an educational reference for developers to study core software engineering fundamentals and common coding patterns required for employment screenings. The repository provides detailed guides and references covering data structures and algorithms, networking and security, operating systems, and web development. It specifically focuses on the implementation and complexity analysis of sorting, searching, and graph algorithms. The material encompasses a wide breadth of comp
The use of lazy evaluations and internal iterations to filter and aggregate data sequences.
AISystem is a comprehensive AI full-stack infrastructure project covering the entire pipeline from AI chip architecture to high-level training frameworks. It encompasses the development of AI compiler frameworks, inference engines, and distributed training orchestrators designed to coordinate workloads across a heterogeneous compute stack of CPUs, GPUs, and NPUs. The project focuses on the deep integration of software and hardware, employing software-hardware co-design to align tensor layouts with physical memory structures. It provides specialized capabilities for accelerating Transformer mo
Divides large matrices into smaller blocks to balance memory bandwidth and maximize hardware compute utilization.