97 repository-uri
Technologies and architectures that facilitate the continuous, real-time flow and processing of data records.
Explore 97 awesome GitHub repositories matching data & databases · Data Streaming. Refine with filters or upvote what's useful.
This project is a general-purpose command-line filter that provides an interactive interface for processing standard input streams. It enables real-time fuzzy searching, data selection, and transformation, allowing users to navigate complex information or file systems directly within their terminal. By utilizing a pipe-oriented architecture, it integrates into existing shell pipelines and workflows to facilitate efficient data exploration. What distinguishes this tool is its highly extensible, event-driven design that allows for deep integration with external processes. It supports asynchrono
Captures data from standard input streams to allow interactive selection and processing of piped content.
OpenHands is an autonomous agent framework designed for software engineering workflows. It provides a modular platform for orchestrating AI agents that reason, plan, and execute tasks within isolated, containerized development environments. By integrating with standard version control and development tools, the system enables agents to autonomously navigate codebases, implement features, and resolve issues through iterative reasoning and tool execution. The platform distinguishes itself through a model-agnostic orchestrator that connects diverse language models to a unified tool registry. It
Outputs system events as structured JSON lines to facilitate real-time integration with external monitoring and logging pipelines.
Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with identical logic, the platform ensures exactly-once processing semantics and consistent results across diverse data sources. The framework distinguishes itself through its specialized support for real-time artificial intelligence and retrieval-augmented generation. It features in
Transitions batch processing workflows into continuous, real-time streaming operations while preserving core transformation logic.
This project is a command-line utility designed to fetch video, audio, and image content from a wide range of web platforms. It functions by parsing page metadata and utilizing modular, site-specific scripts to extract direct media stream URLs from complex web structures, enabling the local archiving of digital media for offline use. The tool distinguishes itself through its ability to handle authenticated content, allowing users to inject browser-stored session cookies to access restricted or private media. It also supports real-time media streaming by piping remote content directly into ext
Streams binary media data into external processes to enable continuous, real-time consumption.
Requests is a high-level HTTP client library designed to simplify web communication and API integration. It provides an intuitive, human-readable interface for performing standard network operations, including request execution, connection pooling, and stateful session management. By encapsulating raw network data into structured objects, the library automates the complexities of headers, cookies, and payload transmission. The library distinguishes itself through a modular transport adapter layer that allows for custom protocol handling and extensible authentication hooks. It supports a wide
Minimizes memory usage by deferring the retrieval of large response bodies until they are explicitly accessed.
This project is a low-dependency engine designed for training large language models using native C and CUDA. It provides a bare-metal environment for tensor computation, allowing for the execution of neural network operations directly on hardware accelerators without the overhead of high-level software abstractions. The framework distinguishes itself by implementing manual gradient backpropagation and custom hardware-specific kernels, providing granular control over memory mapping and computational precision. It supports distributed training across multiple graphics processors and compute nod
Reads pre-processed tokenized data directly from disk into memory to bypass input-output bottlenecks during training.
Apache Flink is a distributed processing engine designed for both high-throughput, low-latency data streams and finite batch workloads. It functions as a stateful stream processor and a SQL stream processing engine, providing a unified runtime to execute relational queries and event-based transformations. The system is distinguished by its ability to manage persistent operator state to ensure exactly-once processing guarantees and consistency during failures. It features specialized capabilities for complex event processing to detect temporal patterns and handles out-of-order events using eve
Manages persistent operator state to ensure exactly-once processing and consistency during failures.
Toon is a data serialization library and toolkit designed to convert complex objects into compact, human-readable formats optimized for large language models. By focusing on token efficiency, the library minimizes the context window footprint of structured data through techniques like key folding and tabular layout optimization. It provides a streaming-capable processor that handles the encoding and decoding of hierarchical data while maintaining structural integrity. The project distinguishes itself through its path-aware transformation pipeline and configurable serialization logic, which al
Handles massive datasets through event-driven stream processing to ensure memory efficiency.
This project is an educational platform and tutorial series designed to teach the Go programming language through the practice of test-driven development. It provides a structured path for developers to master language fundamentals, concurrency, and standard library usage by building functional applications in small, verifiable increments. The core methodology centers on the test-driven development cycle, where failing tests are written before implementation to define requirements and ensure code correctness. This approach is applied across a wide range of practical scenarios, including the c
Move the read pointer back to the beginning of a data stream to allow multiple read operations on the same source.
This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent in modern data infrastructure. The repository explores the mechanics of distributed data management, including strategies for replication, partitioning, and achieving consensus across multiple nodes. It details the design of storage engines, indexing techniques, and transaction management models, whi
Details the streaming of event data to enable continuous, low-latency processing.
This project is a feature-rich Go client library designed for interacting with Redis. It serves as a comprehensive interface for managing remote data stores, enabling developers to execute standard database commands, handle complex data structures, and perform asynchronous operations within Go applications. The library distinguishes itself through its support for advanced Redis capabilities, including connection pooling, pipelining, and transactional integrity. It provides specialized primitives for managing distributed clusters, including automated topology updates and request routing to sha
Enables reliable message processing and stream management using Redis consumer groups and event-driven patterns.
Backtrader is a Python backtesting framework and algorithmic trading platform. It provides a toolkit for developing automated trading rules and simulating investment strategies using historical financial time-series data. The system functions as a quantitative analysis tool, combining a simulation engine for testing trading rules with a financial data visualizer that generates price action charts. It allows for the calculation of technical indicators and the evaluation of portfolio performance through risk-adjusted returns. The platform covers live trading integration via brokerage APIs and
Processes financial time-series data as a continuous stream to optimize memory usage.
ThingsBoard is an IoT device management platform designed for provisioning, monitoring, and managing large fleets of hardware devices and assets across multiple customers. It functions as a microservices infrastructure that allows the deployment of data collection and management services as independent containerized units for scaling. The platform includes a rule-based stream processor that transforms incoming device data and triggers alarms using customizable rule chains. It also provides a data visualization suite consisting of dashboards and widgets to display real-time telemetry and syste
Features an event-driven streaming processor that transforms incoming IoT data using customizable rule chains.
This project is a comprehensive framework for building AI-powered applications, providing a unified toolkit for orchestrating language models, autonomous agents, and interactive user interfaces. It serves as a central library for managing the entire lifecycle of AI interactions, from initial prompt generation and model provider abstraction to complex, multi-step reasoning and tool execution. The framework distinguishes itself through its deep integration with frontend development, specifically by enabling generative user interfaces that render dynamic components directly from model outputs. I
Generate and render complex JSON objects incrementally as they are produced by an AI model, allowing for real-time UI updates based on structured data.
Guidance is a generative AI orchestration framework designed to manage complex interactions with language models by embedding programmatic control directly into the prompt generation process. It functions as a prompt programming environment that allows developers to interleave raw text with executable logic, enabling the construction of sophisticated, multi-step agentic workflows. The framework distinguishes itself through grammar-constrained token sampling and stateful stream interception, which restrict the model's output distribution based on formal language rules. By enforcing these const
Monitors and modifies token generation in real time to enforce logical constraints on model output.
Age is a command-line utility for file encryption that utilizes hybrid cryptography to secure data for multiple recipients. It employs a combination of asymmetric key exchange and symmetric encryption to protect files, supporting access control through public keys, shared passphrases, and hardware-backed identity integration. The tool is designed for memory-efficient operation, utilizing stream-oriented processing to handle large datasets in small, sequential chunks. It features a stanza-based metadata framing system that allows for extensible file headers and supports random-access decryptio
Handles large files by streaming data in small chunks to maintain performance during cryptographic operations.
Mastra is an orchestration framework designed for building, deploying, and managing autonomous AI agents and multi-agent systems. It provides a comprehensive suite of primitives for creating resilient AI applications, including durable workflow orchestration, event-driven agent loops, and semantic memory management. By integrating these core components, the platform enables developers to build complex, multi-step processes that can reason about goals and execute tasks without manual intervention. The framework distinguishes itself through its focus on observability and secure, isolated execut
Delivers structured objects incrementally during generation while maintaining access to the final object.
Letta is a framework for building, deploying, and managing autonomous AI agents that maintain persistent state across long-term interactions. It provides a comprehensive suite of primitives for defining agents with configurable personas, modular memory blocks, and tool-use capabilities, enabling them to retain user preferences and conversation history over extended sessions. The platform distinguishes itself through its advanced memory management and orchestration capabilities. It allows agents to autonomously update their own memory, perform retrieval-augmented generation, and coordinate com
Exchanges JSON messages over standard input and output to allow external programs to control agent behavior and receive responses.
NATS Server is a high-performance, lightweight messaging system designed for cloud-native applications, edge computing, and distributed microservices. It functions as a distributed publish-subscribe broker that routes messages using hierarchical, dot-separated subject strings, enabling decoupled communication between services without requiring centralized broker lookups. The system supports core messaging patterns including asynchronous publish-subscribe, request-reply, and load-balanced queue processing. The platform distinguishes itself through a decentralized architecture that eliminates t
Provides log-structured stream persistence for reliable message replay and durable delivery.
This project is a cross-platform machine learning inference engine designed to execute pre-trained models across diverse operating systems and hardware environments. It functions as a standardized execution framework that manages the entire lifecycle of model inference, from loading and graph optimization to hardware-accelerated execution and generative sequence management. The runtime distinguishes itself through a highly modular architecture that decouples model logic from hardware-specific kernels. By utilizing an execution provider abstraction, it enables developers to offload computation
Delivers generated tokens incrementally to the user interface as they are produced to provide real-time feedback.