Why is junegunn/fzf a recommended Data Streaming GitHub Repositories repository?

Captures data from standard input streams to allow interactive selection and processing of piped content.

Why is openhands/openhands a recommended Data Streaming GitHub Repositories repository?

Outputs system events as structured JSON lines to facilitate real-time integration with external monitoring and logging pipelines.

Why is pathwaycom/pathway a recommended Data Streaming GitHub Repositories repository?

Transitions batch processing workflows into continuous, real-time streaming operations while preserving core transformation logic.

Why is soimort/you-get a recommended Data Streaming GitHub Repositories repository?

Streams binary media data into external processes to enable continuous, real-time consumption.

Why is psf/requests a recommended Data Streaming GitHub Repositories repository?

Minimizes memory usage by deferring the retrieval of large response bodies until they are explicitly accessed.

Why is karpathy/llm.c a recommended Data Streaming GitHub Repositories repository?

Reads pre-processed tokenized data directly from disk into memory to bypass input-output bottlenecks during training.

Why is apache/flink a recommended Data Streaming GitHub Repositories repository?

Manages persistent operator state to ensure exactly-once processing and consistency during failures.

Why is toon-format/toon a recommended Data Streaming GitHub Repositories repository?

Handles massive datasets through event-driven stream processing to ensure memory efficiency.

Why is quii/learn-go-with-tests a recommended Data Streaming GitHub Repositories repository?

Move the read pointer back to the beginning of a data stream to allow multiple read operations on the same source.

Why is vonng/ddia a recommended Data Streaming GitHub Repositories repository?

Details the streaming of event data to enable continuous, low-latency processing.

97 repository-uri

Awesome GitHub RepositoriesData Streaming

Technologies and architectures that facilitate the continuous, real-time flow and processing of data records.

Explore 97 awesome GitHub repositories matching data & databases · Data Streaming. Refine with filters or upvote what's useful.

Găsește cele mai bune repo-uri cu AI.Vom căuta cele mai potrivite repository-uri folosind AI.

junegunn/fzf
junegunn/fzf
81,017Vezi pe GitHub
This project is a general-purpose command-line filter that provides an interactive interface for processing standard input streams. It enables real-time fuzzy searching, data selection, and transformation, allowing users to navigate complex information or file systems directly within their terminal. By utilizing a pipe-oriented architecture, it integrates into existing shell pipelines and workflows to facilitate efficient data exploration. What distinguishes this tool is its highly extensible, event-driven design that allows for deep integration with external processes. It supports asynchrono
Captures data from standard input streams to allow interactive selection and processing of piped content.
Gobashclifish
Vezi pe GitHub81,017
openhands/openhands
OpenHands/OpenHands
77,330Vezi pe GitHub
OpenHands is an autonomous agent framework designed for software engineering workflows. It provides a modular platform for orchestrating AI agents that reason, plan, and execute tasks within isolated, containerized development environments. By integrating with standard version control and development tools, the system enables agents to autonomously navigate codebases, implement features, and resolve issues through iterative reasoning and tool execution. The platform distinguishes itself through a model-agnostic orchestrator that connects diverse language models to a unified tool registry. It
Outputs system events as structured JSON lines to facilitate real-time integration with external monitoring and logging pipelines.
Pythonagentartificial-intelligencechatgpt
Vezi pe GitHub77,330
pathwaycom/pathway
pathwaycom/pathway
62,959Vezi pe GitHub
Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with identical logic, the platform ensures exactly-once processing semantics and consistent results across diverse data sources. The framework distinguishes itself through its specialized support for real-time artificial intelligence and retrieval-augmented generation. It features in
Transitions batch processing workflows into continuous, real-time streaming operations while preserving core transformation logic.
Pythonbatch-processingdata-analyticsdata-pipelines
Vezi pe GitHub62,959
soimort/you-get
soimort/you-get
56,839Vezi pe GitHub
This project is a command-line utility designed to fetch video, audio, and image content from a wide range of web platforms. It functions by parsing page metadata and utilizing modular, site-specific scripts to extract direct media stream URLs from complex web structures, enabling the local archiving of digital media for offline use. The tool distinguishes itself through its ability to handle authenticated content, allowing users to inject browser-stored session cookies to access restricted or private media. It also supports real-time media streaming by piping remote content directly into ext
Streams binary media data into external processes to enable continuous, real-time consumption.
Python
Vezi pe GitHub56,839
psf/requests
psf/requests
54,044Vezi pe GitHub
Requests is a high-level HTTP client library designed to simplify web communication and API integration. It provides an intuitive, human-readable interface for performing standard network operations, including request execution, connection pooling, and stateful session management. By encapsulating raw network data into structured objects, the library automates the complexities of headers, cookies, and payload transmission. The library distinguishes itself through a modular transport adapter layer that allows for custom protocol handling and extensible authentication hooks. It supports a wide
Minimizes memory usage by deferring the retrieval of large response bodies until they are explicitly accessed.
Pythonclientcookiesforhumans
Vezi pe GitHub54,044
karpathy/llm.c
karpathy/llm.c
30,230Vezi pe GitHub
This project is a low-dependency engine designed for training large language models using native C and CUDA. It provides a bare-metal environment for tensor computation, allowing for the execution of neural network operations directly on hardware accelerators without the overhead of high-level software abstractions. The framework distinguishes itself by implementing manual gradient backpropagation and custom hardware-specific kernels, providing granular control over memory mapping and computational precision. It supports distributed training across multiple graphics processors and compute nod
Reads pre-processed tokenized data directly from disk into memory to bypass input-output bottlenecks during training.
Cuda
Vezi pe GitHub30,230
apache/flink
apache/flink
26,086Vezi pe GitHub
Apache Flink is a distributed processing engine designed for both high-throughput, low-latency data streams and finite batch workloads. It functions as a stateful stream processor and a SQL stream processing engine, providing a unified runtime to execute relational queries and event-based transformations. The system is distinguished by its ability to manage persistent operator state to ensure exactly-once processing guarantees and consistency during failures. It features specialized capabilities for complex event processing to detect temporal patterns and handles out-of-order events using eve
Manages persistent operator state to ensure exactly-once processing and consistency during failures.
Java
Vezi pe GitHub26,086
toon-format/toon
toon-format/toon
24,642Vezi pe GitHub
Toon is a data serialization library and toolkit designed to convert complex objects into compact, human-readable formats optimized for large language models. By focusing on token efficiency, the library minimizes the context window footprint of structured data through techniques like key folding and tabular layout optimization. It provides a streaming-capable processor that handles the encoding and decoding of hierarchical data while maintaining structural integrity. The project distinguishes itself through its path-aware transformation pipeline and configurable serialization logic, which al
Handles massive datasets through event-driven stream processing to ensure memory efficiency.
TypeScriptdata-formatllmserialization
Vezi pe GitHub24,642
quii/learn-go-with-tests
quii/learn-go-with-tests
23,510Vezi pe GitHub
This project is an educational platform and tutorial series designed to teach the Go programming language through the practice of test-driven development. It provides a structured path for developers to master language fundamentals, concurrency, and standard library usage by building functional applications in small, verifiable increments. The core methodology centers on the test-driven development cycle, where failing tests are written before implementation to define requirements and ensure code correctness. This approach is applied across a wide range of practical scenarios, including the c
Move the read pointer back to the beginning of a data stream to allow multiple read operations on the same source.
Gogogolangtdd
Vezi pe GitHub23,510
vonng/ddia
Vonng/ddia
22,648Vezi pe GitHub
This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent in modern data infrastructure. The repository explores the mechanics of distributed data management, including strategies for replication, partitioning, and achieving consensus across multiple nodes. It details the design of storage engines, indexing techniques, and transaction management models, whi
Details the streaming of event data to enable continuous, low-latency processing.
Pythonbookdatabaseddia
Vezi pe GitHub22,648
redis/go-redis
redis/go-redis
22,159Vezi pe GitHub
This project is a feature-rich Go client library designed for interacting with Redis. It serves as a comprehensive interface for managing remote data stores, enabling developers to execute standard database commands, handle complex data structures, and perform asynchronous operations within Go applications. The library distinguishes itself through its support for advanced Redis capabilities, including connection pooling, pipelining, and transactional integrity. It provides specialized primitives for managing distributed clusters, including automated topology updates and request routing to sha
Enables reliable message processing and stream management using Redis consumer groups and event-driven patterns.
Gogogolangredis
Vezi pe GitHub22,159
backtrader/backtrader
backtrader/backtrader
22,019Vezi pe GitHub
Backtrader is a Python backtesting framework and algorithmic trading platform. It provides a toolkit for developing automated trading rules and simulating investment strategies using historical financial time-series data. The system functions as a quantitative analysis tool, combining a simulation engine for testing trading rules with a financial data visualizer that generates price action charts. It allows for the calculation of technical indicators and the evaluation of portfolio performance through risk-adjusted returns. The platform covers live trading integration via brokerage APIs and
Processes financial time-series data as a continuous stream to optimize memory usage.
Python
Vezi pe GitHub22,019
thingsboard/thingsboard
thingsboard/thingsboard
21,907Vezi pe GitHub
ThingsBoard is an IoT device management platform designed for provisioning, monitoring, and managing large fleets of hardware devices and assets across multiple customers. It functions as a microservices infrastructure that allows the deployment of data collection and management services as independent containerized units for scaling. The platform includes a rule-based stream processor that transforms incoming device data and triggers alarms using customizable rule chains. It also provides a data visualization suite consisting of dashboards and widgets to display real-time telemetry and syste
Features an event-driven streaming processor that transforms incoming IoT data using customizable rule chains.
Javabig-datacloudcoap-server
Vezi pe GitHub21,907
vercel/ai
vercel/ai
21,885Vezi pe GitHub
This project is a comprehensive framework for building AI-powered applications, providing a unified toolkit for orchestrating language models, autonomous agents, and interactive user interfaces. It serves as a central library for managing the entire lifecycle of AI interactions, from initial prompt generation and model provider abstraction to complex, multi-step reasoning and tool execution. The framework distinguishes itself through its deep integration with frontend development, specifically by enabling generative user interfaces that render dynamic components directly from model outputs. I
Generate and render complex JSON objects incrementally as they are produced by an AI model, allowing for real-time UI updates based on structured data.
TypeScriptanthropicartificial-intelligencegemini
Vezi pe GitHub21,885
guidance-ai/guidance
guidance-ai/guidance
21,502Vezi pe GitHub
Guidance is a generative AI orchestration framework designed to manage complex interactions with language models by embedding programmatic control directly into the prompt generation process. It functions as a prompt programming environment that allows developers to interleave raw text with executable logic, enabling the construction of sophisticated, multi-step agentic workflows. The framework distinguishes itself through grammar-constrained token sampling and stateful stream interception, which restrict the model's output distribution based on formal language rules. By enforcing these const
Monitors and modifies token generation in real time to enforce logical constraints on model output.
Jupyter Notebook
Vezi pe GitHub21,502
filosottile/age
FiloSottile/age
21,369Vezi pe GitHub
Age is a command-line utility for file encryption that utilizes hybrid cryptography to secure data for multiple recipients. It employs a combination of asymmetric key exchange and symmetric encryption to protect files, supporting access control through public keys, shared passphrases, and hardware-backed identity integration. The tool is designed for memory-efficient operation, utilizing stream-oriented processing to handle large datasets in small, sequential chunks. It features a stanza-based metadata framing system that allows for extensible file headers and supports random-access decryptio
Handles large files by streaming data in small chunks to maintain performance during cryptographic operations.
Goage-encryptionbuilt-at-rc
Vezi pe GitHub21,369
mastra-ai/mastra
mastra-ai/mastra
21,221Vezi pe GitHub
Mastra is an orchestration framework designed for building, deploying, and managing autonomous AI agents and multi-agent systems. It provides a comprehensive suite of primitives for creating resilient AI applications, including durable workflow orchestration, event-driven agent loops, and semantic memory management. By integrating these core components, the platform enables developers to build complex, multi-step processes that can reason about goals and execute tasks without manual intervention. The framework distinguishes itself through its focus on observability and secure, isolated execut
Delivers structured objects incrementally during generation while maintaining access to the final object.
TypeScriptagentsaichatbots
Vezi pe GitHub21,221
letta-ai/letta
letta-ai/letta
21,168Vezi pe GitHub
Letta is a framework for building, deploying, and managing autonomous AI agents that maintain persistent state across long-term interactions. It provides a comprehensive suite of primitives for defining agents with configurable personas, modular memory blocks, and tool-use capabilities, enabling them to retain user preferences and conversation history over extended sessions. The platform distinguishes itself through its advanced memory management and orchestration capabilities. It allows agents to autonomously update their own memory, perform retrieval-augmented generation, and coordinate com
Exchanges JSON messages over standard input and output to allow external programs to control agent behavior and receive responses.
Pythonaiai-agentsllm
Vezi pe GitHub21,168
nats-io/nats-server
nats-io/nats-server
20,076Vezi pe GitHub
NATS Server is a high-performance, lightweight messaging system designed for cloud-native applications, edge computing, and distributed microservices. It functions as a distributed publish-subscribe broker that routes messages using hierarchical, dot-separated subject strings, enabling decoupled communication between services without requiring centralized broker lookups. The system supports core messaging patterns including asynchronous publish-subscribe, request-reply, and load-balanced queue processing. The platform distinguishes itself through a decentralized architecture that eliminates t
Provides log-structured stream persistence for reliable message replay and durable delivery.
Gocloudcloud-computingcloud-native
Vezi pe GitHub20,076
microsoft/onnxruntime
microsoft/onnxruntime
19,347Vezi pe GitHub
This project is a cross-platform machine learning inference engine designed to execute pre-trained models across diverse operating systems and hardware environments. It functions as a standardized execution framework that manages the entire lifecycle of model inference, from loading and graph optimization to hardware-accelerated execution and generative sequence management. The runtime distinguishes itself through a highly modular architecture that decouples model logic from hardware-specific kernels. By utilizing an execution provider abstraction, it enables developers to offload computation
Delivers generated tokens incrementally to the user interface as they are produced to provide real-time feedback.
C++ai-frameworkdeep-learninghardware-acceleration
Vezi pe GitHub19,347