5 dépôts
Engines that process data updates incrementally by propagating changes through a directed graph of operations.
Explore 5 awesome GitHub repositories matching data & databases · Differential Dataflow Engines. Refine with filters or upvote what's useful.
Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with identical logic, the platform ensures exactly-once processing semantics and consistent results across diverse data sources. The framework distinguishes itself through its specialized support for real-time artificial intelligence and retrieval-augmented generation. It features in
Propagates data updates incrementally through a directed graph of operators to maintain real-time consistency.
This project is a data processing engine and AI application platform designed for building production-grade machine learning workflows. It provides a unified programming model that handles both historical batch data and live stream ingestion, enabling the development of real-time ETL pipelines and scalable data transformation workflows. The framework distinguishes itself through differential dataflow execution, which propagates only changes through a pipeline rather than recomputing entire datasets. It supports distributed state management across worker nodes and utilizes incremental stream p
Propagates incremental updates through directed graphs to avoid full dataset recomputation during query processing.
Apache MXNet is a deep learning framework and distributed machine learning library designed for training and deploying neural networks across distributed systems, mobile devices, and hardware accelerators. It functions as a cross-platform runtime and a dynamic dataflow scheduler that optimizes neural network execution. The framework provides a multi-language API, enabling the development of machine learning models using Python, R, Julia, Scala, Go, and JavaScript. It supports high-performance model training and the scaling of workloads across multiple GPUs and machines. The system covers cap
Tracks changes to data buffers within the execution graph to incrementally propagate updates and optimize memory efficiency.
Materialize is a streaming SQL database that continuously ingests live data from sources such as Kafka, Redpanda, PostgreSQL, and MySQL, and incrementally maintains materialized views. It provides a PostgreSQL-compatible query engine that accepts standard SQL over the PostgreSQL wire protocol, enabling any existing SQL client or BI tool to query real-time data. The system also includes a Model Context Protocol (MCP) server that exposes live materialized view data to AI agents, providing fresh context without polling. Materialize distinguishes itself through its ability to offer configurable c
Computes incremental view updates by propagating changes through a data-parallel, timely dataflow graph.
This project is a client-side data management library and query orchestrator designed to synchronize remote server state with local client state. It functions as a type-safe state manager and cache orchestrator that coordinates data loading across diverse backends, including REST, GraphQL, and WebSockets. The system distinguishes itself through a durable workflow engine for executing asynchronous functions with persisted state and deterministic replay. It also provides a standardized AI integration adapter to connect large language models to application data, supporting real-time response str
Implements differential dataflow to trigger updates only for specific data slices that have changed.