Streaming, queues and change data capture

Explore distributed messaging systems, event streaming platforms, and change data capture tools for real-time data pipelines.

Find the best repos with AI.We'll search the best matching repositories with AI.

supabase/supabase
supabase/supabase
104,317View on GitHub
This project provides an integrated backend platform built around a relational database. It automatically generates REST and GraphQL APIs from database schemas, allowing for direct data interaction through standard requests and client libraries. The platform includes a comprehensive authentication system that manages user identity, session handling, and fine-grained access control through database-native row-level security policies. Beyond core data management, the platform offers specialized services for object storage, vector data processing for semantic search, and real-time communication
TypeScriptChange Data Capture StreamsGraphQL API GeneratorsAuthentication Strategies
View on GitHub104,317
alibaba/canal
alibaba/canal
29,697View on GitHub
Canal is a database replication middleware that performs change data capture by simulating a database replica. It monitors transaction logs to stream incremental data modifications to downstream systems in real time, acting as an event streaming infrastructure that transforms low-level binary logs into structured, consumable message streams. The project distinguishes itself through a high-throughput architecture that utilizes concurrent multi-threaded parsing and stateful log position tracking to ensure reliable data delivery. It employs a pluggable sink architecture that decouples data extra
JavaChange Data Capture ServicesChange Data Capture ToolsDatabase Change Subscriptions
View on GitHub29,697
rethinkdb/rethinkdb
rethinkdb/rethinkdb
26,996View on GitHub
RethinkDB is a distributed, document-oriented database designed to store and manage JSON-formatted data across scalable clusters. It utilizes a custom log-structured storage engine with B-Tree indexing to ensure high-performance disk I/O and data persistence. The system maintains high availability through automatic sharding and replication, employing a primary-replica voting consensus mechanism to handle node failures and ensure consistent cluster operations. A defining characteristic of the platform is its reactive changefeed engine, which allows applications to subscribe to live data update
C++Document DatabasesChange Data CaptureQuery Builders
View on GitHub26,996
pingcap/tidb
pingcap/tidb
40,166View on GitHub
TiDB is a horizontally scalable, distributed SQL database designed to provide consistent transactional storage and high-performance analytical processing within a single unified architecture. It utilizes a decoupled compute-storage design and a distributed key-value storage layer to ensure horizontal scalability and efficient range-based queries. By employing a consensus-based replication algorithm, the system maintains high availability and automatic failover across multiple nodes and geographical regions. The platform distinguishes itself through its hybrid transactional and analytical proc
GoAnalytical Query EnginesData Manipulation InterfacesDatabase Lifecycle Management
View on GitHub40,166
aws/aws-cdk
aws/aws-cdk
12,817View on GitHub
The AWS Cloud Development Kit is an infrastructure-as-code framework that enables developers to define and provision cloud resources using familiar programming languages. By utilizing construct-based synthesis, it translates high-level, object-oriented code into declarative templates, allowing for the automated management of complex cloud environments through a centralized, code-driven control plane. The framework distinguishes itself through its ability to model infrastructure as a dependency-aware resource graph, ensuring that components are provisioned and updated in the correct order. It
TypeScriptInfrastructure as CodeAWS ProvisionersCloud Deployment Automation
View on GitHub12,817
permify/permify
Permify/permify
5,812View on GitHub
GoAuthorization ServicesAttribute-Aware Policy EvaluatorsAuthorization Mode Managers
View on GitHub5,812
apache/kafka
apache/kafka
32,846View on GitHub
Kafka is a distributed event streaming platform designed for capturing, storing, and processing real-time data streams across interconnected nodes. It functions as a distributed commit log, providing a fault-tolerant storage mechanism that records state changes sequentially to ensure data consistency and durability across distributed environments. The platform distinguishes itself through a partitioned commit log architecture that enables horizontal scaling and parallel processing of data streams. It integrates a stream processing engine for continuous transformations and aggregations, while
JavaDistributed Event Streaming PlatformsDistributed Commit LogsData Streaming Platforms
View on GitHub32,846
debezium/debezium
debezium/debezium
12,421View on GitHub
Debezium is a distributed change data capture platform that streams row-level database modifications as real-time events. By parsing database transaction logs, the system broadcasts structural and data changes to message brokers, enabling reactive processing and data integration across distributed architectures. The platform utilizes log-based capture to extract modifications directly from transaction logs, ensuring minimal impact on source system performance while maintaining the original commit order of operations. It employs database-specific connector adapters to translate proprietary bin
JavaChange Data CaptureEvent Streaming InfrastructureChange Data Capture Streams
View on GitHub12,421
pathwaycom/llm-app
pathwaycom/llm-app
59,341View on GitHub
This project is a data processing engine and AI application platform designed for building production-grade machine learning workflows. It provides a unified programming model that handles both historical batch data and live stream ingestion, enabling the development of real-time ETL pipelines and scalable data transformation workflows. The framework distinguishes itself through differential dataflow execution, which propagates only changes through a pipeline rather than recomputing entire datasets. It supports distributed state management across worker nodes and utilizes incremental stream p
Jupyter NotebookData Processing FrameworksDifferential Dataflow EnginesDistributed State Management
View on GitHub59,341
electric-sql/electric
electric-sql/electric
9,909View on GitHub
Electric is a Postgres data synchronization engine and replication proxy designed to enable local-first software. It replicates data from Postgres databases to client-side stores in real time using logical replication, allowing applications to maintain a local embedded database for offline access and low-latency updates. The system distinguishes itself by using shapes to filter and authorize specific subsets of database rows and columns before streaming them to clients or edge workers. It further supports multi-user collaboration by integrating a conflict-free replicated data type framework t
ElixirClient-Side State SynchronizersData Synchronization EnginesLocal-First Architectures
View on GitHub9,909
pathwaycom/pathway
pathwaycom/pathway
62,959View on GitHub
Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with identical logic, the platform ensures exactly-once processing semantics and consistent results across diverse data sources. The framework distinguishes itself through its specialized support for real-time artificial intelligence and retrieval-augmented generation. It features in
PythonData Processing FrameworksData Stream ProcessorsDeclarative Pipeline Construction
View on GitHub62,959
rqlite/rqlite
rqlite/rqlite
17,586View on GitHub
rqlite is a distributed relational database that replicates SQLite data across a cluster using the Raft consensus algorithm. It functions as a fault-tolerant storage system that provides high availability and a web API for executing SQL queries and managing relational data without requiring native database drivers. The system distinguishes itself by using an HTTP SQL interface to expose database operations and cluster management. It features a real-time change data capture stream that pushes database mutations to external HTTP endpoints via webhooks and supports the scaling of read throughput
GoDistributed Relational DatabasesRaft Consensus ImplementationsChange Data Capture Streams
View on GitHub17,586
reactivex/rxjs
ReactiveX/rxjs
31,682View on GitHub
RxJS is a library for reactive programming that provides a framework for composing asynchronous and event-based programs. It utilizes observable sequences to model data flows, allowing developers to manage complex sequences of events through a declarative programming interface. The library implements the observer pattern to facilitate decoupled communication between data producers and subscribers. By employing a lazy execution model, streams remain dormant until a consumer explicitly subscribes, at which point data production is triggered. This approach enables the construction of predictable
TypeScriptReactive Programming LibrariesAsynchronous StreamsReactive Programming
View on GitHub31,682
greenrobot/eventbus
greenrobot/EventBus
24,760View on GitHub
EventBus is a publish-subscribe messaging library designed to facilitate decoupled communication between components in Java applications. It functions as a central hub where producers dispatch events that are routed to subscribers based on the class type of the payload. By using annotation-based markers, the system maps event handlers to specific data types, allowing different parts of an application to exchange information without requiring direct references between classes. The library distinguishes itself through a focus on performance and execution control. It utilizes a compile-time inde
JavaEvent Bus SystemsMessage BusesEvent Messaging Systems
View on GitHub24,760
boto/boto3
boto/boto3
9,834View on GitHub
Boto3 is the AWS SDK for Python, providing a programmatic interface for managing and automating AWS cloud infrastructure and services. It serves as a cloud management API client and resource manager for provisioning, configuring, and scaling virtual servers, databases, and storage. The library enables the implementation of infrastructure-as-code through declarative templates and scripts, allowing for the deployment of identical resource stacks across multiple accounts and geographic regions. It also provides a framework for coordinating distributed workflows, serverless functions, and contain
PythonAWS ProvisionersCloud Provisioning TemplatesCloud Service SDKs
View on GitHub9,834
nsqio/nsq
nsqio/nsq
25,738View on GitHub
NSQ is a distributed, brokerless messaging platform designed for high-throughput, fault-tolerant communication. By utilizing a decentralized topology, it eliminates single points of failure and allows for horizontal scaling across clusters. The system organizes message streams into topics and channels, effectively decoupling producers from consumers to support both streaming and job-oriented workloads. The platform distinguishes itself through a lookup-service-based discovery mechanism that enables clients to dynamically locate producers at runtime without requiring centralized coordination.
GoDistributed SystemsMessage BrokersConnection Management Strategies
View on GitHub25,738
pubkey/rxdb
pubkey/rxdb
23,048View on GitHub
This project is a reactive, offline-first NoSQL database engine designed for JavaScript applications. It provides a robust framework for managing application state by synchronizing data across browsers, mobile devices, and server-side runtimes. By treating local storage as the primary source of truth, it enables applications to remain functional without network connectivity, automatically reconciling changes with remote backends once a connection is restored. The database distinguishes itself through a modular architecture that supports cross-environment synchronization and high-performance d
TypeScriptNoSQLCross-Client SynchronizationCross-Device Synchronization Engines
View on GitHub23,048
celery/celery
celery/celery
28,596View on GitHub
Celery is an asynchronous job processor and distributed task queue designed to offload time-consuming operations to background worker nodes. By utilizing a message-passing architecture, it decouples task producers from consumers, allowing applications to maintain responsiveness while scaling workloads across multiple isolated environments. The system functions as a distributed workload orchestrator that manages the lifecycle of deferred operations through persistent queues. It distinguishes itself by providing a pluggable transport abstraction, which allows the core task logic to remain indep
PythonDistributed Task QueuesTask QueuesDistributed Task Processors
View on GitHub28,596
arroyosystems/arroyo
ArroyoSystems/arroyo
4,819View on GitHub
Arroyo is a high-performance stream processing platform built in Rust. It executes continuous SQL queries on streaming data with event-time semantics, enabling accurate windowed aggregations, joins, and stateful computations on unbounded event streams. The platform uses native Rust execution for high throughput and low latency, with periodic checkpointing for exactly-once fault tolerance and horizontal scaling across distributed workers. The system integrates deeply with Kafka for reading and writing topics with exactly-once delivery and supports change data capture (CDC) from MySQL and Postg
RustAd Hoc Dataset QueryingChange Data Capture StreamsCheckpoint-Based Recovery
View on GitHub4,819
taosdata/tdengine
taosdata/TDengine
24,734View on GitHub
TDengine is a distributed time-series database designed for the high-speed ingestion, compression, and retrieval of timestamped metrics and sensor data. It functions as a SQL-compatible analytics engine, allowing users to perform complex operations on massive volumes of time-ordered information using standard relational syntax. The platform is built to serve as a backend foundation for industrial IoT environments, managing real-time data streams and device metadata through a cluster-based architecture. The system distinguishes itself through a distributed sharding architecture that uses consi
CAnalytics EnginesColumnar Storage EnginesTime Series Databases
View on GitHub24,734
datahub-project/datahub
datahub-project/datahub
12,141View on GitHub
DataHub is a metadata management platform designed to unify technical, operational, and business context across diverse data ecosystems. By utilizing a graph-based metadata model and an event-driven ingestion architecture, it creates a centralized source of truth that maps complex data relationships, lineage, and ownership. This foundational framework enables organizations to maintain a synchronized view of their data landscape, supporting both human-led discovery and automated data operations. The platform distinguishes itself through its focus on grounding artificial intelligence and autono
PythonAI Agent Context EnrichersBusiness Context GroundingContext-Aware Retrieval
View on GitHub12,141
p0deje/maccy
p0deje/Maccy
18,635View on GitHub
Maccy is a lightweight clipboard manager for macOS that captures and stores text and images copied to the system clipboard. It provides a searchable interface for retrieving historical content, allowing users to access previously copied items through a keyboard-driven workflow. The application distinguishes itself by prioritizing privacy and performance through automated filtering and local data management. It employs pattern matching to identify and exclude sensitive information, such as passwords, from being saved. All history is maintained in a local database, with an in-memory index that
SwiftClipboard ManagersClipboard ManagersClipboard Management
View on GitHub18,635
delta-io/delta
delta-io/delta
8,596View on GitHub
Delta is a lakehouse table format that brings ACID transactions and data warehouse consistency to large scale data lakes on cloud object storage. It serves as an ACID transaction manager, coordinating atomic commits and serializable isolation for concurrent reads and writes across distributed compute engines. The project provides a multi-engine interoperability layer that uses format translation to allow diverse SQL engines and processing frameworks to read and write the same tables. It functions as a data versioning system, utilizing a transaction log to enable time travel, historical snapsh
ScalaLakehouse Storage LayersLakehouse Table FormatsTransaction Management
View on GitHub8,596
socketio/socket.io
socketio/socket.io
63,195View on GitHub
Socket.io is a real-time communication engine that enables bidirectional, event-based data exchange between clients and servers. It provides a robust transport-agnostic protocol layer that automatically manages connection lifecycles, including heartbeat signals, automatic reconnection, and seamless fallback between WebSockets and HTTP long-polling. By maintaining persistent links, it ensures reliable messaging across diverse network environments. The project distinguishes itself through a scalable, distributed architecture that supports multi-node synchronization and room-based message routin
TypeScriptConnection Establishment ProtocolsConnection Lifecycle ManagersDistributed Pub-Sub Adapters
View on GitHub63,195
apache/seatunnel
apache/seatunnel
9,427View on GitHub
SeaTunnel is a distributed data integration engine designed to synchronize structured and unstructured data across diverse sources and sinks. It functions as a multi-engine execution framework that can run data integration tasks across different distributed computing backends to optimize workload performance. The project is distinguished by a visual data pipeline designer for configuring workflows without manual code and a specialized change data capture tool for streaming incremental database updates. It also includes an enrichment pipeline that integrates large language models and embedding
JavaBackend-Agnostic Execution LayersDistributed Data EnginesCDC Synchronization
View on GitHub9,427
tokio-rs/tokio
tokio-rs/tokio
32,309View on GitHub
Tokio is an asynchronous runtime for the Rust programming language, designed to manage and execute concurrent tasks efficiently. It provides a multi-threaded execution environment that schedules lightweight tasks across available processor cores, utilizing a work-stealing scheduler to balance computational load. By employing a poll-based execution model and waker-based notifications, the runtime drives asynchronous operations forward without requiring active polling loops, ensuring efficient resource utilization. The project distinguishes itself through a comprehensive suite of tools for high
RustRuntime EnvironmentsTimer SchedulersNetwork Programming Frameworks
View on GitHub32,309
redis/redis
redis/redis
74,906View on GitHub
Redis is an in-memory, key-value database designed to provide sub-millisecond latency for read and write operations. It functions as a versatile data platform, serving as a distributed cache, a message broker, a NoSQL document store, and a vector database. The system utilizes an event-driven, single-threaded loop to process requests efficiently, while maintaining data durability through append-only persistence logs and asynchronous snapshotting mechanisms. What distinguishes Redis is its ability to handle complex data structures—including strings, hashes, lists, sets, and sorted sets—alongsid
CActive-Active Database ClustersDistributed CachesDistributed State Management
View on GitHub74,906
macrozheng/mall
macrozheng/mall
83,878View on GitHub
This project is an enterprise-grade Java framework designed for building scalable, full-stack e-commerce applications. It provides a comprehensive foundation for microservice-based distributed architectures, enabling the development of complex retail platforms that include product management, order processing, and secure user authentication. By leveraging modular service patterns and centralized API gateways, the framework supports the construction of resilient systems that decompose monolithic business logic into independent, manageable services. The platform distinguishes itself through a r
JavaFull-Stack FrameworksFull-Stack Web FrameworksJava Frameworks
View on GitHub83,878
grafana/loki
grafana/loki
27,640View on GitHub
Loki is a horizontally scalable, highly available log aggregation engine designed to store and query massive volumes of unstructured log data. It functions as a distributed observability platform that correlates logs, metrics, and traces to provide comprehensive visibility into the health and performance of complex infrastructure. The system distinguishes itself through a distributed query execution model that processes large datasets in parallel across cluster nodes. It utilizes label-based stream indexing and a distributed index to map log data to specific chunks, enabling rapid retrieval w
GoDistributed Observability SystemsLog Storage EnginesObservability Platforms
View on GitHub27,640
benthosdev/benthos
benthosdev/benthos
8,681View on GitHub
Benthos is a stream processing engine and data integration pipeline used for routing, transforming, and connecting data streams between diverse sources and sinks. It functions as event routing middleware and a change data capture tool, streaming real-time database modifications as discrete events for downstream processing. The system utilizes a declarative pipeline configuration, where data flow and processing logic are defined in a single static file. It features a specialized domain-specific language for mapping, filtering, and enriching data payloads, allowing for complex transformations w
GoData Ingestion and IntegrationData Integration PipelinesStream Processing Engines
View on GitHub8,681
influxdata/influxdb
influxdata/influxdb
31,556View on GitHub
InfluxDB is a specialized time series database platform engineered for the high-speed ingestion, compression, and retrieval of timestamped data at scale. It functions as a distributed metrics platform, providing the infrastructure necessary to organize and analyze massive volumes of time-stamped information to identify trends, patterns, and anomalies within complex data streams. The platform distinguishes itself through a functional dataflow engine that utilizes a specialized programming language for complex analytical transformations and automated tasks. This architecture is supported by a p
RustTime Series DatabasesDomain Specific LanguagesData Ingestion Plugins
View on GitHub31,556
airbytehq/airbyte
airbytehq/airbyte
21,472View on GitHub
Airbyte is a data integration platform designed to synchronize information between diverse applications, databases, and data warehouses. It functions as an extract, transform, and load orchestrator that manages automated data movement workflows across cloud, on-premise, and hybrid environments. The platform provides a standardized interface for connectors, enabling the movement of structured and unstructured data while maintaining stateful checkpoints for reliable incremental syncing. The platform distinguishes itself through a containerized architecture that isolates connectors to prevent de
PythonData Integration & SynchronizationEnterprise Data PlatformsAI Agent Tool Integrations
View on GitHub21,472
python-telegram-bot/python-telegram-bot
python-telegram-bot/python-telegram-bot
29,227View on GitHub
This project is an asynchronous messaging framework designed for building interactive applications on the Telegram platform. It functions as a comprehensive wrapper that maps native platform methods and update types into structured objects, enabling developers to create event-driven services that respond to real-time user input. By integrating with standard event loops, the library facilitates high-throughput communication and non-blocking message processing. The framework distinguishes itself through a sophisticated update-driven dispatcher pattern that routes incoming messages to specific h
PythonAsynchronous Bot FrameworksAsynchronous Messaging FrameworksDispatchers
View on GitHub29,227
redpanda-data/connect
redpanda-data/connect
8,681View on GitHub
Connect is a Kafka data integration platform and stream processing engine used to build declarative pipelines that move and transform messages between Kafka topics and external sources. It functions as a Kafka Connect framework and a change data capture tool, streaming real-time database modifications to synchronize data across distributed environments. The project differentiates itself through a dedicated mapping language for mutating and reshaping message payloads and the ability to execute custom processing logic within a sandboxed WebAssembly runtime. It also provides an observability pip
GoReal-Time Data Integration PlatformsChange Data CaptureChange Data Capture Tools
View on GitHub8,681
duckdb/duckdb
duckdb/duckdb
38,805View on GitHub
DuckDB is an in-process analytical database engine designed to run directly within an application process. As a zero-dependency, embedded system, it provides enterprise-grade SQL data processing capabilities without the overhead of managing a dedicated database server. It is built to handle complex analytical and aggregation tasks by storing and retrieving information in columns, allowing for high-performance relational data manipulation. The engine distinguishes itself through a columnar vectorized execution model that maximizes CPU cache efficiency during query operations. It employs adapti
C++Analytical DatabasesColumnar EnginesEmbedded Databases
View on GitHub38,805
cockroachdb/cockroach
cockroachdb/cockroach
32,207View on GitHub
Cockroach is a distributed SQL database designed to scale horizontally across multiple nodes while maintaining strict ACID compliance and global data consistency. It functions as a relational database engine that automatically partitions data into ranges, rebalancing them across a cluster to accommodate growing storage and throughput requirements. By utilizing a distributed consensus protocol, the system ensures that all nodes agree on the order of operations, providing fault tolerance and continuous availability even in the event of hardware failures. The system distinguishes itself through
GoDistributed Relational DatabasesDistributed SQL DatabasesDistributed SQL Engines
View on GitHub32,207
collectiveidea/audited
collectiveidea/audited
3,491View on GitHub
Audited is a Ruby on Rails audit log library and change data capture framework. It tracks model changes by recording previous and current attribute values during create, update, and destroy operations to maintain a complete history of database modifications. The system functions as a database versioning tool and user activity tracker. It allows for the retrieval of historical record states by timestamp or index, enables reverting models to previous versions, and associates record modifications with specific user identities and remote IP addresses. The library includes capabilities for sensit
RubyChange Data CaptureDatabase Change TrackingActivity Auditing
View on GitHub3,491
clickhouse/clickhouse
ClickHouse/ClickHouse
48,229View on GitHub
ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring. The platform distinguishes itself through ad
C++Access Control SystemsAgent AnalyticsAgentic Architectures
View on GitHub48,229
mongodb/node-mongodb-native
mongodb/node-mongodb-native
10,180View on GitHub
The MongoDB Node.js Driver is a programmatic interface and NoSQL database client used to manage document storage and execute operations within a MongoDB database. It serves as an asynchronous database interface and connection manager that enables Node.js applications to integrate with MongoDB servers. The project implements client-side field encryption to secure sensitive data and queries locally before transmission. It also provides a BSON serialization library to convert JavaScript objects into a binary format for efficient storage and network transmission. The driver covers a broad range
TypeScriptDatabase ConnectivityMongoDB ConnectorsMongoDB Drivers
View on GitHub10,180
novuhq/novu
novuhq/novu
39,133View on GitHub
This project is a centralized notification infrastructure platform designed to manage multi-channel messaging workflows, delivery routing, and user preference settings through a unified integration layer. It provides a code-first workflow engine that allows engineers to define complex messaging sequences and notification logic as version-controlled code, ensuring consistency across development and deployment pipelines. The platform distinguishes itself by decoupling notification content from application logic, enabling non-technical teams to design and update templates through a visual interf
TypeScriptMessaging OrchestratorsNotification InfrastructureCode-First Workflow Management
View on GitHub39,133
dbt-labs/dbt-core
dbt-labs/dbt-core
13,051View on GitHub
dbt-core is a command-line framework for transforming data within a warehouse using modular SQL and version control. It functions as a data transformation engine that enables users to define data structures and business logic through declarative configuration files, which the system then compiles into executable code. By managing complex data dependencies through a directed acyclic graph, it ensures that transformation tasks execute in the correct order while maintaining a manifest-driven state to track lineage and execution history. The project distinguishes itself through an adapter-based d
RustData Pipeline OrchestrationTransformation FrameworksBusiness Metric Aggregators
View on GitHub13,051
dragonflydb/dragonfly
dragonflydb/dragonfly
30,688View on GitHub
Dragonfly is a high-performance, multi-model in-memory data store designed to serve as a drop-in replacement for existing database infrastructures. By utilizing a multi-threaded, shared-nothing architecture and a fiber-based concurrency model, it maximizes CPU utilization and minimizes latency for read and write operations. The system supports a wide range of data structures, including strings, hashes, lists, sets, sorted sets, and JSON documents, while maintaining full compatibility with standard industry wire protocols and client libraries. What distinguishes Dragonfly is its focus on effic
C++Access Control SystemsCluster ManagementConcurrency Models
View on GitHub30,688
apache/shardingsphere
apache/shardingsphere
20,737View on GitHub
ShardingSphere is a distributed SQL database middleware that provides sharding, read-write splitting, and distributed transaction management for relational databases. It functions as a layer that intercepts SQL queries to distribute data across multiple physical database instances for horizontal scaling. The project is distinguished by its ability to operate as either a standalone transparent database proxy or via direct integration as a JDBC driver. It features a SQL dialect translator that parses queries into abstract syntax trees to convert syntax between different database engines, enabli
JavaDatabase Partitioning and ShardingDatabase ShardingAST-Based Rewriting
View on GitHub20,737
grafana/grafana
grafana/grafana
74,456View on GitHub
Grafana is an observability data platform designed to aggregate metrics, logs, and traces from diverse sources into a unified environment. It functions as a centralized interface for visualizing complex telemetry data, transforming raw streams into interactive dashboards that support real-time system health tracking and performance monitoring. The platform distinguishes itself through a plugin-based modular architecture that integrates disparate databases, cloud services, and monitoring tools via a standardized data abstraction layer. This framework allows for the dynamic loading of external
TypeScriptObservability Data PlatformsObservability DashboardsTelemetry Collection and Aggregation
View on GitHub74,456
valkey-io/valkey
valkey-io/valkey
24,875View on GitHub
Valkey is an in-memory, NoSQL database server designed for high-performance data storage and real-time state management. It operates as a distributed key-value store, maintaining datasets entirely within system memory to facilitate sub-millisecond response times for read and write operations. The system distinguishes itself through a single-threaded event loop that utilizes asynchronous I/O multiplexing to ensure high throughput. It supports high availability via master-replica replication and provides a decoupled communication model through a built-in publish-subscribe messaging pattern. To
CIn-Memory Data StoresIn-Memory DatabasesKey-Value Stores
View on GitHub24,875
pola-rs/polars
pola-rs/polars
38,855View on GitHub
Polars is a high-performance columnar data processing library designed for efficient analytical workflows. It functions as a structured data library that organizes information into typed columns, utilizing the Apache Arrow memory format to enable zero-copy data sharing and cache-friendly, vectorized operations. The engine is built to handle large-scale tabular datasets, providing both local and distributed analytical runtimes that scale from single-machine environments to multi-node clusters. The project distinguishes itself through a sophisticated lazy query engine that constructs abstract e
RustAnalytical Data EnginesColumnar Data ProcessorsDistributed Query Engines
View on GitHub38,855
seaweedfs/seaweedfs
seaweedfs/seaweedfs
32,937View on GitHub
SeaweedFS is a distributed object store and high-performance file system designed to manage massive volumes of unstructured data. It utilizes a decoupled architecture that separates metadata management from raw data storage, allowing for independent scalability and the efficient handling of billions of files. By providing a POSIX-compliant interface, it enables applications to interact with a unified namespace while maintaining the performance characteristics of a distributed object store. The system distinguishes itself through a multi-region data fabric that supports active-active replicati
GoObject StorageDistributed Object StoresHigh-Performance File Systems
View on GitHub32,937
ray-project/ray
ray-project/ray
42,895View on GitHub
Ray is a distributed computing framework designed to scale Python and Java applications across clusters by abstracting task scheduling and resource management. It functions as a resource-aware execution engine that manages task dependencies, placement, and fault tolerance across networked compute nodes. At its core, the system provides a stateful actor model, allowing developers to define classes that run in dedicated processes to maintain and mutate internal state across remote method calls. The framework distinguishes itself through a robust cross-language interoperability layer, enabling f
PythonActor ModelsDistributed Computing FrameworksDistributed Datasets
View on GitHub42,895

Streaming, queues and change data capture

supabase/supabase

alibaba/canal

rethinkdb/rethinkdb

pingcap/tidb

aws/aws-cdk

Permify/permify

apache/kafka

debezium/debezium

pathwaycom/llm-app

electric-sql/electric

pathwaycom/pathway

rqlite/rqlite

ReactiveX/rxjs

greenrobot/EventBus

boto/boto3

nsqio/nsq

pubkey/rxdb

celery/celery

ArroyoSystems/arroyo

taosdata/TDengine

datahub-project/datahub

p0deje/Maccy

delta-io/delta

socketio/socket.io

apache/seatunnel

tokio-rs/tokio

redis/redis

macrozheng/mall

grafana/loki

benthosdev/benthos

influxdata/influxdb

airbytehq/airbyte

python-telegram-bot/python-telegram-bot

redpanda-data/connect

duckdb/duckdb

cockroachdb/cockroach

collectiveidea/audited

ClickHouse/ClickHouse

mongodb/node-mongodb-native

novuhq/novu

dbt-labs/dbt-core

dragonflydb/dragonfly

apache/shardingsphere

grafana/grafana

valkey-io/valkey

pola-rs/polars

seaweedfs/seaweedfs

ray-project/ray