Why is pola-rs/polars a recommended Data Sinking GitHub Repositories repository?

Saves large-scale query results directly to cloud storage to support automated data pipelines.

Why is zhisheng17/flink-learning a recommended Data Sinking GitHub Repositories repository?

Provides interfaces to define how records are written to non-standard external storage systems.

Why is rerun-io/rerun a recommended Data Sinking GitHub Repositories repository?

Provides mechanisms for directing logged sensor data to various external destinations including gRPC servers and binary files.

Why is apache/seatunnel a recommended Data Sinking GitHub Repositories repository?

Transfers processed data into target destinations such as databases, object storage, and message queues.

Why is risingwavelabs/risingwave a recommended Data Sinking GitHub Repositories repository?

Continuously pushes processed data streams into downstream databases, data lakes, or message queues.

Why is hazelcast/hazelcast a recommended Data Sinking GitHub Repositories repository?

Routes processed data from pipelines to local or remote storage systems using built-in connectors.

Why is apache/flink-cdc a recommended Data Sinking GitHub Repositories repository?

Maps specific source tables to designated sink tables to organize data distribution across multiple target systems.

Why is rust-lang/futures-rs a recommended Data Sinking GitHub Repositories repository?

Sends a sequence of values to a destination using the Sink trait and its adapters.

Why is zio/zio a recommended Data Sinking GitHub Repositories repository?

Provides sink type transformation as part of its stream processing capabilities.

10 Repos

Awesome GitHub RepositoriesData Sinking

Mechanisms for writing large-scale query results to external storage systems.

Distinguishing note: Specifically addresses the output phase of data pipelines into cloud storage.

Explore 10 awesome GitHub repositories matching data & databases · Data Sinking. Refine with filters or upvote what's useful.

Finde die besten Repos mit KI.Wir suchen mit KI nach den am besten passenden Repositories.

pola-rs/polars
pola-rs/polars
38,855Auf GitHub ansehen
Polars is a high-performance columnar data processing library designed for efficient analytical workflows. It functions as a structured data library that organizes information into typed columns, utilizing the Apache Arrow memory format to enable zero-copy data sharing and cache-friendly, vectorized operations. The engine is built to handle large-scale tabular datasets, providing both local and distributed analytical runtimes that scale from single-machine environments to multi-node clusters. The project distinguishes itself through a sophisticated lazy query engine that constructs abstract e
Saves large-scale query results directly to cloud storage to support automated data pipelines.
Rustarrowdataframedataframe-library
Auf GitHub ansehen38,855
zhisheng17/flink-learning
zhisheng17/flink-learning
15,071Auf GitHub ansehen
This project is a collection of educational resources and reference implementations for the Apache Flink stream processing framework. It provides a learning resource focused on mastering distributed stream processing through implementation guides, performance tuning tutorials, and practical examples. The repository features detailed walkthroughs for building real-time data pipelines using the DataStream and Table APIs. It includes specific integration examples for connecting Apache Flink with Kafka brokers and Elasticsearch indices, as well as reference implementations for real-time deduplica
Provides interfaces to define how records are written to non-standard external storage systems.
Javaclickhouseelasticsearchflink
Auf GitHub ansehen15,071
rerun-io/rerun
rerun-io/rerun
10,214Auf GitHub ansehen
Rerun is a multimodal data visualizer and robotics data logger designed for rendering synchronized streams of 3D spatial data, images, and time-series metrics. It functions as a tool for capturing high-frequency sensor data and AI outputs into a queryable columnar format, providing a dedicated interface for viewing MCAP recording files and analyzing physical environments. The project distinguishes itself as a machine learning dataset streamer, capable of feeding logged recordings directly into GPU buffers and PyTorch training pipelines without intermediate exports. It supports a high-performa
Provides mechanisms for directing logged sensor data to various external destinations including gRPC servers and binary files.
Rustcomputer-visioncppmultimodal
Auf GitHub ansehen10,214
apache/seatunnel
apache/seatunnel
9,427Auf GitHub ansehen
SeaTunnel is a distributed data integration engine designed to synchronize structured and unstructured data across diverse sources and sinks. It functions as a multi-engine execution framework that can run data integration tasks across different distributed computing backends to optimize workload performance. The project is distinguished by a visual data pipeline designer for configuring workflows without manual code and a specialized change data capture tool for streaming incremental database updates. It also includes an enrichment pipeline that integrates large language models and embedding
Transfers processed data into target destinations such as databases, object storage, and message queues.
Javaapachebatchcdc
Auf GitHub ansehen9,427
risingwavelabs/risingwave
risingwavelabs/risingwave
9,093Auf GitHub ansehen
RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process continuous data streams. It functions as a streaming data lakehouse, combining the capabilities of a streaming SQL database with a platform that integrates streaming ingestion with open table formats. The system is distinguished by its use of the PostgreSQL wire protocol, allowing it to integrate with existing SQL tools and drivers. It employs a decoupled compute and storage architecture, persisting streaming state and materialized views in cloud object storage to enable independen
Continuously pushes processed data streams into downstream databases, data lakes, or message queues.
Rustapache-icebergdata-engineeringdatabase
Auf GitHub ansehen9,093
hazelcast/hazelcast
hazelcast/hazelcast
6,570Auf GitHub ansehen
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Routes processed data from pipelines to local or remote storage systems using built-in connectors.
Javabig-datacachingdata-in-motion
Auf GitHub ansehen6,570
ibis-project/ibis
ibis-project/ibis
6,574Auf GitHub ansehen
Ibis is a portable Python dataframe library and multi-backend query engine that provides a unified interface for executing data transformations across diverse compute engines. It functions as a Python SQL expression compiler and dialect transpiler, allowing users to define data logic once and execute it across cloud warehouses, embedded databases, and distributed clusters without rewriting code. The project distinguishes itself through a database backend abstraction that decouples transformation logic from the underlying execution engine. It enables polyglot data workflows by mixing raw SQL s
Implements mechanisms for writing large-scale query results to external destination systems.
Pythonbigqueryclickhousedatabase
Auf GitHub ansehen6,574
apache/flink-cdc
apache/flink-cdc
6,430Auf GitHub ansehen
This project is a streaming data integration framework that captures real-time database changes and synchronizes them with downstream systems. It operates as a distributed streaming ETL and database synchronizer, reading database logs and snapshots to propagate row-level modifications to target sinks. The system supports declarative data integration, allowing users to define source-to-sink data flows using SQL or YAML configurations. It distinguishes itself by automating schema evolution to maintain synchronization when source structures change and ensuring exactly-once delivery and processin
Maps specific source tables to designated sink tables to organize data distribution across multiple target systems.
Javabatchcdcchange-data-capture
Auf GitHub ansehen6,430
rust-lang/futures-rs
rust-lang/futures-rs
5,870Auf GitHub ansehen
Zero-cost asynchronous programming in Rust
Sends a sequence of values to a destination using the Sink trait and its adapters.
Rustasync-foundations
Auf GitHub ansehen5,870
zio/zio
zio/zio
4,347Auf GitHub ansehen
ZIO is a functional effect system for the JVM that models asynchronous and concurrent programs as pure, composable values with typed error handling and dependency injection. Its core identity is built on fiber-based concurrency, where lightweight, non-blocking fibers execute millions of concurrent tasks with structured lifecycle management, and a dual-channel error model that separates expected business failures from unexpected system defects at compile time. The system provides effect-typed dependency injection through a layer-based dependency graph, pull-based reactive stream processing with
Provides sink type transformation as part of its stream processing capabilities.
Scalaasynchronicityasynchronousasynchronous-programming
Auf GitHub ansehen4,347