Why is pathwaycom/pathway a recommended Stream-Oriented Data Pipelines GitHub Repositories repository?

Transitions batch processing workflows into continuous, real-time streaming operations while preserving core transformation logic.

Why is soimort/you-get a recommended Stream-Oriented Data Pipelines GitHub Repositories repository?

Streams binary media data into external processes to enable continuous, real-time consumption.

Why is openobserve/openobserve a recommended Stream-Oriented Data Pipelines GitHub Repositories repository?

Parses, filters, and enriches incoming data in real-time to convert raw logs into structured insights.

Why is redpanda-data/redpanda a recommended Stream-Oriented Data Pipelines GitHub Repositories repository?

Provides continuous, real-time streaming pipelines for reliable data movement across distributed environments.

Why is snowplow/snowplow a recommended Stream-Oriented Data Pipelines GitHub Repositories repository?

Provides streaming pipelines that route processed event data into lakehouses and third-party platforms.

Why is hazelcast/hazelcast a recommended Stream-Oriented Data Pipelines GitHub Repositories repository?

Reads in-memory map contents as a source for real-time data processing and analytics pipelines.

6 مستودعات

Awesome GitHub RepositoriesStream-Oriented Data Pipelines

Pipelines that convert batch processing workflows into continuous, real-time streaming operations.

Explore 6 awesome GitHub repositories matching data & databases · Stream-Oriented Data Pipelines. Refine with filters or upvote what's useful.

اعثر على أفضل المستودعات باستخدام الذكاء الاصطناعي.سنبحث عن أفضل المستودعات المطابقة باستخدام الذكاء الاصطناعي.

pathwaycom/pathway
pathwaycom/pathway
62,959عرض على GitHub
Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with identical logic, the platform ensures exactly-once processing semantics and consistent results across diverse data sources. The framework distinguishes itself through its specialized support for real-time artificial intelligence and retrieval-augmented generation. It features in
Transitions batch processing workflows into continuous, real-time streaming operations while preserving core transformation logic.
Pythonbatch-processingdata-analyticsdata-pipelines
عرض على GitHub62,959
soimort/you-get
soimort/you-get
56,839عرض على GitHub
This project is a command-line utility designed to fetch video, audio, and image content from a wide range of web platforms. It functions by parsing page metadata and utilizing modular, site-specific scripts to extract direct media stream URLs from complex web structures, enabling the local archiving of digital media for offline use. The tool distinguishes itself through its ability to handle authenticated content, allowing users to inject browser-stored session cookies to access restricted or private media. It also supports real-time media streaming by piping remote content directly into ext
Streams binary media data into external processes to enable continuous, real-time consumption.
Python
عرض على GitHub56,839
openobserve/openobserve
openobserve/openobserve
17,937عرض على GitHub
OpenObserve is a unified observability data platform designed to ingest, store, and analyze logs, metrics, and traces. It functions as a cloud-native monitoring tool that centralizes telemetry from diverse sources, including standard collectors and cloud service providers, into a single, scalable system. By utilizing a columnar storage engine backed by object storage, the platform enables efficient long-term data retention and high-performance analytical querying. The platform distinguishes itself through deep integration with artificial intelligence, allowing users to query data using natura
Parses, filters, and enriches incoming data in real-time to convert raw logs into structured insights.
TypeScriptanalyticsapmdatadog
عرض على GitHub17,937
redpanda-data/redpanda
redpanda-data/redpanda
12,248عرض على GitHub
Redpanda is a distributed event streaming engine designed to serve as a high-performance, drop-in replacement for existing event-driven architectures. It provides a foundation for building and scaling applications that require reliable data movement, analytical querying, and strict operational compliance across both cloud and self-managed environments. The platform distinguishes itself through a shared-nothing architecture that utilizes thread-per-core execution and a non-blocking asynchronous input/output engine to maximize throughput. It maintains data consistency through a consensus-based
Provides continuous, real-time streaming pipelines for reliable data movement across distributed environments.
C++containerscppevent-driven
عرض على GitHub12,248
snowplow/snowplow
snowplow/snowplow
7,012عرض على GitHub
Snowplow is a behavioral event data pipeline and customer data infrastructure designed to capture user interactions and transform them into structured events for real-time analysis and long-term storage. It functions as a customer data platform that gathers user signals and enriches them with metadata to create a unified view of customer behavior. The system operates as an event schema validation engine to enforce strict data contracts on incoming streams, preventing data corruption. It further serves as a real-time event router and an event-driven automation platform, triggering proactive bu
Provides streaming pipelines that route processed event data into lakehouses and third-party platforms.
Scala
عرض على GitHub7,012
hazelcast/hazelcast
hazelcast/hazelcast
6,570عرض على GitHub
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Reads in-memory map contents as a source for real-time data processing and analytics pipelines.
Javabig-datacachingdata-in-motion
عرض على GitHub6,570

Awesome Stream-Oriented Data Pipelines GitHub Repositories

pathwaycom/pathway

soimort/you-get

openobserve/openobserve

redpanda-data/redpanda

snowplow/snowplow

hazelcast/hazelcast

استكشف الوسوم الفرعية