Why is sindresorhus/awesome-nodejs a recommended Real-Time Data Processors GitHub Repositories repository?

Identify high-performance frameworks capable of ingesting and transforming data streams in real time.

Why is pathwaycom/pathway a recommended Real-Time Data Processors GitHub Repositories repository?

Processes continuous data streams in real-time to facilitate immediate event-driven analytics.

Why is pathwaycom/llm-app a recommended Real-Time Data Processors GitHub Repositories repository?

Ingests and processes information from diverse sources in real-time to ensure continuous visibility into changing data.

Why is apache/spark a recommended Real-Time Data Processors GitHub Repositories repository?

Ships a processing system that ingests and transforms real-time data streams for continuous analytics.

Why is facebook/zstd a recommended Real-Time Data Processors GitHub Repositories repository?

Enables high-throughput real-time decompression to restore data quickly for immediate application use.

Why is videolan/vlc a recommended Real-Time Data Processors GitHub Repositories repository?

Applies audio and video transformations sequentially to raw data streams before final rendering.

Why is apache/doris a recommended Real-Time Data Processors GitHub Repositories repository?

Supports continuous real-time data ingestion to ensure new information is immediately available for analysis.

Why is datahub-project/datahub a recommended Real-Time Data Processors GitHub Repositories repository?

Processes metadata updates in real-time using an event-driven architecture to maintain current data context.

Why is finos/perspective a recommended Real-Time Data Processors GitHub Repositories repository?

Processes and transforms data streams in real-time to provide continuous analytics and visual updates.

Why is yutiansut/quantaxis a recommended Real-Time Data Processors GitHub Repositories repository?

Processes live financial data feeds in real-time to retrieve current prices, spreads, and changes.

24 dépôts

Awesome GitHub RepositoriesReal-Time Data Processors

Processing systems that ingest and transform data streams in real-time for continuous analytics and event handling.

Explore 24 awesome GitHub repositories matching data & databases · Real-Time Data Processors. Refine with filters or upvote what's useful.

Trouvez les meilleurs dépôts grâce à l'IA.Nous recherchons les dépôts les plus pertinents grâce à l'IA.

sindresorhus/awesome-nodejs
sindresorhus/awesome-nodejs
65,973Voir sur GitHub
This project is a community-driven directory that aggregates essential software projects and educational content for the Node.js ecosystem. It functions as a centralized knowledge base and discovery index, designed to simplify the navigation of a fragmented technical landscape by providing a structured collection of high-quality links, tools, and learning materials. The repository distinguishes itself through a decentralized, peer-reviewed curation model. By utilizing standard version control workflows and pull requests, the community ensures that all listed resources undergo human verificati
Identify high-performance frameworks capable of ingesting and transforming data streams in real time.
awesomeawesome-listjavascript
Voir sur GitHub65,973
pathwaycom/pathway
pathwaycom/pathway
62,959Voir sur GitHub
Pathway is a high-performance data processing framework designed for building unified batch and streaming pipelines. It functions as an orchestrator for complex data transformations, utilizing a differential dataflow engine to process updates incrementally. By treating static datasets and continuous event streams with identical logic, the platform ensures exactly-once processing semantics and consistent results across diverse data sources. The framework distinguishes itself through its specialized support for real-time artificial intelligence and retrieval-augmented generation. It features in
Processes continuous data streams in real-time to facilitate immediate event-driven analytics.
Pythonbatch-processingdata-analyticsdata-pipelines
Voir sur GitHub62,959
pathwaycom/llm-app
pathwaycom/llm-app
59,341Voir sur GitHub
This project is a data processing engine and AI application platform designed for building production-grade machine learning workflows. It provides a unified programming model that handles both historical batch data and live stream ingestion, enabling the development of real-time ETL pipelines and scalable data transformation workflows. The framework distinguishes itself through differential dataflow execution, which propagates only changes through a pipeline rather than recomputing entire datasets. It supports distributed state management across worker nodes and utilizes incremental stream p
Ingests and processes information from diverse sources in real-time to ensure continuous visibility into changing data.
Jupyter Notebookchatbothugging-facellm
Voir sur GitHub59,341
apache/spark
apache/spark
43,467Voir sur GitHub
Apache Spark is a unified distributed data processing engine designed for large-scale data analysis and computation graphs. It functions as a distributed machine learning framework, a graph processing system, a real-time stream processor, and a SQL analytics engine. The system enables the execution of distributed SQL querying, large-scale graph analysis, and real-time stream analytics across clusters of machines. It also provides a scalable environment for implementing machine learning algorithms and predictive model development on massive datasets. The engine incorporates relational query e
Ships a processing system that ingests and transforms real-time data streams for continuous analytics.
Scalabig-datajavajdbc
Voir sur GitHub43,467
facebook/zstd
facebook/zstd
27,259Voir sur GitHub
Zstandard is a lossless data compression library and archive format designed for high compression ratios and fast real-time processing. It functions as a real-time data compressor and multi-threaded compression engine capable of distributing workloads across multiple CPU cores to increase throughput. The system features a dictionary-based compressor that trains on sample data to improve the compression ratio and speed of small files. It also provides long distance pattern matching to identify repeated sequences across large files. The library covers a broad range of capabilities including st
Enables high-throughput real-time decompression to restore data quickly for immediate application use.
C
Voir sur GitHub27,259
videolan/vlc
videolan/vlc
18,717Voir sur GitHub
VLC is a cross-platform multimedia player and framework designed to decode and render virtually any audio or video format, network stream, or physical disc without requiring external codecs. It functions as both a standalone application and a portable library, providing a modular architecture that allows developers to integrate playback, filtering, and streaming capabilities into third-party software. The project distinguishes itself through a highly modular plugin-based engine that supports real-time media processing, including format transcoding and the application of audio and video filter
Applies audio and video transformations sequentially to raw data streams before final rendering.
Ccframeworkgplv2
Voir sur GitHub18,717
apache/doris
apache/doris
15,526Voir sur GitHub
Doris is a distributed SQL data warehouse designed for high-performance analytical workloads and real-time data processing. It functions as a unified platform that integrates traditional relational warehousing with lakehouse query capabilities, allowing users to execute analytical operations directly against external data lakes without requiring data migration. The system distinguishes itself through a shared-nothing, massively parallel processing architecture that utilizes vectorized query execution and columnar storage to maintain sub-second latency. It supports dynamic schema evolution, en
Supports continuous real-time data ingestion to ensure new information is immediately available for analysis.
Javaagentaibigquery
Voir sur GitHub15,526
datahub-project/datahub
datahub-project/datahub
12,141Voir sur GitHub
DataHub is a metadata management platform designed to unify technical, operational, and business context across diverse data ecosystems. By utilizing a graph-based metadata model and an event-driven ingestion architecture, it creates a centralized source of truth that maps complex data relationships, lineage, and ownership. This foundational framework enables organizations to maintain a synchronized view of their data landscape, supporting both human-led discovery and automated data operations. The platform distinguishes itself through its focus on grounding artificial intelligence and autono
Processes metadata updates in real-time using an event-driven architecture to maintain current data context.
Pythondata-catalogdata-discoverydata-governance
Voir sur GitHub12,141
finos/perspective
finos/perspective
10,967Voir sur GitHub
Perspective is a columnar data analytics library and streaming data visualization engine. It provides an interactive data grid component and notebook analytics widgets designed for processing high-volume data and rendering interactive charts and grids. The system utilizes a high-performance query engine to enable real-time data analysis and streaming dataset visualization. It supports the creation of customizable dashboards and reports that update automatically as new data arrives without requiring full dataset reloads. The project covers large-scale dataset analytics through a schema-driven
Processes and transforms data streams in real-time to provide continuous analytics and visual updates.
C++
Voir sur GitHub10,967
yutiansut/quantaxis
yutiansut/QUANTAXIS
9,955Voir sur GitHub
Quantaxis is a quantitative trading framework designed for building, backtesting, and executing automated strategies across global equities, futures, and cryptocurrencies. It integrates an event-driven backtesting engine, a multi-market execution gateway for order routing, and a quantitative data pipeline for ingesting and storing multi-asset market data. The system features a Rust-accelerated financial library that utilizes Apache Arrow for high-performance technical indicator calculation and zero-copy data processing. It provides a containerized infrastructure model designed for orchestrati
Processes live financial data feeds in real-time to retrieve current prices, spreads, and changes.
Pythonquant
Voir sur GitHub9,955
risingwavelabs/risingwave
risingwavelabs/risingwave
9,093Voir sur GitHub
RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process continuous data streams. It functions as a streaming data lakehouse, combining the capabilities of a streaming SQL database with a platform that integrates streaming ingestion with open table formats. The system is distinguished by its use of the PostgreSQL wire protocol, allowing it to integrate with existing SQL tools and drivers. It employs a decoupled compute and storage architecture, persisting streaming state and materialized views in cloud object storage to enable independen
Ingests and transforms data streams in real-time using SQL for continuous analytics and event handling.
Rustapache-icebergdata-engineeringdatabase
Voir sur GitHub9,093
fluent/fluent-bit
fluent/fluent-bit
7,946Voir sur GitHub
Fluent Bit est un collecteur de logs et de télémétrie unifié cloud-native conçu comme un pipeline de données efficace en ressources. Il ingère des logs, des métriques et des traces provenant de multiples sources, les traitant en temps réel avant d'acheminer les données vers des backends de stockage externes. Le projet fonctionne comme un processeur de flux en temps réel et un processeur de logs OpenTelemetry, capable de transformer et de filtrer les données en utilisant SQL et une logique conditionnelle. Il agit également comme un agent de traçage distribué capable d'échantillonner les traces pour réduire le volume de données tout en préservant les chemins de requête complets. Le système fournit une livraison de données fiable grâce à une mise en mémoire tampon basée sur le système de fichiers et une logique de réessai avec état pour éviter la perte de données lors des pannes. Son architecture modulaire prend en charge des plugins d'entrée et de sortie enfichables, un routage basé sur les métadonnées et la capacité d'étendre les fonctionnalités via des bibliothèques partagées. Le logiciel peut être déployé en tant que conteneur sur différentes architectures CPU et systèmes d'exploitation.
Ingests and transforms telemetry data streams in real-time using conditional logic for continuous analytics.
C
Voir sur GitHub7,946
jagenjo/litegraph.js
jagenjo/litegraph.js
7,871Voir sur GitHub
litegraph.js is a JavaScript dataflow framework and visual node graph engine used to define programmable logic and data flow. It provides a node-based visual programming tool for designing complex logic through connected functional blocks. The library allows for the creation of hierarchical logic by nesting multiple nodes into recursive subgraphs. It also supports the development of custom node types with unique inputs and outputs, as well as custom widgets and live views that can hide the underlying graph structure to present a visual interface. The engine enables the execution of logic gra
Executes logic graphs across browser or server environments to process and route data in real-time.
JavaScriptblueprintscanvas2deditor
Voir sur GitHub7,871
apache/incubator-storm
apache/incubator-storm
6,683Voir sur GitHub
Apache Storm is a distributed stream processing framework and real-time data processing engine. It functions as a fault-tolerant distributed computing system designed to analyze data in motion across a cluster of machines for continuous stream computation. The system enables the creation of fault-tolerant data pipelines and scalable event processing by distributing workloads across a network of computing nodes. This architecture ensures low latency and high throughput for live data while allowing the system to recover automatically from individual node failures. The framework provides capabi
Ingests and transforms data streams in real-time for continuous analytics and event handling.
Java
Voir sur GitHub6,683
apache/storm
apache/storm
6,683Voir sur GitHub
Storm is a distributed stream processing framework designed to execute unbounded computations across a cluster to process real-time data streams. It functions as a data pipeline orchestrator that allows users to define and deploy declarative data flow graphs connecting streaming sources to processing components. The system operates as a multi-tenant distributed compute engine that isolates workloads and limits resource usage across shared clusters using dedicated pools and access control. It is also a secure distributed processing engine that employs encrypted node communication and SSL-secur
Orchestrates declarative data flow graphs that connect streaming sources to processing components.
Java
Voir sur GitHub6,683
hazelcast/hazelcast
hazelcast/hazelcast
6,570Voir sur GitHub
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
Handles both endless streams of event data and finite static datasets for unified processing.
Javabig-datacachingdata-in-motion
Voir sur GitHub6,570
greptimeteam/greptimedb
GreptimeTeam/greptimedb
5,968Voir sur GitHub
GreptimeDB is a distributed, open-source time-series database built for unified observability. It stores and queries metrics, logs, and traces together in a single columnar engine, supporting both SQL and PromQL for analysis. The database is designed as a Kubernetes-native operator with a decoupled compute and storage architecture, enabling horizontal scaling and multi-region deployment. What distinguishes GreptimeDB is its role as a multi-protocol ingestion gateway, accepting data through OpenTelemetry, Prometheus Remote Write, InfluxDB, Loki, Elasticsearch, Kafka, and MQTT protocols without
GreptimeDB processes incoming data incrementally and continuously, updating results as new data arrives for immediate analytics.
Rustanalyticscloud-nativedatabase
Voir sur GitHub5,968
infinyon/fluvio
infinyon/fluvio
5,231Voir sur GitHub
Fluvio est une plateforme de streaming d'événements distribuée et un moteur de streaming cloud-native conçu pour collecter, persister et répliquer des flux de données en temps réel à travers un cluster distribué. Il fonctionne comme un pipeline de données temps réel pour construire des workflows avec état qui ingèrent, enrichissent et exportent des données entre des sources et des destinations externes. La plateforme se distingue par son utilisation de WebAssembly pour exécuter des modules compilés pour des transformations et filtrages de données en ligne. Cela permet l'exécution d'une logique métier personnalisée pour remodeler l'information en mouvement sans nécessiter de redémarrage du cluster. Le système couvre un large éventail de capacités, incluant l'ingestion de données basée sur des connecteurs depuis des protocoles externes, un stockage immuable structuré en logs avec E/S zéro-copie, et une mise à l'échelle horizontale du cluster. Il prend en charge la création de pipelines complexes pilotés par les événements qui utilisent le traitement avec état, les agrégations par fenêtrage et la distribution de données basée sur les partitions. Le moteur peut être déployé comme un binaire léger sur diverses architectures système, y compris des appareils IoT ARM64 pour le traitement de données en périphérie (edge).
Implements a framework for building stateful workflows that ingest, enrich, and export data.
Rust
Voir sur GitHub5,231
reactivex/rxpy
ReactiveX/RxPY
5,014Voir sur GitHub
RxPY est une bibliothèque de programmation réactive fonctionnelle et une bibliothèque d'observables ReactiveX pour Python. Elle sert de processeur de flux asynchrone et de framework de coordination piloté par les événements, utilisé pour construire des pipelines de données qui réagissent aux changements d'état ou aux flux d'événements au fil du temps. La bibliothèque fournit une boîte à outils pour composer des programmes asynchrones et basés sur les événements en utilisant des séquences observables et des opérateurs. Elle se distingue par l'utilisation de planificateurs configurables pour gérer la concurrence, le timing et les cycles de vie des abonnements. Le projet couvre un large éventail de capacités de traitement de flux, y compris l'agrégation, le filtrage et la combinaison de données. Il fournit des mécanismes pour la diffusion d'événements, la mise en tampon de séquences et la gestion des erreurs, ainsi que des outils pour coordonner les flux observables avec des boucles d'événements asynchrones. Les tests et l'assurance qualité sont pris en charge par la simulation de temps virtuel, la modélisation par diagrammes de billes et la vérification des émissions.
Processes live data streams in real-time by chaining operators to aggregate, buffer, or merge values.
Python
Voir sur GitHub5,014
arroyosystems/arroyo
ArroyoSystems/arroyo
4,819Voir sur GitHub
Arroyo is a high-performance stream processing platform built in Rust. It executes continuous SQL queries on streaming data with event-time semantics, enabling accurate windowed aggregations, joins, and stateful computations on unbounded event streams. The platform uses native Rust execution for high throughput and low latency, with periodic checkpointing for exactly-once fault tolerance and horizontal scaling across distributed workers. The system integrates deeply with Kafka for reading and writing topics with exactly-once delivery and supports change data capture (CDC) from MySQL and Postg
An open-source system for building fault-tolerant, stateful pipelines that process millions of events per second with subsecond latency.
Rustdatadata-stream-processingdev-tools
Voir sur GitHub4,819

Awesome Real-Time Data Processors GitHub Repositories

sindresorhus/awesome-nodejs

pathwaycom/pathway

pathwaycom/llm-app

apache/spark

facebook/zstd

videolan/vlc

apache/doris

datahub-project/datahub

finos/perspective

yutiansut/QUANTAXIS

risingwavelabs/risingwave

fluent/fluent-bit

jagenjo/litegraph.js

apache/incubator-storm

apache/storm

hazelcast/hazelcast

GreptimeTeam/greptimedb

infinyon/fluvio

ReactiveX/RxPY

ArroyoSystems/arroyo

Explorer les sous-tags