# nathanmarz/storm

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/nathanmarz-storm).**

8,772 stars · 1,640 forks · Java · Apache-2.0

## Links

- GitHub: https://github.com/nathanmarz/storm
- Homepage: http://storm-project.net
- awesome-repositories: https://awesome-repositories.com/repository/nathanmarz-storm.md

## Description

Storm is a distributed stream processing framework and fault-tolerant compute engine designed for executing real-time continuous computations across a cluster of machines. It functions as a stateful stream processor and cluster topology manager, enabling the deployment and monitoring of distributed data flow configurations.

The system ensures exactly-once semantics by utilizing transactional state management to guarantee that every message in a data stream is processed exactly one time. It further operates as a distributed RPC system, allowing for the integration of non-native languages through a standardized communication protocol.

The framework covers a broad range of capabilities including distributed stateful computation, cluster resource management, and the execution of system-level shell commands. It provides tools for monitoring stream performance, validating topology submissions, and implementing customizable data routing and serialization.

## Tags

### Data & Databases

- [Real-Time Data Streaming](https://awesome-repositories.com/f/data-databases/real-time-data-streaming.md) — Provides a platform for processing and delivering continuous data streams in real-time across a cluster. ([source](https://github.com/nathanmarz/storm/blob/moved-to-apache/dev@storm.incubator.apache.org))
- [Stream Topology Management](https://awesome-repositories.com/f/data-databases/cluster-topology-management/stream-topology-management.md) — Provides a management system for deploying, rebalancing, and monitoring the lifecycle of distributed data flow configurations.
- [Exactly-Once Processing Semantics](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing-frameworks/exactly-once-processing-semantics.md) — Guarantees that every message in a data stream is processed exactly once despite system failures. ([source](https://github.com/nathanmarz/storm/blob/moved-to-apache/CHANGELOG.md))
- [Distributed Processing Frameworks](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing/distributed-processing-frameworks.md) — Provides a framework for executing real-time continuous computations across a cluster of machines with low latency.
- [Distributed Computing](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing/distributed-processing-frameworks/distributed-computing.md) — Executes large-scale stateful computations and real-time distributed queries across a cluster of machines.
- [Streaming State Management](https://awesome-repositories.com/f/data-databases/query-state-management/streaming-state-management.md) — Combines high-volume stream processing with distributed queries to maintain and retrieve real-time state.
- [Stateful Processing Backends](https://awesome-repositories.com/f/data-databases/stateful-processing-backends.md) — Maintains real-time state and distributed queries to enable stateful stream processing. ([source](https://github.com/nathanmarz/storm/blob/moved-to-apache/CHANGELOG.md))
- [Pluggable Serializers](https://awesome-repositories.com/f/data-databases/data-serialization-formats/data-formats/object-serializers/pluggable-serializers.md) — Allows definition of custom serialization and deserialization logic using pluggable factories. ([source](https://github.com/nathanmarz/storm/blob/moved-to-apache/CHANGELOG.md))
- [Stream Routing](https://awesome-repositories.com/f/data-databases/stream-routing.md) — Provides logic for directing real-time data streams across tasks using pluggable grouping strategies. ([source](https://github.com/nathanmarz/storm/blob/moved-to-apache/CHANGELOG.md))

### DevOps & Infrastructure

- [Exactly-Once Processing Guarantees](https://awesome-repositories.com/f/devops-infrastructure/fault-tolerance/exactly-once-processing-guarantees.md) — Ensures exactly-once semantics using transactional state management to guarantee reliable message processing.
- [Machine-to-Workload Assignments](https://awesome-repositories.com/f/devops-infrastructure/machine-to-workload-assignments.md) — Assigns specific physical hardware to certain data flows to ensure complete resource isolation. ([source](https://github.com/nathanmarz/storm/blob/moved-to-apache/CHANGELOG.md))

### Networking & Communication

- [Cross-Language RPC Frameworks](https://awesome-repositories.com/f/networking-communication/cross-language-rpc-frameworks.md) — Implements a distributed RPC system that enables integration of non-native languages through a standardized communication protocol.
- [Distributed RPC Systems](https://awesome-repositories.com/f/networking-communication/distributed-rpc-systems.md) — Operates as a distributed RPC system for executing remote procedure calls and integrating external languages.

### Software Engineering & Architecture

- [Fault Tolerance](https://awesome-repositories.com/f/software-engineering-architecture/fault-tolerance.md) — Maintains transactional state and ensures resilience across distributed worker nodes to prevent data loss.
- [Clustered Task Distribution](https://awesome-repositories.com/f/software-engineering-architecture/load-balancing-architectures/clustered-task-distribution.md) — Distributes programmatic data processing tasks across a cluster of machines using a coordinator.
- [Directed Acyclic Graph Pipelines](https://awesome-repositories.com/f/software-engineering-architecture/parallel-processing-pipelines/directed-acyclic-graph-pipelines.md) — Defines data flow topologies as directed acyclic graphs to manage task dependencies and processing sequences.
- [Transactional State Tracking](https://awesome-repositories.com/f/software-engineering-architecture/state-management/object-state-trackers/distributed-state-managers/transactional-state-tracking.md) — Ensures exactly-once processing by tracking message offsets and committing state changes across the distributed pipeline.
- [Distributed Cluster Coordination](https://awesome-repositories.com/f/software-engineering-architecture/distributed-cluster-coordination.md) — Implements centralized synchronization and scheduling of data flows across compute nodes.
- [Stream Element Grouping](https://awesome-repositories.com/f/software-engineering-architecture/stream-element-grouping.md) — Routes data packets to downstream tasks using customizable key-extraction and grouping logic.

### Programming Languages & Runtimes

- [Cross-Language Runtime Integration](https://awesome-repositories.com/f/programming-languages-runtimes/cross-language-runtime-integration.md) — Integrates runtimes from different programming languages to allow the execution of non-native components. ([source](https://github.com/nathanmarz/storm/blob/moved-to-apache/CHANGELOG.md))
- [Language Bridges](https://awesome-repositories.com/f/programming-languages-runtimes/language-bridges.md) — Provides interfaces for translating data types and function calls to execute non-native language components.

### Scientific & Mathematical Computing

- [Stream Processing Resource Managers](https://awesome-repositories.com/f/scientific-mathematical-computing/high-performance-execution-environments/high-performance-and-parallel-computing/high-performance-computing/cluster-resource-managers/stream-processing-resource-managers.md) — Provides mechanisms to assign specific hardware to data flows and monitor throughput for resource isolation.

### System Administration & Monitoring

- [Stream Performance Monitoring](https://awesome-repositories.com/f/system-administration-monitoring/stream-performance-monitoring.md) — Tracks execution latency and data throughput to visualize system capacity and bottlenecks. ([source](https://github.com/nathanmarz/storm/blob/moved-to-apache/CHANGELOG.md))
- [Topology Lifecycle Management](https://awesome-repositories.com/f/system-administration-monitoring/topology-lifecycle-management.md) — Provides tools to control the activation, rebalancing, and termination of running data flows. ([source](https://github.com/nathanmarz/storm/blob/moved-to-apache/CHANGELOG.md))

### Part of an Awesome List

- [Data Processing and Analytics](https://awesome-repositories.com/f/awesome-lists/data/data-processing-and-analytics.md) — Distributed system for real-time stream processing.
