# arroyosystems/arroyo

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/arroyosystems-arroyo).**

4,819 stars · 345 forks · Rust · apache-2.0

## Links

- GitHub: https://github.com/ArroyoSystems/arroyo
- Homepage: https://arroyo.dev
- awesome-repositories: https://awesome-repositories.com/repository/arroyosystems-arroyo.md

## Topics

`data` `data-stream-processing` `dev-tools` `infrastructure` `kafka` `rust` `sql` `stream-processing` `stream-processing-engine`

## Description

Arroyo is a high-performance stream processing platform built in Rust. It executes continuous SQL queries on streaming data with event-time semantics, enabling accurate windowed aggregations, joins, and stateful computations on unbounded event streams. The platform uses native Rust execution for high throughput and low latency, with periodic checkpointing for exactly-once fault tolerance and horizontal scaling across distributed workers.

The system integrates deeply with Kafka for reading and writing topics with exactly-once delivery and supports change data capture (CDC) from MySQL and Postgres databases via Debezium. A wide range of source and sink connectors covers systems such as Kinesis, Redis, Delta Lake, Iceberg, MQTT, NATS, and more. SQL pipelines can be defined ad hoc or as derived streams, with support for user-defined functions written in Rust or Python for custom transformation logic. Deployment is managed through a web UI, CLI, and REST API, with options for single-node, multi-node, or Kubernetes clusters using Helm.

Event-time processing includes watermarking to handle out-of-order data and supports tumbling, sliding, and session windows. The engine provides comprehensive SQL functions for string manipulation, timestamp arithmetic, JSON and array operations, data type conversion, and mathematical computations. Additional operational features include anomaly detection by counting events over time windows, synthetic data generation for testing, and authentication and TLS encryption for secure access.

## Tags

### Part of an Awesome List

- [Streaming SQL](https://awesome-repositories.com/f/awesome-lists/data/streaming-sql.md) — Translates SQL queries with streaming extensions into an optimized dataflow plan that runs continuously on unbounded data.
- [Data Formats and Parsing](https://awesome-repositories.com/f/awesome-lists/data/data-formats-and-parsing.md) — Arroyo supports JSON, Avro, Parquet, and raw string formats for reading and writing data through connectors. ([source](https://doc.arroyo.dev/connectors/))
- [Streaming Engines](https://awesome-repositories.com/f/awesome-lists/devtools/streaming-engines.md) — Distributed engine supporting SQL and Rust pipelines.

### Data & Databases

- [Ad Hoc Dataset Querying](https://awesome-repositories.com/f/data-databases/ad-hoc-dataset-querying.md) — Runs ad-hoc SELECT queries on streaming data with projections, filtering, and subqueries. ([source](https://doc.arroyo.dev/sql/))
- [Change Data Capture Streams](https://awesome-repositories.com/f/data-databases/change-data-capture-streams.md) — Processes CDC-formatted streams from Debezium with stateful upsert and delete semantics. ([source](https://doc.arroyo.dev/sql/))
- [Kafka Connectors](https://awesome-repositories.com/f/data-databases/data-ingestion/kafka-connectors.md) — Writes records to Kafka topics with exactly-once or at-least-once guarantees and JSON output. ([source](https://doc.arroyo.dev/connectors/kafka/))
- [Real-Time Data Processors](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing/distributed-processing-frameworks/real-time-data-processors.md) — An open-source system for building fault-tolerant, stateful pipelines that process millions of events per second with subsecond latency.
- [JSON-Schema](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-serialization/json-schema.md) — Arroyo reads and writes structured or unstructured JSON data with configurable schema handling and timestamp or decimal encoding. ([source](https://doc.arroyo.dev/connectors/formats/))
- [Stream Processing](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/stream-processing-systems/stream-processing.md) — A high-performance stream processing engine built in Rust, supporting SQL queries and user-defined functions.
- [Distributed Stream Execution](https://awesome-repositories.com/f/data-databases/distributed-stream-execution.md) — Executes streaming pipelines as a distributed DAG of parallel subtasks for high throughput and fault tolerance. ([source](https://doc.arroyo.dev/concepts/))
- [Event-Time Processing](https://awesome-repositories.com/f/data-databases/event-time-processing.md) — Handles out-of-order events using watermarks and supports tumbling, sliding, and session windows.
- [Time-Window Aggregations](https://awesome-repositories.com/f/data-databases/event-time-processing/time-window-aggregations.md) — Computes running metrics over tumbling, sliding, and session windows based on event-time progress. ([source](https://doc.arroyo.dev/tutorial/first-pipeline/))
- [Low-Latency Stream Output](https://awesome-repositories.com/f/data-databases/low-latency-data-retrieval/low-latency-stream-output.md) — Emits output with subsecond latency by processing data incrementally using the Dataflow model. ([source](https://cdn.jsdelivr.net/gh/arroyosystems/arroyo@master/README.md))
- [Streaming State Management](https://awesome-repositories.com/f/data-databases/query-state-management/streaming-state-management.md) — Maintains state across streaming events to enable windowed aggregations, joins, and other stateful computations. ([source](https://doc.arroyo.dev))
- [Real-Time Analytics](https://awesome-repositories.com/f/data-databases/real-time-analytics.md) — Computes windowed aggregations, joins, and top-N rankings on streaming data with event-time semantics.
- [Streaming SQL Transformations](https://awesome-repositories.com/f/data-databases/streaming-sql-transformations.md) — Defines continuous data transformations and analytics using SQL on unbounded event streams.
- [SQL-Based Pipeline Definitions](https://awesome-repositories.com/f/data-databases/streaming-sql-transformations/sql-based-pipeline-definitions.md) — Defines streaming data pipelines using SQL as the primary language for transformation and analysis logic. ([source](https://doc.arroyo.dev))
- [Kafka Stream Ingestion](https://awesome-repositories.com/f/data-databases/table-data-processing/stream-ingestion/stream-to-online-store-ingestion/kafka-stream-ingestion.md) — Consumes and produces messages from Kafka topics with exactly-once semantics using SQL queries.
- [Streaming Source and Sink Tables](https://awesome-repositories.com/f/data-databases/table-definitions/streaming-source-and-sink-tables.md) — Declares source and sink tables using CREATE TABLE SQL with connector options and WITH clauses. ([source](https://doc.arroyo.dev/sql/))
- [Streaming Connector Abstractions](https://awesome-repositories.com/f/data-databases/unified-storage-interfaces/unified-data-connector-interfaces/streaming-connector-abstractions.md) — Unifies sources and sinks through a common trait interface, supporting Kafka, Kinesis, files, and more with exactly-once guarantees.
- [Watermark-Based Event Tracking](https://awesome-repositories.com/f/data-databases/watermark-based-event-tracking.md) — Uses watermarks derived from event timestamps to handle out-of-order data and trigger window computations consistently.
- [JSON Array Functions](https://awesome-repositories.com/f/data-databases/array-column-operations/json-array-functions.md) — Provides built-in SQL functions for JSONPath extraction and array transformation on streaming data. ([source](https://doc.arroyo.dev/sql/scalar-functions/array/))
- [Change Data Capture](https://awesome-repositories.com/f/data-databases/change-data-capture.md) — Ingests real-time database change events from MySQL and Postgres via Debezium and presents them as queryable streams.
- [Built-in Metadata Columns](https://awesome-repositories.com/f/data-databases/data-governance-modeling/data-modeling-schemas/data-schemas/column-definitions/virtual-column-functions/built-in-metadata-columns.md) — Arroyo adds metadata such as partition and offset as columns in source tables for use in SQL queries. ([source](https://doc.arroyo.dev/connectors/))
- [Avro Connector Serializations](https://awesome-repositories.com/f/data-databases/data-serialization/avro-decoding/avro-table-reads-and-writes/avro-connector-serializations.md) — Arroyo reads and writes Avro binary data, supporting Confluent Schema Registry and flexible serialization modes for schema distribution. ([source](https://doc.arroyo.dev/connectors/formats/))
- [Reusable Stream Connections](https://awesome-repositories.com/f/data-databases/data-source-connectivity-tools/reusable-stream-connections.md) — Arroyo creates reusable source and sink definitions through the Web UI or SQL that can be shared across queries. ([source](https://doc.arroyo.dev/connectors/))
- [Named Struct Columns](https://awesome-repositories.com/f/data-databases/data-type-managers/structured-types/anonymous-struct-nesting/named-struct-columns.md) — Supports nested struct columns with named fields and dot-notation access in streaming SQL. ([source](https://doc.arroyo.dev/sql/data-types/))
- [CDC Sources](https://awesome-repositories.com/f/data-databases/database-connectors/cdc-sources.md) — Ingests change data capture streams from MySQL and Postgres databases via Debezium. ([source](https://doc.arroyo.dev/tutorial))
- [Stream Lookup Joins](https://awesome-repositories.com/f/data-databases/database-query-joins/stream-lookup-joins.md) — Enriches streaming data by querying external key-value stores like Redis with lookup joins. ([source](https://doc.arroyo.dev/connectors/redis/))
- [Database Type Support](https://awesome-repositories.com/f/data-databases/database-type-support.md) — Supports array columns as a built-in SQL data type with indexing and unnesting operations. ([source](https://doc.arroyo.dev/sql/data-types/))
- [Horizontal Scaling](https://awesome-repositories.com/f/data-databases/horizontal-scaling.md) — Distributes stream processing across multiple workers to handle high throughput with horizontal scaling. ([source](https://doc.arroyo.dev))
- [Parquet Data Exports](https://awesome-repositories.com/f/data-databases/parquet-data-exports.md) — Arroyo writes data in Parquet columnar format to file system sinks for efficient storage in data lakes. ([source](https://doc.arroyo.dev/connectors/formats/))
- [Protobuf Serialization](https://awesome-repositories.com/f/data-databases/protobuf-serialization.md) — Arroyo reads Protocol Buffers binary data with Confluent Schema Registry or custom schema definitions via the Web UI or API. ([source](https://doc.arroyo.dev/connectors/formats/))
- [Stream Schema Enforcers](https://awesome-repositories.com/f/data-databases/schema-enforcement-tools/stream-schema-enforcers.md) — Arroyo enforces a schema for source data with support for JSON Schema, Avro, Protobuf, and automatic retrieval from Schema Registry. ([source](https://doc.arroyo.dev/connectors/))
- [Custom SQL Functions](https://awesome-repositories.com/f/data-databases/sql-aggregate-functions/custom-sql-functions.md) — Defines custom SQL functions in Rust or Python for use in streaming data pipelines. ([source](https://doc.arroyo.dev/udfs/))
- [Derived Stream Definitions](https://awesome-repositories.com/f/data-databases/streaming-sql-transformations/derived-stream-definitions.md) — Defines new data streams from SQL queries with automatic schema inference from the query result. ([source](https://doc.arroyo.dev/sql/ddl/))
- [Window Functions](https://awesome-repositories.com/f/data-databases/window-functions.md) — Provides SQL window functions for ranking and analytical calculations over partitioned rows in streaming data. ([source](https://doc.arroyo.dev/sql/window-functions/))

### Development Tools & Productivity

- [Checkpoint-Based Recovery](https://awesome-repositories.com/f/development-tools-productivity/crash-recovery-systems/checkpoint-based-recovery.md) — Periodically persists consistent snapshots of operator state to remote storage, enabling exactly-once recovery and rescaling.
- [Pipeline Execution Monitors](https://awesome-repositories.com/f/development-tools-productivity/pipeline-monitors/pipeline-execution-monitors.md) — Starts a streaming computation, shows real-time operator metrics and outputs, and takes periodic state snapshots for recovery. ([source](https://doc.arroyo.dev/tutorial/first-pipeline/))
- [Pipeline Execution Interfaces](https://awesome-repositories.com/f/development-tools-productivity/headless-execution-environments/cli-execution/pipeline-execution-interfaces.md) — Starts a stream processing pipeline directly from the command line, accepting SQL from standard input or as an argument. ([source](https://doc.arroyo.dev/getting-started))

### DevOps & Infrastructure

- [State Checkpointing](https://awesome-repositories.com/f/devops-infrastructure/cicd-pipeline-automation/cicd-pipeline-management/ci-cd-workflows/pull-request-automation-tools/automated-fix-validators/pipeline-failure-recovery/state-checkpointing.md) — Periodically saves pipeline state to remote storage for exactly-once fault tolerance and recovery. ([source](https://doc.arroyo.dev))
- [CLI and Web GUI Operation Interfaces](https://awesome-repositories.com/f/devops-infrastructure/control-planes/cli-and-web-gui-operation-interfaces.md) — Provides a web-based user interface and REST API to create, configure, and monitor real-time data processing pipelines. ([source](https://doc.arroyo.dev/deployment/kubernetes/))
- [Orchestrator-Worker Models](https://awesome-repositories.com/f/devops-infrastructure/worker-node-management/orchestrator-worker-models.md) — Schedules stream processing workers across local processes, dedicated nodes, or Kubernetes pods with configurable resources. ([source](https://doc.arroyo.dev/configuration/))
- [Stream Processing Pipeline Deployments](https://awesome-repositories.com/f/devops-infrastructure/stream-processing-pipeline-deployments.md) — Deploys fault-tolerant stream processors on Kubernetes with checkpointing and a web UI for monitoring.

### Software Engineering & Architecture

- [Visual Dataflow Graph Designers](https://awesome-repositories.com/f/software-engineering-architecture/dataflow-frameworks/visual-dataflow-graph-designers.md) — Executes streaming pipelines as a directed acyclic graph of parallel operators, routing data via forward or shuffle edges across workers.
- [State Persistence](https://awesome-repositories.com/f/software-engineering-architecture/workflow-persistence/state-persistence.md) — Persists pipeline state, schemas, and checkpoint data to SQLite, Postgres, or object stores for fault tolerance. ([source](https://doc.arroyo.dev/deployment))
- [Event Timestamp Accessors](https://awesome-repositories.com/f/software-engineering-architecture/data-exchange-standards/iso-8601-standards/current-timestamp-retrievals/event-timestamp-accessors.md) — Provides SQL functions to access event timestamps and pipeline start times for time-based processing. ([source](https://doc.arroyo.dev/sql/scalar-functions/time-and-date/))
- [Schema Registries](https://awesome-repositories.com/f/software-engineering-architecture/schema-registries.md) — Arroyo fetches and applies JSON and Avro schemas automatically from a centralized registry for both sources and sinks. ([source](https://doc.arroyo.dev/connectors/confluent/))
- [String Manipulation](https://awesome-repositories.com/f/software-engineering-architecture/string-validation-and-normalization/string-encodings/string-manipulation.md) — Offers a comprehensive set of SQL string functions including regex, substring, and transformation operations. ([source](https://doc.arroyo.dev/sql/scalar-functions/string/))

### User Interface & Experience

- [Streaming Operator Execution](https://awesome-repositories.com/f/user-interface-experience/native-widget-toolkits/rust-native-toolkits/streaming-operator-execution.md) — Runs all operators and user-defined functions as compiled native code, providing high throughput and low latency.

### Operating Systems & Systems Programming

- [Timestamp Construction Functions](https://awesome-repositories.com/f/operating-systems-systems-programming/timestamp-formatters/timestamp-construction-functions.md) — Provides SQL functions for constructing, converting, extracting, and truncating timestamps in streaming queries. ([source](https://doc.arroyo.dev/sql/scalar-functions/time-and-date/))

### Programming Languages & Runtimes

- [Primitive Types](https://awesome-repositories.com/f/programming-languages-runtimes/primitive-types.md) — Provides standard primitive SQL types including integers, floats, strings, and timestamps. ([source](https://doc.arroyo.dev/sql/data-types/))
- [Type Conversion and Casting](https://awesome-repositories.com/f/programming-languages-runtimes/type-conversion-and-casting.md) — Provides built-in SQL functions for casting, null substitution, binary encoding, and struct construction. ([source](https://doc.arroyo.dev/sql/scalar-functions/conditional/))

### Scientific & Mathematical Computing

- [SQL Mathematical Functions](https://awesome-repositories.com/f/scientific-mathematical-computing/sql-mathematical-functions.md) — Provides a comprehensive library of arithmetic, trigonometric, logarithmic, and hash functions for SQL queries. ([source](https://doc.arroyo.dev/sql/scalar-functions/math/))

### Security & Cryptography

- [API Authentication](https://awesome-repositories.com/f/security-cryptography/api-authentication.md) — Supports static API keys and mutual TLS for API access control. ([source](https://doc.arroyo.dev/configuration/))

### Testing & Quality Assurance

- [Synthetic Data Generation](https://awesome-repositories.com/f/testing-quality-assurance/synthetic-data-generation.md) — Generates synthetic streaming events at configurable speeds for testing pipeline behavior. ([source](https://doc.arroyo.dev/tutorial/first-pipeline/))

### Web Development

- [Stream Join Operators](https://awesome-repositories.com/f/web-development/asynchronous-api-clients/stream-composition/stream-combinators/stream-join-operators.md) — Joins unbounded streams incrementally, outputting a changelog of inserts, updates, and deletes. ([source](https://doc.arroyo.dev/sql/joins/))
