Arroyo is a high-performance stream processing platform built in Rust. It executes continuous SQL queries on streaming data with event-time semantics, enabling accurate windowed aggregations, joins, and stateful computations on unbounded event streams. The platform uses native Rust execution for high throughput and low latency, with periodic checkpointing for exactly-once fault tolerance and horizontal scaling across distributed workers.
The system integrates deeply with Kafka for reading and writing topics with exactly-once delivery and supports change data capture (CDC) from MySQL and Postgres databases via Debezium. A wide range of source and sink connectors covers systems such as Kinesis, Redis, Delta Lake, Iceberg, MQTT, NATS, and more. SQL pipelines can be defined ad hoc or as derived streams, with support for user-defined functions written in Rust or Python for custom transformation logic. Deployment is managed through a web UI, CLI, and REST API, with options for single-node, multi-node, or Kubernetes clusters using Helm.
Event-time processing includes watermarking to handle out-of-order data and supports tumbling, sliding, and session windows. The engine provides comprehensive SQL functions for string manipulation, timestamp arithmetic, JSON and array operations, data type conversion, and mathematical computations. Additional operational features include anomaly detection by counting events over time windows, synthetic data generation for testing, and authentication and TLS encryption for secure access.