# starrocks/starrocks

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/starrocks-starrocks).**

11,789 stars · 2,450 forks · Java · Apache-2.0

## Links

- GitHub: https://github.com/StarRocks/starrocks
- Homepage: https://starrocks.io
- awesome-repositories: https://awesome-repositories.com/repository/starrocks-starrocks.md

## Topics

`analytics` `big-data` `cloudnative` `database` `datalake` `delta-lake` `distributed-database` `hudi` `iceberg` `join` `lakehouse` `lakehouse-platform` `mpp` `olap` `real-time-analytics` `real-time-updates` `realtime-database` `sql` `star-schema` `vectorized`

## Description

StarRocks is a distributed SQL OLAP database engine designed for real-time analytics and high-performance multi-dimensional analysis. It functions as a data lakehouse query engine that enables SQL execution across large datasets and external open table formats without requiring local data imports.

The system employs a shared-nothing distributed architecture and utilizes the MySQL protocol to integrate with business intelligence tools. It maintains real-time data consistency through a primary key upsert model and accelerates query response times using vectorized execution and cost-based optimization.

Broad capabilities include the use of automated materialized views to reduce scan volumes and multi-tenant resource isolation to manage CPU and memory quotas across concurrent workloads. The engine also supports automatic resource balancing and data recovery during cluster scaling.

## Tags

### Data & Databases

- [Real-Time Analytics](https://awesome-repositories.com/f/data-databases/real-time-analytics.md) — Provides a low-latency real-time analytics engine for processing streaming data and updating business dashboards.
- [Cloud Data Lake Integrations](https://awesome-repositories.com/f/data-databases/data-integration-synchronization/data-integration/cloud-data-lake-integrations.md) — Enables direct SQL access to data stored in open table formats within cloud-native data lakes. ([source](https://github.com/starrocks/starrocks#readme))
- [Data Upsert Operations](https://awesome-repositories.com/f/data-databases/data-upsert-operations.md) — Ensures real-time data consistency using a primary key upsert model to update records during ingestion.
- [Distributed SQL Databases](https://awesome-repositories.com/f/data-databases/distributed-sql-databases.md) — Implements a distributed SQL database architecture that scales horizontally while maintaining data availability.
- [Federated Data Query Engines](https://awesome-repositories.com/f/data-databases/federated-data-query-engines.md) — Functions as a federated query engine that accesses remote lakehouse storage formats without requiring local data imports.
- [Lakehouse Engines](https://awesome-repositories.com/f/data-databases/federated-data-query-engines/lakehouse-engines.md) — Acts as an analytics engine capable of querying external data lakes and open table formats without local imports.
- [Multi-Dimensional Analysis](https://awesome-repositories.com/f/data-databases/multi-dimensional-analysis.md) — Performs complex aggregations and ad-hoc queries across large datasets using vectorized processing.
- [OLAP Database Engines](https://awesome-repositories.com/f/data-databases/olap-database-engines.md) — Provides a distributed database designed for high-performance multi-dimensional analytics and sub-second SQL queries.
- [Cost-Based Optimizers](https://awesome-repositories.com/f/data-databases/query-optimizers/cost-based-optimizers.md) — Implements a cost-based optimizer that determines the most efficient execution plan using table statistics. ([source](https://github.com/starrocks/starrocks#readme))
- [Real-time Data Synchronization](https://awesome-repositories.com/f/data-databases/real-time-data-synchronization.md) — Synchronizes datasets in real-time using primary key upserts and deletes to keep analytics current. ([source](https://github.com/starrocks/starrocks#readme))
- [Vectorized Execution Engines](https://awesome-repositories.com/f/data-databases/vectorized-execution-engines.md) — Utilizes a vectorized execution engine with SIMD instructions to process data in batches for high analytical throughput.
- [Lakehouse Querying](https://awesome-repositories.com/f/data-databases/virtual-table-querying/external-table-querying/direct-path-querying/lakehouse-querying.md) — Runs high-performance SQL queries directly on open table formats in a data lake without requiring file imports.
- [Business Intelligence Connectors](https://awesome-repositories.com/f/data-databases/business-intelligence-connectors.md) — Bridges the high-speed engine with business intelligence platforms using standard MySQL protocols and ANSI SQL.
- [Materialized Views](https://awesome-repositories.com/f/data-databases/materialized-views.md) — Provides automated materialized views that pre-calculate result sets to accelerate query response times. ([source](https://github.com/starrocks/starrocks#readme))
- [Materialized View Selectors](https://awesome-repositories.com/f/data-databases/materialized-views/materialized-view-selectors.md) — Implements a cost-based optimizer to automatically route queries to the most efficient pre-calculated materialized views.
- [Multi-Tenant Resource Isolation](https://awesome-repositories.com/f/data-databases/multi-tenant-resource-isolation.md) — Prevents query interference by allocating specific CPU and memory quotas to individual users or workloads within the cluster.
- [MySQL Compatibility](https://awesome-repositories.com/f/data-databases/mysql-integrations/mysql-compatibility.md) — Uses the MySQL protocol to ensure compatibility with standard SQL clients and BI tools.
- [Scalable Database Clusters](https://awesome-repositories.com/f/data-databases/scalable-database-clusters.md) — Provides a scalable database cluster architecture that automatically balances resources during node scaling.
- [SQL Database Connectivity](https://awesome-repositories.com/f/data-databases/sql-database-connectivity.md) — Provides connectivity and compatibility with existing business intelligence tools and database clients via standard MySQL protocols. ([source](https://github.com/starrocks/starrocks#readme))

### Software Engineering & Architecture

- [Shared-Nothing Processing Engines](https://awesome-repositories.com/f/software-engineering-architecture/shared-nothing-architectures/shared-nothing-processing-engines.md) — Employs a shared-nothing distributed processing model where each node manages its own local storage and memory.

### DevOps & Infrastructure

- [Workload Isolation](https://awesome-repositories.com/f/devops-infrastructure/cluster-management/workload-isolation.md) — Implements multi-tenant resource isolation to manage CPU and memory quotas across concurrent workloads. ([source](https://github.com/starrocks/starrocks#readme))
- [Cluster Scaling Orchestrators](https://awesome-repositories.com/f/devops-infrastructure/cluster-scaling-orchestrators.md) — Automatically balances resources and recovers data replicas when adding or removing nodes from the cluster. ([source](https://github.com/starrocks/starrocks#readme))
