# apache/doris

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/apache-doris).**

15,028 stars · 3,713 forks · Java · apache-2.0

## Links

- GitHub: https://github.com/apache/doris
- Homepage: https://doris.apache.org
- awesome-repositories: https://awesome-repositories.com/repository/apache-doris.md

## Topics

`agent` `ai` `bigquery` `database` `dbt` `delta-lake` `elt` `hudi` `iceberg` `lakehouse` `olap` `paimon` `query-engine` `real-time` `redshift` `snowflake` `spark` `sql`

## Description

Doris is a distributed SQL data warehouse designed for high-performance analytical workloads and real-time data processing. It functions as a unified platform that integrates traditional relational warehousing with lakehouse query capabilities, allowing users to execute analytical operations directly against external data lakes without requiring data migration.

The system distinguishes itself through a shared-nothing, massively parallel processing architecture that utilizes vectorized query execution and columnar storage to maintain sub-second latency. It supports dynamic schema evolution, enabling real-time updates to table structures, and provides elastic resource scaling by decoupling compute and storage layers to accommodate fluctuating workload demands.

Beyond standard analytical processing, the platform incorporates vector database functionality to support artificial intelligence and semantic search applications. It enables hybrid search by combining structured SQL analytics with full-text filtering and vector similarity, facilitating complex retrieval-augmented generation workflows within a single environment. The engine is built to handle high-concurrency requirements, supporting thousands of simultaneous queries per second for enterprise-scale operations.

## Tags

### Data & Databases

- [Data Warehousing](https://awesome-repositories.com/f/data-databases/data-warehousing.md) — Handles thousands of simultaneous analytical queries per second for enterprise-scale workloads.
- [Distributed Data Warehouses](https://awesome-repositories.com/f/data-databases/distributed-data-warehouses.md) — Provides a scalable distributed data warehouse architecture for managing large-scale analytical workloads with real-time ingestion.
- [Real-time Analytics Platforms](https://awesome-repositories.com/f/data-databases/real-time-analytics-platforms.md) — Delivers sub-second analytical query performance on massive datasets using a high-concurrency, distributed columnar engine.
- [SQL Query Interfaces](https://awesome-repositories.com/f/data-databases/sql-query-interfaces.md) — Provides a standard ANSI SQL interface for analytical queries and data management. ([source](https://doris.apache.org/docs/4.x/gettingStarted/what-is-apache-doris/))
- [Columnar Storage Engines](https://awesome-repositories.com/f/data-databases/columnar-storage-engines.md) — Organizes data into vertical blocks to minimize disk I/O and accelerate analytical scanning.
- [Federated Data Query Engines](https://awesome-repositories.com/f/data-databases/federated-data-query-engines.md) — Executes federated analytical queries directly against external data lake storage without requiring data migration. ([source](https://doris.apache.org/docs/4.x/gettingStarted/what-is-apache-doris/))
- [Real-Time Analytics](https://awesome-repositories.com/f/data-databases/real-time-analytics.md) — Delivers real-time analytics with sub-second latency for operational dashboards and time-sensitive business operations. ([source](https://doris.apache.org/docs/4.x/gettingStarted/what-is-apache-doris/))
- [Relational Vector Engines](https://awesome-repositories.com/f/data-databases/vector-databases/relational-vector-engines.md) — Unifies relational SQL analytics with vector similarity search to support RAG workflows and intelligent applications.
- [Data Lake Acceleration](https://awesome-repositories.com/f/data-databases/data-lake-acceleration.md) — Enables direct analysis of external data lakes without requiring data migration. ([source](https://doris.apache.org/docs/4.x/gettingStarted/what-is-apache-doris/))
- [Real-Time Data Processors](https://awesome-repositories.com/f/data-databases/data-processing-pipelines/data-processing/distributed-processing-frameworks/real-time-data-processors.md) — Supports continuous real-time data ingestion to ensure new information is immediately available for analysis. ([source](https://doris.apache.org/docs/4.x/gettingStarted/what-is-apache-doris/))
- [Distributed Query Processing](https://awesome-repositories.com/f/data-databases/distributed-query-processing.md) — Executes distributed analytical queries across multiple nodes to optimize performance for massive datasets. ([source](https://doris.apache.org/docs/4.x/gettingStarted/what-is-apache-doris/))
- [Concurrent Query Processing](https://awesome-repositories.com/f/data-databases/high-performance-ingestion/concurrent-query-processing.md) — Supports high-concurrency analytical query processing, handling thousands of requests per second for enterprise-scale operations. ([source](https://doris.apache.org/docs/4.x/gettingStarted/what-is-apache-doris/))
- [Hybrid Search](https://awesome-repositories.com/f/data-databases/hybrid-search.md) — Combines structured analytics, full-text filtering, and vector similarity search within a single query for advanced data retrieval. ([source](https://doris.apache.org/docs/4.x/gettingStarted/what-is-apache-doris/))
- [Parallel Processing](https://awesome-repositories.com/f/data-databases/parallel-processing.md) — Distributes complex analytical workloads across a cluster of nodes for parallel execution.
- [Storage-Compute Architectures](https://awesome-repositories.com/f/data-databases/storage-compute-architectures.md) — Separates compute and storage layers to enable independent resource scaling based on workload demands.
- [Vector Databases](https://awesome-repositories.com/f/data-databases/vector-databases.md) — Integrates vector similarity search directly into the database engine to enable semantic analysis alongside structured relational data.
- [Vectorized Execution Engines](https://awesome-repositories.com/f/data-databases/vectorized-execution-engines.md) — Processes batches of data rows using CPU-friendly instructions to maximize analytical throughput.
- [Data Ingestion](https://awesome-repositories.com/f/data-databases/data-engineering-infrastructure/data-extraction-ingestion/data-ingestion.md) — Captures and processes incoming information at second-level intervals for immediate availability.
- [Schema Evolution](https://awesome-repositories.com/f/data-databases/data-governance-modeling/data-modeling-schemas/schema-evolution.md) — Supports real-time updates to table structures without requiring data migration or system downtime.
- [Data Schema Management](https://awesome-repositories.com/f/data-databases/data-schema-management.md) — Manages dynamic data schemas by supporting semi-structured data and rapid modifications. ([source](https://doris.apache.org/docs/4.x/gettingStarted/what-is-apache-doris/))
- [Resource Scaling Strategies](https://awesome-repositories.com/f/data-databases/horizontal-database-scaling/resource-scaling-strategies.md) — Enables elastic resource scaling by adjusting storage and compute capacity to balance performance requirements. ([source](https://doris.apache.org/docs/4.x/gettingStarted/what-is-apache-doris/))
- [Search and Indexing](https://awesome-repositories.com/f/data-databases/search-indexing-technologies/search-indexing/search-and-indexing.md) — Combines inverted indexes, bloom filters, and zone maps to prune irrelevant data and accelerate search.

### Software Engineering & Architecture

- [Shared-Nothing Architectures](https://awesome-repositories.com/f/software-engineering-architecture/shared-nothing-architectures.md) — Maintains independent node states to eliminate central bottlenecks and ensure linear scalability.

### Artificial Intelligence & ML

- [Knowledge Base Retrieval](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-rag-development/knowledge-base-retrieval.md) — Facilitates intelligent document retrieval and context-aware responses by storing and querying enterprise knowledge base data. ([source](https://doris.apache.org/docs/4.x/gettingStarted/what-is-apache-doris/))
