# citusdata/citus

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/citusdata-citus).**

12,562 stars · 779 forks · C · AGPL-3.0

## Links

- GitHub: https://github.com/citusdata/citus
- Homepage: https://www.citusdata.com
- awesome-repositories: https://awesome-repositories.com/repository/citusdata-citus.md

## Topics

`citus` `citus-extension` `database` `database-cluster` `distributed-database` `multi-tenant` `postgres` `postgresql` `relational-database` `scale` `sharding` `sql`

## Description

Citus is a PostgreSQL extension that transforms a standard database into a distributed system. It functions as a sharding framework and distributed SQL engine, enabling horizontal scaling by partitioning tables across a cluster of nodes. By utilizing a coordinator-worker topology, the system manages metadata and routes queries to the appropriate nodes, allowing for parallel execution of complex operations across distributed data shards.

The platform distinguishes itself through its specialized support for multi-tenant architectures and real-time analytical processing. It enables tenant-based distribution and schema-based sharding, which allows for the isolation of tenant data and the migration of high-volume workloads to dedicated nodes. To accelerate analytical performance, the system integrates columnar storage with data compression and supports pre-aggregated rollups, ensuring that large-scale datasets remain performant as the cluster grows.

Beyond its core distribution capabilities, the project provides comprehensive tools for cluster administration and data lifecycle management. It automates shard rebalancing, schema propagation via a two-phase commit protocol, and the maintenance of time-based partitions. The system also includes diagnostic utilities for monitoring query performance, detecting resource contention, and analyzing index usage across the distributed environment.

## Tags

### Data & Databases

- [Distributed Extensions](https://awesome-repositories.com/f/data-databases/postgresql-extensions/distributed-extensions.md) — Transforms standard PostgreSQL into a distributed system by sharding tables and parallelizing queries across multiple nodes.
- [Row-Based Sharding](https://awesome-repositories.com/f/data-databases/database-sharding/row-based-sharding.md) — Distributed database systems split tables into shards based on a distribution column to maximize hardware efficiency and tenant density in multi-tenant database environments. ([source](https://docs.citusdata.com/en/stable/get_started/concepts.html))
- [Distributed Relational Databases](https://awesome-repositories.com/f/data-databases/distributed-relational-databases.md) — Transforms standard databases into distributed systems to enable horizontal scaling and parallel query processing while maintaining relational compatibility. ([source](https://docs.citusdata.com/en/stable/installation/single_node.html))
- [Table Co-location](https://awesome-repositories.com/f/data-databases/distributed-relational-databases/table-co-location.md) — Distributed database systems co-locate related data across worker nodes using a shared distribution column to enable efficient parallel processing and cross-table operations for specific users. ([source](https://docs.citusdata.com/en/stable/articles/aggregation.html))
- [Distributed SQL Engines](https://awesome-repositories.com/f/data-databases/distributed-sql-engines.md) — Functions as a distributed SQL engine that enables horizontal scaling and parallel query execution across a cluster.
- [Horizontal Database Scaling](https://awesome-repositories.com/f/data-databases/horizontal-database-scaling.md) — Transforms a standard database into a horizontally scalable system by partitioning tables across multiple nodes for parallel query processing.
- [Multi-Tenant Data Management](https://awesome-repositories.com/f/data-databases/multi-tenant-data-management.md) — Provides structural isolation of tenant data across shards and nodes to ensure consistent performance for large-scale applications.
- [Sharding Orchestration](https://awesome-repositories.com/f/data-databases/sharding-orchestration.md) — Automates data distribution, shard rebalancing, and schema propagation across a fleet of database nodes.
- [Columnar Storage Engines](https://awesome-repositories.com/f/data-databases/columnar-storage-engines.md) — Integrates compressed columnar storage to optimize analytical scan performance and reduce disk storage requirements.
- [Distributed Query Processing](https://awesome-repositories.com/f/data-databases/distributed-query-processing.md) — Routes and executes database operations across multiple nodes simultaneously to accelerate analytical workloads. ([source](https://cdn.jsdelivr.net/gh/citusdata/citus@main/README.md))
- [Schema-Based Sharding](https://awesome-repositories.com/f/data-databases/distributed-relational-databases/schema-based-sharding.md) — Distributed database systems distribute data by assigning individual database schemas to specific nodes, allowing multi-tenant applications to scale without modifying existing query logic. ([source](https://cdn.jsdelivr.net/gh/citusdata/citus@main/README.md))
- [Tenant Request Routing](https://awesome-repositories.com/f/data-databases/multi-tenant-data-management/tenant-request-routing.md) — Distributed database systems migrate specific high-volume tenants to dedicated nodes to ensure consistent performance and resource availability for critical workloads. ([source](https://docs.citusdata.com/en/stable/use_cases/multi_tenant.html))
- [Real-Time Analytics](https://awesome-repositories.com/f/data-databases/real-time-analytics.md) — Accelerates complex analytical queries on massive datasets using parallel execution and columnar storage.
- [Parallel Query Execution](https://awesome-repositories.com/f/data-databases/database-management-systems/database-systems-management/database-operations/sql-query-execution/parallel-query-execution.md) — Decomposes complex SQL statements into fragments and pushes them to worker nodes for concurrent execution to maximize throughput.
- [Online Table Distribution](https://awesome-repositories.com/f/data-databases/distributed-databases/online-table-distribution.md) — Distributed database systems convert existing local tables into distributed ones without blocking application read or write operations during the migration process. ([source](https://cdn.jsdelivr.net/gh/citusdata/citus@main/README.md))
- [Distributed Sharding Architectures](https://awesome-repositories.com/f/data-databases/distributed-sharding-architectures.md) — Partitions data into logical shards mapped to specific physical nodes using distribution columns to enable horizontal scaling.
- [Query Optimizers](https://awesome-repositories.com/f/data-databases/query-optimizers.md) — Pushes operations to worker nodes based on distribution columns to minimize data movement and maximize parallel computation. ([source](https://docs.citusdata.com/en/stable/performance/performance_tuning.html))
- [Time-Series Data Modeling](https://awesome-repositories.com/f/data-databases/time-series-data-modeling.md) — Automates the lifecycle of time-based partitions to efficiently store and expire large volumes of historical data.
- [Coordinator-Worker Topologies](https://awesome-repositories.com/f/data-databases/cluster-topology-management/coordinator-worker-topologies.md) — Manages metadata and query routing on a central node while worker nodes execute parallel operations on partitioned shards.
- [Automated Rollup Engines](https://awesome-repositories.com/f/data-databases/data-query-management/automated-rollup-engines.md) — Creates pre-aggregated tables at defined time intervals to speed up analytical queries and reduce storage requirements. ([source](https://docs.citusdata.com/en/stable/use_cases/realtime_analytics.html))
- [Distributed Query Routers](https://awesome-repositories.com/f/data-databases/query-middleware/distributed-query-routers.md) — Identifies the specific node containing required data based on distribution keys to minimize data movement during query execution.
- [Schema Propagation Protocols](https://awesome-repositories.com/f/data-databases/schema-synchronizers/schema-propagation-protocols.md) — Distributed database systems apply schema modifications from a central coordinator to all worker nodes automatically using a two-phase commit protocol to maintain structural consistency. ([source](https://docs.citusdata.com/en/stable/use_cases/multi_tenant.html))
- [Data Compression Algorithms](https://awesome-repositories.com/f/data-databases/data-compression-algorithms.md) — Distributed database systems store table data in a columnar format to reduce disk usage and speed up scan-heavy analytical queries across large datasets. ([source](https://cdn.jsdelivr.net/gh/citusdata/citus@main/README.md))
- [Data Replication](https://awesome-repositories.com/f/data-databases/data-replication.md) — Synchronizes small, shared lookup tables across all worker nodes to allow local joins without network overhead during distributed queries. ([source](https://docs.citusdata.com/en/stable/get_started/concepts.html))
- [Dataset Aggregations](https://awesome-repositories.com/f/data-databases/dataset-aggregations.md) — Distributes insert-select operations across a cluster to pre-compute summaries from large datasets. ([source](https://docs.citusdata.com/en/stable/articles/aggregation.html))
- [Time-Based Partition Automation](https://awesome-repositories.com/f/data-databases/partitioning-algorithms/time-based-partition-automation.md) — Automates the creation and removal of time-based table partitions using scheduled maintenance tasks to ensure continuous data ingestion and efficient retention. ([source](https://docs.citusdata.com/en/stable/use_cases/timeseries.html))
- [Query Performance Tuning](https://awesome-repositories.com/f/data-databases/query-performance-tuning.md) — Inspects query plans and adjusts worker configurations to ensure optimal resource utilization across the cluster. ([source](https://docs.citusdata.com/en/stable/performance/performance_tuning.html))
- [Reference Table Replication](https://awesome-repositories.com/f/data-databases/table-definitions/object-mappings/table-to-code-mappers/reference-table-replication.md) — Copies small lookup tables to every node in the cluster to allow local joins and eliminate cross-node network traffic.
- [Bulk Data Ingestion](https://awesome-repositories.com/f/data-databases/bulk-data-ingestion.md) — Loads large datasets into distributed tables using parallel connections to achieve high throughput and reduce migration time. ([source](https://docs.citusdata.com/en/stable/performance/performance_tuning.html))
- [Cardinality Estimation](https://awesome-repositories.com/f/data-databases/cardinality-estimation.md) — Distributed database systems calculate approximate distinct values using probabilistic data structures to minimize memory usage and network traffic during large-scale analytical operations. ([source](https://docs.citusdata.com/en/stable/use_cases/realtime_analytics.html))
- [Uniqueness Enforcement](https://awesome-repositories.com/f/data-databases/data-management/unique-identifier-generators/uniqueness-enforcement.md) — Ensures data integrity by requiring unique constraints to include the distribution column, allowing local enforcement across independent shards. ([source](https://docs.citusdata.com/en/stable/reference/common_errors.html))
- [Database Connection Managers](https://awesome-repositories.com/f/data-databases/database-connection-managers.md) — Distributed database systems configure connection pooling and execution policies to balance parallelism with network overhead for different workload types. ([source](https://docs.citusdata.com/en/stable/performance/performance_tuning.html))
- [Metadata Inspection Tools](https://awesome-repositories.com/f/data-databases/distributed-databases/metadata-inspection-tools.md) — Retrieves configuration details such as distribution columns, shard sizes, and schema locations across the distributed environment. ([source](https://docs.citusdata.com/en/stable/admin_guide/diagnostic_queries.html))
- [Hybrid Partitioning](https://awesome-repositories.com/f/data-databases/partitioning-algorithms/hybrid-partitioning.md) — Distributed database systems combine row-based and columnar storage formats within a single partitioned table structure to balance performance needs across different data segments. ([source](https://github.com/citusdata/citus/blob/master/src/backend/columnar/README.md))

### System Administration & Monitoring

- [Shard Rebalancing](https://awesome-repositories.com/f/system-administration-monitoring/cluster-management/shard-rebalancing.md) — Distributed database systems move data shards between nodes to equalize storage distribution and maintain performance as the cluster grows or hardware is added. ([source](https://docs.citusdata.com/en/stable/use_cases/multi_tenant.html))
- [Database Performance Monitors](https://awesome-repositories.com/f/system-administration-monitoring/database-performance-monitors.md) — Tracks active queries, wait events, and hit rates across all nodes to optimize workload efficiency. ([source](https://docs.citusdata.com/en/stable/admin_guide/diagnostic_queries.html))
- [Contention Detection Tools](https://awesome-repositories.com/f/system-administration-monitoring/monitoring-and-observability/contention-detection-tools.md) — Identifies open locks and blocking queries across the cluster to diagnose performance bottlenecks. ([source](https://docs.citusdata.com/en/stable/admin_guide/diagnostic_queries.html))

### Part of an Awesome List

- [Distributed Storage](https://awesome-repositories.com/f/awesome-lists/data/distributed-storage.md) — Extension for scaling PostgreSQL into a distributed database.
- [High Availability and Scaling](https://awesome-repositories.com/f/awesome-lists/devops/high-availability-and-scaling.md) — PostgreSQL extension for distributed queries and data sharding.
- [Extensions and Plugins](https://awesome-repositories.com/f/awesome-lists/devtools/extensions-and-plugins.md) — Scalable cluster extension for real-time workloads.

### DevOps & Infrastructure

- [Database Cluster Orchestration](https://awesome-repositories.com/f/devops-infrastructure/database-cluster-orchestration.md) — Monitors, rebalances, and maintains structural consistency across a cluster of database nodes to ensure high availability and optimal performance.

### Software Engineering & Architecture

- [Deadlock Detection Systems](https://awesome-repositories.com/f/software-engineering-architecture/distributed-transaction-management/deadlock-detection-systems.md) — Identifies and automatically aborts transactions involved in circular dependencies across multiple nodes to prevent stalls. ([source](https://docs.citusdata.com/en/stable/reference/common_errors.html))
