Why is clickhouse/clickhouse a recommended Data Warehousing GitHub Repositories repository?

Enables storage and analysis of large-scale datasets with high-performance query execution and optimized infrastructure costs.

Why is apache/doris a recommended Data Warehousing GitHub Repositories repository?

Handles thousands of simultaneous analytical queries per second for enterprise-scale workloads.

Why is databendlabs/databend a recommended Data Warehousing GitHub Repositories repository?

Implements a serverless data warehouse architecture that scales compute automatically and separates it from storage.

Why is redpanda-data/connect a recommended Data Warehousing GitHub Repositories repository?

Syncs streaming data to large-scale analytics warehouses and table catalogs for high-performance analytical queries.

Why is apache/pinot a recommended Data Warehousing GitHub Repositories repository?

Unifies real-time streaming and historical batch datasets into a single queryable interface for consistent business intelligence.

Why is apache/hive a recommended Data Warehousing GitHub Repositories repository?

Provides a SQL-on-Hadoop data warehouse for querying petabytes of distributed data.

Why is janusgraph/janusgraph a recommended Data Warehousing GitHub Repositories repository?

Runs full-graph processing jobs as MapReduce or Spark tasks on a Hadoop cluster for offline computation.

Why is apache/hbase a recommended Data Warehousing GitHub Repositories repository?

Implements a distributed NoSQL wide-column store built on top of the Hadoop ecosystem for sparse datasets.

Why is yalantis/side-menu.android a recommended Data Warehousing GitHub Repositories repository?

Builds data warehousing and analytics pipelines to process large datasets using scalable storage.

13 مستودعات

Awesome GitHub RepositoriesData Warehousing

Platforms designed for large-scale data storage and high-performance analytical query execution.

Distinguishing note: None available; no candidates provided.

Explore 13 awesome GitHub repositories matching data & databases · Data Warehousing. Refine with filters or upvote what's useful.

اعثر على أفضل المستودعات باستخدام الذكاء الاصطناعي.سنبحث عن أفضل المستودعات المطابقة باستخدام الذكاء الاصطناعي.

clickhouse/clickhouse
ClickHouse/ClickHouse
48,229عرض على GitHub
ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring. The platform distinguishes itself through ad
Enables storage and analysis of large-scale datasets with high-performance query execution and optimized infrastructure costs.
C++aianalyticsbig-data
عرض على GitHub48,229
vonng/ddia
Vonng/ddia
22,648عرض على GitHub
This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent in modern data infrastructure. The repository explores the mechanics of distributed data management, including strategies for replication, partitioning, and achieving consensus across multiple nodes. It details the design of storage engines, indexing techniques, and transaction management models, whi
Provides platforms designed for large-scale data storage and high-performance analytical query execution.
Pythonbookdatabaseddia
عرض على GitHub22,648
apache/doris
apache/doris
15,526عرض على GitHub
Doris is a distributed SQL data warehouse designed for high-performance analytical workloads and real-time data processing. It functions as a unified platform that integrates traditional relational warehousing with lakehouse query capabilities, allowing users to execute analytical operations directly against external data lakes without requiring data migration. The system distinguishes itself through a shared-nothing, massively parallel processing architecture that utilizes vectorized query execution and columnar storage to maintain sub-second latency. It supports dynamic schema evolution, en
Handles thousands of simultaneous analytical queries per second for enterprise-scale workloads.
Javaagentaibigquery
عرض على GitHub15,526
databendlabs/databend
databendlabs/databend
9,351عرض على GitHub
Databend is a cloud-native data warehouse and OLAP database designed for large-scale analytics. It functions as a SQL-compliant engine and serverless analytics platform that separates compute from storage to allow for independent scaling. The system integrates vector database capabilities, indexing high-dimensional embeddings to enable semantic, hybrid, and full-text searches across massive datasets. It further distinguishes itself through serverless compute management that automatically scales resources based on demand and shuts them down during idle periods. The platform covers a broad set
Implements a serverless data warehouse architecture that scales compute automatically and separates it from storage.
Rustaibigdatacloud-native
عرض على GitHub9,351
redpanda-data/connect
redpanda-data/connect
8,681عرض على GitHub
Connect is a Kafka data integration platform and stream processing engine used to build declarative pipelines that move and transform messages between Kafka topics and external sources. It functions as a Kafka Connect framework and a change data capture tool, streaming real-time database modifications to synchronize data across distributed environments. The project differentiates itself through a dedicated mapping language for mutating and reshaping message payloads and the ability to execute custom processing logic within a sandboxed WebAssembly runtime. It also provides an observability pip
Syncs streaming data to large-scale analytics warehouses and table catalogs for high-performance analytical queries.
Goamqpcqrsdata-engineering
عرض على GitHub8,681
apache/pinot
apache/pinot
6,098عرض على GitHub
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Unifies real-time streaming and historical batch datasets into a single queryable interface for consistent business intelligence.
Java
عرض على GitHub6,098
apache/hive
apache/hive
6,012عرض على GitHub
Apache Hive is a SQL-on-Hadoop data warehouse that enables querying and managing petabytes of data stored in distributed storage such as HDFS and cloud storage services. It provides a familiar SQL interface for batch analytics and reporting, supported by a core set of components including the HiveServer2 Thrift service for remote query execution, the Hive Metastore Service for central metadata management, the Hive ACID Transaction Engine for concurrent read-write operations, and the Hive LLAP Interactive Engine for low-latency analytical processing. The WebHCat REST API offers an HTTP interfac
Provides a SQL-on-Hadoop data warehouse for querying petabytes of distributed data.
Javaapachebig-datadatabase
عرض على GitHub6,012
janusgraph/janusgraph
JanusGraph/janusgraph
5,799عرض على GitHub
JanusGraph is a distributed, elastically scalable graph database designed to store and query highly connected data across a cluster of machines. It supports the property graph data model with ACID consistency and integrates multi-model search capabilities including geo, numeric range, and full-text queries. The database also includes a Graph OLAP engine for running batch analytics and global graph computations on large datasets using the Hadoop framework. The project distinguishes itself through a masterless cluster architecture that eliminates single points of failure, allowing every node to
Runs full-graph processing jobs as MapReduce or Spark tasks on a Hadoop cluster for offline computation.
Javabigtablecassandraelasticsearch
عرض على GitHub5,799
apache/hbase
apache/hbase
5,540عرض على GitHub
HBase هو مخزن NoSQL موزع واسع الأعمدة ومحرك تخزين بيانات ضخمة مصمم لمجموعات البيانات المتفرقة. يعمل كقاعدة بيانات عمودية قابلة للتوسع مبنية فوق نظام ملفات Hadoop الموزع لتوفير وصول للقراءة والكتابة في الوقت الفعلي لأحجام هائلة من البيانات المهيكلة وغير المهيكلة. يعمل النظام كبوابة قاعدة بيانات عبر اللغات، ويوفر الاتصال من خلال استدعاءات الإجراءات البعيدة الأصلية، وREST، وواجهات Thrift. ويتميز بنموذج تنسيق رئيس-عامل يتيح التوسع الأفقي وتحمل الأخطاء عبر العنقود. يغطي المشروع مجموعة واسعة من الإمكانيات بما في ذلك التحكم الدقيق في الوصول عبر تسميات الرؤية على مستوى الخلية، وضغط البيانات القابل للتوصيل، وتجميع البيانات من جانب الخادم. كما يدعم سير عمل تحليلات البيانات الضخمة من خلال تكامل map-reduce ويسمح بتنفيذ منطق مخصص من جانب الخادم. يتم توفير المراقبة التشغيلية من خلال تتبع مقاييس النظام وتصدير المقاييس القائم على الإضافات.
Implements a distributed NoSQL wide-column store built on top of the Hadoop ecosystem for sparse datasets.
Java
عرض على GitHub5,540
yalantis/side-menu.android
Yalantis/Side-Menu.Android
5,212عرض على GitHub
Side-Menu.Android هو مكون واجهة مستخدم قابل لإعادة الاستخدام لتطبيقات Android يوفر درج تنقل منزلق. تم تصميمه لمساعدة المطورين على تنظيم أقسام التطبيق وخيارات المستخدم في لوحة مخفية منظمة تحافظ على واجهة نظيفة لمنطقة المحتوى الأساسية. يتميز المكون بعرضه المرئي، الذي يتبع إرشادات Material Design لضمان تجربة مستخدم متسقة وبديهية. يتميز بتسلسل هرمي للقائمة يعتمد على البيانات يسمح بالتجميع المنطقي لعناصر التنقل، ويدمج رسوماً متحركة دائرية انسيابية لتوفير انتقالات مرئية مصقولة عند فتح القائمة أو إغلاقها. من خلال تغليف منطق التخطيط والتفاعل المعقد في فئة واحدة معيارية، تبسط المكتبة تنفيذ التنقل عبر شاشات متعددة. تدعم الانتقالات المعتمدة على الأحداث، مما يسمح للمطورين بفصل تفاعلات القائمة عن تحديثات المحتوى للحفاظ على بنية تطبيق نظيفة وسريعة الاستجابة.
Builds data warehousing and analytics pipelines to process large datasets using scalable storage.
Javaandroidanimationdrawer-layout
عرض على GitHub5,212
moabukar/tech-vault
moabukar/tech-vault
3,351عرض على GitHub
tech-vault is a command-line technical interview bank and knowledge base designed for practicing engineering questions across various technical domains. It functions as a terminal-based application that stores structured study materials and interview questions as markdown files, which are then rendered directly within the system console. The project distinguishes itself through a delivery model that uses command-line argument parsing to filter content by topic or difficulty. It also includes a random selection algorithm to pick individual questions from the collection for spontaneous study se
Offers practice materials covering data modeling, schema design, and data warehousing concepts.
HCL
عرض على GitHub3,351
openaddresses/openaddresses
openaddresses/openaddresses
3,113عرض على GitHub
OpenAddresses is an open-source geospatial data aggregator and directory that collects public domain and open-license address, parcel, and building datasets from governments and organizations worldwide. It functions as a global index and data warehouse for locating and distributing free geospatial records. The project operates a normalization pipeline that cleans and standardizes diverse source formats into a consistent global coordinate and attribute schema. This process includes a crowdsourced curation pipeline and programmatic quality validation to verify the spatial accuracy and formattin
Utilizes large-scale data storage to handle the global distribution of massive geospatial records.
JavaScriptaddressesgeocodinghacktoberfest
عرض على GitHub3,113
cve-search/cve-search
cve-search/cve-search
2,593عرض على GitHub
cve-search is a vulnerability search engine and database manager designed to index, synchronize, and query CVE and CPE security vulnerability data. It functions as a security data warehouse that imports vulnerability feeds into a local database to enable fast, keyword-based discovery of security flaws. The project provides a web-based vulnerability browser and a programmatic JSON API for retrieving records and risk scores. It utilizes full-text indexing for vulnerability descriptions and implements an identity-verified security portal using the OpenID Connect standard for user authentication.
Functions as a security data warehouse by importing and indexing large sets of vulnerability information.
Pythoncommon-vulnerabilitiescpecve
عرض على GitHub2,593

Awesome Data Warehousing GitHub Repositories

ClickHouse/ClickHouse

Vonng/ddia

apache/doris

databendlabs/databend

redpanda-data/connect

apache/pinot

apache/hive

JanusGraph/janusgraph

apache/hbase

Yalantis/Side-Menu.Android

moabukar/tech-vault

openaddresses/openaddresses

cve-search/cve-search

استكشف الوسوم الفرعية