Hbase

Hbase - store massive structured data | Awesome Repos

Features

Columnar Databases - Implements a distributed NoSQL wide-column store built on top of the Hadoop ecosystem for sparse datasets.
Big Data Storage - Functions as a distributed engine for storing and querying massive volumes of structured and unstructured data.
Column Family Management - Organizes sparse data into grouped column families for efficient distributed storage and retrieval.
Hadoop - Integrates with the Hadoop Distributed File System to provide a columnar store for large-scale data analysis.
Distributed File System Backends - Relies on the Hadoop Distributed File System for durable, replicated persistent storage of data files.
Sparse Dataset Management - Provides scalable storage and versioning for massive, sparse, column-oriented datasets across a cluster.
LSM-Tree Storage Engines - Utilizes an LSM-tree storage engine to provide high write throughput via in-memory buffering and sorted flushes.
Region-Based Partitioning - Implements region-based partitioning by splitting the sorted keyspace into contiguous ranges for horizontal scaling.
Wide-Column Stores - Organizes data into column families to provide real-time read and write access to high-scale datasets.
Column-Oriented Disk Storage - Organizes sparse datasets into column-oriented disk storage for scalable, versioned data management.
Distributed Data Stores - Provides a cluster-based storage system with horizontal scaling and fault tolerance for scalable data retrieval.
Cell-Level Controls - Enforces fine-grained access control using visibility labels at the individual data cell level.
Master-Worker Coordination - Employs a master-worker coordination model to manage cluster metadata and region assignments.
Distributed Storage Clusters - Implements a scalable architecture that aggregates multiple nodes into a unified storage system for massive datasets.
Distributed File Systems - Relies on a distributed file system like HDFS for durable and replicated storage of underlying data files.
Big Data Processing - Supports big data processing workflows using map-reduce patterns for large-scale data transformation.
Cross-Language Data Interfaces - Provides consistent data interaction interfaces via native RPC, REST, and Thrift APIs for clients in multiple programming languages.
MapReduce Processing Engines - Integrates with MapReduce processing engines to transform and migrate large volumes of data between tables.
Server-Side Aggregations - Calculates summaries and statistics directly on the server to minimize data transfer to the client.
Application REST API Gateways - Exposes database operations and cluster status through a standardized REST API gateway.
Thrift RPC Servers - Ships a dedicated Thrift server to enable cross-language connectivity for database operations.
Cross-Language Service Gateways - Acts as an entry point that translates REST, Thrift, and RPC requests into internal database protocols.
Storage Block Compression - Applies pluggable block compression to reduce the physical storage footprint of datasets on disk.
Multi-Protocol Communication Bridges - Provides a multi-protocol gateway allowing clients to connect via RPC, HTTP, and Thrift.
Remote Procedure Calls - Uses remote procedure calls for low-latency communication between clients, master nodes, and region servers.
Remote Procedure Call Protocols - Implements structured messaging protocols for standardized communication between cluster nodes and clients.
Database Systems - Distributed big data store modeled after Bigtable.

Open-source alternatives to Hbase

Similar open-source projects, ranked by how many features they share with Hbase.

apache/hadoop
apache/hadoop
15,567View on GitHub
Hadoop is a big data infrastructure suite and distributed data processing framework designed to store and process massive datasets across clusters of computers. It consists of a distributed storage system for managing large files across multiple nodes and a parallel computing engine for processing data across a distributed cluster. The framework implements a distributed file system to ensure fault tolerance and high throughput, paired with a programming model that processes large datasets in parallel. It manages the underlying hardware and software environment required for distributed big dat
Java
View on GitHub15,567
apache/hive
apache/hive
6,012View on GitHub
Apache Hive is a SQL-on-Hadoop data warehouse that enables querying and managing petabytes of data stored in distributed storage such as HDFS and cloud storage services. It provides a familiar SQL interface for batch analytics and reporting, supported by a core set of components including the HiveServer2 Thrift service for remote query execution, the Hive Metastore Service for central metadata management, the Hive ACID Transaction Engine for concurrent read-write operations, and the Hive LLAP Interactive Engine for low-latency analytical processing. The WebHCat REST API offers an HTTP interfac
Javaapachebig-datadatabase
View on GitHub6,012
deepseek-ai/3fs
deepseek-ai/3FS
9,970View on GitHub
3FS is a distributed file system and RDMA storage cluster designed for high-performance AI training and inference workloads. It functions as a strongly consistent storage layer that utilizes a disaggregated architecture to pool SSDs and memory resources across multiple nodes. The system provides specialized storage implementations including an AI training checkpoint store for parallel state preservation and a distributed key-value cache store for decoder layer vectors to optimize inference processing. It ensures data integrity through chain replication and apportioned query distribution. The
C++
View on GitHub9,970
gluster/glusterfs
gluster/glusterfs
5,191View on GitHub
GlusterFS is a software-defined distributed file system and scale-out storage cluster that aggregates disk resources from multiple servers into a single global namespace. It functions as a unified storage platform, allowing the same underlying data to be exposed through file, block, and object storage interfaces. The system distinguishes itself through a decentralized architecture that uses consistent hashing to distribute files across network nodes without a central metadata server. It ensures data integrity and availability using self-healing replication, quorum-based consistency to prevent
C
View on GitHub5,191

See all 30 alternatives to Hbase

apachehbase

Features

Open-source alternatives to Hbase

apache/hadoop

apache/hive

deepseek-ai/3FS

gluster/glusterfs

Star history

Open-source alternatives to Hbase

apache/hadoop

apache/hive

deepseek-ai/3FS

gluster/glusterfs