1 repositorio
Performs partitioning, sorting, and resizing on raw input files using distributed jobs prior to segment creation.
Distinct from Data Ingestion Optimization: Distinct from Data Ingestion Optimization: focuses on distributed job-based preprocessing rather than block-level ingestion tuning.
Explore 1 awesome GitHub repository matching data & databases · Distributed Preprocessing. Refine with filters or upvote what's useful.
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Performs partitioning, sorting, and resizing on raw input files using distributed jobs to optimize data layout prior to segment creation.