3 repository-uri
Integration layers that leverage high-performance engines for efficient data manipulation and sorting.
Distinguishing note: Focuses on the integration of external high-performance engines for specific data operations rather than general database management.
Explore 3 awesome GitHub repositories matching data & databases · Data Processing Engines. Refine with filters or upvote what's useful.
ClickHouse is a high-performance, columnar analytical database designed for real-time query execution and large-scale data aggregation. It functions as a distributed data warehouse capable of processing petabytes of information, while also providing an embedded engine that integrates directly into applications for native query capabilities without external dependencies. The system is built to handle high-throughput ingestion and complex analytical workloads, delivering millisecond-level latency for interactive dashboards and operational monitoring. The platform distinguishes itself through ad
Processes diverse data formats including Parquet, CSV, JSON, and Arrow to ensure broad interoperability across external sources.
Ray is a distributed computing framework designed to scale Python and Java applications across clusters by abstracting task scheduling and resource management. It functions as a resource-aware execution engine that manages task dependencies, placement, and fault tolerance across networked compute nodes. At its core, the system provides a stateful actor model, allowing developers to define classes that run in dedicated processes to maintain and mutate internal state across remote method calls. The framework distinguishes itself through a robust cross-language interoperability layer, enabling f
Utilizes high-performance engines for internal sorting operations to improve performance on large tabular datasets.
Enso is a visual dataflow programming environment and multi-language data processing engine that compiles Enso, Python, Java, and JavaScript into a unified representation with a shared memory model for zero-overhead inter-language calls. It functions as a self-service data preparation and analysis platform where users can build data pipelines by connecting nodes in a graph, switching between a no-code visual interface and a code view while keeping all changes reviewable. The platform also serves as a cloud data workflow scheduler and API exposer, allowing workflows to run on a timetable or be
Compiles Enso, Python, Java, and JavaScript into a unified representation for zero-overhead inter-language calls.