30 open-source projects similar to apache/calcite, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Calcite alternative.
Presto is a distributed SQL query engine designed for high-performance analytical processing across heterogeneous data sources. It functions as a data federation platform and massively parallel processing engine, allowing users to execute interactive queries against diverse storage systems without requiring data migration. By mapping remote metadata and structures to a unified relational namespace, it enables seamless cross-platform analysis through a standard SQL interface. The engine distinguishes itself through a pluggable connector architecture and a shared-nothing distributed processing
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
AlaSQL is a JavaScript SQL database engine that allows for the filtering, grouping, and joining of in-memory object arrays and JSON data. It functions as an in-memory SQL database and client-side data processor, enabling the execution of SQL statements against JavaScript arrays and external data sources in both browser and server environments. The project serves as a universal data query tool capable of performing relational joins across diverse sources, such as merging Google Spreadsheets, SQLite files, and remote APIs into a single result set. It also acts as an IndexedDB SQL wrapper, allow
Apache DataFusion is an extensible, columnar SQL query engine that runs embedded within a host application without requiring a separate server process. It processes data in columnar batches using Apache Arrow for memory-efficient analytics, and can scale analytic workloads across multiple nodes for parallel execution. The engine supports both SQL and DataFrame queries through a modular, streaming architecture that allows custom operators, data sources, functions, and optimizer rules. The engine distinguishes itself through its modular extension framework, which enables building custom query e
Apache Hive is a SQL-on-Hadoop data warehouse that enables querying and managing petabytes of data stored in distributed storage such as HDFS and cloud storage services. It provides a familiar SQL interface for batch analytics and reporting, supported by a core set of components including the HiveServer2 Thrift service for remote query execution, the Hive Metastore Service for central metadata management, the Hive ACID Transaction Engine for concurrent read-write operations, and the Hive LLAP Interactive Engine for low-latency analytical processing. The WebHCat REST API offers an HTTP interfac
Trino is a distributed SQL query engine designed for large-scale data analytics. It functions as a data federation platform, providing a unified interface that allows users to execute complex analytical queries across multiple heterogeneous data sources simultaneously without requiring data movement or transformation. The engine utilizes a massively parallel processing architecture to scale compute resources across clusters for high-speed data retrieval. It distinguishes itself through a cost-based query optimizer that analyzes metadata to determine efficient execution plans, alongside dynami
dbt-core is a command-line framework for transforming data within a warehouse using modular SQL and version control. It functions as a data transformation engine that enables users to define data structures and business logic through declarative configuration files, which the system then compiles into executable code. By managing complex data dependencies through a directed acyclic graph, it ensures that transformation tasks execute in the correct order while maintaining a manifest-driven state to track lineage and execution history. The project distinguishes itself through an adapter-based d
Gravitino is a federated metadata lake and unified data catalog designed to manage tables, files, and AI models across diverse data sources and cloud storage. It serves as a centralized interface for governing schemas, access controls, and tagging across relational databases, messaging queues, and object stores. The project distinguishes itself by unifying the management of AI assets, such as machine learning models and their version lineages, alongside traditional tabular data. It also implements the Iceberg REST specification to provide a standardized metadata server and proxy for lakehouse
StarRocks is a distributed SQL OLAP database engine designed for real-time analytics and high-performance multi-dimensional analysis. It functions as a data lakehouse query engine that enables SQL execution across large datasets and external open table formats without requiring local data imports. The system employs a shared-nothing distributed architecture and utilizes the MySQL protocol to integrate with business intelligence tools. It maintains real-time data consistency through a primary key upsert model and accelerates query response times using vectorized execution and cost-based optimi
This project is an educational resource and technical manual for Apache Spark, focused on the architecture and practical application of large-scale data processing. It serves as a guide for big data engineering and distributed computing, covering the principles of parallel processing and fault-tolerant data distribution. The material provides instructional content on designing distributed ETL pipelines and implementing data analysis workflows. It includes tutorials for polyglot data processing, offering patterns and examples for using Python, Scala, and Java within a unified environment. The
Soar is a suite of specialized tools designed for analyzing MySQL performance, advising on indexing, and optimizing SQL syntax. It functions as a performance analyzer, index advisor, and query optimizer to identify bottlenecks and suggest structural improvements for faster execution. The project distinguishes itself through a system for rewriting SQL statements into optimized equivalent versions using custom heuristic rules and patterns. It also features a dedicated index advisor that evaluates query patterns and database metadata to recommend the creation of new indexes. Its broader capabil
Drift is a type-safe SQL persistence library and relational mapper that provides a structured way to map database tables to classes and execute SQL queries with build-time validation. It functions as a type-safe query builder and a wrapper for SQLite and PostgreSQL, eliminating manual result set parsing by binding query outputs to native objects. The project distinguishes itself through a build-time code generation system that produces type-safe APIs and validates raw SQL statements against database versions before execution. It features reactive query streaming, which transforms SQL queries
Hazelcast is a distributed data platform that combines an in-memory data grid with a stream processing engine to support real-time analytics and event-driven applications. It functions as a partitioned, distributed key-value store that replicates data across cluster nodes to provide low-latency access and high availability. The platform also serves as a distributed SQL query engine, allowing users to execute standard SQL statements against both in-memory datasets and external data sources. What distinguishes Hazelcast is its use of a distributed consensus subsystem to maintain strongly consis
SQLDelight is a Kotlin database library that validates SQL schema, statements, and migrations at compile time, generating type-safe Kotlin query functions from labeled SQL files. It treats SQL as the source of truth for database definitions, catching schema errors during the build process before they reach production. The library supports multiple database dialects including SQLite, MySQL, PostgreSQL, HSQL, and H2, and generates platform-specific code for Android, iOS, JVM, and JavaScript targets. It provides a platform-specific driver abstraction that handles database connectivity difference
sqlc is a SQL compiler and code generator that creates type-safe database client code from raw SQL queries. It transforms SQL statements into typed definitions and functions, eliminating the need for manual row mapping between database results and application structures. The tool ensures compile-time safety by validating SQL queries against the database schema before the application is run. This workflow integrates the database schema directly into the application code, deriving types from the underlying SQL definitions to prevent runtime errors. The system utilizes AST-based query analysis
Apache Spark is a unified distributed data processing engine designed for large-scale data analysis and computation graphs. It functions as a distributed machine learning framework, a graph processing system, a real-time stream processor, and a SQL analytics engine. The system enables the execution of distributed SQL querying, large-scale graph analysis, and real-time stream analytics across clusters of machines. It also provides a scalable environment for implementing machine learning algorithms and predictive model development on massive datasets. The engine incorporates relational query e
Records is a SQL database client designed for executing raw queries and managing result sets through a simplified interface. It provides a parameterized SQL executor to bind values to placeholders, ensuring safe data handling and preventing injection attacks, alongside a database transaction manager for grouping operations into atomic units. The project includes a dedicated command-line interface for running database statements and exporting query results directly to local files. This tooling allows for the conversion of SQL result sets into multiple serialization formats, including CSV, JSON
JimuReport is an open-source reporting and dashboard engine designed to be embedded directly into Spring Boot applications. Its core identity centers on generating data reports and full-screen dashboards from natural language descriptions, eliminating the need for manual design. The platform also provides a conversational query interface that translates plain-language questions into database queries, returning results as tables and charts without requiring SQL knowledge. What distinguishes JimuReport is its integration of AI skills that can be installed with a single command, enabling report
node-sqlite3 is a relational database client and a set of native bindings that allow Node.js applications to interact with SQLite databases. It functions as a C++ native addon, linking JavaScript to the SQLite C library to manage data stored in local files or in-memory stores. The project includes optional support for SQLCipher, enabling page-level encryption to secure local database files. The driver covers a wide range of database management capabilities, including executing SQL queries with parameter binding, managing connections to database files, and preparing statements for repeated ex
Chartbrew is a self-hosted business intelligence platform and data visualization engine designed to transform raw data from SQL databases and external API endpoints into interactive charts and dashboards. It serves as a tool for building analytics dashboards that monitor business metrics and KPIs through a privately hosted environment. The platform distinguishes itself with an embedded analytics workflow, allowing users to generate secure, time-limited shared links and iframes to display private charts on external websites. It also provides programmatic chart generation via API and integrates
Sqlectron-gui is a cross-platform database manager and SQL client. It provides a graphical interface for organizing server connections and executing SQL commands across various operating systems. The tool functions as a multi-database query environment, allowing users to connect to and interact with diverse relational database systems from a single interface. The application covers database server administration by saving and organizing connection details for multiple environments. It also includes capabilities for managing database connections and executing SQL queries.
sqlglot is a SQL parser and transpiler that represents queries as abstract syntax trees to enable structural analysis, modification, and semantic transformation. It functions as a dialect translator and query optimizer, converting SQL code between different database engines and simplifying syntax trees through rule-based normalization. The project provides a framework for defining custom SQL dialects by overriding tokenizers, parsers, and generators. It includes a lineage analyzer to track data flow from source tables through complex queries to identify the origin of specific columns. Additi
Octosql is a federated SQL query engine, data transformer, and streaming SQL processor. It allows users to execute single SQL statements across multiple disparate data sources, including different database types and file formats, to merge and transform results into a unified set. The system distinguishes itself by treating CSV, JSONLines, and Parquet files as virtual tables and utilizing a plugin-based architecture to extend connectivity to external storage engines. It functions as a streaming processor for infinite data streams, using watermarks, retractions, and tumbling windows to maintain
Doris is a distributed SQL data warehouse designed for high-performance analytical workloads and real-time data processing. It functions as a unified platform that integrates traditional relational warehousing with lakehouse query capabilities, allowing users to execute analytical operations directly against external data lakes without requiring data migration. The system distinguishes itself through a shared-nothing, massively parallel processing architecture that utilizes vectorized query execution and columnar storage to maintain sub-second latency. It supports dynamic schema evolution, en
Franchise is a database query tool and notebook SQL client that allows users to run queries and analyze datasets. It functions as a local data processor with a browser-based engine for executing SQL commands against CSV, JSON, and XLSX files without uploading data to a remote server. The project uses a cell-based interface to organize queries and results in an interactive, document-like layout. It supports a workflow where users can fork queries into side-by-side layouts to compare different SQL variations and their results without overwriting existing code. The system provides a unified int
koa2-note is a project focused on Koa2 web server development and Node.js asynchronous programming. It provides a framework for building web servers and APIs using an asynchronous middleware pipeline to handle request and response cycles. The project emphasizes a layered backend architecture that decouples routing, business services, and data models. It distinguishes itself through the integration of relational databases for persisting user sessions and application data, alongside a build process that includes JSX-to-JavaScript compilation for frontend assets. The capability surface covers b
CloudBeaver is a web-based database manager and cloud database IDE that provides a centralized browser interface for administering SQL and NoSQL databases. It functions as a multi-database administration tool and an RBAC database access gateway, allowing users to manage diverse relational and document-based database engines through a single server-based installation. The platform distinguishes itself by integrating an artificial intelligence assistant for natural language SQL generation and optimization. It further supports collaborative data engineering by coordinating database operations ac
Briefer is an interactive data notebook platform and business intelligence dashboard tool used for collaborative data analysis and reporting. It provides a containerized environment for building reports that combine SQL, Python, and Markdown with native visualizations. The platform features an integrated code assistant that uses large language models to generate SQL and Python snippets from natural language prompts. It is designed as a Kubernetes data application, deploying via Helm charts to manage isolated compute environments and ensure separate resources per page through pod-based isolati
Fava is a web-based dashboard and query tool for visualizing and analyzing financial records stored in Beancount plain-text ledger files. It serves as a double-entry bookkeeping viewer and plain-text accounting dashboard that renders ledger files as interactive reports, searchable financial tables, and visual tools for exploring balance sheets and income statements. The project distinguishes itself through a specialized BQL query interface that executes SQL-like queries against postings to extract specific financial data and trends. It includes a financial data visualization system for genera
LazySQL is a terminal user interface database manager and SQL client. It functions as a query runner and connection manager for interacting with SQL databases from the command line. The project features a read-only connection mode that blocks mutation commands to prevent accidental data loss. It supports automated pre-connection tasks, including the execution of shell commands and the establishment of SSH tunnels, and allows for both global and project-specific configuration. The interface provides a tree-based schema browser for navigating tables, a dedicated SQL query editor with tabular r