19 dépôts
Functions for calculating scalar values like sums, counts, and averages over datasets.
Distinguishing note: Existing candidates focused on parallel prefix sums or ML loss, not standard SQL aggregation
Explore 19 awesome GitHub repositories matching data & databases · SQL Aggregate Functions. Refine with filters or upvote what's useful.
SQLite.swift is a type-safe Swift wrapper and object-relational mapping layer that provides a bridge for interacting with SQLite databases. It functions as a database driver that allows for embedded database management and local data persistence within Swift applications. The project distinguishes itself through a type-safe expression builder that verifies SQL statement syntax and intent at compile time. It includes specialized support for high-performance text matching via full-text search integration and provides mechanisms for securing sensitive data through database encryption. The libra
Calculates scalar values such as counts, sums, and averages across filtered datasets.
TextQL is a command line SQL query engine designed to execute relational queries directly against structured text files, such as CSV and TSV, without requiring a database import. It functions as a relational text file analyzer and a CSV processor that treats plain text files as virtual tables for filtering, joining, and aggregating data. The tool is built as a pipe-compatible data transformation utility, allowing it to process data from standard input and output formatted datasets. It enables relational joins across multiple files or directories within a single query to analyze relationships
Provides the ability to extend the query language with custom mathematical, string, and aggregate operations via shared libraries.
This project is a Go language driver for the SQLite database. It provides a relational database interface and a Cgo wrapper that connects Go applications to SQLite for persistent local data storage and query execution. The implementation serves as a provider for JSON document storage and local full-text search. It enables the creation, querying, and modification of JSON data and the implementation of searchable indexes for large text datasets directly within the database. The driver supports standard SQL query execution for both file-based and in-memory storage. It includes capabilities for
Allows the registration of Go functions as custom SQL scalar or aggregate functions via C-to-Go callbacks.
GRDB.swift is a comprehensive SQLite toolkit and object-relational mapper for Swift. It provides a database wrapper that handles local data persistence, connection management, and encrypted file storage for Apple platforms. The library features a dedicated observation framework that tracks database changes to automatically synchronize the application state and user interface in real time. It distinguishes itself with a type-safe query builder and a protocol-based mapping system that converts database rows into structured Swift objects. The toolkit covers a broad range of administrative and o
Supports registering custom Swift logic as SQL functions to extend the database's query capabilities.
AlaSQL is a JavaScript SQL database engine that allows for the filtering, grouping, and joining of in-memory object arrays and JSON data. It functions as an in-memory SQL database and client-side data processor, enabling the execution of SQL statements against JavaScript arrays and external data sources in both browser and server environments. The project serves as a universal data query tool capable of performing relational joins across diverse sources, such as merging Google Spreadsheets, SQLite files, and remote APIs into a single result set. It also acts as an IndexedDB SQL wrapper, allow
Enables the definition of custom scalar functions via JavaScript to perform specialized calculations in queries.
ToyDB is a distributed SQL database that provides a system for storing and querying data across multiple nodes. It focuses on maintaining strong consistency and fault tolerance through the implementation of a distributed consensus algorithm. The project distinguishes itself by supporting historical data versioning, enabling time-travel queries to retrieve the state of the database from a specific point in the past. It utilizes multi-version concurrency control to manage ACID transactions and ensure data integrity during concurrent operations. The system covers relational data modeling with t
Provides standard SQL aggregate functions for calculating sums, counts, and averages over datasets.
SQLiteStudio is an open-source graphical tool for browsing, editing, and managing SQLite database files. It combines a full-featured SQL editor with syntax highlighting, a visual database schema designer for creating entity-relationship diagrams, and a plugin-based extensibility platform that allows adding custom functionality through C/C++, JavaScript, Tcl, or Python. The application distinguishes itself through its multi-language scripting engine, which embeds JavaScript, Tcl, and Python interpreters to enable user-defined functions and scripts within SQL queries. It supports encrypted data
Adds user-defined functions written in C/C++ that can be called from SQL queries.
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Allows configuration of aggregation functions to compute results directly within the index.
Apache Hive is a SQL-on-Hadoop data warehouse that enables querying and managing petabytes of data stored in distributed storage such as HDFS and cloud storage services. It provides a familiar SQL interface for batch analytics and reporting, supported by a core set of components including the HiveServer2 Thrift service for remote query execution, the Hive Metastore Service for central metadata management, the Hive ACID Transaction Engine for concurrent read-write operations, and the Hive LLAP Interactive Engine for low-latency analytical processing. The WebHCat REST API offers an HTTP interfac
Adds user-defined functions, aggregates, and table functions to SQL for custom data processing.
GreptimeDB is a distributed, open-source time-series database built for unified observability. It stores and queries metrics, logs, and traces together in a single columnar engine, supporting both SQL and PromQL for analysis. The database is designed as a Kubernetes-native operator with a decoupled compute and storage architecture, enabling horizontal scaling and multi-region deployment. What distinguishes GreptimeDB is its role as a multi-protocol ingestion gateway, accepting data through OpenTelemetry, Prometheus Remote Write, InfluxDB, Loki, Elasticsearch, Kafka, and MQTT protocols without
Processes streaming data with continuous aggregation flows to produce downsampled results in real time.
Perfetto is a platform for system-level performance tracing and analysis on Linux and Android. It combines a high-throughput trace recorder, a SQL-based query engine, and a browser-based visualizer into a single toolchain. The platform covers CPU scheduling and call-stack profiling, native and Java heap memory allocation tracking, GPU and graphics events, and system-wide counters such as CPU frequency and power consumption. The architecture decouples trace recording from offline analysis, using a compact protobuf format for event encoding and columnar storage for efficient SQL queries. The we
Creates scalar or table-valued functions using a SQL SELECT statement for custom analysis logic.
Ce projet est une antisèche de base de données relationnelle et un guide de référence SQL. Il fournit une collection d'exemples de syntaxe et de documentation de requêtes pour gérer des bases de données relationnelles en utilisant le langage de requête structuré (SQL). L'outil est implémenté sous forme de site statique avec une documentation interrogeable côté client, permettant un filtrage immédiat du contenu technique via un index basé sur le navigateur. La référence couvre la gestion de bases de données relationnelles, incluant la récupération de données, la gestion de schémas de base de données et la maintenance des enregistrements. Elle inclut également des conseils sur la manipulation de données relationnelles via des jointures de tables et la génération de rapports agrégés.
Documents the use of SQL aggregate functions to generate data summaries and reports.
Readyset is a transparent caching proxy for PostgreSQL and MySQL that sits between an application and its database, intercepting SQL queries and serving cached results from memory. It automatically caches query results on first execution and keeps those caches consistent by consuming the database’s replication stream in real time, enabling faster repeated reads without application code changes. The proxy also supports caching advanced SQL functions such as window functions, bucket functions, and locale-aware collation sorting, and exposes an interface that allows AI agents to inspect proxied q
Supports caching window functions, bucket functions, and locale-aware collation sorting for complex analytical queries.
Arroyo is a high-performance stream processing platform built in Rust. It executes continuous SQL queries on streaming data with event-time semantics, enabling accurate windowed aggregations, joins, and stateful computations on unbounded event streams. The platform uses native Rust execution for high throughput and low latency, with periodic checkpointing for exactly-once fault tolerance and horizontal scaling across distributed workers. The system integrates deeply with Kafka for reading and writing topics with exactly-once delivery and supports change data capture (CDC) from MySQL and Postg
Defines custom SQL functions in Rust or Python for use in streaming data pipelines.
H2 is a JDBC-compliant relational database management system written in Java. It functions as an embeddable SQL database that can run directly within an application process to remove network latency, or as an in-memory database for high-performance volatile storage. It also includes a web-based console for executing SQL commands and administering schemas. The system is characterized by its flexible deployment modes, including a standalone server mode for remote TCP/IP access and a mixed mode for simultaneous local and remote connectivity. It features a dialect emulation layer and compatibilit
Supports the creation of user-defined aggregate functions (UDAFs) by mapping them to source code.
sqlean est une collection de bibliothèques d'extension SQLite implémentées sous forme de bibliothèques partagées basées sur C. Elle fournit une suite de fonctions scalaires et de fonctions renvoyant des tables supplémentaires qui étendent les capacités natives du moteur de base de données SQLite. Le projet fournit des ensembles d'outils spécialisés pour la cryptographie, les mathématiques avancées, le réseau et l'accès au système de fichiers. Ceux-ci incluent le hachage et l'encodage binaire, l'analyse statistique, la validation d'adresses IP et la capacité de mapper des fichiers CSV ou des chemins de système de fichiers en tant que tables virtuelles. La bibliothèque inclut également des outils complets de traitement de texte tels que les expressions régulières, la correspondance floue (fuzzy matching) et la manipulation de chaînes compatible Unicode. Des capacités supplémentaires couvrent la gestion de haute précision des dates et heures et la génération d'identifiants uniques.
Enables the creation of custom scalar user-defined functions to encapsulate reusable single-value logic.
Velox est un moteur d'exécution de requêtes C++ haute performance et une bibliothèque de traitement de données colonnaires. Il sert de framework composable pour implémenter des moteurs de requêtes analytiques, fournissant un évaluateur d'expressions vectorisées et une boîte à outils pour les systèmes de gestion de données. Le projet se distingue par son utilisation de l'exécution colonnaire vectorisée et de l'allocation mémoire basée sur des arènes pour traiter des jeux de données à grande échelle. Il propose des optimisations spécialisées telles que la mise en cache des tables de jointure broadcast, le push-down de filtres dynamiques et l'encodage par dictionnaire pour réduire la surcharge mémoire et accélérer les lectures analytiques. Le moteur couvre un large éventail de capacités analytiques, incluant l'implémentation de jointures hash, merge et semi, ainsi que l'agrégation parallèle multi-étapes et le calcul de fonctions de fenêtre. Il fournit des primitives pour le stockage colonnaire en mémoire, le décodage de données Parquet et l'intégration avec le stockage cloud. L'extensibilité est assurée par un système d'enregistrement de fonctions pour des fonctions scalaires et d'agrégation personnalisées, avec des bindings de haut niveau disponibles pour connecter la logique C++ à Python.
Allows definition of new aggregation logic using vector interfaces and registration with specific type signatures.
Rusqlite is an embedded database interface and relational database driver that provides a client library for interacting with SQLite. It functions as an SQL query wrapper, enabling the management of local file-based or in-memory databases through a safe interface. The library allows for the extension of native database capabilities by implementing custom scalar functions, collations, and virtual tables. It also supports the embedding of the database engine directly into the application binary to remove external library dependencies. The project covers a broad range of capabilities including
Extends SQLite functionality by implementing custom scalar functions, collations, and virtual tables using Rust logic.
Drift is a type-safe SQL persistence library and relational mapper that provides a structured way to map database tables to classes and execute SQL queries with build-time validation. It functions as a type-safe query builder and a wrapper for SQLite and PostgreSQL, eliminating manual result set parsing by binding query outputs to native objects. The project distinguishes itself through a build-time code generation system that produces type-safe APIs and validates raw SQL statements against database versions before execution. It features reactive query streaming, which transforms SQL queries
Computes summary values like sums and counts using SQL grouping and window functions.