8 repositorios
Techniques for extracting, filtering, and aggregating data from relational tables using SQL.
Distinguishing note: None of the candidates focus on the general educational practice of writing retrieval queries; they focus on loaders or distributed engines.
Explore 8 awesome GitHub repositories matching data & databases · SQL Data Retrieval. Refine with filters or upvote what's useful.
DataX is a distributed data integration framework and plugin-based ETL tool designed for synchronizing large datasets between heterogeneous sources and destinations. It functions as a JDBC data migration engine and offline synchronization tool, enabling the movement of data between relational databases, NoSQL stores, and object storage. The system utilizes a plugin-based connector architecture that decouples reader and writer logic, allowing it to map and transform data types across different storage engines using a standardized internal representation. This design supports heterogeneous data
Implements techniques for filtering and extracting specific data from relational tables using SQL WHERE clauses.
RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process continuous data streams. It functions as a streaming data lakehouse, combining the capabilities of a streaming SQL database with a platform that integrates streaming ingestion with open table formats. The system is distinguished by its use of the PostgreSQL wire protocol, allowing it to integrate with existing SQL tools and drivers. It employs a decoupled compute and storage architecture, persisting streaming state and materialized views in cloud object storage to enable independen
Queries real-time data directly using a built-in serving layer and standard SQL.
hello-sql is a collection of educational resources and practical guides designed for mastering relational database design, SQL query writing, and schema mapping. It provides a set of lessons and exercises for practicing the creation and manipulation of data within relational databases. The project includes a database schema workbook for designing tables and mapping relationships, alongside a dedicated SQL query guide for writing selection, filtering, and aggregation statements. These resources are delivered through a relational database tutorial and a broader SQL learning resource. The mater
Provides guides on writing queries to extract and aggregate specific information from relational tables.
Connect is a Kafka data integration platform and stream processing engine used to build declarative pipelines that move and transform messages between Kafka topics and external sources. It functions as a Kafka Connect framework and a change data capture tool, streaming real-time database modifications to synchronize data across distributed environments. The project differentiates itself through a dedicated mapping language for mutating and reshaping message payloads and the ability to execute custom processing logic within a sandboxed WebAssembly runtime. It also provides an observability pip
Extracts, filters, and aggregates data from relational tables using standard SQL query language.
This project provides a SQL interface for Elasticsearch, serving as a translator and database layer that allows users to retrieve, filter, and manipulate indices using structured query language. It functions by converting standard SQL statements into the native JSON query language used by the search engine. The system includes a geospatial SQL engine for executing location-based searches and distance calculations. It also features a query debugger used to visualize the translation process from SQL to search engine request bodies to verify the logic and accuracy of data retrieval. The capabil
Provides the ability to retrieve, filter, sort, and group data from indices using standard SQL syntax.
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Exposes a tabular data model for retrieving and analyzing information using standard SQL syntax.
Trailbase es una plataforma de backend-as-a-service entregada como un ejecutable único que integra un motor de base de datos en tiempo real, un gestor de identidad y acceso, y un generador de API con seguridad de tipos. Proporciona un entorno de backend integral que incluye un motor de almacenamiento respaldado por SQLite y un servidor de runtime WebAssembly para ejecutar lógica personalizada. La plataforma se distingue por transformar automáticamente los esquemas de base de datos en APIs JSON con bindings de cliente para múltiples lenguajes y por permitir la ejecución de componentes portátiles para el renderizado del lado del servidor y rutas HTTP personalizadas. Además, incorpora capacidades de base de datos vectorial para admitir el almacenamiento de embeddings y la búsqueda vectorial basada en similitud. El sistema cubre una amplia gama de capacidades operativas, incluyendo la autenticación de usuarios con soporte para inicio de sesión social, listas de control de acceso para la visibilidad de datos y sincronización pub-sub para actualizaciones de datos en vivo. También proporciona herramientas para gestionar esquemas de bases de datos mediante migraciones SQL y el manejo de datos geoespaciales.
Allows direct execution of SQL queries for complex data modeling and retrieval.
Biopython es una biblioteca de bioinformática para Python que proporciona herramientas para analizar, manipular y estudiar secuencias biológicas, estructuras moleculares y árboles filogenéticos. Sirve como un analizador de secuencias biológicas para datos genómicos y proteómicos en múltiples formatos de archivo estándar de la industria y actúa como interfaz para consultar datos biológicos y citas de los repositorios NCBI Entrez. El proyecto se distingue por kits de herramientas especializados para el análisis de estructuras proteicas y la construcción de árboles filogenéticos. Incluye un analizador de estructuras de proteínas para procesar archivos PDB y mmCIF para calcular la geometría molecular, así como un kit de herramientas de árboles filogenéticos para analizar relaciones evolutivas entre especies. La biblioteca cubre una amplia gama de capacidades bioinformáticas, incluyendo análisis de secuencias genómicas para transcripción y traducción, gestión de alineamientos de secuencias y cálculos de genética de poblaciones. También proporciona herramientas de análisis estructural para la manipulación de coordenadas atómicas en 3D, así como utilidades para la visualización de características genómicas y modelado de datos biogeográficos. El sistema se integra con binarios bioinformáticos externos mediante envoltorios de herramientas y admite el almacenamiento persistente de registros biológicos a través de almacenamiento de secuencias respaldado por SQL.
Extracts biological records from relational databases on demand as sequence record objects.