8 个仓库
Techniques for extracting, filtering, and aggregating data from relational tables using SQL.
Distinguishing note: None of the candidates focus on the general educational practice of writing retrieval queries; they focus on loaders or distributed engines.
Explore 8 awesome GitHub repositories matching data & databases · SQL Data Retrieval. Refine with filters or upvote what's useful.
DataX is a distributed data integration framework and plugin-based ETL tool designed for synchronizing large datasets between heterogeneous sources and destinations. It functions as a JDBC data migration engine and offline synchronization tool, enabling the movement of data between relational databases, NoSQL stores, and object storage. The system utilizes a plugin-based connector architecture that decouples reader and writer logic, allowing it to map and transform data types across different storage engines using a standardized internal representation. This design supports heterogeneous data
Implements techniques for filtering and extracting specific data from relational tables using SQL WHERE clauses.
RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process continuous data streams. It functions as a streaming data lakehouse, combining the capabilities of a streaming SQL database with a platform that integrates streaming ingestion with open table formats. The system is distinguished by its use of the PostgreSQL wire protocol, allowing it to integrate with existing SQL tools and drivers. It employs a decoupled compute and storage architecture, persisting streaming state and materialized views in cloud object storage to enable independen
Queries real-time data directly using a built-in serving layer and standard SQL.
hello-sql is a collection of educational resources and practical guides designed for mastering relational database design, SQL query writing, and schema mapping. It provides a set of lessons and exercises for practicing the creation and manipulation of data within relational databases. The project includes a database schema workbook for designing tables and mapping relationships, alongside a dedicated SQL query guide for writing selection, filtering, and aggregation statements. These resources are delivered through a relational database tutorial and a broader SQL learning resource. The mater
Provides guides on writing queries to extract and aggregate specific information from relational tables.
Connect is a Kafka data integration platform and stream processing engine used to build declarative pipelines that move and transform messages between Kafka topics and external sources. It functions as a Kafka Connect framework and a change data capture tool, streaming real-time database modifications to synchronize data across distributed environments. The project differentiates itself through a dedicated mapping language for mutating and reshaping message payloads and the ability to execute custom processing logic within a sandboxed WebAssembly runtime. It also provides an observability pip
Extracts, filters, and aggregates data from relational tables using standard SQL query language.
This project provides a SQL interface for Elasticsearch, serving as a translator and database layer that allows users to retrieve, filter, and manipulate indices using structured query language. It functions by converting standard SQL statements into the native JSON query language used by the search engine. The system includes a geospatial SQL engine for executing location-based searches and distance calculations. It also features a query debugger used to visualize the translation process from SQL to search engine request bodies to verify the logic and accuracy of data retrieval. The capabil
Provides the ability to retrieve, filter, sort, and group data from indices using standard SQL syntax.
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Exposes a tabular data model for retrieving and analyzing information using standard SQL syntax.
Trailbase 是一个后端即服务(BaaS)平台,以单个可执行文件的形式交付,集成了实时数据库引擎、身份和访问管理器以及类型安全的 API 生成器。它提供了一个全面的后端环境,包括基于 SQLite 的存储引擎和用于执行自定义逻辑的 WebAssembly 运行时服务器。 该平台通过自动将数据库模式转换为具有跨语言客户端绑定的 JSON API,以及允许执行用于服务器端渲染和自定义 HTTP 路由的便携式组件而脱颖而出。它还集成了向量数据库功能,以支持嵌入向量的存储和基于相似性的向量搜索。 该系统涵盖了广泛的操作功能,包括支持社交登录的用户认证、用于数据可见性的访问控制列表,以及用于实时数据更新的发布-订阅(pub-sub)同步。它还提供了通过 SQL 迁移管理数据库模式以及处理地理空间数据的工具。
Allows direct execution of SQL queries for complex data modeling and retrieval.
Biopython 是一个 Python 生物信息学库,提供用于解析、操作和分析生物序列、分子结构和系统发育树的工具。它作为基因组和蛋白质组数据的生物序列解析器,支持多种行业标准文件格式,并充当从 NCBI Entrez 仓库查询生物数据和引用的接口。 该项目以其用于蛋白质结构分析和系统发育树构建的专业工具包而著称。它包括用于处理 PDB 和 mmCIF 文件以计算分子几何结构的蛋白质结构分析器,以及用于分析物种间进化关系的系统发育树工具包。 该库涵盖了广泛的生物信息学能力,包括用于转录和翻译的基因组序列分析、序列比对管理以及群体遗传学计算。它还提供用于 3D 原子坐标操作的结构分析工具,以及用于基因组特征可视化和生物地理数据建模的实用程序。 该系统通过工具封装与外部生物信息学二进制文件集成,并支持通过 SQL 后端进行持久化生物记录存储。
Extracts biological records from relational databases on demand as sequence record objects.