6 Repos
Operations for reshaping tabular data using relational algebra and SQL logic.
Distinct from Tabular Data Frameworks: Focuses on SQL-based transformation and joining of tables, whereas Tabular Data Frameworks is the general environment.
Explore 6 awesome GitHub repositories matching data & databases · Relational Transformations. Refine with filters or upvote what's useful.
Apache Spark is a unified distributed data processing engine designed for large-scale data analysis and computation graphs. It functions as a distributed machine learning framework, a graph processing system, a real-time stream processor, and a SQL analytics engine. The system enables the execution of distributed SQL querying, large-scale graph analysis, and real-time stream analytics across clusters of machines. It also provides a scalable environment for implementing machine learning algorithms and predictive model development on massive datasets. The engine incorporates relational query e
Performs distributed relational transformations on structured data using SQL and programmatic interfaces.
FerretDB is an open-source database emulator and protocol translator that mimics a MongoDB environment to support existing drivers and client tools on a relational backend. It functions as a stateless database proxy that converts binary wire protocol messages into SQL statements, allowing a relational engine to handle document-oriented requests. The project serves as a migration tool for moving applications from MongoDB to PostgreSQL without rewriting queries or changing client drivers. It achieves this by using PostgreSQL as a document store, storing and querying BSON documents through a tra
Implements the mapping of BSON documents to SQL tables to maintain compatibility between NoSQL and SQL models.
q is a command-line utility for the processing, filtering, and aggregation of tabular text and database files using standard SQL syntax. It functions as a query engine that treats CSV and TSV files, as well as standard input, as relational database tables. The tool distinguishes itself by providing a persistent cache layer that stores processed tabular data in a binary format to accelerate repeated queries on large datasets. It also maps individual filenames or stream identifiers to relational table names, enabling SQL joins across disparate text files. The project covers a broad range of da
Provides the ability to join and reshape delimited text files using standard SQL logic for reports and processing.
This project is a comprehensive geographic location dataset and reference library providing standardized data for countries, states, and cities. It serves as a source of truth for regional hierarchies, ISO codes, coordinates, and timezone information, available as both a relational SQL database and a document-based JSON library. The project includes a custom dataset export tool that functions as a filtering engine. This allows for the generation of tailored geographic files in JSON, CSV, and GeoJSON formats by selecting only the specific regions or fields required. The dataset covers global
Converts normalized SQL database tables into JSON, CSV, and GeoJSON formats for diverse application use.
Featuretools is an automated feature engineering library and data transformation framework written in Python. It automatically generates machine learning feature vectors from multi-table datasets by applying synthesis patterns to relational and timestamped data. The system functions as a distributed feature synthesis engine, allowing the process of creating feature vectors to scale across multiple cores or clusters to handle large-scale datasets. The library supports the synthesis of multi-table datasets, time series feature generation, and the creation of custom machine learning primitives
Applies relational transformations and aggregation patterns across multiple related tables to synthesize new features.
Dieses Projekt ist ein Change-Data-Capture-System und eine Synchronisationsschicht, die Daten aus MySQL-Datenbanken in Elasticsearch-Indizes verschiebt. Es fungiert als Relational-to-Document-Mapper, der Datenbanktabellen in durchsuchbare Dokumente umwandelt, um Echtzeit-Datenintegration und Volltextsuche zu ermöglichen. Der Synchronizer zeichnet sich durch die Unterstützung der Denormalisierung relationaler Daten aus, die Eins-zu-Viele-Datenbank-Joins in Eltern-Kind-Dokumentstrukturen umwandelt. Er ermöglicht zudem die Aggregation partitionierter Tabellen unter Verwendung regulärer Ausdrücke, um mehrere Datenbanktabellen in einem einzigen Suchindex zu gruppieren. Das System deckt umfassende Datenmapping- und Transformationsfunktionen ab, einschließlich Feldtypkonvertierung, Schema-Mapping und synchronisierter Filterung von Feldern. Es verwendet ein Pipeline-basiertes Verarbeitungsmodell zum Dekodieren und Zusammenführen von Feldern und nutzt sowohl Snapshot-basiertes Initial-Loading für Baselines als auch Binary-Log-Streaming für Echtzeit-Updates.
Implements relational data denormalization by transforming database joins into parent-child document structures.