16 dépôts
Support for non-scalar data structures like maps and unions.
Distinguishing note: Focuses on schema flexibility rather than general data ingestion.
Explore 16 awesome GitHub repositories matching data & databases · Complex Data Types. Refine with filters or upvote what's useful.
DuckDB is an in-process analytical database engine designed to run directly within an application process. As a zero-dependency, embedded system, it provides enterprise-grade SQL data processing capabilities without the overhead of managing a dedicated database server. It is built to handle complex analytical and aggregation tasks by storing and retrieving information in columns, allowing for high-performance relational data manipulation. The engine distinguishes itself through a columnar vectorized execution model that maximizes CPU cache efficiency during query operations. It employs adapti
Supports intricate data structures using specialized types for nested or heterogeneous information.
This project is a cross-platform development framework and managed runtime environment designed for building high-performance applications. It provides a comprehensive toolkit for constructing web services, cloud-native microservices, and desktop applications, utilizing a unified runtime that handles memory management and execution across diverse operating systems. The framework distinguishes itself through a native ahead-of-time compilation toolchain that transforms source code into optimized, self-contained machine code binaries. This capability enables fast startup times and reduced memory
Supports complex data structures like union types and collection expressions to simplify data modeling.
TOML is a configuration file format designed for human readability and unambiguous mapping to hash tables. It serves as a standardized language for structured data, enabling consistent parsing and data exchange across diverse programming environments. The format distinguishes itself through a strict type-system specification that ensures data is interpreted identically regardless of the implementation. It utilizes a line-oriented lexical structure that supports both hierarchical organization through bracketed sections and compact inline embedding for nested objects. This approach allows for t
Encodes diverse data types including multi-line strings, scientific numbers, and temporal values.
Presto is a distributed SQL query engine designed for high-performance analytical processing across heterogeneous data sources. It functions as a data federation platform and massively parallel processing engine, allowing users to execute interactive queries against diverse storage systems without requiring data migration. By mapping remote metadata and structures to a unified relational namespace, it enables seamless cross-platform analysis through a standard SQL interface. The engine distinguishes itself through a pluggable connector architecture and a shared-nothing distributed processing
Organizes information into arrays, maps, and nested structures to support complex data models within SQL queries.
RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process continuous data streams. It functions as a streaming data lakehouse, combining the capabilities of a streaming SQL database with a platform that integrates streaming ingestion with open table formats. The system is distinguished by its use of the PostgreSQL wire protocol, allowing it to integrate with existing SQL tools and drivers. It employs a decoupled compute and storage architecture, persisting streaming state and materialized views in cloud object storage to enable independen
Supports a wide range of standard SQL types, including arbitrary precision decimals and large integers.
RedisInsight is a graphical user interface and management tool for browsing, analyzing, and administering Redis databases. It provides a visual environment for exploring key-value data structures, managing database instances, and performing data analysis across different operating systems and deployments. The tool distinguishes itself by providing dedicated visual managers for complex operations, including a vector database manager for configuring embeddings and similarity searches, a query workbench for executing raw commands and Lua scripts, and a performance monitoring dashboard for tracki
Manages diverse and complex data formats including JSON documents, time series, and probabilistic types.
asyncpg is an asynchronous database driver and binary protocol client for PostgreSQL. It provides a non-blocking interface for executing SQL statements, streaming result sets, and managing data transfer between an application and a PostgreSQL database. The driver implements the PostgreSQL binary protocol directly to facilitate efficient data transfer and type conversion. It includes a connection pool to maintain and reuse open database connections, reducing the latency associated with repeated handshakes. The project covers a broad range of database integration capabilities, including atomic
Encodes and decodes composite types, arrays, and custom formats between the database and application.
MessagePack is a binary object serialization library and a cross-platform data exchange format. It serves as a binary alternative to JSON, converting structured data into a space-efficient binary representation for network transmission and storage. The system provides a standardized format for swapping complex data types across different programming languages and architectures. It allows for the definition of custom data type encoding by pairing application-specific information with specialized serialization markers. The library handles the encoding and decoding of diverse data types, includ
Defines specialized binary formats for application-specific data structures using extendable serialization markers.
jOOQ is a type-safe SQL query builder for Java that generates code from live database schemas, enabling compile-time validation of SQL syntax and data types. Its core identity is built around a fluent DSL that mirrors SQL structure, a code generator that maps tables, views, and routines to Java objects, and a multi-dialect engine that translates the same DSL into vendor-specific SQL for over 30 databases. The project also includes a SQL parser and transformer for refactoring or dialect conversion, reactive stream integration for non-blocking query execution, and a JDBC proxy diagnostics tool f
Wraps multiple database columns into a single client-side value object for type-safe composite data handling.
Pinot is a distributed, columnar analytical database designed for high-concurrency, low-latency query processing. It functions as a real-time OLAP datastore, enabling interactive, user-facing analytics by ingesting and querying massive datasets from both streaming and batch sources. The system architecture relies on a centralized controller for cluster coordination and a distributed segment-based storage model to ensure horizontal scalability. The platform distinguishes itself through a hybrid ingestion pipeline that unifies real-time event streams and historical batch data into a single quer
Processes and flattens nested JSON or stream document fields to make complex data structures queryable.
Octosql est un moteur de requête SQL fédéré, un transformateur de données et un processeur SQL de flux. Il permet aux utilisateurs d'exécuter des instructions SQL uniques sur plusieurs sources de données disparates, y compris différents types de bases de données et formats de fichiers, afin de fusionner et transformer les résultats en un ensemble unifié. Le système se distingue en traitant les fichiers CSV, JSONLines et Parquet comme des tables virtuelles et en utilisant une architecture basée sur des plugins pour étendre la connectivité aux moteurs de stockage externes. Il fonctionne comme un processeur de flux pour les flux de données infinis, utilisant des filigranes (watermarks), des rétractions et des fenêtres glissantes pour maintenir la cohérence des événements hors séquence. De plus, il sert de générateur de données SQL capable de produire des jeux de données synthétiques et des flux d'enregistrements via des fonctions table. Le moteur inclut des capacités de jointure de données inter-sources et d'analyse multi-sources, optimisées par le push-down de prédicats côté source pour réduire le transfert de données. Il gère des données complexes via un système de typage statique avec des types union et offre une observabilité grâce à la visualisation des plans d'exécution de requêtes.
Utilizes a static type system to manage complex data structures like union types within columns.
Ce projet est un tutoriel complet d'analyse de données pandas et un guide pédagogique conçu pour apprendre la manipulation et l'analyse de données. Il sert de guide de traitement de données tabulaires et de manuel pour l'analyse de séries temporelles, fournissant une approche structurée pour nettoyer, fusionner et transformer les jeux de données. Le dépôt fonctionne comme un cours d'ingénierie de caractéristiques de données, fournissant des tutoriels sur la construction et la sélection de caractéristiques de jeu de données pour améliorer les performances des modèles d'apprentissage automatique. Il inclut également un guide d'opérations de données vectorisées pour effectuer des calculs mathématiques élément par élément et des manipulations de matrices. Le matériel couvre un large éventail de capacités, notamment les flux de travail de nettoyage de données, les tâches d'intégration de données et l'analyse de données tabulaires. Il fournit des conseils sur le traitement des informations textuelles, la gestion des données catégorielles et l'optimisation de la vitesse d'exécution pour les grands jeux de données. Le projet est livré sous forme d'une série de Jupyter Notebooks contenant des exercices pratiques et des problèmes d'entraînement ciblés.
Provides specialized techniques for managing timestamps, date offsets, and categorical variables.
Ce projet est un guide complet et une ressource éducative pour le langage TypeScript. Il couvre les principes fondamentaux du langage, incluant son système de typage structurel, l'analyse de type statique et le processus de transpilation des fichiers sources typés en JavaScript. Le matériel détaille comment modéliser des données complexes et une logique de type réutilisable en utilisant des génériques, des types conditionnels et des types mappés. Il explique également l'utilisation des fichiers de déclaration pour fournir une sécurité de type pour les bibliothèques JavaScript externes et l'intégration de la vérification de type dans des projets JavaScript existants via des annotations JSDoc. La portée du contenu s'étend aux modèles de programmation orientée objet, à la manipulation du DOM et à la configuration des comportements du compilateur. Il inclut des conseils sur la gestion de l'interopérabilité des modules, la mise en place de pipelines de build et l'utilisation de l'intelligence de l'éditeur pour une meilleure productivité des développeurs.
Provides techniques for creating reusable structures and shorthand aliases to model complex data shapes.
H2 is a JDBC-compliant relational database management system written in Java. It functions as an embeddable SQL database that can run directly within an application process to remove network latency, or as an in-memory database for high-performance volatile storage. It also includes a web-based console for executing SQL commands and administering schemas. The system is characterized by its flexible deployment modes, including a standalone server mode for remote TCP/IP access and a mixed mode for simultaneous local and remote connectivity. It features a dialect emulation layer and compatibilit
Supports non-scalar data structures including JSON, UUIDs, and enumerated types.
Hive is a lightweight NoSQL key-value database written in pure Dart for local data persistence. It functions as a type-safe document store that allows for the saving and retrieval of complex data structures and custom objects. The system distinguishes itself through the use of custom adapters for object serialization and symmetric-key encryption to secure data at rest. For web environments, it provides a persistence layer that wraps IndexedDB and utilizes web workers. The project covers broad capability areas including container management, atomic transactional writes, and indexed data retri
Supports storing non-scalar data structures such as lists and maps while maintaining data integrity.
TypeGPU is a tool for type-safe WebGPU development that enables writing shaders in TypeScript. It translates high-level TypeScript function definitions and structures into WebGPU Shading Language source code to automate shader generation and validate logic using a type system. The project provides a mechanism for cross-library GPU interoperability by sharing typed buffers without copying data to system memory. It also integrates the Model Context Protocol to allow AI agents to inspect generated shader code and diagnose runtime errors. The system manages WebGPU resource mapping through typed
Translates complex data structures into typed binary formats to ensure correct memory alignment during CPU-to-GPU transfer.