15 dépôts
Tools for converting datasets between different framework formats.
Distinguishing note: Focuses on ecosystem-wide data format conversion.
Explore 15 awesome GitHub repositories matching data & databases · Data Format Interoperability. Refine with filters or upvote what's useful.
Ray is a distributed computing framework designed to scale Python and Java applications across clusters by abstracting task scheduling and resource management. It functions as a resource-aware execution engine that manages task dependencies, placement, and fault tolerance across networked compute nodes. At its core, the system provides a stateful actor model, allowing developers to define classes that run in dedicated processes to maintain and mutate internal state across remote method calls. The framework distinguishes itself through a robust cross-language interoperability layer, enabling f
Transforms distributed datasets into other framework formats to enable interoperability with different data processing ecosystems.
This project is a machine learning array framework and tensor computation library designed for high-performance numerical computing. It provides a comprehensive suite of tools for constructing and training neural networks, featuring an automatic differentiation engine that facilitates gradient-based optimization and complex mathematical modeling. The library distinguishes itself through a unified memory architecture that allows data to be shared across CPU and GPU devices without explicit copies, significantly reducing data movement overhead. Its execution model relies on a lazy evaluation en
Enables data exchange between frameworks using buffer protocols and standard serialization formats.
Toon is a data serialization library and toolkit designed to convert complex objects into compact, human-readable formats optimized for large language models. By focusing on token efficiency, the library minimizes the context window footprint of structured data through techniques like key folding and tabular layout optimization. It provides a streaming-capable processor that handles the encoding and decoding of hierarchical data while maintaining structural integrity. The project distinguishes itself through its path-aware transformation pipeline and configurable serialization logic, which al
Transforms data between standard formats and compact representations to facilitate efficient data processing pipelines.
Fx is a command-line processing suite designed for the transformation, conversion, exploration, and visualization of structured data. It functions as a terminal-based utility that handles both automated shell pipelines and interactive navigation of complex, nested data hierarchies. The tool distinguishes itself by integrating a JavaScript-based engine that executes user-provided logic to filter, map, or modify data fields within a sandboxed runtime. It maintains a responsive interface by decoupling data processing from the display loop, allowing users to explore large datasets through an inte
Translates non-standard formats like YAML or TOML into structured data for consistent processing.
Arrow is a cross-language development platform for in-memory data. It provides a standardized, language-independent columnar memory format designed to accelerate analytical operations and improve memory efficiency on modern computing hardware. By utilizing a schema-driven approach, the framework enables the efficient organization of both flat and nested data structures. The project functions as an analytical data processing engine that facilitates high-performance computation directly on memory-resident datasets. It distinguishes itself through a zero-copy architecture, which allows multiple
Provides a standardized interface for reading and writing data across diverse file formats like Parquet, ORC, and CSV.
This tool is a command-line processor designed for querying, updating, and transforming structured data files. It functions as a versatile engine for manipulating YAML, JSON, TOML, and XML documents, allowing users to perform complex operations directly from the terminal. By utilizing a path-based expression language, it enables precise navigation and modification of data structures within configuration files and infrastructure-as-code workflows. What distinguishes this tool is its ability to perform in-place document mutations while preserving original formatting, comments, and metadata. It
Facilitates seamless data translation between YAML, JSON, TOML, and XML formats.
Miller is a command-line data processor used for filtering, transforming, and aggregating name-indexed tabular data. It functions as a tool for querying and reshaping records across multiple file formats, serving as a converter between CSV, JSON, and YAML. The tool distinguishes itself by using a name-indexed data model, allowing users to manipulate fields by name rather than numeric position. It utilizes single-pass streaming algorithms to compute statistics and summaries on large datasets that exceed available system memory. Its capabilities cover data transformation and analysis, includin
Transforms data between formats such as CSV, JSON, and YAML to move information between systems.
A polyglot web converter.
Converts structured data between JSON, YAML, TOML, XML, and Markdown formats.
Darts is a Python time series library designed for forecasting, anomaly detection, and the preprocessing of univariate and multivariate temporal data. It serves as a comprehensive framework for training and evaluating a wide range of statistical, machine learning, and deep learning models to predict future numerical values. The toolkit is distinguished by its support for global time series modeling, allowing a single model to be trained across multiple different series to leverage shared patterns. It also features a hierarchical time series manager to ensure consistency between aggregate and
Implements a conversion layer to move time series objects between pandas, polars, numpy, pyarrow, and xarray formats.
Delta is a lakehouse table format that brings ACID transactions and data warehouse consistency to large scale data lakes on cloud object storage. It serves as an ACID transaction manager, coordinating atomic commits and serializable isolation for concurrent reads and writes across distributed compute engines. The project provides a multi-engine interoperability layer that uses format translation to allow diverse SQL engines and processing frameworks to read and write the same tables. It functions as a data versioning system, utilizing a transaction log to enable time travel, historical snapsh
Translates data between different lakehouse formats to ensure compatibility across ecosystem standards.
Solid is a protocol and ecosystem for decentralized web applications that separates application logic from data storage. It enables users to store and control their personal information in personal online data stores, known as Pods, ensuring that individuals own their data rather than the applications they use. The project provides a framework for decentralized identity and authentication using WebID and OpenID Connect, decoupling identity from central providers. It implements a resource-level permission system via Web Access Control, allowing users to grant or deny read, write, and append ac
Enables moving information between different personal data stores using standard linked data formats.
Romm is a self-hosted game library manager and ROM management web interface. It serves as a central server for storing and categorizing game files and emulator firmware, providing a web-based browser to organize collections through automated library scanning and metadata retrieval. The project distinguishes itself by integrating a web-based emulator frontend that uses WebAssembly to play games directly in the browser. It further provides a game save synchronization server that uses SSH-based synchronization to transfer save states and progress between the server and registered handheld device
Handles game information using standardized formats like gamelist.xml to ensure compatibility with other frontend managers.
OpenUSD is a framework for authoring and exchanging scalable, time-sampled 3D scene data across digital content creation tools. It serves as a 3D scene interchange standard and interoperable data format to ensure consistency and compatibility when transferring environments between diverse graphics applications. The project functions as an incremental 3D data streamer, transmitting scene descriptions in small increments to allow the loading and display of visual content without interrupting the user experience. It provides capabilities for 3D scene authoring, standardized data exchange, and t
Provides a standardized data format that ensures consistency and compatibility when transferring 3D scenes.
MessagePack-CSharp is a high-performance binary serializer for .NET that converts C# objects to and from the compact MessagePack format. It uses compile-time source generation to produce AOT-safe formatters and resolvers, eliminating runtime reflection and enabling ahead-of-time compilation scenarios. The serializer encodes object fields as integer indices instead of string keys, producing compact binary output with deterministic field ordering, and provides stack-allocated reader and writer structs for direct encoding and decoding of MessagePack primitives without heap allocations. The libra
Converts between MessagePack binary and JSON for debugging and interoperability.
MakerSkillTree is an educational roadmap designer and interactive skill map visualizer. It provides a system for creating, exporting, and navigating structured learning paths through an SVG skill tree generator and a corresponding YAML learning path schema. The project features a drag-and-drop interface for designing custom skill trees and a bidirectional conversion system that translates visual layouts between SVG and YAML formats. This allows for data-driven version tracking and the generation of changelogs between different iterations of a skill tree. The system supports the visualization
Facilitates data interoperability through bidirectional transformation between SVG and YAML formats.