13 dépôts
Formats and methods for encoding and decoding data for storage or transmission.
Distinguishing note: Focuses on JSON processing.
Explore 13 awesome GitHub repositories matching data & databases · Data Serialization. Refine with filters or upvote what's useful.
This project is a structured educational resource designed to guide developers through the mastery of the JavaScript programming language. It utilizes a progressive curriculum that organizes technical concepts into a daily learning path, allowing students to build foundational knowledge before advancing to complex application development. The resource distinguishes itself through a hands-on training model that combines detailed explanations with practical code challenges. By focusing on an interactive learning experience, it reinforces core language principles—such as data types, functional p
Covers JSON data processing for web applications.
This project is a comprehensive platform for quantitative investment research, machine learning, and algorithmic trading. It provides an end-to-end environment for developing, testing, and executing financial strategies, supporting the entire lifecycle from data ingestion and feature engineering to model training and backtesting. The system is distinguished by its configuration-driven workflow orchestration, which allows researchers to automate complex pipelines and manage experiments through declarative files. It features a high-performance data infrastructure that utilizes custom binary for
Provides mechanisms to store and reload complex datasets and models to disk for persistent research workflows.
Avalonia is a cross-platform desktop framework that enables the creation of native-feeling applications for Windows, macOS, and Linux from a single codebase. It functions as a declarative UI toolkit, allowing developers to define complex visual hierarchies and interface structures using a markup-based syntax that maps directly to underlying object properties. By utilizing the Model-View-ViewModel architectural pattern, the framework facilitates a clean separation between application logic and user interface layout, which simplifies unit testing and component maintenance. The framework disting
Serializes and deserializes clipboard data using custom mechanisms to handle object data.
This project is a generative speech synthesis engine that converts text into high-fidelity human speech. It utilizes a two-stage autoregressive transformer architecture that separates semantic token prediction from acoustic detail reconstruction to balance linguistic accuracy with audio quality. The system is designed to support multilingual output and conversational AI development, enabling the generation of context-aware speech that maintains flow across multiple dialogue turns. The platform distinguishes itself through a production-ready inference server that employs continuous batching to
Provides utilities to pack audio and text data into structured formats for training.
Sanic is an asynchronous Python web framework designed for building high-performance APIs and services. It operates as a production-ready ASGI web server, utilizing a non-blocking event loop to handle concurrent requests and maximize throughput. The framework is built to support scalable architectures, offering built-in worker process management to distribute traffic across available CPU cores. What distinguishes Sanic is its focus on modularity and developer-centric tooling. It features a blueprint-based system for organizing complex applications into pluggable components, alongside a robust
Defines custom functions for serializing and deserializing data formats like JSON to meet specific requirements.
fq is a command-line binary data processor used for decoding, transforming, and analyzing raw byte streams and bit-level data into structured formats. It functions as a functional binary query engine that allows for filtering and mapping binary structures, as well as a converter that translates complex binary blobs and proprietary file formats into standard JSON, YAML, or XML. The tool distinguishes itself as a low-level bit manipulator capable of performing bit-level slicing, bitwise operations, and cryptographic hashing on raw files. It also serves as a network protocol analyzer with the ab
Decodes Avro Object Container Format files using compression codecs to inspect stored data.
RisingWave is a cloud-native streaming database and real-time analytics engine that uses standard SQL to process continuous data streams. It functions as a streaming data lakehouse, combining the capabilities of a streaming SQL database with a platform that integrates streaming ingestion with open table formats. The system is distinguished by its use of the PostgreSQL wire protocol, allowing it to integrate with existing SQL tools and drivers. It employs a decoupled compute and storage architecture, persisting streaming state and materialized views in cloud object storage to enable independen
Parses Avro-serialized data using a schema registry to enable seamless data exchange between different languages.
Materialize is a streaming SQL database that continuously ingests live data from sources such as Kafka, Redpanda, PostgreSQL, and MySQL, and incrementally maintains materialized views. It provides a PostgreSQL-compatible query engine that accepts standard SQL over the PostgreSQL wire protocol, enabling any existing SQL client or BI tool to query real-time data. The system also includes a Model Context Protocol (MCP) server that exposes live materialized view data to AI agents, providing fresh context without polling. Materialize distinguishes itself through its ability to offer configurable c
Decodes Avro messages from Kafka topics using Confluent Schema Registry schemas for typed SQL columns.
Apache Hive is a SQL-on-Hadoop data warehouse that enables querying and managing petabytes of data stored in distributed storage such as HDFS and cloud storage services. It provides a familiar SQL interface for batch analytics and reporting, supported by a core set of components including the HiveServer2 Thrift service for remote query execution, the Hive Metastore Service for central metadata management, the Hive ACID Transaction Engine for concurrent read-write operations, and the Hive LLAP Interactive Engine for low-latency analytical processing. The WebHCat REST API offers an HTTP interfac
Reads and writes Avro-encoded data as Hive tables, inferring the table schema from the Avro schema and supporting nested structures.
CloudEvents is an open specification for describing event data in a common format across cloud platforms and services. It defines a standard structure and set of metadata attributes for events, enabling interoperability across different systems so producers and consumers can exchange events without custom translation. The specification provides a protocol-agnostic serialization framework that maps CloudEvents attributes and payloads to multiple serialization formats including JSON, Avro, and Protobuf, and defines transport bindings for mapping events onto protocols like HTTP, AMQP, Kafka, MQTT
Defines the type mapping table for serializing CloudEvents attributes into Avro primitives.
kcat est un client d'interface en ligne de commande pour Apache Kafka utilisé pour produire, consommer et déboguer des messages en utilisant le protocole filaire natif. Il fournit une suite d'outils pour interagir avec les clusters Kafka, y compris un débogueur de protocole pour inspecter les métadonnées du cluster et un gestionnaire de transactions pour gérer les lots de messages atomiques. Le projet dispose d'un décodeur de schéma Avro spécialisé qui convertit les messages encodés en binaire en JSON lisible par l'homme en s'intégrant avec des registres de schémas distants ou des fichiers locaux. De plus, il inclut un simulateur en mémoire qui permet de tester la logique du producteur et du consommateur en simulant un comportement de courtier éphémère sans nécessiter d'infrastructure externe. L'ensemble d'outils couvre un large éventail d'opérations de messagerie, y compris la prise en charge des groupes de consommateurs équilibrés, la recherche d'offset basée sur l'horodatage et le streaming de données transactionnelles à partir de l'entrée standard. Il fournit également des utilitaires pour la configuration de la sécurité des connexions et l'inspection des métadonnées du cluster.
Transforms binary Avro message keys and values into human-readable JSON text.
Racket est un langage de programmation généraliste multi-paradigme de la famille Lisp, conçu pour la création de langages. Il fonctionne comme un atelier de langage, fournissant une plateforme pour concevoir et implémenter des langages de programmation personnalisés via un système flexible de macros et de modules. Le système se distingue en offrant une suite complète pour l'ingénierie sémantique, permettant la construction de sous-ensembles de langages spécialisés et de couches éducatives. Il inclut des outils pour la conception de langages personnalisés, tels que la génération de lexer et de parser, ainsi que la capacité de définir des règles d'expansion de module et une sélection de langage dynamique au moment de la lecture. Le projet fournit un environnement de développement intégré avec un éditeur intégré, un débogueur visuel et un gestionnaire de paquets logiciels. Sa surface de capacités s'étend à une bibliothèque standard généraliste couvrant le rendu graphique 2D, le traitement de données binaires, l'intégration SQL et de bases de données déductives, et la construction d'interfaces utilisateur graphiques. L'environnement prend en charge la compilation du code source en fichiers exécutables autonomes pour la distribution.
Provides serialization and deserialization of data using the Apache Avro protocol based on JSON schemas.
Arroyo is a high-performance stream processing platform built in Rust. It executes continuous SQL queries on streaming data with event-time semantics, enabling accurate windowed aggregations, joins, and stateful computations on unbounded event streams. The platform uses native Rust execution for high throughput and low latency, with periodic checkpointing for exactly-once fault tolerance and horizontal scaling across distributed workers. The system integrates deeply with Kafka for reading and writing topics with exactly-once delivery and supports change data capture (CDC) from MySQL and Postg
Arroyo reads and writes Avro binary data, supporting Confluent Schema Registry and flexible serialization modes for schema distribution.