18 repositorios
Tools for converting between raw formats, binary representations, and structured objects for transmission or storage.
Explore 18 awesome GitHub repositories matching data & databases · Data Serialization and Parsing. Refine with filters or upvote what's useful.
nanoGPT is a lightweight engine for training and fine-tuning transformer-based language models from scratch. It provides a minimalist codebase designed for educational exploration and rapid experimentation with neural network architectures, utilizing self-attention and feed-forward layers to process sequences and predict subsequent elements. The project distinguishes itself through a focus on high-speed data ingestion and hardware-accelerated performance. It includes a dedicated pipeline for transforming raw text into memory-mapped binary files, which enables efficient streaming during traini
Stores data in memory-mapped binary structures to facilitate rapid sequential access during training.
Requests is a Python HTTP client library used for sending HTTP requests and handling responses. It serves as a network client providing fundamental components for session management, proxy routing, multi-part uploading, and SSL/TLS certificate verification. The project distinguishes itself through a session manager that maintains cookies and reuses TCP connections to improve network performance. It also includes a dedicated multi-part form uploader for transmitting binary data and an integrated SSL/TLS certificate verifier to ensure encrypted and trusted communication. The library covers a b
Automatically detects character sets and applies decoding logic to convert incoming network data into readable text.
Requests is a high-level HTTP client library designed to simplify web communication and API integration. It provides an intuitive, human-readable interface for performing standard network operations, including request execution, connection pooling, and stateful session management. By encapsulating raw network data into structured objects, the library automates the complexities of headers, cookies, and payload transmission. The library distinguishes itself through a modular transport adapter layer that allows for custom protocol handling and extensible authentication hooks. It supports a wide
Detects character sets automatically or applies manual decoding logic to incoming response data.
Guzzle is a PHP HTTP client used for sending synchronous and asynchronous requests to web services. It serves as a concurrent HTTP request manager, an HTTP stream handler, and a middleware-based HTTP pipeline. The project is a PSR-7 compliant client, utilizing standardized PHP interfaces for requests, responses, and streams. The library differentiates itself through a customizable functional handler stack that allows for the interception and modification of the request and response lifecycle. It features an adapter-based transport system that enables swapping between network implementations,
Automatically decodes response bodies based on encoding headers to ensure readable data.
Vector is a high-performance observability data pipeline designed to collect, transform, and route logs, metrics, and traces across distributed infrastructure. It functions as a modular engine that decouples data ingestion from processing and transmission, utilizing a component-based architecture to connect diverse sources to multiple destinations. The project distinguishes itself through a focus on reliability and flow control. It implements backpressure-aware data movement to prevent data loss during traffic spikes and utilizes disk-backed event buffering to ensure durability during network
Parses raw byte streams into structured events using configurable framing and decoding logic.
Ciphey is an automated decryption tool and cryptographic analysis framework designed to identify and reverse encryptions, encodings, and hashes without requiring a known key or cipher. It functions as a hash cracking engine and a heuristic cipher identifier to recover original plaintext from unknown data patterns. The project features a nested encoding resolver that iteratively unwraps multiple layers of encryption and encoding until readable text is reached. It employs a heuristic cryptanalysis workflow to analyze data characteristics and guess the likely encoding scheme or encryption method
Iteratively applies decryption modules to nested data layers until readable plaintext is reached.
Ciphey is an automated decryption and data obfuscation tool designed to identify and reverse complex, multi-layered encoding schemes. By utilizing statistical analysis and probability scoring, the system automatically detects unknown data formats and recovers human-readable plaintext from obfuscated input strings without requiring manual algorithm specification. The tool distinguishes itself through a recursive pipeline that processes nested data structures and strips formatting anomalies or invisible characters to ensure consistent input. It employs a heuristic search and multithreaded execu
Iteratively applies decryption modules to nested data structures until human-readable plaintext is recovered.
ip2region is an offline IP geolocation library and framework designed to resolve IPv4 and IPv6 addresses to city-level regional information using local binary data files. It functions as a binary IP database compiler and a cross-language search client, allowing for regional lookups without relying on external APIs. The project distinguishes itself through a specialized binary format that supports high-performance query optimization. It employs adjacent-segment IP merging and deduplicated region storage to minimize the database footprint, while utilizing memory-mapped file caching and vector-i
Converts raw text IP mappings into a specialized binary format designed for high-speed sequential access and offline lookups.
Moya is a network abstraction layer for Swift that provides a structured framework for defining and executing REST API requests. It functions as a type-safe API client that decouples network endpoint definitions from the underlying implementation details to prevent configuration errors and URL typos. The project distinguishes itself by using protocol-based endpoint definitions and a provider-coordinated execution model. It includes a system for mapping network responses to strongly typed objects and features a dedicated tool for generating type-safe network interface files from external REST
Converts raw network response data into usable formats using custom decoding logic for specific data types.
Pwntools is a Python-based framework designed for rapid prototyping and automation in binary exploitation, reverse engineering, and security research. It serves as a comprehensive toolkit for interacting with local and remote processes, providing the primitives necessary to manage complex exploit workflows and streamline security analysis tasks. The framework distinguishes itself through its specialized capabilities for binary manipulation and automated exploit construction. It includes dedicated utilities for parsing executable file formats, assembling and disassembling machine code, and gen
Packs binary data and generates cyclic patterns to assist in the analysis of buffer overflows.
This project is a computer vision benchmark and image classification dataset used to measure and compare the accuracy of machine learning models. It provides a standardized collection of labeled fashion product images and training data formatted to be compatible with the MNIST dataset structure. The dataset consists of fixed-dimension grayscale images and label-based category mappings, stored in a binary format. It includes pre-split training and testing sets and a static distribution to ensure consistent cross-model benchmarking. The repository supports image classification benchmarking and
Stores image pixels and category labels in a binary format compatible with the MNIST structure.
Resty is a high-level HTTP client library for Go designed for consuming REST services. It provides a streamlined interface for executing network requests, managing server-sent event streams, and automatically mapping JSON and XML responses into data structures. The library includes built-in mechanisms for service resilience and traffic management, such as circuit breakers to prevent cascading failures, token-bucket rate limiting, and automated request retries with exponential backoff. It also features client-side load balancing to distribute outgoing traffic across multiple base URLs and requ
Parses incoming response streams into data structures using content-type headers for various object formats.
Potree es un motor de renderizado y visor de nubes de puntos basado en web, diseñado para la visualización y análisis de datasets espaciales 3D masivos y escaneos LIDAR. Funciona como una herramienta de análisis geoespacial que permite la exploración interactiva de nubes de puntos de alta densidad directamente en un navegador web mediante WebGL. El sistema utiliza eye-dome lighting para mejorar la percepción de profundidad de estructuras 3D y soporta realidad virtual para una exploración espacial inmersiva. Proporciona capacidades especializadas para la documentación de escenas 3D mediante anotaciones jerárquicas y la creación de tours animados con cámaras en movimiento. La plataforma incluye herramientas para el análisis de datos geoespaciales, como mediciones de distancia y área, perfiles de elevación y la superposición de shapefiles y geopackages externos. Los usuarios pueden aislar características específicas mediante filtrado basado en atributos y aislamiento de volúmenes de recorte, mientras que las imágenes externas pueden alinearse y sincronizarse con la perspectiva de la nube de puntos. Potree emplea un formato binario preprocesado y una indexación espacial basada en octree para facilitar el streaming asíncrono de datos y el renderizado de nivel de detalle para datasets a gran escala.
Converts raw spatial data into an optimized binary format to reduce parsing overhead and accelerate network transfers.
Racket es un lenguaje de programación de propósito general y multiparadigma de la familia Lisp, diseñado para la creación de lenguajes. Funciona como un banco de trabajo de lenguajes, proporcionando una plataforma para diseñar e implementar lenguajes de programación personalizados a través de un sistema flexible de macros y módulos. El sistema se distingue por ofrecer una suite integral para la ingeniería de semántica, permitiendo la construcción de subconjuntos de lenguajes especializados y capas educativas. Incluye herramientas para el diseño de lenguajes personalizados, como la generación de lexers y parsers, así como la capacidad de definir reglas de expansión de módulos y selección dinámica de lenguaje en tiempo de lectura. El proyecto proporciona un entorno de desarrollo integrado con un editor incorporado, depurador visual y un gestor de paquetes de software. Su superficie de capacidades se extiende a una biblioteca estándar de propósito general que cubre renderizado de gráficos 2D, procesamiento de datos binarios, integración con SQL y bases de datos deductivas, y la construcción de interfaces gráficas de usuario. El entorno admite la compilación de código fuente en archivos ejecutables independientes para su distribución.
Provides capabilities to parse Resource Interchange File Format data and write objects to output ports.
This is a TOML parser and serializer for the Go language. It serves as a data serialization library and configuration file mapper that encodes and decodes data between Go structures and the TOML configuration format. The library provides interfaces for custom type marshaling, allowing for specialized logic when parsing or serializing specific data types. It transforms structured objects into deterministic TOML documents for storage or transmission. The project covers a broad range of data processing capabilities, including structured value encoding, TOML data generation, and metadata inspect
Provides full-cycle conversion between complex data objects and TOML formatted text.
Arroyo is a high-performance stream processing platform built in Rust. It executes continuous SQL queries on streaming data with event-time semantics, enabling accurate windowed aggregations, joins, and stateful computations on unbounded event streams. The platform uses native Rust execution for high throughput and low latency, with periodic checkpointing for exactly-once fault tolerance and horizontal scaling across distributed workers. The system integrates deeply with Kafka for reading and writing topics with exactly-once delivery and supports change data capture (CDC) from MySQL and Postg
Arroyo reads and writes arbitrary binary data as a bytea column for custom processing with UDFs.
Este proyecto es un recurso educativo integral y un manual de tutoriales para construir, entrenar y desplegar modelos de machine learning usando TensorFlow 2. Sirve como una guía de aprendizaje estructurada que cubre conceptos fundamentales de deep learning, incluyendo arquitecturas de redes neuronales, diferenciación automática y operaciones con tensores. El manual proporciona orientación técnica sobre cómo optimizar la eficiencia de ejecución mediante la gestión de memoria de GPU, entrenamiento distribuido y cuantización de modelos. También incluye guías detalladas para construir pipelines de datos de alto rendimiento y exportar modelos para servidores de producción, dispositivos móviles y navegadores web. El material abarca una amplia gama de capacidades, incluyendo el desarrollo de modelos con redes convolucionales y recurrentes, la implementación de funciones de pérdida y capas personalizadas, y el uso de modelos preentrenados para transfer learning. También aborda estrategias de despliegue para dispositivos edge y el uso de entornos de ejecución en la nube para aceleración por hardware. El recurso está implementado como una colección de Jupyter Notebooks.
Covers the use of binary data formats to enable rapid sequential access and processing of large-scale datasets.
Tippecanoe is a command-line tool used to generate optimized vector tiles for web maps. It converts large-scale geospatial datasets, including GeoJSON, CSV, and Geobuf files, into binary vector tiles or MBTiles SQLite databases. The project is designed to maintain map performance and visual quality across different zoom levels. It achieves this through geospatial data downsampling, which includes simplifying geometries and thinning point density to prevent tile overcrowding and keep tile sizes within specific limits. The tool provides extensive data transformation capabilities, such as attri
Convert Geobuf encoded geospatial data into a format suitable for vector tile generation.