18 repository-uri
Tools for converting between raw formats, binary representations, and structured objects for transmission or storage.
Explore 18 awesome GitHub repositories matching data & databases · Data Serialization and Parsing. Refine with filters or upvote what's useful.
nanoGPT is a lightweight engine for training and fine-tuning transformer-based language models from scratch. It provides a minimalist codebase designed for educational exploration and rapid experimentation with neural network architectures, utilizing self-attention and feed-forward layers to process sequences and predict subsequent elements. The project distinguishes itself through a focus on high-speed data ingestion and hardware-accelerated performance. It includes a dedicated pipeline for transforming raw text into memory-mapped binary files, which enables efficient streaming during traini
Stores data in memory-mapped binary structures to facilitate rapid sequential access during training.
Requests is a Python HTTP client library used for sending HTTP requests and handling responses. It serves as a network client providing fundamental components for session management, proxy routing, multi-part uploading, and SSL/TLS certificate verification. The project distinguishes itself through a session manager that maintains cookies and reuses TCP connections to improve network performance. It also includes a dedicated multi-part form uploader for transmitting binary data and an integrated SSL/TLS certificate verifier to ensure encrypted and trusted communication. The library covers a b
Automatically detects character sets and applies decoding logic to convert incoming network data into readable text.
Requests is a high-level HTTP client library designed to simplify web communication and API integration. It provides an intuitive, human-readable interface for performing standard network operations, including request execution, connection pooling, and stateful session management. By encapsulating raw network data into structured objects, the library automates the complexities of headers, cookies, and payload transmission. The library distinguishes itself through a modular transport adapter layer that allows for custom protocol handling and extensible authentication hooks. It supports a wide
Detects character sets automatically or applies manual decoding logic to incoming response data.
Guzzle is a PHP HTTP client used for sending synchronous and asynchronous requests to web services. It serves as a concurrent HTTP request manager, an HTTP stream handler, and a middleware-based HTTP pipeline. The project is a PSR-7 compliant client, utilizing standardized PHP interfaces for requests, responses, and streams. The library differentiates itself through a customizable functional handler stack that allows for the interception and modification of the request and response lifecycle. It features an adapter-based transport system that enables swapping between network implementations,
Automatically decodes response bodies based on encoding headers to ensure readable data.
Vector is a high-performance observability data pipeline designed to collect, transform, and route logs, metrics, and traces across distributed infrastructure. It functions as a modular engine that decouples data ingestion from processing and transmission, utilizing a component-based architecture to connect diverse sources to multiple destinations. The project distinguishes itself through a focus on reliability and flow control. It implements backpressure-aware data movement to prevent data loss during traffic spikes and utilizes disk-backed event buffering to ensure durability during network
Parses raw byte streams into structured events using configurable framing and decoding logic.
Ciphey is an automated decryption tool and cryptographic analysis framework designed to identify and reverse encryptions, encodings, and hashes without requiring a known key or cipher. It functions as a hash cracking engine and a heuristic cipher identifier to recover original plaintext from unknown data patterns. The project features a nested encoding resolver that iteratively unwraps multiple layers of encryption and encoding until readable text is reached. It employs a heuristic cryptanalysis workflow to analyze data characteristics and guess the likely encoding scheme or encryption method
Iteratively applies decryption modules to nested data layers until readable plaintext is reached.
Ciphey is an automated decryption and data obfuscation tool designed to identify and reverse complex, multi-layered encoding schemes. By utilizing statistical analysis and probability scoring, the system automatically detects unknown data formats and recovers human-readable plaintext from obfuscated input strings without requiring manual algorithm specification. The tool distinguishes itself through a recursive pipeline that processes nested data structures and strips formatting anomalies or invisible characters to ensure consistent input. It employs a heuristic search and multithreaded execu
Iteratively applies decryption modules to nested data structures until human-readable plaintext is recovered.
ip2region is an offline IP geolocation library and framework designed to resolve IPv4 and IPv6 addresses to city-level regional information using local binary data files. It functions as a binary IP database compiler and a cross-language search client, allowing for regional lookups without relying on external APIs. The project distinguishes itself through a specialized binary format that supports high-performance query optimization. It employs adjacent-segment IP merging and deduplicated region storage to minimize the database footprint, while utilizing memory-mapped file caching and vector-i
Converts raw text IP mappings into a specialized binary format designed for high-speed sequential access and offline lookups.
Moya is a network abstraction layer for Swift that provides a structured framework for defining and executing REST API requests. It functions as a type-safe API client that decouples network endpoint definitions from the underlying implementation details to prevent configuration errors and URL typos. The project distinguishes itself by using protocol-based endpoint definitions and a provider-coordinated execution model. It includes a system for mapping network responses to strongly typed objects and features a dedicated tool for generating type-safe network interface files from external REST
Converts raw network response data into usable formats using custom decoding logic for specific data types.
Pwntools is a Python-based framework designed for rapid prototyping and automation in binary exploitation, reverse engineering, and security research. It serves as a comprehensive toolkit for interacting with local and remote processes, providing the primitives necessary to manage complex exploit workflows and streamline security analysis tasks. The framework distinguishes itself through its specialized capabilities for binary manipulation and automated exploit construction. It includes dedicated utilities for parsing executable file formats, assembling and disassembling machine code, and gen
Packs binary data and generates cyclic patterns to assist in the analysis of buffer overflows.
This project is a computer vision benchmark and image classification dataset used to measure and compare the accuracy of machine learning models. It provides a standardized collection of labeled fashion product images and training data formatted to be compatible with the MNIST dataset structure. The dataset consists of fixed-dimension grayscale images and label-based category mappings, stored in a binary format. It includes pre-split training and testing sets and a static distribution to ensure consistent cross-model benchmarking. The repository supports image classification benchmarking and
Stores image pixels and category labels in a binary format compatible with the MNIST structure.
Resty is a high-level HTTP client library for Go designed for consuming REST services. It provides a streamlined interface for executing network requests, managing server-sent event streams, and automatically mapping JSON and XML responses into data structures. The library includes built-in mechanisms for service resilience and traffic management, such as circuit breakers to prevent cascading failures, token-bucket rate limiting, and automated request retries with exponential backoff. It also features client-side load balancing to distribute outgoing traffic across multiple base URLs and requ
Parses incoming response streams into data structures using content-type headers for various object formats.
Potree este un motor de randare și vizualizator de nori de puncte (point cloud) bazat pe web, conceput pentru vizualizarea și analiza seturilor de date spațiale 3D masive și a scanărilor LIDAR. Funcționează ca un instrument de analiză geospațială care permite explorarea interactivă a norilor de puncte de înaltă densitate direct într-un browser web folosind WebGL. Sistemul utilizează eye-dome lighting pentru a îmbunătăți percepția adâncimii structurilor 3D și suportă realitatea virtuală pentru explorarea spațială imersivă. Oferă capabilități specializate pentru documentarea scenelor 3D prin adnotări ierarhice și crearea de tururi animate cu cameră virtuală. Platforma include instrumente pentru analiza datelor geospațiale, cum ar fi măsurători de distanță și arie, profilare de elevație și suprapunerea de shapefile-uri și geopackage-uri externe. Utilizatorii pot izola caracteristici specifice folosind filtrarea bazată pe atribute și izolarea volumelor de tăiere, în timp ce imaginile externe pot fi aliniate și sincronizate cu perspectiva norului de puncte. Potree utilizează un format binar pre-procesat și indexare spațială bazată pe octree pentru a facilita streaming-ul asincron de date și randarea pe niveluri de detaliu pentru seturi de date la scară largă.
Converts raw spatial data into an optimized binary format to reduce parsing overhead and accelerate network transfers.
Racket este un limbaj de programare general-purpose, multi-paradigmă, din familia Lisp, conceput pentru crearea de limbaje. Funcționează ca un banc de lucru pentru limbaje (language workbench), oferind o platformă pentru proiectarea și implementarea de limbaje de programare personalizate printr-un sistem flexibil de macro-uri și module. Sistemul se distinge prin oferirea unei suite cuprinzătoare pentru ingineria semantică, permițând construcția de subseturi de limbaje specializate și straturi educaționale. Include instrumente pentru designul de limbaje personalizate, cum ar fi generarea de lexere și parsere, precum și capacitatea de a defini reguli de expansiune a modulelor și selecția dinamică a limbajului la momentul citirii (read-time). Proiectul oferă un mediu de dezvoltare integrat (IDE) cu editor încorporat, debugger vizual și un manager de pachete software. Suprafața sa de capabilități se extinde la o bibliotecă standard general-purpose care acoperă randarea graficii 2D, procesarea datelor binare, integrarea SQL și a bazelor de date deductive, precum și construcția de interfețe grafice. Mediul suportă compilarea codului sursă în fișiere executabile standalone pentru distribuție.
Provides capabilities to parse Resource Interchange File Format data and write objects to output ports.
This is a TOML parser and serializer for the Go language. It serves as a data serialization library and configuration file mapper that encodes and decodes data between Go structures and the TOML configuration format. The library provides interfaces for custom type marshaling, allowing for specialized logic when parsing or serializing specific data types. It transforms structured objects into deterministic TOML documents for storage or transmission. The project covers a broad range of data processing capabilities, including structured value encoding, TOML data generation, and metadata inspect
Provides full-cycle conversion between complex data objects and TOML formatted text.
Arroyo is a high-performance stream processing platform built in Rust. It executes continuous SQL queries on streaming data with event-time semantics, enabling accurate windowed aggregations, joins, and stateful computations on unbounded event streams. The platform uses native Rust execution for high throughput and low latency, with periodic checkpointing for exactly-once fault tolerance and horizontal scaling across distributed workers. The system integrates deeply with Kafka for reading and writing topics with exactly-once delivery and supports change data capture (CDC) from MySQL and Postg
Arroyo reads and writes arbitrary binary data as a bytea column for custom processing with UDFs.
Acest proiect este o resursă educațională cuprinzătoare și un manual de tutoriale pentru construirea, antrenarea și implementarea modelelor de machine learning folosind TensorFlow 2. Acesta servește drept ghid de învățare structurat, acoperind concepte fundamentale de deep learning, inclusiv arhitecturi de rețele neuronale, diferențiere automată și operații cu tensori. Manualul oferă îndrumări tehnice pentru optimizarea eficienței execuției prin gestionarea memoriei GPU, antrenarea distribuită și cuantizarea modelelor. Include, de asemenea, manuale detaliate pentru construirea de pipeline-uri de date de înaltă performanță și exportul modelelor pentru servere de producție, dispozitive mobile și browsere web. Materialul acoperă o gamă largă de capabilități, inclusiv dezvoltarea de modele cu rețele convoluționale și recurente, implementarea de funcții de loss și straturi personalizate, precum și utilizarea modelelor pre-antrenate pentru transfer learning. De asemenea, abordează strategii de implementare pentru dispozitive edge și utilizarea runtime-urilor bazate pe cloud pentru accelerare hardware. Resursa este implementată sub forma unei colecții de Jupyter Notebooks.
Covers the use of binary data formats to enable rapid sequential access and processing of large-scale datasets.
Tippecanoe is a command-line tool used to generate optimized vector tiles for web maps. It converts large-scale geospatial datasets, including GeoJSON, CSV, and Geobuf files, into binary vector tiles or MBTiles SQLite databases. The project is designed to maintain map performance and visual quality across different zoom levels. It achieves this through geospatial data downsampling, which includes simplifying geometries and thinning point density to prevent tile overcrowding and keep tile sizes within specific limits. The tool provides extensive data transformation capabilities, such as attri
Convert Geobuf encoded geospatial data into a format suitable for vector tile generation.