18 रिपॉजिटरी
Tools for converting between raw formats, binary representations, and structured objects for transmission or storage.
Explore 18 awesome GitHub repositories matching data & databases · Data Serialization and Parsing. Refine with filters or upvote what's useful.
nanoGPT is a lightweight engine for training and fine-tuning transformer-based language models from scratch. It provides a minimalist codebase designed for educational exploration and rapid experimentation with neural network architectures, utilizing self-attention and feed-forward layers to process sequences and predict subsequent elements. The project distinguishes itself through a focus on high-speed data ingestion and hardware-accelerated performance. It includes a dedicated pipeline for transforming raw text into memory-mapped binary files, which enables efficient streaming during traini
Stores data in memory-mapped binary structures to facilitate rapid sequential access during training.
Requests is a Python HTTP client library used for sending HTTP requests and handling responses. It serves as a network client providing fundamental components for session management, proxy routing, multi-part uploading, and SSL/TLS certificate verification. The project distinguishes itself through a session manager that maintains cookies and reuses TCP connections to improve network performance. It also includes a dedicated multi-part form uploader for transmitting binary data and an integrated SSL/TLS certificate verifier to ensure encrypted and trusted communication. The library covers a b
Automatically detects character sets and applies decoding logic to convert incoming network data into readable text.
Requests is a high-level HTTP client library designed to simplify web communication and API integration. It provides an intuitive, human-readable interface for performing standard network operations, including request execution, connection pooling, and stateful session management. By encapsulating raw network data into structured objects, the library automates the complexities of headers, cookies, and payload transmission. The library distinguishes itself through a modular transport adapter layer that allows for custom protocol handling and extensible authentication hooks. It supports a wide
Detects character sets automatically or applies manual decoding logic to incoming response data.
Guzzle is a PHP HTTP client used for sending synchronous and asynchronous requests to web services. It serves as a concurrent HTTP request manager, an HTTP stream handler, and a middleware-based HTTP pipeline. The project is a PSR-7 compliant client, utilizing standardized PHP interfaces for requests, responses, and streams. The library differentiates itself through a customizable functional handler stack that allows for the interception and modification of the request and response lifecycle. It features an adapter-based transport system that enables swapping between network implementations,
Automatically decodes response bodies based on encoding headers to ensure readable data.
Vector is a high-performance observability data pipeline designed to collect, transform, and route logs, metrics, and traces across distributed infrastructure. It functions as a modular engine that decouples data ingestion from processing and transmission, utilizing a component-based architecture to connect diverse sources to multiple destinations. The project distinguishes itself through a focus on reliability and flow control. It implements backpressure-aware data movement to prevent data loss during traffic spikes and utilizes disk-backed event buffering to ensure durability during network
Parses raw byte streams into structured events using configurable framing and decoding logic.
Ciphey is an automated decryption tool and cryptographic analysis framework designed to identify and reverse encryptions, encodings, and hashes without requiring a known key or cipher. It functions as a hash cracking engine and a heuristic cipher identifier to recover original plaintext from unknown data patterns. The project features a nested encoding resolver that iteratively unwraps multiple layers of encryption and encoding until readable text is reached. It employs a heuristic cryptanalysis workflow to analyze data characteristics and guess the likely encoding scheme or encryption method
Iteratively applies decryption modules to nested data layers until readable plaintext is reached.
Ciphey is an automated decryption and data obfuscation tool designed to identify and reverse complex, multi-layered encoding schemes. By utilizing statistical analysis and probability scoring, the system automatically detects unknown data formats and recovers human-readable plaintext from obfuscated input strings without requiring manual algorithm specification. The tool distinguishes itself through a recursive pipeline that processes nested data structures and strips formatting anomalies or invisible characters to ensure consistent input. It employs a heuristic search and multithreaded execu
Iteratively applies decryption modules to nested data structures until human-readable plaintext is recovered.
ip2region is an offline IP geolocation library and framework designed to resolve IPv4 and IPv6 addresses to city-level regional information using local binary data files. It functions as a binary IP database compiler and a cross-language search client, allowing for regional lookups without relying on external APIs. The project distinguishes itself through a specialized binary format that supports high-performance query optimization. It employs adjacent-segment IP merging and deduplicated region storage to minimize the database footprint, while utilizing memory-mapped file caching and vector-i
Converts raw text IP mappings into a specialized binary format designed for high-speed sequential access and offline lookups.
Moya is a network abstraction layer for Swift that provides a structured framework for defining and executing REST API requests. It functions as a type-safe API client that decouples network endpoint definitions from the underlying implementation details to prevent configuration errors and URL typos. The project distinguishes itself by using protocol-based endpoint definitions and a provider-coordinated execution model. It includes a system for mapping network responses to strongly typed objects and features a dedicated tool for generating type-safe network interface files from external REST
Converts raw network response data into usable formats using custom decoding logic for specific data types.
Pwntools is a Python-based framework designed for rapid prototyping and automation in binary exploitation, reverse engineering, and security research. It serves as a comprehensive toolkit for interacting with local and remote processes, providing the primitives necessary to manage complex exploit workflows and streamline security analysis tasks. The framework distinguishes itself through its specialized capabilities for binary manipulation and automated exploit construction. It includes dedicated utilities for parsing executable file formats, assembling and disassembling machine code, and gen
Packs binary data and generates cyclic patterns to assist in the analysis of buffer overflows.
This project is a computer vision benchmark and image classification dataset used to measure and compare the accuracy of machine learning models. It provides a standardized collection of labeled fashion product images and training data formatted to be compatible with the MNIST dataset structure. The dataset consists of fixed-dimension grayscale images and label-based category mappings, stored in a binary format. It includes pre-split training and testing sets and a static distribution to ensure consistent cross-model benchmarking. The repository supports image classification benchmarking and
Stores image pixels and category labels in a binary format compatible with the MNIST structure.
Resty is a high-level HTTP client library for Go designed for consuming REST services. It provides a streamlined interface for executing network requests, managing server-sent event streams, and automatically mapping JSON and XML responses into data structures. The library includes built-in mechanisms for service resilience and traffic management, such as circuit breakers to prevent cascading failures, token-bucket rate limiting, and automated request retries with exponential backoff. It also features client-side load balancing to distribute outgoing traffic across multiple base URLs and requ
Parses incoming response streams into data structures using content-type headers for various object formats.
Potree एक वेब-बेस्ड पॉइंट क्लाउड रेंडरिंग इंजन और व्यूअर है जिसे विशाल 3D स्थानिक डेटासेट और LIDAR स्कैन के विज़ुअलाइज़ेशन और विश्लेषण के लिए डिज़ाइन किया गया है। यह एक भू-स्थानिक (geospatial) विश्लेषण टूल के रूप में कार्य करता है जो WebGL का उपयोग करके सीधे वेब ब्राउज़र के भीतर हाई-डेंसिटी पॉइंट क्लाउड की इंटरैक्टिव खोज को सक्षम बनाता है। यह सिस्टम 3D स्ट्रक्चर्स की गहराई की धारणा को बढ़ाने के लिए आई-डोम लाइटिंग का उपयोग करता है और इमर्सिव स्थानिक खोज के लिए वर्चुअल रियलिटी का समर्थन करता है। यह पदानुक्रमित एनोटेशन और एनिमेटेड कैमरा फ़्लाई-थ्रू टूर्स के माध्यम से 3D सीन डॉक्यूमेंटेशन के लिए विशेष क्षमताएं प्रदान करता है। प्लेटफ़ॉर्म में भू-स्थानिक डेटा विश्लेषण के लिए टूल शामिल हैं, जैसे स्थानिक दूरी और क्षेत्र माप, एलिवेशन प्रोफ़ाइलिंग, और बाहरी शेपफ़ाइल्स और जियोपैकेजेस का ओवरले। यूज़र्स एट्रिब्यूट-बेस्ड फ़िल्टरिंग और क्लिपिंग वॉल्यूम आइसोलेशन का उपयोग करके विशिष्ट सुविधाओं को अलग कर सकते हैं, जबकि बाहरी इमेजेस को पॉइंट क्लाउड परिप्रेक्ष्य के साथ संरेखित और सिंक्रनाइज़ किया जा सकता है। Potree बड़े पैमाने के डेटासेट के लिए एसिंक्रोनस डेटा स्ट्रीमिंग और लेवल-ऑफ़-डिटेल रेंडरिंग की सुविधा के लिए एक प्री-प्रोसेस्ड बाइनरी फ़ॉर्मेट और ऑक्ट्री-बेस्ड स्थानिक इंडेक्सिंग का उपयोग करता है।
Converts raw spatial data into an optimized binary format to reduce parsing overhead and accelerate network transfers.
Racket एक सामान्य-उद्देश्य, बहु-प्रतिमान प्रोग्रामिंग भाषा है जो Lisp परिवार में भाषा निर्माण के लिए डिज़ाइन की गई है। यह एक भाषा वर्कबेंच के रूप में कार्य करता है, जो मैक्रोज़ और मॉड्यूल की एक लचीली प्रणाली के माध्यम से कस्टम प्रोग्रामिंग भाषाओं को डिजाइन और कार्यान्वित करने के लिए एक प्लेटफॉर्म प्रदान करता है। यह सिस्टम सिमेंटिक्स इंजीनियरिंग के लिए एक व्यापक सूट की पेशकश करके खुद को अलग करता है, जो विशेष भाषा सबसेट और शैक्षिक परतों के निर्माण की अनुमति देता है। इसमें कस्टम भाषा डिज़ाइन के लिए टूल शामिल हैं, जैसे लेक्सर और पार्सर जनरेशन, साथ ही रीड-टाइम पर मॉड्यूल विस्तार नियमों और गतिशील भाषा चयन को परिभाषित करने की क्षमता। यह प्रोजेक्ट एक इन-बिल्ट एडिटर, विजुअल डिबगर और सॉफ़्टवेयर पैकेज मैनेजर के साथ एक एकीकृत विकास वातावरण प्रदान करता है। इसकी क्षमता सतह 2D ग्राफिक्स रेंडरिंग, बाइनरी डेटा प्रोसेसिंग, SQL और डिडक्टिव डेटाबेस एकीकरण, और ग्राफिकल यूजर इंटरफेस के निर्माण को कवर करने वाली एक सामान्य-उद्देश्य मानक लाइब्रेरी तक फैली हुई है। यह वातावरण वितरण के लिए सोर्स कोड को स्टैंडअलोन निष्पादन योग्य फाइलों में संकलित करने का समर्थन करता है।
Provides capabilities to parse Resource Interchange File Format data and write objects to output ports.
This is a TOML parser and serializer for the Go language. It serves as a data serialization library and configuration file mapper that encodes and decodes data between Go structures and the TOML configuration format. The library provides interfaces for custom type marshaling, allowing for specialized logic when parsing or serializing specific data types. It transforms structured objects into deterministic TOML documents for storage or transmission. The project covers a broad range of data processing capabilities, including structured value encoding, TOML data generation, and metadata inspect
Provides full-cycle conversion between complex data objects and TOML formatted text.
Arroyo is a high-performance stream processing platform built in Rust. It executes continuous SQL queries on streaming data with event-time semantics, enabling accurate windowed aggregations, joins, and stateful computations on unbounded event streams. The platform uses native Rust execution for high throughput and low latency, with periodic checkpointing for exactly-once fault tolerance and horizontal scaling across distributed workers. The system integrates deeply with Kafka for reading and writing topics with exactly-once delivery and supports change data capture (CDC) from MySQL and Postg
Arroyo reads and writes arbitrary binary data as a bytea column for custom processing with UDFs.
This project is a comprehensive educational resource and tutorial handbook for building, training, and deploying machine learning models using TensorFlow 2. It serves as a structured learning guide covering core deep learning concepts, including neural network architectures, automatic differentiation, and tensor operations. The handbook provides technical guidance on optimizing execution efficiency through GPU memory management, distributed training, and model quantization. It also includes detailed manuals for constructing high-performance data pipelines and exporting models for production s
Covers the use of binary data formats to enable rapid sequential access and processing of large-scale datasets.
Tippecanoe is a command-line tool used to generate optimized vector tiles for web maps. It converts large-scale geospatial datasets, including GeoJSON, CSV, and Geobuf files, into binary vector tiles or MBTiles SQLite databases. The project is designed to maintain map performance and visual quality across different zoom levels. It achieves this through geospatial data downsampling, which includes simplifying geometries and thinning point density to prevent tile overcrowding and keep tile sizes within specific limits. The tool provides extensive data transformation capabilities, such as attri
Convert Geobuf encoded geospatial data into a format suitable for vector tile generation.