18 مستودعات
Tools for converting between raw formats, binary representations, and structured objects for transmission or storage.
Explore 18 awesome GitHub repositories matching data & databases · Data Serialization and Parsing. Refine with filters or upvote what's useful.
nanoGPT is a lightweight engine for training and fine-tuning transformer-based language models from scratch. It provides a minimalist codebase designed for educational exploration and rapid experimentation with neural network architectures, utilizing self-attention and feed-forward layers to process sequences and predict subsequent elements. The project distinguishes itself through a focus on high-speed data ingestion and hardware-accelerated performance. It includes a dedicated pipeline for transforming raw text into memory-mapped binary files, which enables efficient streaming during traini
Stores data in memory-mapped binary structures to facilitate rapid sequential access during training.
Requests is a Python HTTP client library used for sending HTTP requests and handling responses. It serves as a network client providing fundamental components for session management, proxy routing, multi-part uploading, and SSL/TLS certificate verification. The project distinguishes itself through a session manager that maintains cookies and reuses TCP connections to improve network performance. It also includes a dedicated multi-part form uploader for transmitting binary data and an integrated SSL/TLS certificate verifier to ensure encrypted and trusted communication. The library covers a b
Automatically detects character sets and applies decoding logic to convert incoming network data into readable text.
Requests is a high-level HTTP client library designed to simplify web communication and API integration. It provides an intuitive, human-readable interface for performing standard network operations, including request execution, connection pooling, and stateful session management. By encapsulating raw network data into structured objects, the library automates the complexities of headers, cookies, and payload transmission. The library distinguishes itself through a modular transport adapter layer that allows for custom protocol handling and extensible authentication hooks. It supports a wide
Detects character sets automatically or applies manual decoding logic to incoming response data.
Guzzle is a PHP HTTP client used for sending synchronous and asynchronous requests to web services. It serves as a concurrent HTTP request manager, an HTTP stream handler, and a middleware-based HTTP pipeline. The project is a PSR-7 compliant client, utilizing standardized PHP interfaces for requests, responses, and streams. The library differentiates itself through a customizable functional handler stack that allows for the interception and modification of the request and response lifecycle. It features an adapter-based transport system that enables swapping between network implementations,
Automatically decodes response bodies based on encoding headers to ensure readable data.
Vector is a high-performance observability data pipeline designed to collect, transform, and route logs, metrics, and traces across distributed infrastructure. It functions as a modular engine that decouples data ingestion from processing and transmission, utilizing a component-based architecture to connect diverse sources to multiple destinations. The project distinguishes itself through a focus on reliability and flow control. It implements backpressure-aware data movement to prevent data loss during traffic spikes and utilizes disk-backed event buffering to ensure durability during network
Parses raw byte streams into structured events using configurable framing and decoding logic.
Ciphey is an automated decryption tool and cryptographic analysis framework designed to identify and reverse encryptions, encodings, and hashes without requiring a known key or cipher. It functions as a hash cracking engine and a heuristic cipher identifier to recover original plaintext from unknown data patterns. The project features a nested encoding resolver that iteratively unwraps multiple layers of encryption and encoding until readable text is reached. It employs a heuristic cryptanalysis workflow to analyze data characteristics and guess the likely encoding scheme or encryption method
Iteratively applies decryption modules to nested data layers until readable plaintext is reached.
Ciphey is an automated decryption and data obfuscation tool designed to identify and reverse complex, multi-layered encoding schemes. By utilizing statistical analysis and probability scoring, the system automatically detects unknown data formats and recovers human-readable plaintext from obfuscated input strings without requiring manual algorithm specification. The tool distinguishes itself through a recursive pipeline that processes nested data structures and strips formatting anomalies or invisible characters to ensure consistent input. It employs a heuristic search and multithreaded execu
Iteratively applies decryption modules to nested data structures until human-readable plaintext is recovered.
ip2region is an offline IP geolocation library and framework designed to resolve IPv4 and IPv6 addresses to city-level regional information using local binary data files. It functions as a binary IP database compiler and a cross-language search client, allowing for regional lookups without relying on external APIs. The project distinguishes itself through a specialized binary format that supports high-performance query optimization. It employs adjacent-segment IP merging and deduplicated region storage to minimize the database footprint, while utilizing memory-mapped file caching and vector-i
Converts raw text IP mappings into a specialized binary format designed for high-speed sequential access and offline lookups.
Moya is a network abstraction layer for Swift that provides a structured framework for defining and executing REST API requests. It functions as a type-safe API client that decouples network endpoint definitions from the underlying implementation details to prevent configuration errors and URL typos. The project distinguishes itself by using protocol-based endpoint definitions and a provider-coordinated execution model. It includes a system for mapping network responses to strongly typed objects and features a dedicated tool for generating type-safe network interface files from external REST
Converts raw network response data into usable formats using custom decoding logic for specific data types.
Pwntools is a Python-based framework designed for rapid prototyping and automation in binary exploitation, reverse engineering, and security research. It serves as a comprehensive toolkit for interacting with local and remote processes, providing the primitives necessary to manage complex exploit workflows and streamline security analysis tasks. The framework distinguishes itself through its specialized capabilities for binary manipulation and automated exploit construction. It includes dedicated utilities for parsing executable file formats, assembling and disassembling machine code, and gen
Packs binary data and generates cyclic patterns to assist in the analysis of buffer overflows.
This project is a computer vision benchmark and image classification dataset used to measure and compare the accuracy of machine learning models. It provides a standardized collection of labeled fashion product images and training data formatted to be compatible with the MNIST dataset structure. The dataset consists of fixed-dimension grayscale images and label-based category mappings, stored in a binary format. It includes pre-split training and testing sets and a static distribution to ensure consistent cross-model benchmarking. The repository supports image classification benchmarking and
Stores image pixels and category labels in a binary format compatible with the MNIST structure.
Resty is a high-level HTTP client library for Go designed for consuming REST services. It provides a streamlined interface for executing network requests, managing server-sent event streams, and automatically mapping JSON and XML responses into data structures. The library includes built-in mechanisms for service resilience and traffic management, such as circuit breakers to prevent cascading failures, token-bucket rate limiting, and automated request retries with exponential backoff. It also features client-side load balancing to distribute outgoing traffic across multiple base URLs and requ
Parses incoming response streams into data structures using content-type headers for various object formats.
Potree is a web-based point cloud rendering engine and viewer designed for the visualization and analysis of massive 3D spatial datasets and LIDAR scans. It functions as a geospatial analysis tool that enables the interactive exploration of high-density point clouds directly within a web browser using WebGL. The system utilizes eye-dome lighting to enhance depth perception of 3D structures and supports virtual reality for immersive spatial exploration. It provides specialized capabilities for 3D scene documentation through hierarchical annotations and the creation of animated camera fly-throu
Converts raw spatial data into an optimized binary format to reduce parsing overhead and accelerate network transfers.
Racket هي لغة برمجة متعددة النماذج للأغراض العامة من عائلة Lisp مصممة لإنشاء اللغات. تعمل كمنصة عمل للغة، حيث توفر بيئة لتصميم وتنفيذ لغات برمجة مخصصة من خلال نظام مرن من الماكرو والوحدات. يتميز النظام بتقديم مجموعة شاملة لهندسة الدلالات، مما يسمح ببناء مجموعات لغوية متخصصة وطبقات تعليمية. يتضمن أدوات لتصميم اللغات المخصصة، مثل إنشاء المحلل اللغوي (lexer and parser)، بالإضافة إلى القدرة على تحديد قواعد توسيع الوحدات واختيار اللغة الديناميكي في وقت القراءة. يوفر المشروع بيئة تطوير متكاملة مع محرر مدمج، ومصحح أخطاء مرئي، ومدير حزم برمجية. تمتد إمكانياته إلى مكتبة قياسية للأغراض العامة تغطي عرض الرسومات ثنائية الأبعاد، ومعالجة البيانات الثنائية، وتكامل SQL وقواعد البيانات الاستنتاجية، وبناء واجهات المستخدم الرسومية. تدعم البيئة تجميع الكود المصدري في ملفات تنفيذية مستقلة للتوزيع.
Provides capabilities to parse Resource Interchange File Format data and write objects to output ports.
This is a TOML parser and serializer for the Go language. It serves as a data serialization library and configuration file mapper that encodes and decodes data between Go structures and the TOML configuration format. The library provides interfaces for custom type marshaling, allowing for specialized logic when parsing or serializing specific data types. It transforms structured objects into deterministic TOML documents for storage or transmission. The project covers a broad range of data processing capabilities, including structured value encoding, TOML data generation, and metadata inspect
Provides full-cycle conversion between complex data objects and TOML formatted text.
Arroyo is a high-performance stream processing platform built in Rust. It executes continuous SQL queries on streaming data with event-time semantics, enabling accurate windowed aggregations, joins, and stateful computations on unbounded event streams. The platform uses native Rust execution for high throughput and low latency, with periodic checkpointing for exactly-once fault tolerance and horizontal scaling across distributed workers. The system integrates deeply with Kafka for reading and writing topics with exactly-once delivery and supports change data capture (CDC) from MySQL and Postg
Arroyo reads and writes arbitrary binary data as a bytea column for custom processing with UDFs.
هذا المشروع عبارة عن مورد تعليمي شامل ودليل تدريبي لبناء وتدريب ونشر نماذج تعلم الآلة باستخدام TensorFlow 2. يعمل كدليل تعليمي منظم يغطي مفاهيم التعلم العميق الأساسية، بما في ذلك معماريات الشبكات العصبية، والاشتقاق التلقائي، وعمليات الموترات (Tensors). يوفر الدليل توجيهات تقنية حول تحسين كفاءة التنفيذ من خلال إدارة ذاكرة GPU، والتدريب الموزع، وتكميم النماذج (Model Quantization). كما يتضمن أدلة مفصلة لبناء خطوط معالجة بيانات عالية الأداء وتصدير النماذج لخوادم الإنتاج، والأجهزة المحمولة، ومتصفحات الويب. تغطي المادة مجموعة واسعة من القدرات، بما في ذلك تطوير النماذج باستخدام الشبكات التلافيفية (CNN) والمتكررة (RNN)، وتنفيذ دوال خسارة وطبقات مخصصة، واستخدام النماذج المدربة مسبقاً للتعلم بنقل المعرفة (Transfer Learning). كما يتناول استراتيجيات النشر للأجهزة الطرفية (Edge Devices) واستخدام بيئات التشغيل السحابية لتسريع العتاد. تم تنفيذ المادة كمجموعة من دفاتر Jupyter Notebooks.
Covers the use of binary data formats to enable rapid sequential access and processing of large-scale datasets.
Tippecanoe is a command-line tool used to generate optimized vector tiles for web maps. It converts large-scale geospatial datasets, including GeoJSON, CSV, and Geobuf files, into binary vector tiles or MBTiles SQLite databases. The project is designed to maintain map performance and visual quality across different zoom levels. It achieves this through geospatial data downsampling, which includes simplifying geometries and thinning point density to prevent tile overcrowding and keep tile sizes within specific limits. The tool provides extensive data transformation capabilities, such as attri
Convert Geobuf encoded geospatial data into a format suitable for vector tile generation.