16 repository-uri
Libraries that map diverse input formats into a unified internal object model for consistent processing.
Distinguishing note: Focuses on structural normalization of heterogeneous data sources rather than statistical data cleaning.
Explore 16 awesome GitHub repositories matching data & databases · Data Normalization Utilities. Refine with filters or upvote what's useful.
Career-ops is an AI-driven job search automation system designed to manage the entire application lifecycle, from discovery to tracking. It functions as a career copilot that utilizes autonomous agents to identify vacancies, evaluate professional fit, and generate tailored application materials. The project distinguishes itself through a multi-archetype persona management system and writing style calibration, allowing users to maintain different professional identities and a consistent voice across documents. It employs a multi-dimensional weighted scoring system to evaluate job suitability a
Converts various pay intervals into a standardized annual salary for consistent comparison.
Supervision is a computer vision toolset for normalizing model outputs, managing datasets, and visualizing annotations. It provides a framework to convert predictions from various classification and detection models into a standardized data format to ensure interoperability across different computer vision pipelines. The library features a post-processor for filtering, counting, and tracking detected objects across image frames and video streams. It includes capabilities for large image tiling to improve the detection of small objects and tools for assigning persistent identities to objects t
Transforms detection results from different computer vision models into a unified internal object model for consistent processing.
VeighNa is an event-driven, modular platform designed for the development, backtesting, and execution of automated financial trading strategies. It provides a comprehensive suite of tools that includes a centralized trading terminal for monitoring portfolios and market conditions, alongside a robust algorithmic trading engine that manages real-time data processing and order execution. The platform distinguishes itself through a highly decoupled architecture that isolates algorithmic logic from market connectivity, allowing for independent strategy development and testing. It utilizes a dynami
Maps heterogeneous market data and order streams from multiple exchange APIs into a consistent internal format for analysis.
SheetJS is a comprehensive library for parsing, manipulating, and generating complex spreadsheet file formats. It functions as a universal data processor that maps diverse binary, XML, and text-based file structures into a unified internal object model, allowing developers to create, read, and transform workbook data programmatically. The library distinguishes itself through a portable logic layer that provides a consistent execution environment across web browsers, server-side runtimes, and native desktop or mobile applications. By utilizing stream-based processing, it handles large files in
Maps diverse binary and text-based spreadsheet formats into a unified internal object model for consistent data manipulation and transformation.
Haystack is an orchestration framework designed for building complex search and generative AI pipelines. It functions as an agentic workflow engine, enabling the construction of automated sequences that allow AI agents to perform multi-step reasoning and data analysis. The framework utilizes a modular, component-based architecture that connects processing steps into directed acyclic graphs. By employing a provider-agnostic integration layer, it decouples core logic from specific external AI services and vector databases, allowing for the flexible exchange of underlying technologies. This desi
Normalizes diverse media formats into a unified internal representation for consistent processing.
GSAP is a comprehensive JavaScript animation library designed for orchestrating complex motion sequences and interactive user interfaces. It provides a robust property-interpolation engine that calculates intermediate values for CSS styles, attributes, and numeric properties, enabling smooth visual transitions across web elements. The framework is built on a core architecture that manages animation lifecycles, timeline-based sequence orchestration, and virtual property interception to ensure precise control over motion. The library distinguishes itself through a modular, plugin-based extensib
Normalizes diverse input formats like NodeLists and selector strings into flat arrays for consistent animation processing.
Fx is a command-line processing suite designed for the transformation, conversion, exploration, and visualization of structured data. It functions as a terminal-based utility that handles both automated shell pipelines and interactive navigation of complex, nested data hierarchies. The tool distinguishes itself by integrating a JavaScript-based engine that executes user-provided logic to filter, map, or modify data fields within a sandboxed runtime. It maintains a responsive interface by decoupling data processing from the display loop, allowing users to explore large datasets through an inte
Translates diverse input formats like YAML or TOML into a unified internal object model for consistent processing.
Keploy is an automated testing platform that leverages kernel-level traffic interception to generate and maintain regression test suites for microservices. By capturing live network traffic and system calls via eBPF, the platform automatically creates deterministic test cases and mocks external dependencies without requiring manual code instrumentation. This approach allows developers to validate application behavior and API contracts by replaying production-like traffic in isolated environments. The platform distinguishes itself through its use of machine learning to perform test maintenance
Uses statistical analysis to identify and mask non-deterministic fields like timestamps to prevent flaky test results.
This project is a Python library designed for the programmatic retrieval and analysis of diverse financial datasets. It functions as a comprehensive toolkit for quantitative research, providing a unified interface to fetch historical and real-time market data across asset classes including equities, futures, bonds, cryptocurrencies, and foreign exchange. By abstracting complex network requests into simple, parameter-driven functions, it enables users to integrate financial data into research workflows and automated trading systems. The library distinguishes itself through its scraper-based ag
Maps heterogeneous API responses and web tables into consistent, standardized Python data structures for analysis.
MNN is a high-performance inference engine and framework designed for on-device machine learning. It provides a comprehensive environment for executing, optimizing, and deploying neural network models directly on mobile and resource-constrained edge devices. The framework distinguishes itself through a robust model optimization toolkit that supports quantization, compression, and structural graph manipulation to minimize memory footprint and maximize execution speed. It features a modular architecture that abstracts hardware-specific backends, allowing models to run efficiently across diverse
Scales input matrix values to consistent ranges to ensure stable data distribution and improve model convergence.
This project serves as a developer documentation tool and reference utility designed for identifying and copying text-based emoji shortcodes. It provides a comprehensive directory of standard emoji shortcodes organized by category, facilitating quick lookup and integration into documentation, code comments, and messaging platforms. Beyond its reference capabilities, the repository functions as a build utility for TypeScript development. It includes a pipeline that transforms modern TypeScript source code into compatible JavaScript, enforcing strict type safety and module resolution during the
Maps raw emoji metadata into a structured internal model for consistent cross-platform access.
jc is a tool that transforms plain-text results from command-line utilities, system tools, log formats, and text tables into structured JSON data. It functions as a structured data transformer capable of converting various file formats, including CSV, INI, XML, and YAML, into JSON representations for programmatic use. The project includes a collection of specific parsers for Unix commands and system tools such as df, blkid, and various package managers. It also features specialized converters for web server logs, Common Log Format, and Common Event Format strings. The tool covers broad capab
Maps diverse input formats like INI, XML, and YAML into a unified JSON object model for consistent processing.
Faraday is a vulnerability management platform and security tool aggregator designed to centralize security findings from multiple scanners into a single dashboard. It utilizes a relational security database to catalog hosts, services, and security flaws, enabling users to track remediation and analyze organizational risk. The platform distinguishes itself through a plugin-based system that normalizes diverse security tool outputs into a unified data model. It supports deep integration with a wide array of scanners and CLI tools, intercepting shell command output or parsing report files to ag
Normalizes diverse security tool outputs into a unified data model using a plugin-based system of parsers.
This project is a network security utility designed to manage and automate the deployment of IP-based blocklists. It functions by fetching external threat intelligence, normalizing the data, and injecting it directly into the Linux kernel firewall. By maintaining these high-performance network sets, the system provides automated perimeter defense against known malicious traffic sources. The tool distinguishes itself through its ability to perform atomic rule updates, which allows security policies to be refreshed without interrupting active network connections or requiring service restarts. I
Transforms diverse threat intelligence formats into a unified internal model for consistent firewall rule processing.
OpenAddresses is an open-source geospatial data aggregator and directory that collects public domain and open-license address, parcel, and building datasets from governments and organizations worldwide. It functions as a global index and data warehouse for locating and distributing free geospatial records. The project operates a normalization pipeline that cleans and standardizes diverse source formats into a consistent global coordinate and attribute schema. This process includes a crowdsourced curation pipeline and programmatic quality validation to verify the spatial accuracy and formattin
Maps diverse geospatial input formats into a unified internal object model for consistent processing.
Acest proiect este o bibliotecă Java concepută pentru a valida documentele JSON conform specificațiilor de schemă definite. Funcționează ca un motor conform standardelor care asigură integritatea datelor prin verificarea conformității structurale și aplicarea regulilor de business în aplicațiile Java. Biblioteca se distinge prin strategiile sale flexibile de validare, permițând dezvoltatorilor să injecteze logică personalizată și motoare de potrivire a tiparelor pentru a gestiona formate de date specializate. Suportă moduri de execuție configurabile, permițând utilizatorilor fie să oprească validarea imediat la prima eroare, fie să colecteze toate încălcările pentru raportare cuprinzătoare. Mai mult, motorul include capabilități încorporate pentru coerciția tipurilor și injectarea valorilor implicite, ceea ce ajută la normalizarea datelor primite în timpul procesului de validare. Dincolo de verificările structurale standard, sistemul oferă funcții avansate pentru gestionarea dependențelor complexe de date și a constrângerilor de securitate. Include un registru local pentru rezolvarea referințelor de schemă fără acces la rețea și suportă aplicarea tiparelor de acces read-only sau write-only. Biblioteca oferă, de asemenea, instrumente de observabilitate, cum ar fi hook-uri bazate pe evenimente care permit sistemelor externe să monitorizeze procesul intern de validare.
Normalizes incoming data by coercing types and injecting default values to ensure consistent internal data models.