16 रिपॉजिटरी
Tools that standardize heterogeneous data sources into consistent structures to ensure schema uniformity.
Explore 16 awesome GitHub repositories matching data & databases · Schema-Driven Data Normalizers. Refine with filters or upvote what's useful.
OpenBB is a financial data platform and investment research terminal designed to aggregate, normalize, and distribute market data across analytical workflows. It functions as a comprehensive ecosystem that bridges disparate financial data providers with custom applications, spreadsheets, and internal modeling infrastructure. The platform distinguishes itself through a provider-based data abstraction layer that normalizes heterogeneous financial APIs into a consistent, schema-driven format. This architecture supports quantitative research automation and the construction of interactive, widget-
Enforces standardized data structures to ensure information from heterogeneous financial APIs remains consistent throughout the research pipeline.
Kotaemon is an orchestration framework designed for building modular, agentic workflows that integrate document processing, retrieval-augmented generation, and multi-step reasoning. It provides a comprehensive platform for developing document-based question answering systems, allowing users to chain language models, prompt templates, and external tools into complex, automated pipelines. The system distinguishes itself through a highly modular architecture that emphasizes component-based composition and schema-driven data exchange. It supports autonomous agents capable of decomposing complex q
Standardizes heterogeneous data sources into consistent structures to ensure schema uniformity across indexing and reasoning components.
Figma-Context-MCP is a design-to-code automation tool that functions as a server for the Model Context Protocol. It acts as a bridge between visual design platforms and development environments, enabling large language models to access design file metadata and component properties directly. The project distinguishes itself by providing a standard-compliant interface that translates design specifications into structured data. By extracting layout and styling information, it facilitates the programmatic conversion of design tokens and component requirements into actionable code structures. Thi
Normalizes raw design properties into consistent interface definitions for automated environments.
This project is a command-line forensic toolkit designed for the investigation and security auditing of mobile devices. It provides a framework for collecting system logs, application data, and forensic artifacts to identify potential security breaches, unauthorized access, or evidence of malicious activity. The utility employs a modular extraction architecture that parses diverse file formats and system logs into a standardized, normalized data structure. By utilizing this unified format, the tool performs both heuristic analysis of system metadata and pattern matching against structured thr
Normalizes raw device logs and backups into a standardized format for consistent cross-platform analysis.
Joyagent-jdgenie is an automated data orchestrator designed to centralize the retrieval and processing of information from disparate remote sources. It functions as a framework for building repeatable data pipelines that fetch, clean, and normalize raw input into consistent, structured formats. The system utilizes a schema-driven engine to apply validation rules and structural templates to incoming data, ensuring compatibility across enterprise systems. By employing configuration-based workflow definitions, it allows for the orchestration of modular tasks into automated execution flows, separ
Applies structural templates and validation rules to raw incoming information to ensure enterprise-wide consistency.
Sigma is a generic SIEM signature format and log event pattern standard used to describe malicious activity. It provides a vendor-neutral system for defining security event patterns in YAML, ensuring that detection logic remains portable across different monitoring platforms. The project maintains a curated library of peer-reviewed detection rules that identify threats and compliance violations. This standardized approach allows for the exchange of threat hunting logic and the translation of generic signatures into specific queries for various security information and event management systems
Enforces a strict data model for log event patterns to ensure consistency across shared detection rules.
OmniAuth is a rack authentication framework that allows applications to verify user identities through third-party service providers using a single standardized interface. It functions as middleware to separate identity verification from core application logic by intercepting incoming requests. The project employs a strategy-pattern provider model to encapsulate provider-specific logic into interchangeable classes. It provides a custom authentication strategy framework and base classes for building new providers based on industry standards. The framework handles the multi-step authentication
Normalizes diverse user data from different third-party sources into a consistent hash structure.
easyquotation is a Python library that provides access to Chinese stock market data, including real-time quotes, historical daily candlestick prices, exchange-traded fund details, and a stock code database sync utility. It retrieves live trading data from Chinese exchanges, A-shares, and Hong Kong listed stocks without requiring manual API key configuration, offering a unified interface to multiple public data feeds. The library combines several market data providers behind a single query interface, using asynchronous I/O to handle parallel requests and a polling engine that delivers sub-seco
Converts raw JSON responses from different sources into a consistent schema for downstream use.
This project is a manga source extension repository and content aggregator. It functions as an HTTP content scraping engine that retrieves images and metadata from external provider websites by parsing HTML and making network requests to display digital manga within a unified reader. The system utilizes a JSON extension repository to allow reader applications to discover and install third-party content providers. It employs an interface-based plugin framework that defines a common set of methods to ensure external sources remain compatible with a standardized internal format. The project cov
Transforms remote provider data into a standardized internal format to ensure consistent behavior across different sources.
JobSpy is a job board scraper and listing aggregator designed to extract employment opportunities from multiple websites and compile them into a unified dataset. It functions as a job search automation tool that programmatically collects vacancies based on keywords, locations, and specific filters. The project serves as a web scraping framework that utilizes proxy routing and user-agent rotation to bypass rate limits and avoid server-side blocking during data extraction. It includes infrastructure for concurrent request aggregation and schema-based data normalization to ensure consistent form
Standardizes heterogeneous HTML and JSON responses from multiple job boards into a consistent data schema.
This project is a bug bounty target dataset and security asset list. It serves as a structured repository of reachable network assets, domains, and applications eligible for security testing across multiple vulnerability disclosure programs. The dataset is designed to support bug bounty reconnaissance, attack surface mapping, and security target analysis. It provides organized scopes and target lists to help identify valid assets for security testing and vulnerability research workflows. The repository utilizes automated scraping pipelines and platform API integration to synchronize data. It
Standardizes inconsistent third-party data formats into a consistent internal structure for analysis.
OpenAddresses is an open-source geospatial data aggregator and directory that collects public domain and open-license address, parcel, and building datasets from governments and organizations worldwide. It functions as a global index and data warehouse for locating and distributing free geospatial records. The project operates a normalization pipeline that cleans and standardizes diverse source formats into a consistent global coordinate and attribute schema. This process includes a crowdsourced curation pipeline and programmatic quality validation to verify the spatial accuracy and formattin
Standardizes diverse source formats into a consistent global coordinate and attribute system for interoperability.
OSV is a distributed database and aggregator of open-source security advisories that uses a standardized vulnerability schema to track security flaws. It functions as a system for collecting and normalizing security data from diverse ecosystems into a single unified format, providing a web API for querying package vulnerabilities and submitting standardized records. The project distinguishes itself through a security advisory distribution service that supports bulk dataset exports via cloud storage buckets and incremental synchronization of security record updates. It also employs sandbox-bas
Converts diverse security advisories from multiple ecosystems into a single unified format using a standardized open source schema.
The server acts as a centralized ingestion engine designed to collect, normalize, and index distributed telemetry data. It functions as a backend processor that receives performance metrics, traces, and error logs from application agents, transforming them into structured documents for storage and analysis within search and analytics platforms. The system distinguishes itself through a high-throughput ingestion pipeline that utilizes asynchronous event processing and backpressure-aware flow control to maintain stability during traffic spikes. It employs modular, plugin-based transformation st
Standardizes heterogeneous performance events into a unified document structure for consistent searchability.
This library is a data processing framework for the JVM that provides a type-safe environment for manipulating structured tabular data. It functions as a comprehensive toolset for performing complex data transformations, aggregations, and statistical analysis, while leveraging compile-time schema validation to ensure structural integrity across data pipelines. The project distinguishes itself through its deep integration with interactive notebook environments and its use of compile-time code generation. By automatically deriving and enforcing schemas from raw inputs, it generates type-safe ac
Projects untyped input data onto predefined interfaces to enforce structural consistency.
यह ऑटोमेटेड सिक्योरिटी हेल्पर एक कमांड-लाइन यूटिलिटी है जिसे कई सुरक्षा विश्लेषण टूल्स को एक एकीकृत, कॉन्फ़िगरेशन-संचालित वर्कफ़्लो में ऑर्केस्ट्रेट करने के लिए डिज़ाइन किया गया है। यह एक केंद्रीय इंजन के रूप में कार्य करता है जो स्टेटिक एप्लिकेशन सिक्योरिटी टेस्टिंग और इंफ्रास्ट्रक्चर स्कैन निष्पादित करता है, और डेवलपमेंट लाइफसाइकिल में निरंतर भेद्यता (vulnerability) का पता लगाने के लिए विविध टूल आउटपुट को एक स्टैंडर्ड, मशीन-रीडेबल फॉर्मेट में एकत्रित करता है। यह टूल अपने मॉड्यूलर प्लगइन आर्किटेक्चर के कारण अलग है जो कस्टम या प्रोप्रायटरी स्कैनर्स के एकीकरण की अनुमति देता है, साथ ही एक बाहरी इंटेलिजेंस लेयर जो स्वचालित सुधार विश्लेषण के लिए AI सेवाओं को निष्कर्ष भेजती है। SARIF जैसे स्टैंडर्ड रिपोर्टिंग फॉर्मेट्स का समर्थन करके, यह विभिन्न रिपोर्टिंग प्लेटफ़ॉर्म और मॉनिटरिंग सिस्टम में सुरक्षा निष्कर्षों की निरंतर समीक्षा और ट्रैकिंग को सक्षम बनाता है। अपनी कोर ऑर्केस्ट्रेशन क्षमताओं के अलावा, यह फ्रेमवर्क ऑटोमेटेड अनुपालन सत्यापन और सुरक्षा नीति प्रवर्तन की सुविधा प्रदान करता है। यह सीधे लोकल डेवलपमेंट एनवायरनमेंट और CI/CD पाइपलाइनों में इंटीग्रेट होता है, जिससे टीमें प्रोजेक्ट आवश्यकताओं के अनुसार सुरक्षा विश्लेषण को तैयार करने के लिए विशिष्ट स्कैन पैरामीटर, गंभीरता सीमाएं और फाइल बहिष्करण को परिभाषित कर सकती हैं।
Standardizes heterogeneous security tool outputs into a consistent schema for unified data processing.