8 repositorios
Internal data models that normalize diverse input formats into a consistent structure for uniform processing.
Explore 8 awesome GitHub repositories matching data & databases · Intermediate Representations. Refine with filters or upvote what's useful.
Docling is a modular framework designed for document parsing, layout analysis, and structured data extraction. It transforms unstructured files and web content into a unified, hierarchical data model that preserves the spatial and semantic relationships between text, tables, images, and layout elements. By normalizing diverse input formats into a consistent internal representation, the library enables uniform processing across various document types. The project distinguishes itself through a schema-driven approach that maps document regions to strongly-typed objects, ensuring data accuracy t
Normalizes diverse input formats into a consistent internal data model to enable uniform processing across different sources.
This project is a diagram-as-code tool that transforms declarative text scripts into professional visual representations. It functions as a technical documentation generator, allowing users to define nodes, connections, and hierarchical relationships through a domain-specific modeling language that integrates directly into version-controlled developer workflows. The tool distinguishes itself through a highly modular architecture that decouples diagram definitions from spatial positioning. It features a pluggable layout engine that supports multiple arrangement algorithms, alongside a styling
Normalizes input scripts into a unified intermediate graph representation to facilitate consistent cross-format rendering.
DataX is a distributed data integration framework and plugin-based ETL tool designed for synchronizing large datasets between heterogeneous sources and destinations. It functions as a JDBC data migration engine and offline synchronization tool, enabling the movement of data between relational databases, NoSQL stores, and object storage. The system utilizes a plugin-based connector architecture that decouples reader and writer logic, allowing it to map and transform data types across different storage engines using a standardized internal representation. This design supports heterogeneous data
Employs internal data models that normalize diverse input formats into a consistent structure for uniform processing across different storage engines.
dbt-core is a command-line framework for transforming data within a warehouse using modular SQL and version control. It functions as a data transformation engine that enables users to define data structures and business logic through declarative configuration files, which the system then compiles into executable code. By managing complex data dependencies through a directed acyclic graph, it ensures that transformation tasks execute in the correct order while maintaining a manifest-driven state to track lineage and execution history. The project distinguishes itself through an adapter-based d
Separates complex, multi-step data processing into dedicated models to simplify final reporting tables.
Clair is a container image vulnerability scanner and security analyzer. It performs static analysis of container images by matching package contents against vulnerability databases to identify security risks across different package formats and architectures. The project functions as both an image indexer and a vulnerability database manager. It processes container layers into intermediate representations to enable fast security lookups and synchronizes security metadata from multiple external sources to maintain a local registry. Capability areas include continuous security monitoring, whic
Transforms raw package data into a standardized intermediate representation to correlate source-level packages with binary versions.
Este proyecto es una utilidad de perfilado escrita en Rust que captura, transforma y visualiza pilas de llamadas de funciones para identificar cuellos de botella en el rendimiento del sistema. Funciona como un envoltorio de perfilador de muestreo que convierte datos de perfilado sin procesar en flamegraphs interactivos, que son mapas jerárquicos del consumo de recursos. La herramienta proporciona una integración especializada con el sistema de compilación de Rust para perfilar binarios y benchmarks de rendimiento. También permite configuraciones de perfilado personalizadas, permitiendo a los usuarios anular las herramientas de perfilado del sistema predeterminadas o las banderas de grabación para controlar cómo se recopilan los datos. La utilidad admite el monitoreo del rendimiento de aplicaciones y el análisis de ejecución binaria. Puede capturar datos de rendimiento conectándose a un ID de proceso activo para analizar una aplicación en ejecución sin necesidad de un reinicio.
Normalizes raw text output from various profiling tools into a consistent internal call stack representation.
Poml is a prompt management framework and templating engine designed for authoring, versioning, and rendering structured prompts for large language models. It uses a semantic markup language to organize prompts into reusable templates, combining them with dynamic context and data to generate formatted inputs. The system distinguishes itself by decoupling core prompt logic from final presentation through a stylesheet-based approach. It provides a dedicated JSON schema output generator to enforce strict, machine-parsable model responses and a configuration interface for managing function tool s
Transforms semantic XML-like syntax into a structured internal tree for consistent processing across different models.
Dokka is an extensible documentation engine designed to generate structured API reference materials for Kotlin projects. By parsing source code and comments, it functions as a static site generator that transforms codebases into readable documentation. It integrates directly into development workflows as a build system plugin, allowing for the automated creation of reference materials during the standard compilation process. The project distinguishes itself through a modular, plugin-driven processing pipeline that allows developers to modify the generation workflow, customize output formats,
Normalizes diverse source code structures into a unified model to facilitate consistent documentation output.