5 dépôts
Capabilities for performing bulk operations on database records efficiently.
Distinguishing note: Specifically targets high-performance bulk insertion and modification of records, distinct from single-row CRUD.
Explore 5 awesome GitHub repositories matching data & databases · Batch Data Processing. Refine with filters or upvote what's useful.
Twenty is a headless customer relationship management framework that enables developers to build, version, and deploy custom business applications using code. By utilizing a declarative approach to data modeling, the platform allows for the definition of custom objects, fields, and complex relationships directly within the source code. This schema-driven architecture automatically generates corresponding REST and GraphQL APIs, ensuring that data structures and interface components remain synchronized across development and production environments. The platform distinguishes itself through a m
Enables efficient bulk creation, updates, and deletions of records through optimized API requests.
GORM is a developer-focused object-relational mapping library for Go that provides a comprehensive data persistence framework. It serves as a database access layer, allowing developers to map application structures to database tables and perform CRUD operations using a fluent, type-safe query builder instead of writing raw SQL. The library distinguishes itself through its association-aware persistence, which automatically tracks and synchronizes complex entity relationships during database operations. It utilizes a driver-agnostic interface to maintain consistent behavior across various stora
To efficiently insert large number of records, pass a slice to the Create method. GORM will generate a single SQL statement to insert all the data and backfill primary key values, hook methods will be invoked too. It w
This project serves as a comprehensive technical reference for the architecture and design of data-intensive applications. It provides a structured analysis of the fundamental principles required to build reliable, scalable, and maintainable software systems, covering the core trade-offs inherent in modern data infrastructure. The repository explores the mechanics of distributed data management, including strategies for replication, partitioning, and achieving consensus across multiple nodes. It details the design of storage engines, indexing techniques, and transaction management models, whi
Processes large, fixed datasets as single units without modifying original input data.
Unstructured is an enterprise-grade data orchestration engine designed to transform raw, unstructured files into structured, machine-readable formats. It functions as a comprehensive platform for document ingestion, partitioning, and enrichment, specifically engineered to prepare complex data for retrieval-augmented generation and agentic AI workflows. The platform distinguishes itself through its sophisticated document processing strategies, which combine rule-based extraction with vision-language models to handle diverse file layouts, tables, and images. It provides a modular architecture t
Automates the ingestion and transformation of large volumes of files from diverse storage sources.
RoseDB est une base de données clé-valeur persistante et un moteur de stockage structuré en logs. Il fonctionne comme un système de stockage léger utilisant une table de hachage structurée en logs et une implémentation du moteur Bitcask pour offrir une récupération rapide des données et une persistance sur disque. Le système opère comme un moteur de transactions atomiques, regroupant plusieurs opérations de lecture et d'écriture en unités uniques pour maintenir la cohérence des données. Il gère les données via un modèle clé-valeur qui prend en charge les insertions, recherches et suppressions individuelles. La base de données offre des capacités de traitement de données par lots et de mises à jour atomiques. De plus, le projet inclut des fonctionnalités pour la gestion de contenu versionné et la traduction de contenu multilingue.
Optimizes performance by combining large volumes of read and write actions into single units.