5 Repos
Tools for executing automated labeling or organization tasks across large datasets.
Distinct from Data Processing Tasks: Distinct from Data Processing Tasks: focuses on bulk annotation and labeling operations rather than general data manipulation.
Explore 5 awesome GitHub repositories matching data & databases · Bulk Data Processing. Refine with filters or upvote what's useful.
CVAT is an open-source, web-based platform designed for annotating images, videos, and 3D point clouds to create high-quality training datasets for machine learning. It functions as a containerized server that orchestrates the entire lifecycle of computer vision data, from initial task creation and manual labeling to quality assurance and final dataset export. The platform distinguishes itself through deep integration with machine learning models, allowing users to deploy custom AI models as serverless functions for automated object detection, tracking, and skeleton annotation. It supports co
Enables bulk labeling and data organization across multiple files or frames using automated scripts.
Records is a SQL database client designed for executing raw queries and managing result sets through a simplified interface. It provides a parameterized SQL executor to bind values to placeholders, ensuring safe data handling and preventing injection attacks, alongside a database transaction manager for grouping operations into atomic units. The project includes a dedicated command-line interface for running database statements and exporting query results directly to local files. This tooling allows for the conversion of SQL result sets into multiple serialization formats, including CSV, JSON
Executes the same SQL query multiple times with different parameters to handle large datasets efficiently.
HomeBox is a self-hosted home inventory manager designed for tracking physical belongings and household assets. It functions as a digital catalog for creating structured databases of objects, including records of locations, categories, and purchase history. The system distinguishes itself through the use of QR code generation to link physical objects to digital records and the support of hierarchical location mapping to track assets across nested environments. It further enables automation via a REST API and centralizes access management through OpenID Connect integration for user authenticat
Supports large-scale imports and exports of inventory records using comma-separated values for efficient batch updates.
GAM is a command-line tool for administering Google Workspace and Cloud Identity. It translates command-line arguments into structured API calls, enabling administrators to manage users, groups, organizational units, and domain settings across a Google Workspace environment. The tool handles authentication through OAuth2 flows, service accounts, and workload identity federation, and supports multi-tenant configurations for managing multiple domains or cloud projects from a single installation. GAM distinguishes itself through its batch processing and automation capabilities. It can process la
Performs bulk updates and exports across Google Workspace services using data sourced from CSV files.
imapsync is an IMAP mailbox synchronization tool and data migration utility designed to copy and synchronize email messages and folder structures between two IMAP servers. It functions as a migration manager for transferring bulk email accounts between different hosting providers, preserving folder hierarchies and message metadata. The tool is distinguished by its ability to automate the transfer of multiple mailboxes sequentially from delimited lists using administrative credentials or user-specific authentication. It supports advanced authentication methods including OAuth2 and XOAUTH2, and
Transfers bulk email accounts between different hosting providers using administrative or user credentials.