15 Repos
Standardized structures and schemas for organizing training data used in model development.
Explore 15 awesome GitHub repositories matching data & databases · Dataset Formats. Refine with filters or upvote what's useful.
Keras is a high-level deep learning API used to design, build, and train neural networks for tasks such as computer vision, natural language processing, and time series forecasting. It provides a framework for defining model architectures and optimizing weights through a structured interface. The project is defined by a backend-agnostic design that allows the same model code to run across different compute engines. This multi-backend execution enables users to swap underlying engines to optimize for specific hardware or performance requirements. The system supports distributed model training
Supports various standardized dataset formats for organizing training data used in model development.
GPT-SoVITS is a text-to-speech synthesis engine and voice cloning toolkit designed for generating natural-sounding human speech. It functions as a neural audio processing pipeline that maps input text to high-fidelity audio waveforms, utilizing conditional variational autoencoders and flow-based decoders to ensure expressive output. The platform distinguishes itself through its ability to perform few-shot voice cloning and cross-lingual speech generation, allowing users to maintain a specific speaker's vocal identity and emotional delivery across multiple languages. By employing cross-modal l
Defines standardized data structures for organizing and preparing audio training sets.
Supervision is a computer vision toolset for normalizing model outputs, managing datasets, and visualizing annotations. It provides a framework to convert predictions from various classification and detection models into a standardized data format to ensure interoperability across different computer vision pipelines. The library features a post-processor for filtering, counting, and tracking detected objects across image frames and video streams. It includes capabilities for large image tiling to improve the detection of small objects and tools for assigning persistent identities to objects t
Transforms computer vision datasets between different common formats to ensure compatibility between training and evaluation frameworks.
Detectron2 is a PyTorch computer vision framework and visual recognition platform designed for training and deploying models for object detection, image segmentation, and visual recognition. It provides a research-oriented environment for training complex vision models with multi-GPU acceleration. The project includes a specialized object detection library for identifying and locating multiple objects via bounding boxes, as well as an image segmentation toolkit for creating pixel-level masks through instance, semantic, and panoptic segmentation. Additionally, it features a human pose estimati
Provides tools to convert raw dataset annotations into formats required for instance, panoptic, or semantic segmentation.
Fairseq is a PyTorch toolkit for sequence-to-sequence modeling, specializing in neural machine translation, automatic speech recognition, and large-scale language model training. It provides a framework for processing and aligning diverse data sources, including text, audio, and video, to support tasks such as speech-to-text conversion and multimodal sequence learning. The project is distinguished by its distributed training capabilities, which utilize parameter sharding, mixed-precision training, and CPU offloading to handle models that exceed single-device memory. It also includes specializ
Processes raw text and alignment files into a binary format for efficient loading during training.
WeClone is an end-to-end framework designed for the creation, training, and deployment of personalized conversational AI digital twins. By fine-tuning large language models on individual chat history, the platform enables the replication of unique communication styles, speech patterns, and conversational habits. The system manages the entire lifecycle of these digital avatars, from initial data preparation to final integration into messaging platforms for real-time interaction. The platform distinguishes itself through a comprehensive suite of data processing utilities that prepare raw messag
Structures raw chat logs into coherent training sequences by grouping consecutive exchanges based on temporal proximity.
Presto is a distributed SQL query engine designed for high-performance analytical processing across heterogeneous data sources. It functions as a data federation platform and massively parallel processing engine, allowing users to execute interactive queries against diverse storage systems without requiring data migration. By mapping remote metadata and structures to a unified relational namespace, it enables seamless cross-platform analysis through a standard SQL interface. The engine distinguishes itself through a pluggable connector architecture and a shared-nothing distributed processing
Reads and writes data stored in columnar formats by mapping dataset fragments to parallel processing splits.
Labelme ist ein Python-basiertes Bildannotationstool, das zur Erstellung von Datensätzen für Computer Vision verwendet wird. Es dient als visueller Editor für semantische Segmentierung und ermöglicht es Benutzern, Objektgrenzen mithilfe von Polygonen, Rechtecken, Punkten und Kreisen zu definieren. Die Anwendung fungiert auch als Annotator für multispektrale Bilder und unterstützt TIFF-Dateien mit hoher Bittiefe, die in der Satelliten- und wissenschaftlichen Bildgebung verwendet werden. Das Tool integriert KI-gestützte Labeling-Funktionen, um die Erstellung von Masken und Polygonen zu automatisieren. Diese Funktionen ermöglichen die Formgenerierung durch Texteingaben oder interaktive Punktauswahlen, die Grenzen basierend auf vom Benutzer platzierten positiven und negativen Punkten vorschlagen. Die Software deckt ein breites Spektrum an Datenverwaltungs- und Annotationsaufgaben ab, einschließlich der Erstellung dichter Pixelmasken, rotierter Bounding Boxes und Videobildsequenzierung. Sie enthält eine Pipeline zur Übersetzung der internen JSON-Zustandspersistenz in Standard-Datensatzformate wie COCO und Pascal VOC. Zu den weiteren Funktionen gehören Klassifizierungs-Flags auf Bildebene, Geometrie-Verfeinerungstools und der Batch-Import von Bildern.
Provides a pipeline for translating internal JSON annotation data into standard COCO and Pascal VOC formats.
PaddleDetection is an object detection framework designed for the end-to-end development, training, and deployment of computer vision models. It provides a comprehensive library of modular neural network architectures and pipelines that support object detection, instance segmentation, and multi-object tracking tasks. The project distinguishes itself through a configuration-driven approach that decouples model components like backbones and heads, allowing for the flexible assembly of custom vision workflows. It incorporates advanced techniques such as anchor-free detection logic, joint detecti
Implements parsing logic to load and register proprietary data formats for training.
MMSegmentation is an open-source semantic segmentation toolbox built on PyTorch that provides a modular, configurable framework for building, training, evaluating, and deploying segmentation models. At its core, it offers a config-driven pipeline that assembles training, evaluation, and inference workflows by parsing hierarchical configuration files, with a modular component registry that enables plug-and-play composition of neural network modules, optimizers, datasets, and metrics. The framework supports the full model lifecycle through a unified runner interface that controls training, testi
Transforms raw dataset annotations into the expected label format for training and evaluation.
X-AnyLabeling is an AI-assisted annotation platform and computer vision labeling tool. It provides an interface for annotating images and videos using polygons and rectangles to create training sets for machine learning models. The project distinguishes itself through the integration of external AI models via a plugin-based inference backend, allowing for automated generation of candidate labels and the execution of specialized tasks like pose estimation and object detection. It also functions as an optical character recognition tool for extracting text and layout information from document im
Provides utilities for translating computer vision annotations between various industry-standard formats to ensure cross-platform compatibility.
RF-DETR is a Python library for training and deploying object detection, instance segmentation, and keypoint detection models built on a vision transformer architecture. It provides a unified command-line interface and Python API for the full workflow, from fine-tuning pretrained checkpoints on custom datasets to running inference on images, video files, and live camera streams. The project supports training on datasets in COCO or YOLO format, with automatic format detection and configurable augmentation pipelines. Models can be exported to ONNX, TFLite, or TensorRT for deployment across edge
Transforms datasets between COCO and YOLO formats using the supervision library for interoperability.
Muzic ist eine Deep-Learning-Plattform und ein Framework für KI-gestützte Musikanalyse, Komposition und Synthese. Es fungiert als Musikgenerierungs-Framework und Analysetool, das große Sprachmodelle und autonome Agenten nutzt, um die Erstellung und Interpretation symbolischer und auditiver Musik zu orchestrieren. Das Projekt zeichnet sich durch seine cross-modale Fähigkeiten aus, bei denen natürliche Sprache und symbolische Musik in einen gemeinsamen Embedding-Raum für Zero-Shot-Klassifizierung und Informationsabruf abgebildet werden. Es verwendet eine Vielzahl spezialisierter Architekturen, einschließlich Diffusions-Frameworks für die Audiosynthese, Dual-Grain-Aufmerksamkeitsmechanismen für strukturelle Konsistenz bei langen Sequenzen und ein hybrides System, das musiktheoretische Regeln mit neuronalen Netzwerken kombiniert. Die Plattform deckt ein breites Spektrum an Funktionen ab, einschließlich der Generierung von MIDI-Sequenzen aus Text und Liedtexten, neuronaler Gesangssynthese und automatisierter Liedtext-Transkription. Sie bietet zudem Tools für die Modellierung von Musikstrukturen, attributbasierte symbolische Generierung und die Orchestrierung externer Musiktools über autonome Agenten. Unterstützende Dienstprogramme umfassen Data-Engineering-Pipelines für die MIDI-Binarisierung im großen Maßstab, Datensatz-Kodierung und Audiosignalverarbeitung für die Extraktion von Melodienoten und die Ausrichtung von Sprache zu Phonemen.
Transforms raw MIDI data into specialized binarized formats to optimize large-scale model training and inference.
mmocr ist ein auf PyTorch basierendes Framework für optische Zeichenerkennung (OCR), das für das Training und Deployment von Modellen zur Texterkennung, -identifizierung und Extraktion von Schlüsselinformationen entwickelt wurde. Es dient als umfassende Toolbox für die Erkennung und Identifizierung von Text in Szenen und bietet spezialisierte Bibliotheken zum Lokalisieren von Textregionen und zum Konvertieren von visuellem Text in maschinell kodierte Strings. Das Projekt zeichnet sich durch ein Forschungs-Framework für die Extraktion von Schlüsselinformationen und fortgeschrittene Text-Spotting-Funktionen aus. Dazu gehören punktbasiertes Spotting mittels Transformern und die Verwendung parametrisierter Bezier-Kurven, um beliebig geformten Text zu identifizieren und zu transkribieren. Das Framework deckt ein breites Spektrum an Computer-Vision-Funktionen ab, einschließlich Daten-Pipeline-Management zur Augmentierung und Standardisierung diverser OCR-Datensätze, Modelltraining mit verteilter Skalierung und Performance-Evaluierung unter Verwendung von Standard-OCR-Metriken. Es bietet zudem Dienstprogramme für geometrische Polygon-Manipulation und Ergebnisvisualisierung zur Überprüfung von Vorhersagen gegen Ground-Truth-Annotationen. Das System ist in Python implementiert und unterstützt die Installation über Docker-Umgebungs-Packaging.
Translates diverse dataset formats into a standardized internal representation for training and evaluation compatibility.
This project is a deep learning implementation of the RetinaNet architecture for detecting and classifying objects within images. Built as a Keras object detection framework and a TensorFlow computer vision tool, it provides a complete neural network implementation based on the RetinaNet paper. The framework includes specialized components such as a Feature Pyramid Network and a focal loss function to handle object detection. It features a configurable backbone architecture and anchor-based bounding boxes to predict object locations across varying scales and aspect ratios. The toolset covers
Transforms raw XML and CSV dataset annotations into standardized label formats required for training.