38 repositorios
Accessing data using explicit index labels.
Distinguishing note: Focuses on label-based access patterns.
Explore 38 awesome GitHub repositories matching data & databases · Label-Based Data Selection. Refine with filters or upvote what's useful.
Developer Roadmap es una plataforma impulsada por la comunidad que proporciona rutas de aprendizaje estructuradas basadas en grafos para la ingeniería de software. Sirve como un repositorio de conocimiento integral donde los dominios técnicos se organizan en secuencias visuales para guiar la adquisición de habilidades profesionales y el crecimiento profesional. El proyecto se distingue por un ecosistema colaborativo que permite a los usuarios contribuir con roadmaps, curar las mejores prácticas de la industria y mantener perfiles profesionales. Integra marcos de evaluación de diagnóstico para evaluar la competencia técnica, ayudando a los desarrolladores a identificar brechas de conocimiento y prepararse para entrevistas profesionales a través de secuencias de aprendizaje específicas. Más allá de sus capacidades principales de mapeo, la plataforma ofrece ideas de proyectos prácticos y tutoría interactiva para reforzar los conceptos de ingeniería. Proporciona un espacio centralizado para que la comunidad comparta recursos, rastree el desarrollo progresivo de habilidades y navegue por paisajes técnicos complejos.
Returns named data structures for improved code readability.
Pandas is a high-performance data analysis library that provides a comprehensive framework for manipulating, cleaning, and transforming structured datasets. It centers on labeled one-dimensional and two-dimensional data structures, allowing users to construct, filter, and reshape tabular information while performing complex arithmetic and logical operations. The library distinguishes itself through a sophisticated indexing engine that enables automatic data alignment during calculations and relational merges. By utilizing a block-based memory layout, it optimizes cache locality for vectorized
Provides intuitive access to data rows and columns via index labels.
This project is a comprehensive Chinese translation of a technical deep learning textbook, providing an educational resource on the theory and implementation of neural networks. It functions as a collaborative technical translation project designed to make complex academic AI literature accessible to non-English speakers. The project utilizes a community-driven translation model that integrates external suggestions and pull requests to refine linguistic accuracy and reduce bias. It employs standardized terminology mapping to ensure a uniform vocabulary throughout the translated content. To i
Provides guidance on using label smoothing to prevent neural networks from becoming overconfident in their predictions.
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
Implements label shift correction to adjust training data weighting when label distributions change.
Fastai is a high-level deep learning library built on PyTorch that provides a unified interface for managing the entire machine learning lifecycle. It functions as a comprehensive training toolkit, abstracting hardware management and automating complex training loops to simplify the construction and execution of neural network models. The framework is distinguished by its notebook-centric development environment and a type-dispatching data pipeline that automatically applies transformations based on input data formats. It emphasizes transfer learning through discriminative layer-wise optimiza
Adjusts target labels during training to prevent model overconfidence and improve generalization.
Label Studio es una herramienta de etiquetado de datos de múltiples tipos y un espacio de trabajo de anotación de datos diseñado para preparar conjuntos de datos para el entrenamiento de aprendizaje automático. Funciona como una tubería de datos integrada en la nube que importa datos sin procesar del almacenamiento, gestiona el proceso de anotación y exporta etiquetas a formatos estandarizados. La plataforma cuenta con un marco de integración de modelos de aprendizaje automático que se conecta a servidores de modelos externos. Esto permite la anotación asistida por modelos y el aprendizaje activo, lo que permite al sistema realizar un pre-etiquetado y refinar las predicciones basadas en la retroalimentación humana. El software proporciona herramientas de gestión de proyectos para organizar conjuntos de datos y asignar tareas a los usuarios a través del acceso basado en roles. Admite varios tipos de datos y utiliza adaptadores de almacenamiento agnósticos del backend para conectarse con sistemas de archivos locales o proveedores de almacenamiento en la nube. La aplicación se puede instalar mediante configuración manual o implementaciones con un solo clic en la infraestructura de la nube.
Integrates machine learning models to automatically generate initial annotations and refine training data.
Label Studio is a multi-modal data annotation platform designed to create and manage high-quality training datasets for machine learning. It functions as a self-hosted, containerized environment that supports secure, private deployments, including air-gapped configurations. The platform provides a centralized workspace for labeling diverse media types, such as images, text, audio, and time-series data, to support supervised and reinforcement learning workflows. The platform distinguishes itself through deep integration with machine learning backends, enabling active learning loops, automated
| Integrating machine learning models to provide automated predictions and active learning loops that accelerate the manual data annotation process.
labelImg is a computer vision labeling tool and image bounding box annotator used to create training datasets for machine learning models. It functions as a desktop utility for drawing rectangular labels on images and saving object coordinates and class names in common machine learning formats. The tool is specifically designed to generate and edit PascalVOC formatted XML files and create image labels in the text-based format required by YOLO object detection pipelines. The software covers object detection annotation and training data preparation, including the ability to manage label catego
Transforms image labels between XML, text, and CSV formats for use in cloud training platforms.
Grounded-Segment-Anything is a suite of specialized tools for multimodal visual analysis, text-based segmentation, and generative image editing. It integrates text-to-bounding-box detection and high-precision image segmentation masks to function as a text-based image segmenter and an automated visual labeling tool. The project enables text-driven image editing by identifying objects through natural language to perform inpainting and element replacement. It further extends visual analysis into three dimensions, allowing for 3D human reconstruction and the generation of 3D bounding boxes from t
Automatically creates image pseudo-labels, bounding boxes, and masks using recognition and captioning models.
CVAT es una herramienta de anotación de visión artificial de código abierto y una plataforma de gestión de conjuntos de datos visuales. Proporciona una interfaz autohospedada para etiquetar imágenes, videos y datos 3D para crear conjuntos de datos para modelos de IA de visión. La plataforma cuenta con etiquetado de datos asistido por IA para automatizar la creación de máscaras y cuadros delimitadores, utilizando un sistema de complementos para conectar modelos de aprendizaje automático externos. Incluye un sistema de garantía de calidad basado en consenso que verifica la precisión de las etiquetas comparando anotaciones independientes. El sistema cubre la gestión colaborativa de equipos, la organización de proyectos a través de la descomposición de tareas y la integración de almacenamiento en la nube remota. También proporciona una API REST para el control programático del flujo de trabajo y la importación y exportación de datos en formatos estándar de la industria.
Utilizes machine learning models to automatically generate initial bounding boxes and masks for visual data.
CVAT is an open-source, web-based platform designed for annotating images, videos, and 3D point clouds to create high-quality training datasets for machine learning. It functions as a containerized server that orchestrates the entire lifecycle of computer vision data, from initial task creation and manual labeling to quality assurance and final dataset export. The platform distinguishes itself through deep integration with machine learning models, allowing users to deploy custom AI models as serverless functions for automated object detection, tracking, and skeleton annotation. It supports co
Applies pre-trained machine learning models to generate initial annotations or suggest labels, reducing manual effort.
Dask es un framework de computación paralela y un programador de tareas distribuido diseñado para escalar flujos de trabajo de ciencia de datos en Python desde máquinas individuales hasta grandes clústeres. Funciona como un gestor de recursos de clúster que orquesta la lógica computacional representando las tareas y sus dependencias como grafos acíclicos dirigidos. Esta arquitectura permite al sistema automatizar la distribución de cargas de trabajo a través del hardware disponible mientras gestiona requisitos de ejecución complejos. El proyecto se distingue por un motor de evaluación perezosa que difiere las operaciones de datos hasta que se solicitan explícitamente, permitiendo la optimización global del grafo y una asignación eficiente de recursos. Incorpora el volcado de datos consciente de la memoria para evitar fallos del sistema al procesar conjuntos de datos que exceden la memoria disponible, y utiliza la fusión de grafos de tareas para combinar secuencias de operaciones en pasos de ejecución únicos, minimizando la sobrecarga de programación y la comunicación entre nodos. La plataforma proporciona una superficie de capacidades integral para el análisis de datos a gran escala, incluyendo soporte para aprendizaje automático distribuido, integración de computación de alto rendimiento y procesamiento de datos en paralelo. Ofrece herramientas extensas para la gestión del ciclo de vida del clúster, perfilado de rendimiento y monitoreo en tiempo real de la ejecución de tareas. Los usuarios pueden desplegar estos entornos en diversas infraestructuras, incluyendo hardware local, proveedores de nube, sistemas en contenedores y clústeres de computación de alto rendimiento.
Retrieves specific rows or columns using index labels, boolean masks, or partial-string matching to filter large datasets.
h2oGPT is a self-hosted platform designed for running large language models and executing retrieval-augmented generation workflows locally. It provides a comprehensive web interface that allows users to index private document collections into searchable databases, enabling context-aware question answering and summarization without exposing sensitive data to external services. The platform distinguishes itself by offering a modular architecture that supports both local model execution and connections to external inference servers. It facilitates the development of autonomous agents capable of
Generate labels for documents and provide tools to validate, correct, and manage annotation workflows for training machine learning models.
This project is a PyTorch-based generative framework and implementation template for building Generative Adversarial Networks. It provides a collection of foundational toolkits and architectural patterns designed to synthesize high-quality artificial data while focusing on the stability of adversarial neural networks. The framework distinguishes itself through a specialized toolkit for conditional image generation, which integrates discrete labels and auxiliary classification into the training process. It utilizes specific mechanisms to guide the generative process toward target classes by co
Provides utilities to adjust target labels with random noise to prevent discriminator overconfidence.
AutoGluon is an automated machine learning framework and multimodal library designed to automate the end-to-end pipeline from data preprocessing to high-accuracy model training and validation. It functions as an automated model trainer for tabular, image, text, and time series data, as well as a tool for time series forecasting and foundation model finetuning. The project is distinguished by its ability to jointly process and fuse different data types, allowing for the construction of multimodal neural networks that integrate images, text, and structured tables. It supports zero-shot inferenc
Increases model accuracy by iteratively predicting and filtering confident samples from unlabeled data to expand the training set.
This is a large-scale collection of curated Chinese text corpora designed for training natural language processing models. The project provides a variety of datasets, including a deduplicated archive of millions of news articles with titles and keywords, high-quality categorized question-and-answer pairs, and parallel translation corpora. The collection includes millions of aligned Chinese and English sentence pairs used for cross-lingual model training and machine translation development. It also contains filtered question-and-answer data organized by label for the construction of knowledge-
Links specific questions to corresponding answers using category labels for building knowledge-based systems.
This project is a Transformer machine translation model and attention-based neural network implemented using the PyTorch deep learning framework. It functions as a text-to-text translation tool designed to convert source sequences into target language text. The implementation focuses on neural machine translation, covering the development of sequence-to-sequence architectures. It includes the full pipeline for translation, from text sequence preprocessing and vocabulary creation to model training and text generation inference. The system incorporates standard transformer components such as a
Includes utilities for label smoothing to distribute probability mass and prevent overconfidence.
This project is an educational resource and a collection of instructional materials for performing data manipulation and statistical analysis using Python. It provides a comprehensive set of guides and code examples for using the Pandas, NumPy, and Matplotlib libraries to analyze structured data. The resource includes a dedicated guide for reshaping, cleaning, and aggregating tabular data and time series via Pandas, alongside a reference for high-performance vectorized operations and linear algebra using NumPy. It also features tutorials for creating publication-quality charts, distribution p
Explains how to use explicit axis labels to match and align data points across different tabular objects.
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
Runs deep learning models to automatically label datasets with GPU-accelerated pre- and post-processing.
X-AnyLabeling is an AI-assisted annotation platform and computer vision labeling tool. It provides an interface for annotating images and videos using polygons and rectangles to create training sets for machine learning models. The project distinguishes itself through the integration of external AI models via a plugin-based inference backend, allowing for automated generation of candidate labels and the execution of specialized tasks like pose estimation and object detection. It also functions as an optical character recognition tool for extracting text and layout information from document im
Translates annotations between different industry-standard data formats to ensure cross-tool compatibility.