38 repository-uri
Accessing data using explicit index labels.
Distinguishing note: Focuses on label-based access patterns.
Explore 38 awesome GitHub repositories matching data & databases · Label-Based Data Selection. Refine with filters or upvote what's useful.
Developer Roadmap este o platformă condusă de comunitate care oferă căi de învățare structurate, bazate pe grafuri, pentru ingineria software. Servește drept repository cuprinzător de cunoștințe unde domeniile tehnice sunt organizate în secvențe vizuale pentru a ghida dobândirea competențelor profesionale și creșterea în carieră. Proiectul se distinge printr-un ecosistem colaborativ care permite utilizatorilor să contribuie cu roadmap-uri, să cureție cele mai bune practici din industrie și să mențină profiluri profesionale. Acesta integrează framework-uri de evaluare diagnostică pentru a evalua competența tehnică, ajutând dezvoltatorii să identifice lacunele de cunoștințe și să se pregătească pentru interviurile profesionale prin secvențe de învățare țintite. Dincolo de capabilitățile sale de bază de mapare, platforma oferă idei practice de proiecte și tutorat interactiv pentru a consolida conceptele de inginerie. Oferă un spațiu centralizat pentru ca comunitatea să partajeze resurse, să urmărească dezvoltarea progresivă a competențelor și să navigheze prin peisaje tehnice complexe.
Returns named data structures for improved code readability.
Pandas is a high-performance data analysis library that provides a comprehensive framework for manipulating, cleaning, and transforming structured datasets. It centers on labeled one-dimensional and two-dimensional data structures, allowing users to construct, filter, and reshape tabular information while performing complex arithmetic and logical operations. The library distinguishes itself through a sophisticated indexing engine that enables automatic data alignment during calculations and relational merges. By utilizing a block-based memory layout, it optimizes cache locality for vectorized
Provides intuitive access to data rows and columns via index labels.
This project is a comprehensive Chinese translation of a technical deep learning textbook, providing an educational resource on the theory and implementation of neural networks. It functions as a collaborative technical translation project designed to make complex academic AI literature accessible to non-English speakers. The project utilizes a community-driven translation model that integrates external suggestions and pull requests to refine linguistic accuracy and reduce bias. It employs standardized terminology mapping to ensure a uniform vocabulary throughout the translated content. To i
Provides guidance on using label smoothing to prevent neural networks from becoming overconfident in their predictions.
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
Implements label shift correction to adjust training data weighting when label distributions change.
Fastai is a high-level deep learning library built on PyTorch that provides a unified interface for managing the entire machine learning lifecycle. It functions as a comprehensive training toolkit, abstracting hardware management and automating complex training loops to simplify the construction and execution of neural network models. The framework is distinguished by its notebook-centric development environment and a type-dispatching data pipeline that automatically applies transformations based on input data formats. It emphasizes transfer learning through discriminative layer-wise optimiza
Adjusts target labels during training to prevent model overconfidence and improve generalization.
Label Studio este un instrument de etichetare a datelor de mai multe tipuri și un spațiu de lucru pentru adnotarea datelor, conceput pentru a pregăti seturi de date pentru antrenarea învățării automate. Acesta funcționează ca un pipeline de date integrat în cloud care importă date brute din stocare, gestionează procesul de adnotare și exportă etichete în formate standardizate. Platforma dispune de un cadru de integrare a modelelor de învățare automată care se conectează la servere de modele externe. Acest lucru permite adnotarea asistată de model și învățarea activă, permițând sistemului să efectueze pre-etichetarea și să rafineze predicțiile pe baza feedback-ului uman. Software-ul oferă instrumente de gestionare a proiectelor pentru organizarea seturilor de date și atribuirea sarcinilor utilizatorilor prin acces bazat pe roluri. Suportă diverse tipuri de date și utilizează adaptoare de stocare agnostice față de backend pentru a se conecta cu sisteme de fișiere locale sau furnizori de stocare în cloud. Aplicația poate fi instalată prin configurare manuală sau implementări cu un singur clic pe infrastructura cloud.
Integrates machine learning models to automatically generate initial annotations and refine training data.
Label Studio is a multi-modal data annotation platform designed to create and manage high-quality training datasets for machine learning. It functions as a self-hosted, containerized environment that supports secure, private deployments, including air-gapped configurations. The platform provides a centralized workspace for labeling diverse media types, such as images, text, audio, and time-series data, to support supervised and reinforcement learning workflows. The platform distinguishes itself through deep integration with machine learning backends, enabling active learning loops, automated
| Integrating machine learning models to provide automated predictions and active learning loops that accelerate the manual data annotation process.
labelImg is a computer vision labeling tool and image bounding box annotator used to create training datasets for machine learning models. It functions as a desktop utility for drawing rectangular labels on images and saving object coordinates and class names in common machine learning formats. The tool is specifically designed to generate and edit PascalVOC formatted XML files and create image labels in the text-based format required by YOLO object detection pipelines. The software covers object detection annotation and training data preparation, including the ability to manage label catego
Transforms image labels between XML, text, and CSV formats for use in cloud training platforms.
Grounded-Segment-Anything is a suite of specialized tools for multimodal visual analysis, text-based segmentation, and generative image editing. It integrates text-to-bounding-box detection and high-precision image segmentation masks to function as a text-based image segmenter and an automated visual labeling tool. The project enables text-driven image editing by identifying objects through natural language to perform inpainting and element replacement. It further extends visual analysis into three dimensions, allowing for 3D human reconstruction and the generation of 3D bounding boxes from t
Automatically creates image pseudo-labels, bounding boxes, and masks using recognition and captioning models.
CVAT este un instrument open-source de adnotare a viziunii computerizate și o platformă de gestionare a seturilor de date vizuale. Acesta oferă o interfață auto-găzduită pentru etichetarea imaginilor, videoclipurilor și datelor 3D pentru a crea seturi de date pentru modelele AI de viziune. Platforma dispune de etichetarea datelor asistată de AI pentru a automatiza crearea de măști și casete de delimitare, utilizând un sistem de plug-in-uri pentru a conecta modele externe de învățare automată. Include un sistem de asigurare a calității bazat pe consens care verifică acuratețea etichetelor prin compararea adnotărilor independente. Sistemul acoperă gestionarea colaborativă a echipei, organizarea proiectelor prin descompunerea sarcinilor și integrarea stocării la distanță în cloud. De asemenea, oferă un API REST pentru controlul programatic al fluxului de lucru și importul/exportul de date în formate standard din industrie.
Utilizes machine learning models to automatically generate initial bounding boxes and masks for visual data.
CVAT is an open-source, web-based platform designed for annotating images, videos, and 3D point clouds to create high-quality training datasets for machine learning. It functions as a containerized server that orchestrates the entire lifecycle of computer vision data, from initial task creation and manual labeling to quality assurance and final dataset export. The platform distinguishes itself through deep integration with machine learning models, allowing users to deploy custom AI models as serverless functions for automated object detection, tracking, and skeleton annotation. It supports co
Applies pre-trained machine learning models to generate initial annotations or suggest labels, reducing manual effort.
Dask este un framework de calcul paralel și un scheduler de sarcini distribuit conceput pentru a scala fluxurile de lucru de știința datelor în Python de la mașini individuale la clustere mari. Acesta funcționează ca un manager de resurse de cluster care orchestrează logica computațională prin reprezentarea sarcinilor și a dependențelor acestora sub formă de grafuri aciclice direcționate. Această arhitectură permite sistemului să automatizeze distribuția sarcinilor de lucru pe hardware-ul disponibil, gestionând în același timp cerințe complexe de execuție. Proiectul se distinge printr-un motor de evaluare leneșă (lazy) care amână operațiunile pe date până când sunt solicitate explicit, permițând optimizarea globală a grafului și alocarea eficientă a resurselor. Acesta încorporează „spilling” de date conștient de memorie pentru a preveni blocarea sistemului la procesarea seturilor de date care depășesc memoria disponibilă și utilizează fuziunea grafului de sarcini pentru a combina secvențe de operațiuni în pași de execuție unici, minimizând overhead-ul de programare și comunicarea între noduri. Platforma oferă o suprafață cuprinzătoare de capabilități pentru analiza datelor la scară largă, inclusiv suport pentru învățare automată distribuită, integrare cu calcul de înaltă performanță și procesare paralelă a datelor. Oferă instrumente extinse pentru gestionarea ciclului de viață al clusterului, profilarea performanței și monitorizarea în timp real a execuției sarcinilor. Utilizatorii pot implementa aceste medii pe diverse infrastructuri, inclusiv hardware local, furnizori de cloud, sisteme containerizate și clustere de calcul de înaltă performanță.
Retrieves specific rows or columns using index labels, boolean masks, or partial-string matching to filter large datasets.
h2oGPT is a self-hosted platform designed for running large language models and executing retrieval-augmented generation workflows locally. It provides a comprehensive web interface that allows users to index private document collections into searchable databases, enabling context-aware question answering and summarization without exposing sensitive data to external services. The platform distinguishes itself by offering a modular architecture that supports both local model execution and connections to external inference servers. It facilitates the development of autonomous agents capable of
Generate labels for documents and provide tools to validate, correct, and manage annotation workflows for training machine learning models.
This project is a PyTorch-based generative framework and implementation template for building Generative Adversarial Networks. It provides a collection of foundational toolkits and architectural patterns designed to synthesize high-quality artificial data while focusing on the stability of adversarial neural networks. The framework distinguishes itself through a specialized toolkit for conditional image generation, which integrates discrete labels and auxiliary classification into the training process. It utilizes specific mechanisms to guide the generative process toward target classes by co
Provides utilities to adjust target labels with random noise to prevent discriminator overconfidence.
AutoGluon is an automated machine learning framework and multimodal library designed to automate the end-to-end pipeline from data preprocessing to high-accuracy model training and validation. It functions as an automated model trainer for tabular, image, text, and time series data, as well as a tool for time series forecasting and foundation model finetuning. The project is distinguished by its ability to jointly process and fuse different data types, allowing for the construction of multimodal neural networks that integrate images, text, and structured tables. It supports zero-shot inferenc
Increases model accuracy by iteratively predicting and filtering confident samples from unlabeled data to expand the training set.
This is a large-scale collection of curated Chinese text corpora designed for training natural language processing models. The project provides a variety of datasets, including a deduplicated archive of millions of news articles with titles and keywords, high-quality categorized question-and-answer pairs, and parallel translation corpora. The collection includes millions of aligned Chinese and English sentence pairs used for cross-lingual model training and machine translation development. It also contains filtered question-and-answer data organized by label for the construction of knowledge-
Links specific questions to corresponding answers using category labels for building knowledge-based systems.
This project is a Transformer machine translation model and attention-based neural network implemented using the PyTorch deep learning framework. It functions as a text-to-text translation tool designed to convert source sequences into target language text. The implementation focuses on neural machine translation, covering the development of sequence-to-sequence architectures. It includes the full pipeline for translation, from text sequence preprocessing and vocabulary creation to model training and text generation inference. The system incorporates standard transformer components such as a
Includes utilities for label smoothing to distribute probability mass and prevent overconfidence.
This project is an educational resource and a collection of instructional materials for performing data manipulation and statistical analysis using Python. It provides a comprehensive set of guides and code examples for using the Pandas, NumPy, and Matplotlib libraries to analyze structured data. The resource includes a dedicated guide for reshaping, cleaning, and aggregating tabular data and time series via Pandas, alongside a reference for high-performance vectorized operations and linear algebra using NumPy. It also features tutorials for creating publication-quality charts, distribution p
Explains how to use explicit axis labels to match and align data points across different tabular objects.
jetson-inference is a set of libraries and tools for executing optimized deep learning models on embedded GPU hardware. Its primary purpose is to enable real-time computer vision and AI inference at the edge with low latency and high throughput. The project distinguishes itself through high-performance streaming analytics and the ability to execute concurrent AI pipelines on auto-grade silicon. It provides specialized support for multi-sensor stream processing, utilizing zero-copy data transport to load camera frames directly into GPU memory. The codebase covers a broad surface of capabiliti
Runs deep learning models to automatically label datasets with GPU-accelerated pre- and post-processing.
X-AnyLabeling is an AI-assisted annotation platform and computer vision labeling tool. It provides an interface for annotating images and videos using polygons and rectangles to create training sets for machine learning models. The project distinguishes itself through the integration of external AI models via a plugin-based inference backend, allowing for automated generation of candidate labels and the execution of specialized tasks like pose estimation and object detection. It also functions as an optical character recognition tool for extracting text and layout information from document im
Translates annotations between different industry-standard data formats to ensure cross-tool compatibility.