73 dépôts
Systems for labeling and organizing data based on content or metadata.
Distinguishing note: Focuses on classifying video events for review purposes.
Explore 73 awesome GitHub repositories matching data & databases · Data Categorization. Refine with filters or upvote what's useful.
This repository is a collection of guides, notebooks, and recipes for implementing advanced prompting techniques and workflow patterns with large language models. It serves as a prompt engineering guide, an evaluation suite for scoring prompt quality, and a framework for orchestrating agents and integrating external tools. The project provides implementation patterns for building applications with Claude, specifically focusing on coordinating multiple models to split complex tasks between high-reasoning and high-efficiency agents. It includes technical demonstrations for multimodal data proce
Uses language models to assign unstructured text to predefined categories or labels for data organization.
Frigate is a self-hosted network video recorder that functions as a private, local AI-powered vision engine. It manages video streams by performing real-time object detection, tracking, and classification directly on local hardware, ensuring that security monitoring and activity recording remain independent of cloud services. The system distinguishes itself through a modular, hardware-accelerated video pipeline that offloads intensive decoding and machine learning inference to dedicated GPUs, NPUs, or specialized accelerators like Coral TPUs and Hailo modules. It utilizes state-based object t
Labels tracked objects to prioritize important video segments for review.
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
Categorizes complex items by applying multiple non-exclusive tags to a single input.
Fastai is a high-level deep learning library built on PyTorch that provides a unified interface for managing the entire machine learning lifecycle. It functions as a comprehensive training toolkit, abstracting hardware management and automating complex training loops to simplify the construction and execution of neural network models. The framework is distinguished by its notebook-centric development environment and a type-dispatching data pipeline that automatically applies transformations based on input data formats. It emphasizes transfer learning through discriminative layer-wise optimiza
Converts categorical integer labels into binary matrix representations for multi-class classification.
fastText is a library and framework for word embedding generation, text vectorization, and supervised text classification. It provides tools to transform raw text into fixed-length vector representations and to train models that assign category labels to sentences or documents. The system utilizes subword-based vectorization and character n-gram embeddings, allowing it to generate meaningful vectors for words that were not present during training. To manage resource usage, it includes a quantized language model implementation that employs product quantization and dimensionality reduction to d
Predicts the most likely labels or probabilities for text using a trained supervised model.
This project is a PyTorch-based computer vision library and deep learning image processing framework. It provides a collection of neural network architectures designed for visual analysis tasks, specifically focusing on image classification, object detection, and semantic segmentation. The toolset implements diverse methodologies for visual recognition, including anchor-free object detection, regional proposal networks, and heatmap-based keypoint estimation. It utilizes both convolutional neural networks for spatial feature extraction and transformer-based self-attention mechanisms to compute
Provides image classifiers that categorize visual input into predefined classes using CNNs and transformers.
This project provides a collection of machine learning algorithms implemented from scratch in Python. It serves as an educational resource using interactive notebooks that combine code with mathematical explanations to demonstrate the first principles of data science. The repository includes reference implementations for neural networks, such as multilayer perceptrons with backpropagation, and supervised learning models including linear and logistic regression. It also covers unsupervised learning through k-means clustering and Gaussian anomaly detection. The codebase covers a broad range of
Implements the one-vs-all approach to extend binary logistic regression to multiple categories.
This project is an open-source, privacy-focused web analytics platform designed for high-throughput data ingestion and multi-tenant data management. It provides a cookie-less tracking engine that captures visitor interactions using ephemeral request metadata, ensuring comprehensive traffic visibility while maintaining strict privacy standards. The architecture utilizes an event-driven ingestion pipeline and aggregated metric storage to decouple data collection from processing, enabling efficient long-term retrieval and responsive dashboard performance. What distinguishes this platform is its
Applies custom properties like region or product category to segment and filter traffic data at the network level.
This project is a deep learning framework designed for constructing, training, and deploying neural networks across diverse hardware environments. It functions as a high-performance tensor computation library that provides both imperative and symbolic programming interfaces, allowing developers to balance flexible, step-by-step model building with the efficiency of compiled computation graphs. The framework distinguishes itself through a hybrid execution engine that integrates declarative graph compilation with imperative runtime logic. It supports scalable, distributed training across multip
Supports assigning multiple categorical labels to single data items for complex classification tasks.
This project is a machine learning algorithm reference and implementation guide that provides theoretical foundations and code for supervised learning, deep learning, and natural language processing. It serves as a comprehensive toolkit for implementing predictive models and a technical reference for algorithm engineering. The project focuses on ensemble learning frameworks, including the construction of decision trees, random forests, and gradient boosting models. It also functions as a probabilistic graphical model library and an NLP algorithm reference, with specific implementations for se
Implements multi-class classification using a one-vs-rest strategy to determine the highest probability category.
CVAT is an open-source, web-based platform designed for annotating images, videos, and 3D point clouds to create high-quality training datasets for machine learning. It functions as a containerized server that orchestrates the entire lifecycle of computer vision data, from initial task creation and manual labeling to quality assurance and final dataset export. The platform distinguishes itself through deep integration with machine learning models, allowing users to deploy custom AI models as serverless functions for automated object detection, tracking, and skeleton annotation. It supports co
Allows users to assign descriptive tags to entire images or video frames to classify content without spatial coordinates.
This project serves as a comprehensive educational resource and technical handbook for engineers building applications powered by large language models. It provides a structured framework for mastering the principles of artificial intelligence engineering, covering the full lifecycle of model development from initial design to production deployment. The repository distinguishes itself by offering a deep dive into the practical implementation of advanced design patterns, including retrieval-augmented generation, agentic tool orchestration, and parameter-efficient model adaptation. It emphasize
Assigns categorical tags to database columns based on schema metadata to organize and identify sensitive or functional information.
This project is an educational resource providing practical code examples and implementations of machine learning algorithms using the Python language. It serves as a guide for constructing predictive pipelines, clustering models, and dimensionality reduction within the Scikit-Learn ecosystem. The repository includes comprehensive demonstrations for supervised and unsupervised learning, as well as detailed examples for implementing neural networks and deep architectures. It also provides practical guidance on exporting model parameters to JSON and wrapping trained models in web APIs for produ
Implements multi-class classification strategies, including one-vs-all and softmax regression.
react-native-firebase is a modular set of libraries that integrates Firebase cloud services into cross-platform mobile applications. It serves as a native-SDK wrapper, mapping JavaScript method calls to native iOS and Android Firebase SDKs via the React Native bridge to provide a type-safe interface for mobile backend integration. The project enables connectivity to a wide array of cloud services, including user authentication and identity management, NoSQL cloud databases with real-time synchronization, and scalable cloud storage for media files. It also provides tools for sending push notif
Provides a web interface to create and deploy custom image classification models to mobile devices.
This project is an agnostic model interpretability framework and explainability tool designed to provide local interpretable explanations for individual predictions. It functions as a local surrogate model that approximates the behavior of any machine learning classifier or regression model to identify the most influential features for a specific instance. The framework is designed to be model-agnostic, meaning it can explain predictions across tabular, text, and image data regardless of the underlying architecture. It employs local linear approximations and feature importance visualization t
Visualizes the specific segments or pixels of an image that most strongly drive classification decisions.
This project is a PyTorch-based generative framework and implementation template for building Generative Adversarial Networks. It provides a collection of foundational toolkits and architectural patterns designed to synthesize high-quality artificial data while focusing on the stability of adversarial neural networks. The framework distinguishes itself through a specialized toolkit for conditional image generation, which integrates discrete labels and auxiliary classification into the training process. It utilizes specific mechanisms to guide the generative process toward target classes by co
Allows training a discriminator to perform simultaneous classification and authenticity detection.
This project is an automated machine learning framework and toolkit designed for training and tuning custom models for classification, regression, and recommendations. It functions as a multimodal machine learning toolkit capable of processing and training models using a combination of text, image, audio, and sensor data. The framework distinguishes itself as a multimodal data processor that can handle and visualize large datasets on a single machine using column-oriented disk storage. It includes a core machine learning model generator that converts trained models into formats compatible wit
Provides automated analysis tools that categorize images into predefined labels.
This project is a static educational website and comprehensive curriculum focused on computer vision and deep learning. It serves as a public repository of instructional materials, lecture notes, and technical guides specifically detailing convolutional neural networks and visual recognition. The site is developed using static-site generation to host course documentation and student project directories. It provides structured academic resources that guide learners through image classification, generative modeling, and the implementation of various neural network architectures. The curriculum
Provides instructional materials on classifying images by analyzing pixel arrays as input data.
This project is a high-performance image transformation server and media optimization proxy designed to process, resize, and convert assets on the fly. It functions as a secure pipeline that fetches remote source files and applies transformations—such as cropping, watermarking, and visual filtering—directly through parameters defined in the request URL. The service distinguishes itself through a focus on secure, resource-aware delivery. It protects infrastructure by validating incoming requests with cryptographic signatures to prevent unauthorized access and enforces strict limits on file dim
Categorizes images using automated analysis to inform downstream processing decisions.
Crucix is an open-source intelligence system comprising an OSINT aggregator, a geospatial intelligence dashboard, and an LLM intelligence agent. It functions as a real-time signal monitor and automated alerting system designed to collect, analyze, and visualize geopolitical, economic, and satellite data from diverse open-source intelligence sources. The system utilizes large language models to synthesize intelligence feeds, generate actionable trade ideas, and classify signal priority with confidence scores. It features a geospatial visualization interface that plots intelligence events, such
Employs large language models to assign semantic labels and priority scores to raw intelligence data.