Why is anthropics/anthropic-cookbook a recommended Data Categorization GitHub Repositories repository?

Uses language models to assign unstructured text to predefined categories or labels for data organization.

Why is blakeblackshear/frigate a recommended Data Categorization GitHub Repositories repository?

Labels tracked objects to prioritize important video segments for review.

Why is d2l-ai/d2l-en a recommended Data Categorization GitHub Repositories repository?

Categorizes complex items by applying multiple non-exclusive tags to a single input.

Why is fastai/fastai a recommended Data Categorization GitHub Repositories repository?

Converts categorical integer labels into binary matrix representations for multi-class classification.

Why is facebookresearch/fasttext a recommended Data Categorization GitHub Repositories repository?

Predicts the most likely labels or probabilities for text using a trained supervised model.

Why is wzmiaomiao/deep-learning-for-image-processing a recommended Data Categorization GitHub Repositories repository?

Provides image classifiers that categorize visual input into predefined classes using CNNs and transformers.

Why is trekhleb/homemade-machine-learning a recommended Data Categorization GitHub Repositories repository?

Implements the one-vs-all approach to extend binary logistic regression to multiple categories.

Why is plausible/analytics a recommended Data Categorization GitHub Repositories repository?

Applies custom properties like region or product category to segment and filter traffic data at the network level.

Why is apache/mxnet a recommended Data Categorization GitHub Repositories repository?

Supports assigning multiple categorical labels to single data items for complex classification tasks.

Why is nlp-love/ml-nlp a recommended Data Categorization GitHub Repositories repository?

Implements multi-class classification using a one-vs-rest strategy to determine the highest probability category.

73 dépôts

Awesome GitHub RepositoriesData Categorization

Systems for labeling and organizing data based on content or metadata.

Distinguishing note: Focuses on classifying video events for review purposes.

Explore 73 awesome GitHub repositories matching data & databases · Data Categorization. Refine with filters or upvote what's useful.

Trouvez les meilleurs dépôts grâce à l'IA.Nous recherchons les dépôts les plus pertinents grâce à l'IA.

anthropics/anthropic-cookbook
anthropics/anthropic-cookbook
45,984Voir sur GitHub
This repository is a collection of guides, notebooks, and recipes for implementing advanced prompting techniques and workflow patterns with large language models. It serves as a prompt engineering guide, an evaluation suite for scoring prompt quality, and a framework for orchestrating agents and integrating external tools. The project provides implementation patterns for building applications with Claude, specifically focusing on coordinating multiple models to split complex tasks between high-reasoning and high-efficiency agents. It includes technical demonstrations for multimodal data proce
Uses language models to assign unstructured text to predefined categories or labels for data organization.
Jupyter Notebook
Voir sur GitHub45,984
blakeblackshear/frigate
blakeblackshear/frigate
33,778Voir sur GitHub
Frigate is a self-hosted network video recorder that functions as a private, local AI-powered vision engine. It manages video streams by performing real-time object detection, tracking, and classification directly on local hardware, ensuring that security monitoring and activity recording remain independent of cloud services. The system distinguishes itself through a modular, hardware-accelerated video pipeline that offloads intensive decoding and machine learning inference to dedicated GPUs, NPUs, or specialized accelerators like Coral TPUs and Hailo modules. It utilizes state-based object t
Labels tracked objects to prioritize important video segments for review.
TypeScriptaicameragoogle-coral
Voir sur GitHub33,778
d2l-ai/d2l-en
d2l-ai/d2l-en
29,001Voir sur GitHub
This project is an educational platform and research toolkit designed to teach deep learning through a combination of mathematical theory, visual diagrams, and executable code. It provides a comprehensive environment for building, training, and evaluating neural networks, grounding complex concepts in interactive computational notebooks that allow for hands-on experimentation. The framework distinguishes itself by interleaving theoretical foundations—including linear algebra, calculus, and probability—with practical implementations across multiple industry-standard libraries. It supports flex
Categorizes complex items by applying multiple non-exclusive tags to a single input.
Pythonbookcomputer-visiondata-science
Voir sur GitHub29,001
fastai/fastai
fastai/fastai
27,862Voir sur GitHub
Fastai is a high-level deep learning library built on PyTorch that provides a unified interface for managing the entire machine learning lifecycle. It functions as a comprehensive training toolkit, abstracting hardware management and automating complex training loops to simplify the construction and execution of neural network models. The framework is distinguished by its notebook-centric development environment and a type-dispatching data pipeline that automatically applies transformations based on input data formats. It emphasizes transfer learning through discriminative layer-wise optimiza
Converts categorical integer labels into binary matrix representations for multi-class classification.
Jupyter Notebookcolabdeep-learningfastai
Voir sur GitHub27,862
facebookresearch/fasttext
facebookresearch/fastText
26,543Voir sur GitHub
fastText is a library and framework for word embedding generation, text vectorization, and supervised text classification. It provides tools to transform raw text into fixed-length vector representations and to train models that assign category labels to sentences or documents. The system utilizes subword-based vectorization and character n-gram embeddings, allowing it to generate meaningful vectors for words that were not present during training. To manage resource usage, it includes a quantized language model implementation that employs product quantization and dimensionality reduction to d
Predicts the most likely labels or probabilities for text using a trained supervised model.
HTML
Voir sur GitHub26,543
wzmiaomiao/deep-learning-for-image-processing
WZMIAOMIAO/deep-learning-for-image-processing
26,281Voir sur GitHub
This project is a PyTorch-based computer vision library and deep learning image processing framework. It provides a collection of neural network architectures designed for visual analysis tasks, specifically focusing on image classification, object detection, and semantic segmentation. The toolset implements diverse methodologies for visual recognition, including anchor-free object detection, regional proposal networks, and heatmap-based keypoint estimation. It utilizes both convolutional neural networks for spatial feature extraction and transformer-based self-attention mechanisms to compute
Provides image classifiers that categorize visual input into predefined classes using CNNs and transformers.
Pythonbilibiliclassificationdeep-learning
Voir sur GitHub26,281
trekhleb/homemade-machine-learning
trekhleb/homemade-machine-learning
24,608Voir sur GitHub
This project provides a collection of machine learning algorithms implemented from scratch in Python. It serves as an educational resource using interactive notebooks that combine code with mathematical explanations to demonstrate the first principles of data science. The repository includes reference implementations for neural networks, such as multilayer perceptrons with backpropagation, and supervised learning models including linear and logistic regression. It also covers unsupervised learning through k-means clustering and Gaussian anomaly detection. The codebase covers a broad range of
Implements the one-vs-all approach to extend binary logistic regression to multiple categories.
Jupyter Notebook
Voir sur GitHub24,608
plausible/analytics
plausible/analytics
24,245Voir sur GitHub
This project is an open-source, privacy-focused web analytics platform designed for high-throughput data ingestion and multi-tenant data management. It provides a cookie-less tracking engine that captures visitor interactions using ephemeral request metadata, ensuring comprehensive traffic visibility while maintaining strict privacy standards. The architecture utilizes an event-driven ingestion pipeline and aggregated metric storage to decouple data collection from processing, enabling efficient long-term retrieval and responsive dashboard performance. What distinguishes this platform is its
Applies custom properties like region or product category to segment and filter traffic data at the network level.
Elixiranalyticschartsclickhouse
Voir sur GitHub24,245
apache/mxnet
apache/mxnet
20,829Voir sur GitHub
This project is a deep learning framework designed for constructing, training, and deploying neural networks across diverse hardware environments. It functions as a high-performance tensor computation library that provides both imperative and symbolic programming interfaces, allowing developers to balance flexible, step-by-step model building with the efficiency of compiled computation graphs. The framework distinguishes itself through a hybrid execution engine that integrates declarative graph compilation with imperative runtime logic. It supports scalable, distributed training across multip
Supports assigning multiple categorical labels to single data items for complex classification tasks.
C++mxnet
Voir sur GitHub20,829
nlp-love/ml-nlp
NLP-LOVE/ML-NLP
17,725Voir sur GitHub
This project is a machine learning algorithm reference and implementation guide that provides theoretical foundations and code for supervised learning, deep learning, and natural language processing. It serves as a comprehensive toolkit for implementing predictive models and a technical reference for algorithm engineering. The project focuses on ensemble learning frameworks, including the construction of decision trees, random forests, and gradient boosting models. It also functions as a probabilistic graphical model library and an NLP algorithm reference, with specific implementations for se
Implements multi-class classification using a one-vs-rest strategy to determine the highest probability category.
Jupyter Notebookdeep-learningmachine-learningnlp
Voir sur GitHub17,725
cvat-ai/cvat
cvat-ai/cvat
15,317Voir sur GitHub
CVAT is an open-source, web-based platform designed for annotating images, videos, and 3D point clouds to create high-quality training datasets for machine learning. It functions as a containerized server that orchestrates the entire lifecycle of computer vision data, from initial task creation and manual labeling to quality assurance and final dataset export. The platform distinguishes itself through deep integration with machine learning models, allowing users to deploy custom AI models as serverless functions for automated object detection, tracking, and skeleton annotation. It supports co
Allows users to assign descriptive tags to entire images or video frames to classify content without spatial coordinates.
Pythonannotationannotation-toolannotations
Voir sur GitHub15,317
chiphuyen/aie-book
chiphuyen/aie-book
13,779Voir sur GitHub
This project serves as a comprehensive educational resource and technical handbook for engineers building applications powered by large language models. It provides a structured framework for mastering the principles of artificial intelligence engineering, covering the full lifecycle of model development from initial design to production deployment. The repository distinguishes itself by offering a deep dive into the practical implementation of advanced design patterns, including retrieval-augmented generation, agentic tool orchestration, and parameter-efficient model adaptation. It emphasize
Assigns categorical tags to database columns based on schema metadata to organize and identify sensitive or functional information.
Jupyter Notebook
Voir sur GitHub13,779
rasbt/python-machine-learning-book
rasbt/python-machine-learning-book
12,614Voir sur GitHub
This project is an educational resource providing practical code examples and implementations of machine learning algorithms using the Python language. It serves as a guide for constructing predictive pipelines, clustering models, and dimensionality reduction within the Scikit-Learn ecosystem. The repository includes comprehensive demonstrations for supervised and unsupervised learning, as well as detailed examples for implementing neural networks and deep architectures. It also provides practical guidance on exporting model parameters to JSON and wrapping trained models in web APIs for produ
Implements multi-class classification strategies, including one-vs-all and softmax regression.
Jupyter Notebook
Voir sur GitHub12,614
invertase/react-native-firebase
invertase/react-native-firebase
12,291Voir sur GitHub
react-native-firebase is a modular set of libraries that integrates Firebase cloud services into cross-platform mobile applications. It serves as a native-SDK wrapper, mapping JavaScript method calls to native iOS and Android Firebase SDKs via the React Native bridge to provide a type-safe interface for mobile backend integration. The project enables connectivity to a wide array of cloud services, including user authentication and identity management, NoSQL cloud databases with real-time synchronization, and scalable cloud storage for media files. It also provides tools for sending push notif
Provides a web interface to create and deploy custom image classification models to mobile devices.
TypeScript
Voir sur GitHub12,291
marcotcr/lime
marcotcr/lime
12,142Voir sur GitHub
This project is an agnostic model interpretability framework and explainability tool designed to provide local interpretable explanations for individual predictions. It functions as a local surrogate model that approximates the behavior of any machine learning classifier or regression model to identify the most influential features for a specific instance. The framework is designed to be model-agnostic, meaning it can explain predictions across tabular, text, and image data regardless of the underlying architecture. It employs local linear approximations and feature importance visualization t
Visualizes the specific segments or pixels of an image that most strongly drive classification decisions.
JavaScript
Voir sur GitHub12,142
soumith/ganhacks
soumith/ganhacks
11,619Voir sur GitHub
This project is a PyTorch-based generative framework and implementation template for building Generative Adversarial Networks. It provides a collection of foundational toolkits and architectural patterns designed to synthesize high-quality artificial data while focusing on the stability of adversarial neural networks. The framework distinguishes itself through a specialized toolkit for conditional image generation, which integrates discrete labels and auxiliary classification into the training process. It utilizes specific mechanisms to guide the generative process toward target classes by co
Allows training a discriminator to perform simultaneous classification and authenticity detection.
Voir sur GitHub11,619
apple/turicreate
apple/turicreate
11,171Voir sur GitHub
This project is an automated machine learning framework and toolkit designed for training and tuning custom models for classification, regression, and recommendations. It functions as a multimodal machine learning toolkit capable of processing and training models using a combination of text, image, audio, and sensor data. The framework distinguishes itself as a multimodal data processor that can handle and visualize large datasets on a single machine using column-oriented disk storage. It includes a core machine learning model generator that converts trained models into formats compatible wit
Provides automated analysis tools that categorize images into predefined labels.
C++
Voir sur GitHub11,171
cs231n/cs231n.github.io
cs231n/cs231n.github.io
10,923Voir sur GitHub
This project is a static educational website and comprehensive curriculum focused on computer vision and deep learning. It serves as a public repository of instructional materials, lecture notes, and technical guides specifically detailing convolutional neural networks and visual recognition. The site is developed using static-site generation to host course documentation and student project directories. It provides structured academic resources that guide learners through image classification, generative modeling, and the implementation of various neural network architectures. The curriculum
Provides instructional materials on classifying images by analyzing pixel arrays as input data.
Jupyter Notebook
Voir sur GitHub10,923
imgproxy/imgproxy
imgproxy/imgproxy
10,876Voir sur GitHub
This project is a high-performance image transformation server and media optimization proxy designed to process, resize, and convert assets on the fly. It functions as a secure pipeline that fetches remote source files and applies transformations—such as cropping, watermarking, and visual filtering—directly through parameters defined in the request URL. The service distinguishes itself through a focus on secure, resource-aware delivery. It protects infrastructure by validating incoming requests with cryptographic signatures to prevent unauthorized access and enforces strict limits on file dim
Categorizes images using automated analysis to inform downstream processing decisions.
Goavifcrop-imagedocker
Voir sur GitHub10,876
calesthio/crucix
calesthio/Crucix
10,311Voir sur GitHub
Crucix is an open-source intelligence system comprising an OSINT aggregator, a geospatial intelligence dashboard, and an LLM intelligence agent. It functions as a real-time signal monitor and automated alerting system designed to collect, analyze, and visualize geopolitical, economic, and satellite data from diverse open-source intelligence sources. The system utilizes large language models to synthesize intelligence feeds, generate actionable trade ideas, and classify signal priority with confidence scores. It features a geospatial visualization interface that plots intelligence events, such
Employs large language models to assign semantic labels and priority scores to raw intelligence data.
JavaScriptaiintelligenceosint
Voir sur GitHub10,311

Awesome Data Categorization GitHub Repositories

anthropics/anthropic-cookbook

blakeblackshear/frigate

d2l-ai/d2l-en

fastai/fastai

facebookresearch/fastText

WZMIAOMIAO/deep-learning-for-image-processing

trekhleb/homemade-machine-learning

plausible/analytics

apache/mxnet

NLP-LOVE/ML-NLP

cvat-ai/cvat

chiphuyen/aie-book

rasbt/python-machine-learning-book

invertase/react-native-firebase

marcotcr/lime

soumith/ganhacks

apple/turicreate

cs231n/cs231n.github.io

imgproxy/imgproxy

calesthio/Crucix

Explorer les sous-tags