awesome-repositories.com
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPSitemapPrivacyTerms
Machine Learning Datasets · Awesome GitHub Repositories

4 repos

Awesome GitHub RepositoriesMachine Learning Datasets

Structured collections of data used for training, validating, or testing various machine learning models.

Explore 4 awesome GitHub repositories matching artificial intelligence & ml · Machine Learning Datasets. Refine with filters or upvote what's useful.

  1. Home
  2. Artificial Intelligence & ML
  3. Artificial Intelligence & Machine Learning
  4. Machine Learning Datasets

Awesome Machine Learning Datasets GitHub Repositories

Describe the repository you're looking for…
We'll search the best matching repositories with AI.
  • d2l-ai/d2l-zh

    d2l-ai/d2l-zh

    75,708GitHubView on GitHub↗

    This project is an open-source, interactive educational platform designed to teach deep learning through a comprehensive, code-first curriculum. It provides a structured learning path that covers foundational mathematics, modern neural network architectures, and practical optimization techniques, enabling practitioners

    Pythonbookchinesecomputer-vision
  • mlabonne/llm-course

    mlabonne/llm-course

    75,340GitHubView on GitHub↗

    This project is a comprehensive educational curriculum and engineering handbook focused on the lifecycle of large language models. It serves as a structured knowledge base for machine learning practitioners, covering the fundamental mathematical and architectural principles of transformer-based sequence modeling, as we

    courselarge-language-modelsllm
  • tesseract-ocr/tesseract

    tesseract-ocr/tesseract

    72,460GitHubView on GitHub↗

    Tesseract is a neural network-based optical character recognition engine designed to convert scanned images and digital documents into machine-readable, searchable text. It functions as both a command-line utility for automating large-scale digitization workflows and a cross-platform library that can be embedded into d

    C++hacktoberfestlstmmachine-learning
  • dair-ai/Prompt-Engineering-Guide

    dair-ai/Prompt-Engineering-Guide

    70,526GitHubView on GitHub↗

    This project is a comprehensive educational resource and knowledge base dedicated to the development and application of large language models and autonomous agentic systems. It provides a structured framework for understanding prompt engineering, context management, and the architectural patterns required to build task

    MDXagentagentsai-agents

Explore sub-tags

  • Image Classification DatasetsDatasets specifically structured for training image recognition models.
  • Natural Language Processing DatasetsDatasets specifically curated for training or evaluating natural language processing models, including text corpora and annotated linguistic data.
  • OCR Training DatasetsCommunity-maintained datasets specifically for improving optical character recognition accuracy.
Object Detection Datasets
Datasets specifically annotated for identifying and localizing objects within images or video frames.
  • Post-Training DatasetsDatasets specifically formatted for supervised fine-tuning or preference alignment of language models.
  • Pre-training CorporaLarge-scale datasets used for the initial training phase of language models.
  • Regression BenchmarksDatasets specifically curated for evaluating continuous value prediction performance.