awesome-repositories.comBlog
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPBlogSitemapPrivacyTerms
Training Data Generation · Awesome GitHub Repositories

1 repo

Awesome GitHub RepositoriesTraining Data Generation

Tools for creating and curating synthetic datasets to improve the quality and diversity of model training.

Distinguishing note: Focuses on the creation of training examples specifically for model improvement, distinct from general synthetic data generation.

Explore 1 awesome GitHub repository matching artificial intelligence & ml · Training Data Generation. Refine with filters or upvote what's useful.

  1. Home
  2. Artificial Intelligence & ML
  3. Training Data Generation

Awesome Training Data Generation GitHub Repositories

Describe the repository you're looking for…
Find the best repos with AI.We'll search the best matching repositories with AI.
  • tatsu-lab/stanford_alpaca

    tatsu-lab/stanford_alpaca

    30,266View on GitHub↗

    This project provides an end-to-end framework for adapting large language models to follow user instructions through supervised fine-tuning. It functions as a comprehensive training pipeline that enables the creation of specialized assistant models by minimizing the difference between predicted outputs and target responses within structured instruction datasets. The framework distinguishes itself by integrating synthetic data generation with memory-efficient training techniques. It utilizes powerful language models to iteratively expand small sets of human-written seeds into diverse, high-qua

    Create synthetic instruction data by prompting large language models with seed tasks to produce diverse and high-quality examples for improving model training outcomes.

    Pythondeep-learninginstruction-followinglanguage-model
    30,266View on GitHub↗