awesome-repositories.comBlog
© 2026 Bringes Technology SRL·VAT RO45896025·hello@bringes.io
MCPBlogSitemapPrivacyTerms
Synthetic Data Pipelines · Awesome GitHub Repositories

2 repos

Awesome GitHub RepositoriesSynthetic Data Pipelines

Automated workflows for generating training data by applying controlled transformations and noise to clean datasets.

Distinguishing note: Focuses on the data preparation pipeline, distinct from the model architecture itself.

Explore 2 awesome GitHub repositories matching artificial intelligence & ml · Synthetic Data Pipelines. Refine with filters or upvote what's useful.

  1. Home
  2. Artificial Intelligence & ML
  3. Synthetic Data Pipelines

Awesome Synthetic Data Pipelines GitHub Repositories

Describe the repository you're looking for…
Find the best repos with AI.We'll search the best matching repositories with AI.
  • xinntao/Real-ESRGAN

    xinntao/Real-ESRGAN

    34,385View on GitHub↗

    Real-ESRGAN is a deep learning restoration pipeline designed to enhance low-resolution media and improve the visual quality of damaged photographs. It functions as a generative image upscaler that reconstructs high-resolution details from source inputs by utilizing neural networks trained to fill in missing information and remove noise. The project distinguishes itself as a blind super-resolution tool, meaning it improves image sharpness and fidelity without requiring prior knowledge of the specific degradation applied to the source. It employs high-order degradation modeling to address compl

    Simulates real-world image damage by applying blur, noise, and compression artifacts to training images.

    Pythonaminedenoiseesrgan
    34,385View on GitHub↗
  • huggingface/open-r1

    huggingface/open-r1

    25,887View on GitHub↗

    Open-r1 is a framework designed for the large-scale training, distillation, and optimization of language models focused on complex reasoning and programming tasks. It provides a comprehensive suite of tools for managing distributed training jobs across multi-node clusters, enabling the development of high-performance models through reinforcement learning and supervised fine-tuning. The project distinguishes itself by integrating secure, containerized code execution environments directly into the training and evaluation lifecycle. By allowing models to run and verify code snippets against test

    Provides a framework for generating, filtering, and curating high-quality training datasets through model distillation and automated data integrity verification processes.

    Python
    25,887View on GitHub↗