2 repos

Awesome GitHub RepositoriesSynthetic Data Pipelines

Automated workflows for generating training data by applying controlled transformations and noise to clean datasets.

Distinguishing note: Focuses on the data preparation pipeline, distinct from the model architecture itself.

Explore 2 awesome GitHub repositories matching artificial intelligence & ml · Synthetic Data Pipelines. Refine with filters or upvote what's useful.

Find the best repos with AI.We'll search the best matching repositories with AI.

xinntao/Real-ESRGAN
xinntao/Real-ESRGAN
34,385View on GitHub
Real-ESRGAN is a deep learning restoration pipeline designed to enhance low-resolution media and improve the visual quality of damaged photographs. It functions as a generative image upscaler that reconstructs high-resolution details from source inputs by utilizing neural networks trained to fill in missing information and remove noise. The project distinguishes itself as a blind super-resolution tool, meaning it improves image sharpness and fidelity without requiring prior knowledge of the specific degradation applied to the source. It employs high-order degradation modeling to address compl
Simulates real-world image damage by applying blur, noise, and compression artifacts to training images.
Pythonaminedenoiseesrgan
34,385View on GitHub
huggingface/open-r1
huggingface/open-r1
25,887View on GitHub
Open-r1 is a framework designed for the large-scale training, distillation, and optimization of language models focused on complex reasoning and programming tasks. It provides a comprehensive suite of tools for managing distributed training jobs across multi-node clusters, enabling the development of high-performance models through reinforcement learning and supervised fine-tuning. The project distinguishes itself by integrating secure, containerized code execution environments directly into the training and evaluation lifecycle. By allowing models to run and verify code snippets against test
Provides a framework for generating, filtering, and curating high-quality training datasets through model distillation and automated data integrity verification processes.
Python
25,887View on GitHub