# jackfsuia/nanorlhf

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/jackfsuia-nanorlhf).**

80 stars · 12 forks · Python · MIT

## Links

- GitHub: https://github.com/jackfsuia/nanoRLHF
- awesome-repositories: https://awesome-repositories.com/repository/jackfsuia-nanorlhf.md

## Description

RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.

## Tags

### Part of an Awesome List

- [Reasoning Models](https://awesome-repositories.com/f/awesome-lists/ai/reasoning-models.md) — Lightweight reinforcement learning from human feedback implementation.
