[December 5, 2025] 🔍 We propose entropy ratio clipping (ERC) to impose a global constraint on the output distribution of the policy model. Experiments demonstrate that ERC can significantly improve the stability of off-policy training. 📄 The paper is available on arXiv.