This repo contains a reference implementation of the TDPO algorithm for training language models from preference data, as described in the paper Token-level Direct Preference Optimization (ICML 2024). Our implementation is based on DPO, and follows the same usage guidelines.