This project is an educational implementation of a small-scale generative pre-trained transformer designed to teach the fundamentals of neural network architecture and training. It serves as a reference implementation and tutorial for constructing a text-generating neural network from scratch. The codebase demonstrates the mechanics of tokenization, self-attention, and the construction of a lightweight language model. It focuses on the step-by-step process of building a generative model to illustrate how large language models are constructed. The implementation covers transformer-based archi
中文对话0.2B小模型(ChatLM-Chinese-0.2B),开源所有数据集来源、数据清洗、tokenizer训练、模型预训练、SFT指令微调、RLHF优化等流程的全部代码。支持下游任务sft微调,给出三元组信息抽取微调示例。