1 repo
Guides on formatting and processing raw text data for machine learning tasks.
Distinguishing note: Focuses on the data ingestion and preprocessing phase of the machine learning pipeline.
Explore 1 awesome GitHub repository matching education & learning resources · Dataset Preparation Tutorials. Refine with filters or upvote what's useful.
This project is an open-source educational resource providing structured, step-by-step guides for fine-tuning large language models. It focuses on adapting pre-trained transformer-based causal models to custom datasets, enabling users to transfer specific writing styles or domain knowledge into generative AI models. The repository distinguishes itself by emphasizing parameter-efficient training techniques, specifically low-rank adaptation. By providing practical implementations for updating only a small subset of model weights, it allows for the customization of massive neural networks on con
首先,我们需要准备《甄嬛传》剧本数据,这里我们使用了《甄嬛传》剧本数据,我们可以查看一下原始数据的格式。 ```text 第2幕 (退朝,百官散去) 官员甲:咱们皇上可真是器重年将军和隆科多大人。 官员乙:隆科多大人,恭喜恭喜啊!您可是国家的大功臣啊! 官员丙:年大将军,皇上对你可是垂青有加呀! 官员丁:年大人,您可是皇上的股肱之臣哪! 苏培盛(追上年羹尧):年大将军请留步。大将军—— 年羹尧:苏公公,有何指教? 苏培盛:不敢。皇上惦记