1 repo

Awesome GitHub RepositoriesDataset Preparation Tutorials

Guides on formatting and processing raw text data for machine learning tasks.

Distinguishing note: Focuses on the data ingestion and preprocessing phase of the machine learning pipeline.

Explore 1 awesome GitHub repository matching education & learning resources · Dataset Preparation Tutorials. Refine with filters or upvote what's useful.

Find the best repos with AI.We'll search the best matching repositories with AI.

datawhalechina/self-llm
datawhalechina/self-llm
28,285View on GitHub
This project is an open-source educational resource providing structured, step-by-step guides for fine-tuning large language models. It focuses on adapting pre-trained transformer-based causal models to custom datasets, enabling users to transfer specific writing styles or domain knowledge into generative AI models. The repository distinguishes itself by emphasizing parameter-efficient training techniques, specifically low-rank adaptation. By providing practical implementations for updating only a small subset of model weights, it allows for the customization of massive neural networks on con
首先，我们需要准备《甄嬛传》剧本数据，这里我们使用了《甄嬛传》剧本数据，我们可以查看一下原始数据的格式。 ```text 第2幕（退朝，百官散去）官员甲：咱们皇上可真是器重年将军和隆科多大人。官员乙：隆科多大人，恭喜恭喜啊！您可是国家的大功臣啊！官员丙：年大将军，皇上对你可是垂青有加呀！官员丁：年大人，您可是皇上的股肱之臣哪！苏培盛（追上年羹尧）：年大将军请留步。大将军—— 年羹尧：苏公公，有何指教？苏培盛：不敢。皇上惦记
Jupyter Notebookchatglmchatglm3gemma-2b-it
28,285View on GitHub

Awesome GitHub RepositoriesDataset Preparation Tutorials

Guides on formatting and processing raw text data for machine learning tasks.

Distinguishing note: Focuses on the data ingestion and preprocessing phase of the machine learning pipeline.

Explore 1 awesome GitHub repository matching education & learning resources · Dataset Preparation Tutorials. Refine with filters or upvote what's useful.

Awesome Dataset Preparation Tutorials GitHub Repositories

datawhalechina/self-llm

Awesome Dataset Preparation Tutorials GitHub Repositories

datawhalechina/self-llm