# karpathy/build-nanogpt

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/karpathy-build-nanogpt).**

4,746 stars · 750 forks · Python

## Links

- GitHub: https://github.com/karpathy/build-nanogpt
- awesome-repositories: https://awesome-repositories.com/repository/karpathy-build-nanogpt.md

## Description

This is an educational implementation that builds a generative pre-trained transformer (GPT) language model from scratch using PyTorch. The project is structured as a step-by-step tutorial, walking through the construction of a decoder-only transformer architecture and its training loop with clean git commits and an accompanying video lecture for a hands-on learning experience.

What sets this implementation apart is its focus on practical reproduction: it provides a workflow to train a 124-million-parameter model from scratch in about one hour on cloud GPU hardware, costing under ten dollars. The tutorial covers both the architecture construction and the full training pipeline, making it suitable for those who want to understand the inner workings of a GPT-scale model without relying on pre-built frameworks.

The technical implementation covers the core components of a decoder-only transformer, including causal masked self-attention where each token attends only to preceding tokens, cross-entropy loss minimization for next-token prediction, weight-decay regularization to prevent overfitting, and GPU-accelerated training through PyTorch for large-scale computation. While the project is small in scale, it mirrors the architectural patterns used in larger language models.

## Tags

### Artificial Intelligence & ML

- [Decoder Architectures](https://awesome-repositories.com/f/artificial-intelligence-ml/decoder-architectures.md) — Implements the core decoder-only transformer architecture that processes token sequences for autoregressive generation.
- [From-Scratch Decoder Implementations](https://awesome-repositories.com/f/artificial-intelligence-ml/decoder-architectures/from-scratch-decoder-implementations.md) — Focuses on implementing the decoder-only transformer from scratch with causal self-attention and weight-decay.
- [GPU-Accelerated Training](https://awesome-repositories.com/f/artificial-intelligence-ml/hardware-acceleration-backends/cuda-mining-backends/gpu-accelerated-training.md) — Executes forward and backward passes on CUDA-capable GPUs to accelerate large-scale model training.
- [Language Model Training](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/model-fine-tuning-adaptation/language-model-training.md) — Provides a complete implementation for training a GPT-2-scale language model from scratch with cloud GPU optimization. ([source](https://github.com/karpathy/build-nanogpt/blob/master/README.md))
- [124M-Parameter Reproduction Guides](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/model-fine-tuning-adaptation/language-model-training/124m-parameter-reproduction-guides.md) — Provides a guide for training a 124-million-parameter language model on cloud GPUs for under ten dollars.
- [Triangular Mask Implementations](https://awesome-repositories.com/f/artificial-intelligence-ml/masked-language-modeling/causal-masking/triangular-mask-implementations.md) — Implements the triangular attention mask that enforces unidirectional token access in the self-attention mechanism.
- [Cross-Entropy Loss Functions](https://awesome-repositories.com/f/artificial-intelligence-ml/prediction-visualization/loss-function-calculators/binary-cross-entropy-calculators/cross-entropy-loss-functions.md) — Uses cross-entropy loss as the objective function for next-token prediction during language model training.
- [Cloud GPU Reproduction Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/research-papers/research-reproductions/workflow-reproducibility/cloud-gpu-reproduction-pipelines.md) — Ships a workflow that reproduces a 124M-parameter model on cloud GPU hardware in about one hour for under ten dollars.
- [Transformer Language Models](https://awesome-repositories.com/f/artificial-intelligence-ml/transformer-language-models.md) — Implements a decoder-only transformer language model that processes token sequences and predicts the next token.
- [Deep Learning Prototyping Kits](https://awesome-repositories.com/f/artificial-intelligence-ml/deep-learning-prototyping-kits.md) — Provides a hands-on environment for prototyping transformer components like causal self-attention and loss functions.
- [PyTorch Tensor Operations](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-optimization-and-inference/hardware-and-acceleration/tensor-computing-libraries/pytorch-tensor-operations.md) — Leverages PyTorch for all tensor operations and automatic differentiation throughout the model implementation.
- [Transformer Tutorials](https://awesome-repositories.com/f/artificial-intelligence-ml/pytorch-training-frameworks/transformer-tutorials.md) — Provides a step-by-step walkthrough for implementing a transformer language model using PyTorch.

### Part of an Awesome List

- [From-Scratch Training](https://awesome-repositories.com/f/awesome-lists/ai/pre-trained-models/from-scratch-training.md) — Demonstrates the complete workflow of training a generative pre-trained transformer from scratch on text data.

### Scientific & Mathematical Computing

- [From-Scratch Implementations](https://awesome-repositories.com/f/scientific-mathematical-computing/from-scratch-implementations.md) — Builds a generative pre-trained transformer entirely from scratch, covering both architecture construction and training loop.

### Education & Learning Resources

- [Transformer Architecture Walkthroughs](https://awesome-repositories.com/f/education-learning-resources/neural-network-tutorials/transformer-architecture-walkthroughs.md) — Provides a structured tutorial with clean git commits and video lecture for building a transformer language model.