# karpathy/minGPT

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/karpathy-mingpt).**

23,639 stars · 3,115 forks · Python · mit

## Links

- GitHub: https://github.com/karpathy/minGPT
- awesome-repositories: https://awesome-repositories.com/repository/karpathy-mingpt.md

## Description

minGPT is a minimal implementation of the Transformer architecture designed for training and experimenting with language models. It functions as a neural network training framework and a text generation engine, providing the necessary tools to manage data loading, backpropagation, and parameter updates for custom deep learning models.

The project is structured as an educational resource for understanding how transformer architectures function by building and training models from scratch. It utilizes a modular block architecture and transformer-based self-attention to process sequences, allowing users to define custom model configurations and execute the full training loop on their own datasets.

Beyond its core training capabilities, the library supports byte-pair-encoding for text processing and provides mechanisms for serializing model parameters. It includes functionality for extending training logic through custom callbacks and packaging models for distribution, facilitating both neural network prototyping and text generation inference.

## Tags

### Artificial Intelligence & ML

- [Attention Mechanisms](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/architectures/transformer/attention-mechanisms.md) — Implements transformer-based self-attention to capture long-range dependencies within input sequences.
- [Model Training Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/model-training-frameworks.md) — Provides a framework for executing the training process of language models with custom hyperparameters. ([source](https://github.com/karpathy/minGPT/blob/master/README.md))
- [Language Model Training](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/model-fine-tuning-adaptation/language-model-training.md) — Facilitates the full training loop for transformer models to enable understanding of hyperparameter tuning.
- [Language Model Builders](https://awesome-repositories.com/f/artificial-intelligence-ml/transformer-language-models/language-model-builders.md) — Serves as an educational framework for constructing and training transformer-based language models from scratch.
- [Transformer Models](https://awesome-repositories.com/f/artificial-intelligence-ml/transformer-models.md) — Allows construction and configuration of custom transformer architectures for training and inference. ([source](https://github.com/karpathy/minGPT/blob/master/README.md))
- [Transformer Training Toolkits](https://awesome-repositories.com/f/artificial-intelligence-ml/transformer-training-toolkits.md) — Provides a complete toolkit for building and training transformer-based language models from scratch. ([source](https://github.com/karpathy/minGPT#readme))
- [Generative Text Inference](https://awesome-repositories.com/f/artificial-intelligence-ml/generative-ai-resources/generative-ai/generative-text-inference.md) — Performs text generation by predicting subsequent tokens with a trained transformer model. ([source](https://github.com/karpathy/minGPT/tree/master/projects))
- [Neural Network Research Tools](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/architectures/neural-network-components/neural-network-research-tools.md) — Provides a minimalist implementation of neural architectures for educational study and rapid prototyping.
- [Neural Network Trainers](https://awesome-repositories.com/f/artificial-intelligence-ml/neural-networks/neural-network-trainers.md) — Manages the full training loop, including data loading, forward passes, and backpropagation for neural networks. ([source](https://github.com/karpathy/minGPT/blob/master/mingpt/trainer.py))
- [Autoregressive Models](https://awesome-repositories.com/f/artificial-intelligence-ml/autoregressive-models.md) — Implements autoregressive generation by iteratively predicting the next token in a sequence based on previous outputs.
- [Sequence Learning Models](https://awesome-repositories.com/f/artificial-intelligence-ml/sequence-learning-models.md) — Models sequences using stacked self-attention layers to predict subsequent elements.
- [Text Generation Utilities](https://awesome-repositories.com/f/artificial-intelligence-ml/text-generation-utilities.md) — Generates coherent text sequences by predicting tokens from input prompts using trained models. ([source](https://github.com/karpathy/minGPT#readme))
- [Byte Pair Encodings](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/language-tools/tokenization-algorithms/byte-pair-encodings.md) — Uses byte pair encoding to convert raw text into integer sequences for model processing.
- [Sequence Encoders](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/speech-processing/sequence-to-sequence-tasks/sequence-encoders.md) — Encodes raw text into integer sequences using byte pair encoding for model input. ([source](https://github.com/karpathy/minGPT/blob/master/README.md))
- [Model Serialization](https://awesome-repositories.com/f/artificial-intelligence-ml/model-serialization.md) — Persists model parameters and configurations using state-dict serialization for deployment and loading.
- [Data Encoding](https://awesome-repositories.com/f/artificial-intelligence-ml/data-encoding.md) — Converts raw text data into numerical representations suitable for machine learning consumption. ([source](https://github.com/karpathy/minGPT#readme))
- [Training Callbacks](https://awesome-repositories.com/f/artificial-intelligence-ml/training-callbacks.md) — Supports custom training callbacks to inject logging or evaluation logic into the training loop. ([source](https://github.com/karpathy/minGPT/blob/master/mingpt/trainer.py))
- [Backpropagation](https://awesome-repositories.com/f/artificial-intelligence-ml/backpropagation.md) — Calculates gradients of loss functions to update model parameters during the training loop.
- [Optimization Algorithms](https://awesome-repositories.com/f/artificial-intelligence-ml/optimization-algorithms.md) — Provides gradient-based parameter update methods for training neural network models.

### Education & Learning Resources

- [Educational Implementations](https://awesome-repositories.com/f/education-learning-resources/educational-resources/systems-applied-computing/machine-learning-education/llm-engineering-guides/transformer-model-tutorials/educational-implementations.md) — Offers a minimal, readable implementation of transformer architectures specifically for educational study and experimentation.

### Development Tools & Productivity

- [Modular Architecture](https://awesome-repositories.com/f/development-tools-productivity/modular-architecture.md) — Organizes neural network layers into repeatable, modular blocks for flexible model scaling.
