Makemore

Features

Character-Level Models - Trains a model to predict the next character in a sequence to generate text mimicking a specific dataset.
Generative Model Sampling - Generates novel text by sampling from the learned probability distributions of a trained model.
N-Gram Co-occurrence Models - Implements n-gram modeling to calculate the likelihood of subsequent characters based on preceding sequences.
Gradient Descent Algorithms - Optimizes model weights via gradient descent and automatic differentiation to minimize cross-entropy loss.
Neural Network Implementations - Builds a character-level neural network using the PyTorch framework for sequence prediction.
Autoregressive Text Generation - Creates new character sequences by autoregressively predicting subsequent characters based on training patterns.
Text Generation Engines - Provides an engine that generates novel character strings based on learned probability distributions.
Text Model Training - Processes text corpora to build predictive models by learning character-level probability distributions.
LLM Education - Provides a step-by-step educational implementation of the architecture and training processes used in LLMs.
Bigram Frequency Tables - Implements a lookup table to track character pairs and determine the statistical probability of transitions.
Multinomial Samplers - Generates text by drawing characters from a probability distribution using multinomial sampling.
Softmax Normalization - Converts raw model outputs into normalized probability distributions using the softmax function.
Maximum Likelihood Estimators - Uses maximum likelihood estimation to calculate model parameters by optimizing based on observed training data.

makemore is a character-level language model and text generation engine. It serves as an educational implementation of the architecture and training processes used in large language models, built as a neural network using the PyTorch framework.

The system demonstrates sequence prediction by learning the probability distributions of characters within a dataset to generate novel text strings. It implements this through a progression of techniques, including n-gram probability modeling and the use of automatic differentiation for weight optimization.

The project covers the full machine learning lifecycle for sequence prediction, from processing text corpora during model training to producing output via probabilistic sampling. This includes the application of softmax normalization and multinomial sampling to convert model outputs into predictable character sequences.

Features

Character-Level Models - Trains a model to predict the next character in a sequence to generate text mimicking a specific dataset.
Generative Model Sampling - Generates novel text by sampling from the learned probability distributions of a trained model.
N-Gram Co-occurrence Models - Implements n-gram modeling to calculate the likelihood of subsequent characters based on preceding sequences.
Gradient Descent Algorithms - Optimizes model weights via gradient descent and automatic differentiation to minimize cross-entropy loss.
Neural Network Implementations - Builds a character-level neural network using the PyTorch framework for sequence prediction.
Autoregressive Text Generation - Creates new character sequences by autoregressively predicting subsequent characters based on training patterns.
Text Generation Engines - Provides an engine that generates novel character strings based on learned probability distributions.
Text Model Training - Processes text corpora to build predictive models by learning character-level probability distributions.
LLM Education - Provides a step-by-step educational implementation of the architecture and training processes used in LLMs.
Bigram Frequency Tables - Implements a lookup table to track character pairs and determine the statistical probability of transitions.
Multinomial Samplers - Generates text by drawing characters from a probability distribution using multinomial sampling.
Softmax Normalization - Converts raw model outputs into normalized probability distributions using the softmax function.
Maximum Likelihood Estimators - Uses maximum likelihood estimation to calculate model parameters by optimizing based on observed training data.

Features

karpathymakemore

Features

Star history