This project is a manual reconstruction of the Llama 3 transformer architecture implemented as a PyTorch neural network. It serves as a reference for the internal mathematical structure and tensor flow of a transformer-based language model designed for next token prediction.
The implementation focuses on building the model from scratch using basic matrix operations and tensor manipulations. It demonstrates the manual construction of core components, including rotary positional embeddings, multi-head self-attention, and root mean square normalization.
The codebase covers the full inference pipeline, from text tokenization and token embedding generation to the use of gated linear units within a feed-forward network. It also includes the mechanisms for loading pre-trained model weights and configuration parameters to initialize the architecture.
The project is provided as a series of Jupyter Notebooks.