# jzhang38/tinyllama

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/jzhang38-tinyllama).**

8,994 stars · 624 forks · Python · Apache-2.0 · archived

## Links

- GitHub: https://github.com/jzhang38/TinyLlama
- awesome-repositories: https://awesome-repositories.com/repository/jzhang38-tinyllama.md

## Description

TinyLlama is a compact 1.1B parameter language model pretrained on a dataset of 3 trillion tokens. It is an edge AI model designed for high-performance text generation on memory-constrained devices.

The project provides a distributed pretraining framework for training small language models across multiple GPUs and nodes. It also includes a finetuning toolkit for full-parameter weight adjustments to adapt the base model for chat and specific tasks.

The system supports distributed large language model training and on-device text generation. Its architectural components include rotary positional embeddings, root mean square layer normalization, and attention kernels.

## Tags

### Artificial Intelligence & ML

- [Generative AI Models](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-deployment-and-serving/local-and-on-device-inference/edge-ai-model-deployment/generative-ai-models.md) — Provides a lightweight 1.1B parameter generative model optimized for deployment on edge hardware.
- [Data-Parallel Training](https://awesome-repositories.com/f/artificial-intelligence-ml/distributed-training-frameworks/data-parallel-training.md) — Provides a framework for distributing training workloads across multiple GPUs using data parallelism.
- [Efficient LLM Development](https://awesome-repositories.com/f/artificial-intelligence-ml/efficient-llm-development.md) — Focuses on the development and optimization of compact models to enable high performance on constrained hardware.
- [LLM Development Toolkits](https://awesome-repositories.com/f/artificial-intelligence-ml/llm-development-toolkits.md) — Includes a toolkit for full-parameter weight adjustments to adapt the model for specific tasks.
- [Large Language Model Training Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/infrastructure/model-training-and-tuning/training-frameworks/large-language-model-training-frameworks.md) — Executes distributed pretraining of transformer models across multi-GPU environments using optimized attention. ([source](https://cdn.jsdelivr.net/gh/jzhang38/tinyllama@main/README.md))
- [Model Pretraining Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/model-pretraining-frameworks.md) — Provides a distributed system for training foundational language models across multiple GPUs and nodes.
- [Finetuning Workflows](https://awesome-repositories.com/f/artificial-intelligence-ml/model-pretraining-frameworks/finetuning-workflows.md) — Provides workflows for adapting a pretrained base model to specific tasks or chat styles.
- [Language Model Pretraining](https://awesome-repositories.com/f/artificial-intelligence-ml/natural-language-processing/language-model-pretraining.md) — Trains a compact Llama model on a 3 trillion token dataset using distributed GPU clusters.
- [On-Device Text Generation Runners](https://awesome-repositories.com/f/artificial-intelligence-ml/sequence-generation/autoregressive-text-generation/on-device-text-generation-runners.md) — Enables real-time text generation and dialogue execution on memory-constrained edge hardware.
- [FlashAttention](https://awesome-repositories.com/f/artificial-intelligence-ml/attention-mechanisms/flashattention.md) — Uses FlashAttention kernels to minimize GPU memory access and increase attention computation speed.
- [Full Parameter Fine-Tuning](https://awesome-repositories.com/f/artificial-intelligence-ml/full-parameter-fine-tuning.md) — Ships scripts for updating all model parameters to adapt the general model for chat interactions. ([source](https://cdn.jsdelivr.net/gh/jzhang38/tinyllama@main/README.md))
- [Causal Masking](https://awesome-repositories.com/f/artificial-intelligence-ml/masked-language-modeling/causal-masking.md) — Implements causal masking to prevent the model from attending to future tokens during training.
- [RMSNorm Layers](https://awesome-repositories.com/f/artificial-intelligence-ml/normalization-layers/rmsnorm-layers.md) — Implements root mean square layer normalization to stabilize neural network activations during training.
- [Rotary Positional Embeddings](https://awesome-repositories.com/f/artificial-intelligence-ml/positional-embedding-techniques/rotary-positional-embeddings.md) — Utilizes rotary positional embeddings to encode relative token positions in a high-dimensional space.
- [Pretrained Weight Initializers](https://awesome-repositories.com/f/artificial-intelligence-ml/weight-initialization/pretrained-weight-initializers.md) — Employs specific weight initialization distributions to ensure stable convergence during large-scale pretraining.

### Part of an Awesome List

- [Language Models](https://awesome-repositories.com/f/awesome-lists/ai/language-models.md) — Implements a compact 1.1B parameter language model for high-performance text generation.
- [Large Language Models](https://awesome-repositories.com/f/awesome-lists/ai/large-language-models.md) — Compact language models for efficient deployment.
- [Small Language Models](https://awesome-repositories.com/f/awesome-lists/ai/small-language-models.md) — Pre-trained small-scale LLaMA model.
- [Large Language Models (LLMs)](https://awesome-repositories.com/f/awesome-lists/more/large-language-models-llms.md) — Listed in the “Large Language Models (LLMs)” section of the The Incredible Pytorch awesome list.
