30 open-source projects similar to karpathy/build-nanogpt, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Build Nanogpt alternative.
This project is a comprehensive educational curriculum and structured learning path covering the full lifecycle of large language models. It provides a guided progression through the theory, architecture, training, and deployment of these models. The curriculum includes specialized guides on transformer architecture, model training tutorials, and frameworks for designing autonomous agents. It also provides dedicated resources for studying model safety and ethics. The material covers a wide range of technical capabilities, including distributed training strategies, parameter-efficient fine-tu
GPT2-Chinese is a Chinese language model implementation based on the GPT-2 architecture. It provides a causal language model trainer and a natural language generation tool designed for training and generating human-like Chinese text sequences. The system integrates a BERT tokenizer to process Chinese corpora into manageable units for machine learning. It enables the development of predictive text models that can generate specific patterns, such as news or poetry, through prompt-based text completion. The project covers a full workflow including text tokenization, model training using a trans
This project is a transformer-based language model and autoregressive text generator designed to predict the next token in a sequence to produce human-like prose and synthetic text. It functions as a large language model that utilizes a transformer architecture to learn linguistic patterns from large datasets for unsupervised multitask learning. The repository provides a distribution of pre-trained weights, enabling natural language processing tasks without requiring additional training. This allows the model to perform zero-shot task generalization by applying learned patterns to new tasks.
Fairseq is a PyTorch toolkit for sequence-to-sequence modeling, specializing in neural machine translation, automatic speech recognition, and large-scale language model training. It provides a framework for processing and aligning diverse data sources, including text, audio, and video, to support tasks such as speech-to-text conversion and multimodal sequence learning. The project is distinguished by its distributed training capabilities, which utilize parameter sharding, mixed-precision training, and CPU offloading to handle models that exceed single-device memory. It also includes specializ
This project is a static educational website and comprehensive curriculum focused on computer vision and deep learning. It serves as a public repository of instructional materials, lecture notes, and technical guides specifically detailing convolutional neural networks and visual recognition. The site is developed using static-site generation to host course documentation and student project directories. It provides structured academic resources that guide learners through image classification, generative modeling, and the implementation of various neural network architectures. The curriculum
Tiny Universe is an educational monorepo that delivers multiple independent implementations of core AI subsystems as self-contained Jupyter notebooks. It provides from-scratch constructions of foundational architectures including a complete Transformer model built from the original paper specification, a denoising diffusion probabilistic model for image generation, and a ReAct-style autonomous agent framework that equips an LLM with tools for planning and multi-step task execution. The project distinguishes itself by covering the full lifecycle of modern AI systems through hands-on implementa
This project is a collection of educational resources and technical guides focused on the development and implementation of large language models. It provides a comprehensive curriculum covering transformer architectures, training methods, and deployment strategies. The materials provide detailed instructions for building autonomous agents using reasoning loops and tool integration, as well as guides for fine-tuning models through supervised learning and preference optimization. It also includes tutorials for constructing retrieval augmented generation pipelines and implementing transformer m
This project is a framework for fine-tuning large language models using parameter-efficient training techniques. It provides a structured pipeline for adapting pre-trained transformer models to specific tasks while minimizing the computational resources and memory required during the training process. The system distinguishes itself by utilizing low-rank adaptation, which injects trainable rank-decomposition matrices into frozen transformer layers. By updating only this small subset of injected parameters rather than the entire model, the framework reduces the overhead associated with gradien
This project is a PyTorch-based Chinese text classification framework. It provides a transformer-based pipeline designed to categorize Chinese language sequences into predefined labels using deep learning models. The implementation supports both BERT and ERNIE language models for processing and tagging complex Chinese text. These models are used to perform tasks such as sentiment analysis and general text categorization. The system utilizes transformer-based text encoding and attention-weighted sequence pooling to convert raw characters into document vectors. It employs pre-trained model fin
This project is a comprehensive machine learning educational resource and tutorial series delivered as a collection of interactive Jupyter Notebooks. It provides practical Python implementations for the end-to-end machine learning lifecycle, covering supervised and unsupervised learning, deep learning, and reinforcement learning. The resource distinguishes itself by providing detailed implementation guides for complex architectures, including transformers, generative adversarial networks, and convolutional neural networks. It also features specialized courseware for developing reinforcement l
This project is a collection of educational examples and code for implementing deep learning architectures using the PyTorch framework. It serves as a tutorial and implementation guide for building various neural network architectures for machine learning tasks. The project provides practical implementations for computer vision, including image classification and neural style transfer, as well as natural language processing examples for building sequence models and language predictors. It also covers generative models using adversarial and variational networks to synthesize or transform visua
This project is a comprehensive framework for the training, fine-tuning, and deployment of large language models. It functions as a distributed deep learning platform that enables users to scale model workflows across multiple hardware nodes while providing tools for model evaluation and performance benchmarking. The platform distinguishes itself by offering specialized utilities for model compression and weight transformation, allowing users to reduce memory footprints and latency through quantization and pruning. It supports the adaptation of large models for consumer-grade hardware, facili
This project is a comprehensive, open-source educational curriculum designed to guide developers through the mastery of generative artificial intelligence. It provides a structured learning path that covers foundational concepts, prompt engineering, and the practical application of large language models. The repository serves as a central hub for skill acquisition, offering sequential modules that progress from basic model mechanics to advanced architectural patterns. The curriculum distinguishes itself by focusing on the end-to-end lifecycle of intelligent software, including the implementat
🦖 𝗟𝗲𝗮𝗿𝗻 about 𝗟𝗟𝗠𝘀, 𝗟𝗟𝗠𝗢𝗽𝘀, and 𝘃𝗲𝗰𝘁𝗼𝗿 𝗗𝗕𝘀 for free by designing, training, and deploying a real-time financial advisor LLM system ~ 𝘴𝘰𝘶𝘳𝘤𝘦 𝘤𝘰𝘥𝘦 𝘷𝘪𝘥𝘦𝘰 & 𝘳𝘦𝘢𝘥𝘪𝘯𝘨 𝘮𝘢𝘵𝘦𝘳𝘪𝘢𝘭𝘴
This project is an educational course and set of instructional materials for building large language models from scratch using Python. It provides a step-by-step guide and practical tutorials focused on the internal mechanics of transformer architectures and pre-training workflows. The repository features a framework for implementing and comparing diverse model families, including Llama, GLM, and RWKV. It uses a configuration-driven assembly approach to analyze the structural differences and internal mechanisms of these various architectures. The codebase covers the complete development pipe
LLM-RL-Visualized is a visual reference library and collection of knowledge maps designed to explain Large Language Model and Reinforcement Learning algorithms. It provides a structured system of conceptual diagrams and taxonomies covering the intersection of language model alignment and reinforcement learning. The project distinguishes itself through detailed visual mappings of complex workflows, such as the coordination of reward models and policy optimization in reinforcement learning from human feedback. It contrasts different preference optimization architectures, such as RLHF and Direct
LLM101n is an educational machine learning curriculum and open-source resource designed to teach the fundamental principles and practical implementation of large language models. It functions as a technical manual that guides users through the end-to-end process of building and training neural network architectures from scratch using a dynamic tensor library for automatic differentiation and GPU-accelerated computation. The project distinguishes itself through interactive, notebook-based instruction that allows for real-time visualization of training processes. It supports rapid experimentati
LangGPT: Empowering everyone to become a prompt expert! 🚀 📌 结构化提示词(Structured Prompt)提出者 📌 元提示词(Meta-Prompt)发起者 📌 最流行的提示词落地范式 | Language of GPT The pioneering framework for structured & meta-prompt design 10,000+ ⭐ | Battle-tested by thousands of users worldwide Created by 云中江树
Nanochat is a lightweight execution environment designed for training and running language models on standard consumer hardware. It functions as both a neural network training framework and an inference engine, enabling users to perform backpropagation-based training and model execution directly on general-purpose processors without the need for dedicated graphics hardware. The project distinguishes itself through a suite of optimization tools that prioritize efficiency on local machines. By utilizing memory-mapped weight loading and CPU-optimized vector math, it maximizes throughput for inte
This project is a technical learning resource and developer knowledge base focused on the integration of large language models into software applications. It provides a structured collection of guides and code examples designed to teach developers how to implement intelligent features using proven patterns and best practices. The repository distinguishes itself through a library of functional demonstrations that cover complex topics such as retrieval-augmented generation, function calling, and prompt engineering workflows. These materials are organized into a modular structure, allowing for t
This project is a comprehensive educational curriculum and engineering handbook focused on the lifecycle of large language models. It serves as a structured knowledge base for machine learning practitioners, covering the fundamental mathematical and architectural principles of transformer-based sequence modeling, as well as the practical implementation of supervised instruction fine-tuning and preference-based model alignment. The repository distinguishes itself by providing a deep dive into advanced model composition and optimization techniques. It details methodologies for weight-space mode
One Small Step is an educational resource that explains core AI and large language model concepts through short, accessible articles designed to be read in under five minutes. It covers the structure and function of key LLM components like attention mechanisms and tokenization, as well as foundational machine learning mathematics such as matrix rank and overfitting. The project also serves as a guide to the GGUF file format, which packages all model parameters and metadata into a single compact binary file for cross-platform deployment without external dependencies. It explains how this forma
This is a PyTorch deep learning implementation for training transformer-based language models. It functions as a distributed GPU trainer and framework designed to optimize text prediction models for increased speed and sample efficiency. The project is distinguished by its use of the Newton-Schulz weight optimizer. This method applies an iterative process to maintain semi-orthogonal parameter updates and weight matrices, which improves sample efficiency and reduces memory overhead during the training process. The framework covers broad capabilities in distributed GPU computing, including dat
DeiT is a PyTorch vision transformer framework designed for image classification. It implements a transformer-based architecture that processes images as sequences of flattened patches using self-attention layers and position-aware sequence modeling instead of convolutional filters. The project focuses on data-efficient training through a knowledge distillation framework. This system allows a student model to mimic the soft labels of a high-performance teacher model to improve accuracy and generalization, particularly when training on smaller datasets. The library covers the full development
Qwen-7B is a pretrained causal language model designed for natural language generation, text processing, and complex reasoning tasks. It is available as an instruction-tuned model optimized for conversational interactions and a tool-use model capable of executing function calls and interacting with external APIs. The project provides a quantized version of the model to reduce GPU memory usage and supports the development of autonomous agents that can execute code and perform functions to complete complex goals. The system covers a wide range of capabilities including model fine-tuning throug
This project is a comprehensive framework for the entire lifecycle of transformer-based language models, supporting everything from foundational pretraining to specialized deployment. It provides a modular toolkit for defining neural network architectures, managing data preparation pipelines, and executing training routines across various scales. The framework is designed to handle the full model development process, including supervised fine-tuning, behavioral alignment, and the integration of agentic capabilities. What distinguishes this framework is its focus on efficient training and adva