Tiny Universe

Tiny Universe is an educational monorepo that delivers multiple independent implementations of core AI subsystems as self-contained Jupyter notebooks. It provides from-scratch constructions of foundational architectures including a complete Transformer model built from the original paper specification, a denoising diffusion probabilistic model for image generation, and a ReAct-style autonomous agent framework that equips an LLM with tools for planning and multi-step task execution.

The project distinguishes itself by covering the full lifecycle of modern AI systems through hands-on implementations. It includes retrieval-augmented generation pipelines that combine vector databases with knowledge graphs, a GraphRAG system that constructs knowledge graphs from text and generates hierarchical community summaries, and a two-stage evaluation pipeline that scores model outputs against reference answers using metrics like F1, ROUGE, and accuracy. The repository also demonstrates reinforcement learning fine-tuning, automated document review workflows that detect deviations and generate revision suggestions, and iterative image optimization that evaluates and improves generated images against text prompts.

Beyond these core areas, Tiny Universe explores the internal mechanisms of large language models with walkthroughs of grouped query attention, rotary position embeddings, and causal masking. It covers data processing techniques such as semantic chunking by sentence shifts, vector embedding pipelines for similarity-based retrieval, and hybrid search strategies that fuse sentence-level similarity with domain-specific term importance. The project also includes image quality evaluation using Inception Score and Fréchet Inception Distance, as well as image-text consistency checking with vision-language models.

All implementations are delivered as self-contained Jupyter notebooks within a single repository, making the code directly runnable and inspectable for educational purposes.

datawhalechinatiny-universe

Features

Features