# datawhalechina/llm-universe

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/datawhalechina-llm-universe).**

13,269 stars · 1,356 forks · Jupyter Notebook

## Links

- GitHub: https://github.com/datawhalechina/llm-universe
- Homepage: https://datawhalechina.github.io/llm-universe/
- awesome-repositories: https://awesome-repositories.com/repository/datawhalechina-llm-universe.md

## Topics

`langchain` `rag`

## Description

llm-universe is a structured learning resource and technical guide focused on the development of large language model applications. It serves as a curriculum for mastering model orchestration, the creation of autonomous conversational agents, and the implementation of retrieval-augmented generation systems.

The project provides detailed instructions on connecting model APIs with memory and tools to create execution chains. It specifically covers the construction of retrieval pipelines, including the process of cleaning raw documents, generating embeddings, and integrating vector databases to ground model responses in external data.

The resource covers high-level capability areas including prompt engineering workflows, semantic search optimization through hybrid retrieval and re-ranking, and the deployment of AI chatbots with persistent conversation state. It also includes methods for evaluating and measuring the performance of both retrieval and generation components.

The material is delivered as a structured collection of notebooks and documentation.

## Tags

### Artificial Intelligence & ML

- [Retrieval-Augmented Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/retrieval-augmented-generation.md) — Serves as a comprehensive guide for connecting generative models to external vector databases for grounded responses.
- [Advanced Retrieval Techniques](https://awesome-repositories.com/f/artificial-intelligence-ml/advanced-retrieval-techniques.md) — Teaches sophisticated strategies to improve the precision and recall of retrieval systems. ([source](https://github.com/datawhalechina/llm-universe/tree/main/notebook))
- [LLM Application Orchestration](https://awesome-repositories.com/f/artificial-intelligence-ml/artificial-intelligence-tooling/language-model-integrations/llm-application-orchestration.md) — Provides a curriculum for orchestrating model calls, managing memory, and coordinating agentic workflows into functional applications. ([source](https://github.com/datawhalechina/llm-universe/blob/main/notebook/C1%20%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%20LLM%20%E4%BB%8B%E7%BB%8D/C1.md))
- [Autonomous Agent Orchestration](https://awesome-repositories.com/f/artificial-intelligence-ml/autonomous-agent-orchestration.md) — Teaches the use of persistent memory and tool integration to orchestrate autonomous agents.
- [Knowledge Base Management](https://awesome-repositories.com/f/artificial-intelligence-ml/knowledge-base-management.md) — Explains how to organize and maintain structured knowledge bases to support automated retrieval for AI context. ([source](https://github.com/datawhalechina/llm-universe/tree/main/notebook))
- [Retrieval Augmented Generation](https://awesome-repositories.com/f/artificial-intelligence-ml/language-model-orchestration/retrieval-augmented-generation.md) — Implements systems that ground language model responses in external data sources. ([source](https://github.com/datawhalechina/llm-universe/blob/main/notebook/C1%20%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%20LLM%20%E4%BB%8B%E7%BB%8D))
- [RAG Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/language-model-orchestration/retrieval-augmented-generation/rag-pipelines.md) — Guides the construction of workflows that integrate external document data into model outputs. ([source](https://github.com/datawhalechina/llm-universe/blob/main/README.md))
- [Prompt Engineering Workflows](https://awesome-repositories.com/f/artificial-intelligence-ml/model-behavioral-analysis/prompt-engineering-workflows.md) — Teaches structured prompt engineering workflows and iterative refinement to guide model output quality and behavior.
- [Prompt Engineering Workflows](https://awesome-repositories.com/f/artificial-intelligence-ml/prompt-engineering-workflows.md) — Provides a comprehensive methodology for developing and managing prompt-based instructions.
- [RAG Frameworks](https://awesome-repositories.com/f/artificial-intelligence-ml/rag-frameworks.md) — Guides the development of applications that link language models with retrieval chains for accurate, data-grounded responses. ([source](https://github.com/datawhalechina/llm-universe/tree/main/docs))
- [Retrieval-Generation Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/retrieval-generation-integrations.md) — Implements the mechanism to fetch relevant data from a knowledge base and provide it as context to a language model. ([source](https://github.com/datawhalechina/llm-universe#readme))
- [Retrieval Optimization](https://awesome-repositories.com/f/artificial-intelligence-ml/retrieval-optimization.md) — Implements strategies like hybrid search and re-ranking to enhance the quality of retrieved data. ([source](https://github.com/datawhalechina/llm-universe/tree/main/docs))
- [Sequential Orchestration](https://awesome-repositories.com/f/artificial-intelligence-ml/sequential-orchestration.md) — Provides instructions for building complex application logic by linking model inputs, memory, and agents into sequential execution flows.
- [Vector Databases](https://awesome-repositories.com/f/artificial-intelligence-ml/vector-databases.md) — Provides a technical guide for implementing vector databases to store and query high-dimensional embeddings. ([source](https://github.com/datawhalechina/llm-universe/blob/main/notebook/C1%20%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%20LLM%20%E4%BB%8B%E7%BB%8D/C1.md))
- [API Integrations](https://awesome-repositories.com/f/artificial-intelligence-ml/api-integrations.md) — Instructs on connecting external model APIs using native calls or wrappers to power application logic. ([source](https://github.com/datawhalechina/llm-universe#readme))
- [Chat Interfaces](https://awesome-repositories.com/f/artificial-intelligence-ml/chat-interfaces.md) — Guides the design of web-based chat interfaces for uploading documents and interacting with AI models. ([source](https://github.com/datawhalechina/llm-universe/blob/main/notebook/C1%20%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%20LLM%20%E4%BB%8B%E7%BB%8D/C1.md))
- [Conversation State Management](https://awesome-repositories.com/f/artificial-intelligence-ml/conversation-state-management.md) — Provides methods for persisting dialogue history and application state across multiple turns to maintain conversational context.
- [Retrieval Pipelines](https://awesome-repositories.com/f/artificial-intelligence-ml/retrieval-pipelines.md) — Guides the construction of retrieval workflows that combine multiple search strategies, query filtering, and re-ranking.

### Education & Learning Resources

- [LLM Application Development Curricula](https://awesome-repositories.com/f/education-learning-resources/llm-application-development-curricula.md) — Serves as a comprehensive structured learning resource for mastering LLM application development. ([source](https://github.com/datawhalechina/llm-universe/blob/main/notebook/C1%20%E5%A4%A7%E5%9E%8B%E8%AF%AD%E8%A8%80%E6%A8%A1%E5%9E%8B%20LLM%20%E4%BB%8B%E7%BB%8D))
- [LLM Tutorials](https://awesome-repositories.com/f/education-learning-resources/llm-tutorials.md) — Provides a comprehensive structured learning resource and set of tutorials for building applications with large language models.
- [Application Development Guides](https://awesome-repositories.com/f/education-learning-resources/application-development-guides.md) — Provides a structured learning resource and curriculum for building end-to-end AI applications from prompt engineering to deployment. ([source](https://github.com/datawhalechina/llm-universe#readme))
- [LLM Orchestration Courses](https://awesome-repositories.com/f/education-learning-resources/educational-resources/systems-applied-computing/machine-learning-education/llm-engineering-guides/llm-orchestration-courses.md) — Offers a detailed curriculum on linking model APIs, memory, and agents into sequential execution flows.
- [Retrieval Augmented Generation Guides](https://awesome-repositories.com/f/education-learning-resources/retrieval-augmented-generation-guides.md) — Provides technical guides for constructing retrieval-augmented generation pipelines using vector databases and embedding models.

### Part of an Awesome List

- [Prompt Iteration](https://awesome-repositories.com/f/awesome-lists/devtools/code-refinement/prompt-iteration.md) — Teaches the iterative process of refining natural language instructions to improve the quality of model-generated outputs. ([source](https://github.com/datawhalechina/llm-universe/tree/main/docs))

### Content Management & Publishing

- [Vector Indexing Pipelines](https://awesome-repositories.com/f/content-management-publishing/content-processing-transformation/document-processing-conversion/document-processing-tools/pdf-processing-engines/pdf-processing/vector-indexing-pipelines.md) — Implements a technical pipeline for cleaning, splitting, and embedding raw documents for vector store ingestion.

### Data & Databases

- [Vector Database Integrations](https://awesome-repositories.com/f/data-databases/database-management-systems/database-engines/vector-databases/vector-database-integrations.md) — Details the configuration of vector stores to enable semantic similarity search and data retrieval.
- [Document Preprocessing Pipelines](https://awesome-repositories.com/f/data-databases/document-stores/document-preprocessing-pipelines.md) — Provides detailed instructions on cleaning and slicing diverse document types before storing them in vector databases. ([source](https://github.com/datawhalechina/llm-universe/tree/main/docs))
- [Vector Search](https://awesome-repositories.com/f/data-databases/vector-search.md) — Covers the process of converting document chunks into embeddings for high-dimensional similarity search.
- [Integration Tutorials](https://awesome-repositories.com/f/data-databases/vector-databases/integration-tutorials.md) — Includes instructions for processing raw documents into embeddings and implementing semantic search for language models.

### Development Tools & Productivity

- [Output Accuracy Verifiers](https://awesome-repositories.com/f/development-tools-productivity/terminal-output-monitors/output-validation/output-accuracy-verifiers.md) — Ships methods for checking generated responses against reference text to verify accuracy. ([source](https://github.com/datawhalechina/llm-universe/tree/main/notebook))

### Software Engineering & Architecture

- [LLM Performance Analyzers](https://awesome-repositories.com/f/software-engineering-architecture/performance-reliability/performance-optimization/application-performance-tuning/application-performance-optimization/llm-performance-analyzers.md) — Provides methods to identify and optimize bottlenecks within language model application logic. ([source](https://github.com/datawhalechina/llm-universe#readme))

### System Administration & Monitoring

- [System Quality Evaluators](https://awesome-repositories.com/f/system-administration-monitoring/application-quality-monitoring/system-quality-evaluators.md) — Implements frameworks for applying custom metrics to quantify the performance of RAG and agentic workflows. ([source](https://github.com/datawhalechina/llm-universe/tree/main/docs))

### Testing & Quality Assurance

- [LLM Evaluation](https://awesome-repositories.com/f/testing-quality-assurance/model-testing/llm-evaluation.md) — Provides tools and methods for measuring the quality of model outputs using custom metrics.
