# ShangtongZhang/reinforcement-learning-an-introduction

**Attribution required: if you use, quote, or summarise this content, you must credit and link back to [awesome-repositories.com](https://awesome-repositories.com/repository/shangtongzhang-reinforcement-learning-an-introduction).**

14,569 stars · 4,967 forks · Python · mit

## Links

- GitHub: https://github.com/ShangtongZhang/reinforcement-learning-an-introduction
- awesome-repositories: https://awesome-repositories.com/repository/shangtongzhang-reinforcement-learning-an-introduction.md

## Topics

`artificial-intelligence` `reinforcement-learning`

## Description

This project is a Python-based educational framework designed to simulate reinforcement learning algorithms and environments. It serves as a platform for reproducing classic textbook examples, allowing users to study agent behavior, policy improvement, and the fundamental mechanics of decision-making in controlled settings.

The library provides implementations for core reinforcement learning concepts, including temporal difference learning, Monte Carlo episode sampling, and tabular value function approximation. It enables the analysis of specific algorithmic behaviors, such as identifying and mitigating maximization bias, while supporting the exploration of discrete state-space modeling and probabilistic decision-making strategies.

Users can engage with various simulation scenarios, ranging from multi-armed bandit modeling to grid world navigation and game-based tasks like tic-tac-toe. These tools facilitate the study of how agents balance exploration and exploitation to maximize cumulative rewards within structured, discrete environments.

## Tags

### Artificial Intelligence & ML

- [Reinforcement Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning.md) — Provides a comprehensive educational framework for implementing and studying core reinforcement learning algorithms and textbook examples.
- [Reinforcement Learning Environments](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/reinforcement-learning-environments.md) — Offers a suite of simulation environments for testing and analyzing agent behavior in discrete state spaces and classic game scenarios.
- [Python Machine Learning Libraries](https://awesome-repositories.com/f/artificial-intelligence-ml/python-machine-learning-libraries.md) — Provides a Python-based library for simulating reinforcement learning environments and agent-based decision-making.
- [Reinforcement Learning Simulators](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-algorithms/reinforcement-learning-simulators.md) — Simulates standard reinforcement learning algorithms to reproduce textbook examples and study agent behavior. ([source](https://github.com/ShangtongZhang/reinforcement-learning-an-introduction#readme))
- [Grid World Simulation Environments](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/reinforcement-learning-environments/grid-world-simulation-environments.md) — Provides grid-based simulation environments for testing agent navigation and goal-oriented behavior. ([source](https://github.com/ShangtongZhang/reinforcement-learning-an-introduction/tree/master/chapter03))
- [Multi-Armed Bandit Modeling](https://awesome-repositories.com/f/artificial-intelligence-ml/custom-model-training/reward-modeling/multi-armed-bandit-modeling.md) — Simulates decision-making scenarios where agents balance exploration and exploitation over independent choices.
- [Exploration Strategies](https://awesome-repositories.com/f/artificial-intelligence-ml/exploration-strategies.md) — Implements probabilistic decision mechanisms to manage the exploration-exploitation trade-off in reinforcement learning.
- [Monte Carlo Trajectory Estimators](https://awesome-repositories.com/f/artificial-intelligence-ml/monte-carlo-sampling-methods/monte-carlo-trajectory-estimators.md) — Estimates value functions by averaging total returns observed across complete interaction episodes.
- [Multi-Armed Bandit Simulators](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-algorithms/multi-armed-bandit-simulators.md) — Models decision-making scenarios where agents balance exploration and exploitation to maximize total rewards. ([source](https://github.com/ShangtongZhang/reinforcement-learning-an-introduction/tree/master/chapter02))
- [Temporal Difference Learning](https://awesome-repositories.com/f/artificial-intelligence-ml/temporal-difference-learning.md) — Solves reinforcement learning tasks by updating value estimates based on subsequent state evaluations. ([source](https://github.com/ShangtongZhang/reinforcement-learning-an-introduction/tree/master/chapter06))
- [Temporal Difference Bootstrapping Methods](https://awesome-repositories.com/f/artificial-intelligence-ml/temporal-difference-learning/temporal-difference-bootstrapping-methods.md) — Updates value estimates using subsequent state evaluations without waiting for episode completion.
- [Reinforcement Learning Policy Improvement](https://awesome-repositories.com/f/artificial-intelligence-ml/agentic-systems-frameworks/agentic-workflows/iterative-refinement-workflows/reinforcement-learning-policy-improvement.md) — Refines agent behavior through iterative policy evaluation and improvement cycles.
- [Reinforcement Learning Algorithm Analyzers](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-algorithms/reinforcement-learning-algorithm-analyzers.md) — Evaluates the performance and convergence of learning agents to optimize policy improvement strategies.
- [Stochastic Exploration Mechanisms](https://awesome-repositories.com/f/artificial-intelligence-ml/exploration-strategies/stochastic-exploration-mechanisms.md) — Manages the trade-off between testing unknown actions and selecting known high-reward options using stochastic strategies.
- [Game-Based](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/frameworks/reinforcement-learning-environments/game-based.md) — Executes reinforcement learning agents against human players or other agents in grid-based games. ([source](https://github.com/ShangtongZhang/reinforcement-learning-an-introduction/tree/master/chapter01))
- [Tabular Value Function Approximators](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/machine-learning-concepts/training-and-optimization/approximate-training-methods/quadratic-function-approximations/linear-function-approximators/tabular-value-function-approximators.md) — Tracks and updates expected returns for every discrete environment configuration using structured lookup tables.
- [Maximization Bias Analyzers](https://awesome-repositories.com/f/artificial-intelligence-ml/reinforcement-learning-algorithms/maximization-bias-analyzers.md) — Identifies and mitigates maximization bias during the process of policy improvement. ([source](https://github.com/ShangtongZhang/reinforcement-learning-an-introduction/tree/master/chapter06))
- [Discrete State-Space Models](https://awesome-repositories.com/f/artificial-intelligence-ml/machine-learning/architectures/sequence-models/selective-state-space-models/discrete-state-space-models.md) — Provides discrete state-space modeling for exact value function calculation in reinforcement learning environments.

### User Interface & Experience

- [Tabular State-Action Mappings](https://awesome-repositories.com/f/user-interface-experience/state-update-logic/state-action-value-updates/tabular-state-action-mappings.md) — Stores expected returns for discrete state-action pairs in structured lookup tables.