# Reinforcement Learning
![[RL-taxonomy.svg]]
[Image Source](https://spinningup.openai.com/)
## Fundamentals
Focus on understanding the methods and the relationship between them rather than on remembering e.g. update equations
Especially important: know the advantages, disadvantages and limitations of each methods, and the situations where a certain method should be preferred.
### Introduction, MDP and Bandits
- [[Reinforcement Learning Problem Setup]]
- [[Multi-Armed Bandits]]
- [[Markov Decision Processes]]
### Sequential Decision Making
- [[Bellman Equation and Value Functions]]
- [[Dynamic Programming (RL)]]
### MC Methods and TD(0)
- [[Monte-Carlo RL Methods]]
### Advanced TD Methods
- [[Temporal Difference Learning]]
- [[Deep-Q-Network]]
### Prediction with Approximation
- [[On-policy learning with approximation]]
### Control with Approximation
- [[Off-policy learning with approximation]]
![[model-free-rl.jpg]]
[[Model Free Reinforcement Learning]]
### Poilcy Gradient - REINFORCE and Approximations
- [[Policy Gradient]]
- [[REINFORCE - Monte Carlo Policy Gradient]]
### Policy Gradient - PGT and GAE
- [[PGT Actor-Critic]]
- [[Generalized Advantage Estimate]]
### Advanced Policy Search
- [[Natural Policy Gradient]]
- [[TRPO - Trust-Region Policy Optimization]]
- [[PPO - Proximal Policy Optimization]]
### Deterministic PG and Evaluation
- [[Deterministic Policy Gradient]]
- [[Deep Deterministic Policy Gradient]]
- [[Evaluating RL algorithms]]
- [[Soft Actor-Critic]]
### Planning and Learning
- [[Prioritized Sweeping]]
- [[Trajectory Sampling]]
[[Model Based Reinforcement Learning]] can be categorized as:
Observations-predicting:
- [[World Models]]
- [[Dyna-Q - Planning and Learning]]
Value-predicting:
- [[MuZero]]
- [[The Predictron]]
- [[TreeQN]]
- [[Value Prediction Network]]
- [[AlphaZero]]
- [[ReBeL]]
Recent works:
- [[When to Trust Your Model - Model-Based Policy Optimization]] - Neurips 2019
- MOPO https://arxiv.org/abs/2005.13239 - Neurips 2020
- Can be combined with structures priors as well:
- Deep RL with relational inductive bias (ICLR 2019, https://openreview.net/pdf?id=HkxaFoC9KQ)
- Relational Neural Expectation Maximization: Unsupervised discovery of objects and their interactions (ICLR 2018, https://arxiv.org/pdf/1802.10353.pdf)
- Other works
- https://www.researchgate.net/publication/346469184_The_Bottleneck_Simulator_A_Model-Based_Deep_Reinforcement_Learning_Approach
- [[Decision Transformer]]
### Partial Observability
- [[State Update Functions in Partially Observable MDP]]
### Pure Exploration
- [[Best Arm Identification]]
[[Benefits and challenges of different RL methods]]
[[Inverse Reinforcement Learning]]
---
## Resources
1. Solutions to RL:AI 2nd Edition: https://github.com/LyWangPX/Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions/