Reinforcement Learning

# Reinforcement Learning ![[RL-taxonomy.svg]] ## Fundamentals ### Introduction, MDP and Bandits - [[Reinforcement Learning Problem Setup]] - [[Multi-Armed Bandits]] - [[Markov Decision Processes]] ### Sequential Decision Making - [[Bellman Equation and Value Functions]] - [[Dynamic Programming (RL)]] ### MC Methods and TD(0) - [[Monte-Carlo RL Methods]] ### Advanced TD Methods - [[Temporal Difference Learning]] - [[Deep Q-Learning]] - [[Eligibility Trace]] ### Prediction with Approximation - [[On-policy learning with approximation]] ### Control with Approximation - [[Off-policy learning with approximation]] ![[model-free-rl.jpg]] [[Model Free Reinforcement Learning]] ### Poilcy Gradient - REINFORCE and Approximations - [[Policy Gradient]] - [[REINFORCE - Monte Carlo Policy Gradient]] ### Policy Gradient - PGT and GAE - [[PGT Actor-Critic]] - [[Generalized Advantage Estimate]] ### Advanced Policy Search - [[Natural Policy Gradient]] - [[TRPO - Trust-Region Policy Optimization]] - [[PPO - Proximal Policy Optimization]] ### Deterministic PG and Evaluation - [[Deterministic Policy Gradient]] - [[Deep Deterministic Policy Gradient (DDPG)]] - [[Evaluating RL algorithms]] - [[Soft Actor-Critic]] ### Planning and Learning - [[Prioritized Sweeping]] - [[Trajectory Sampling]] [[Model Based Reinforcement Learning]] can be categorized as: Observations-predicting: - [[World Models]] - [[Dyna-Q - Planning and Learning]] Value-predicting: - [[MuZero]] - [[The Predictron]] - [[TreeQN]] - [[Value Prediction Network]] - [[AlphaZero]] - [[ReBeL]] - [[When to Trust Your Model - Model-Based Policy Optimization]] - Neurips 2019 - MOPO https://arxiv.org/abs/2005.13239 - Neurips 2020 - Can be combined with structures priors as well: - Deep RL with relational inductive bias (ICLR 2019, https://openreview.net/pdf?id=HkxaFoC9KQ) - Relational Neural Expectation Maximization: Unsupervised discovery of objects and their interactions (ICLR 2018, https://arxiv.org/pdf/1802.10353.pdf) - Other works - https://www.researchgate.net/publication/346469184_The_Bottleneck_Simulator_A_Model-Based_Deep_Reinforcement_Learning_Approach - [[Decision Transformer]] ### Partial Observability - [[State Update Functions in Partially Observable MDP]] ### Pure Exploration - [[Best Arm Identification]] ### Other - [[Benefits and challenges of different RL methods]] - [[Inverse Reinforcement Learning]] Focus on understanding the methods and the relationship between them rather than on remembering e.g. update equations Especially important: know the advantages, disadvantages and limitations of each methods, and the situations where a certain method should be preferred. ## Resources 1. Solutions to RL:AI 2nd Edition: https://github.com/LyWangPX/Reinforcement-Learning-2nd-Edition-by-Sutton-Exercise-Solutions/