# Model Based Reinforcement Learning
Model-based reinforcement learning (MBRL) is widely seen as having the potential to be significantly more sample efficient than model-free RL.
MFRL's high sample complexity limits largely their application to simulated domains.
Advantages:
- Can efficiently learn model by supervised learning methods
- Can reason about model uncertainty (like in upper confidence bound methods for exploration/exploitation trade offs)
- Generalization - If the dynamics (reward) of the environment change, it can use the learned-model and replan.
- Incorporate uncertainity - helps exploration/exploitation
Disadvantages
- First learn a model, then construct a value function -> two sources of approximation error
However, there are significant challenges.
## Challenges of MBRL
These are the challenges identified by [[Benchmarking Model-Based Reinforcement Learning]]:
### Dynamics bottleneck
- The performance does not increase when more data is collected
- Models with learned dynamics get stuck at performance local minima significantly worse than using ground-truth dynamics.
- The prediction error accumulates with time, and MBRL inevitably involves prediction on unseen states.
- The policy and the learning of dynamics is coupled, which makes the agents more prone to performance local-minima.
- Exploration and off-policy learning are barely addressed on current model-based approaches.
### Planning horizon dilemma
- While increasing the planning horizon provides more accurate reward estimation, it can result in performance drops.
- Planning horizon between 20 to 40 works the best both for the models using ground-truth dynamics and the ones using learned dynamics.
- This can be attributed to insufficient planning in a search space which increases exponentially with planning depth, i. e., the curse of dimensionality.
### Early termination dilemma
- Early termination, when the episode is finalized before the horizon has been reached, is a standard technique used in MFRL algorithms to prevent the agent from visiting unpromising states or damaging states for real robots.
- MBRL can correspondingly also apply early termination in the planned trajectories, or generate early terminated imaginary data, but hard to integrate into the existing MB algorithms.
- Early termination does in fact decrease the performance for MBRL algorithms of different types.
- To perform efficient learning in complex environments, such as Humanoid, early termination is almost necessary. This is an important area for research.
## Types of model based techniques
### Analytic gradient based
### Sampling-based planning
-
### Model-based data generation
### Value-equivalence prediction
---
## References
1. https://bair.berkeley.edu/blog/2019/12/12/mbpo/