# Model Free RL
> "Never solve a more general problem as an intermediate step." ~ Vladimir Vapnik, 1998
If we care about optimal behaviour: why not learn a policy directly?
Model-based RL:
- 'Easy' to learn a model (supervised learning)
- Learns 'all there is to know' from the data
- Objective captures irrelevant information
- May focus compute/capacity on irrelevant details
- Computing policy (planning) is non-trivial and can be computationally expensive
Value-based RL:
- Closer to true objective (get the highest value for each state to get policy)
- Fairly well-understood - somewhat similar to regression Still not the true objective - may still focus capacity on less-important details
Policy-based RL:
- Right objective!
- Not the most efficient use of data
- Ignores other learnable knowledge, so difficult to get off the ground at first
---
## References