# Model Free RL > "Never solve a more general problem as an intermediate step." ~ Vladimir Vapnik, 1998 If we care about optimal behaviour: why not learn a policy directly? Model-based RL: - 'Easy' to learn a model (supervised learning) - Learns 'all there is to know' from the data - Objective captures irrelevant information - May focus compute/capacity on irrelevant details - Computing policy (planning) is non-trivial and can be computationally expensive Value-based RL: - Closer to true objective (get the highest value for each state to get policy) - Fairly well-understood - somewhat similar to regression Still not the true objective - may still focus capacity on less-important details Policy-based RL: - Right objective! - Not the most efficient use of data - Ignores other learnable knowledge, so difficult to get off the ground at first --- ## References