# Markov Reward Processes A MRP is essentially just a Markov chain with an associated reward function. In reinforcement learning, a MRP arises when you fix a policy $\pi$ for your MDP. Then all the decision making is accounted for, and we have a MRP with the induced transition kernel $p\left(s^{\prime} \mid s\right)=\int p\left(s^{\prime} \mid s, a\right) \pi(a \mid s) d a$ This MRP models the reward accrued by a given decision-making strategy $(\pi)$ in the MDP. --- ## References 1. Difference of MDP and MRP https://www.quora.com/What-is-the-difference-between-a-Markov-Decision-Process-MDP-and-a-Markov-Reward-Process-MRP