Markov Reward Processes

# Markov Reward Processes A MRP is essentially just a [[Markov Chain]] with an associated reward function. In [[Reinforcement Learning]], a MRP arises when you fix a policy $\pi$ for your MDP. Then all the decision making is accounted for, and we have a MRP with the induced transition kernel $p\left(s^{\prime} \mid s\right)=\int p\left(s^{\prime} \mid s, a\right) \pi(a \mid s) d a$ This MRP models the reward accrued by a given decision-making strategy $(\pi)$ in the MDP.