Semi-Markov Decision Processes

# Semi-Markov Decision Processes It is a semi-MDP because the process is Markovian at the level of decision points/epochs (at the level of the decisions over options) but not at the "flat" level. That is, if you don't observe the current choice of options along the trajectories and only see state-action pairs, that process won't be Markovian. Semi-MDPs are thus used to deal with such problems that involve actions of different levels of abstraction. **Hierarchical reinforcement learning** (HRL) is a generalization (or extension) of reinforcement learning where the environment is modeled as a semi-MDP. ## Option An option is a generalization of the concept of action. It captures the idea that certain actions are composed of other sub-actions. An example from: > Examples of options include _picking up an object, going to lunch, and traveling to a distant city_, as well as primitive actions such as _muscle twitches_ and _joint torques_. ![[smdp-options.png]] --- ## References 1. What is semi-MDP? https://stats.stackexchange.com/questions/219796/from-markov-decision-process-mdp-to-semi-mdp-what-is-it-in-a-nutshell 2. What are options in RL? https://ai.stackexchange.com/a/13255 3. SMDPs, Chapter 7, A First Course in Stochastic Models, HC Tijms http://read.pudn.com/downloads74/ebook/272070/A%20First%20Course%20in%20Stochastic%20Models/7%20Semi-Markov%20Decision%20Processes.pdf