Partial Observability

# Partial Observability Partial observability occurs when an agent cannot directly observe the full state of the environment, only receiving incomplete observations. This violates the Markov property since the current observation may not contain all information needed for optimal decision-making. These problems are modeled as Partially Observable Markov Decision Processes (POMDPs). Agents handle this through two main approaches: **Beliefs**: Maintain a probability distribution over possible true states given observation history (formally optimal but computationally expensive). Used in methods like particle filters or belief-state planning. **Memory**: Use [[Recurrent Neural Networks (RNN)]] hidden state instead of raw observation as it makes the decision process Markov again, just with respect to the internal state rather than the environment state. Memory mechanisms like [[Eligibility Trace]] can also be used to remember relevant past observations/actions without explicitly modeling state distributions. More practical and commonly used in deep RL. Examples include poker (hidden opponent cards), robotics with limited sensors, or trading with incomplete market information.