Deep Q-Learning - Notes on AI

# Deep Q-Learning Deep Q-Learning extends Q-learning to high-dimensional state spaces by using neural networks to approximate the Q-function Q(s,a). This enables RL in complex environments like video games and robotics where tabular methods are infeasible. Q-learning algorithms for function approximators, such as [[Deep-Q-Network (DQN)]] (and all its variants) and [[Deep Deterministic Policy Gradient (DDPG)]], are largely based on minimizing mean squared bellman error (MSBE) loss function: L = (Q(s,a) - [r + γ max Q(s',a')])² Minimizes the squared difference between current Q-estimates and Bellman targets. ## Essential Techniques **Experience Replay**: Store and randomly sample past experiences to break correlation and improve stability **Target Networks**: Use separate, slowly-updated networks for computing targets to prevent instability during training ([[Multi-Network Training with Moving Average Target]]). ## Key Algorithms - [[Deep-Q-Network (DQN)]]: Discrete action spaces, ε-greedy exploration - Double DQN: Addresses overestimation bias - [[Deep Deterministic Policy Gradient (DDPG)]]: Continuous action spaces, actor-critic architecture - Rainbow DQN: Combines multiple improvements (prioritized replay, dueling networks, etc.) ## References 1. https://spinningup.openai.com/en/latest/algorithms/ddpg.html