# Deep Q-Learning
Deep Q-Learning extends Q-learning to high-dimensional state spaces by using neural networks to approximate the Q-function Q(s,a). This enables RL in complex environments like video games and robotics where tabular methods are infeasible.
Q-learning algorithms for function approximators, such as [[Deep-Q-Network (DQN)]] (and all its variants) and [[Deep Deterministic Policy Gradient (DDPG)]], are largely based on minimizing mean squared bellman error (MSBE) loss function:
L = (Q(s,a) - [r + γ max Q(s',a')])²
Minimizes the squared difference between current Q-estimates and Bellman targets.
## Essential Techniques
**Experience Replay**: Store and randomly sample past experiences to break correlation and improve stability
**Target Networks**: Use separate, slowly-updated networks for computing targets to prevent instability during training ([[Multi-Network Training with Moving Average Target]]).
## Key Algorithms
- [[Deep-Q-Network (DQN)]]: Discrete action spaces, ε-greedy exploration
- Double DQN: Addresses overestimation bias
- [[Deep Deterministic Policy Gradient (DDPG)]]: Continuous action spaces, actor-critic architecture
- Rainbow DQN: Combines multiple improvements (prioritized replay, dueling networks, etc.)
## References
1. https://spinningup.openai.com/en/latest/algorithms/ddpg.html