# Deep-Q-Network (DQN)
Approximates the Q function using a deep neural network. This introduces instability in training. Solves these issues with following additions:
## Experience Replay Buffer
DQN stores its experiences in *Experience Replay Buffer* and learns on randomly sampled batches from this buffer. This buffer stores the tuple of `(state, action, reward, done, next_state)` and allows sampling given a `batch_size`.
## DQN Loss function
The "target" is calculated using the Bellman equation:
$
Q(s, a)<-\left(r+\gamma \max _{a^{\prime} \in A} Q\left(s^{\prime}, a^{\prime}\right)\right)^{2}
$
Then optimization is done using [[Stochastic Gradient Descent]] in a familiar supervised-learning fashion with [[Loss Functions#Mean-Squared-Error Loss]]:
$
L=\left(Q(s, a)-\left(r+\gamma \max _{a^{\prime} \in A} Q\left(s^{\prime}, a^{\prime}\right)\right)^{2}\right.
$
## Target Network
To avoid problems with convergence, a separate network with same structure is created. The target network maintains a fixed value during the learning process, and perodically resets it to the orginal Q-network value.
![[target network q.png]]
[Image Credit](https://greentec.github.io/reinforcement-learning-third-en/)
---
## References