Deep-Q-Network (DQN) - Notes on AI

# Deep-Q-Network (DQN) Approximates the Q function using a deep neural network. This introduces instability in training. Solves these issues with following additions: ## Experience Replay Buffer DQN stores its experiences in *Experience Replay Buffer* and learns on randomly sampled batches from this buffer. This buffer stores the tuple of `(state, action, reward, done, next_state)` and allows sampling given a `batch_size`. ## DQN Loss function The "target" is calculated using the Bellman equation: $ Q(s, a)<-\left(r+\gamma \max _{a^{\prime} \in A} Q\left(s^{\prime}, a^{\prime}\right)\right)^{2} $ Then optimization is done using [[Stochastic Gradient Descent]] in a familiar supervised-learning fashion with [[Loss Functions#Mean-Squared-Error Loss]]: $ L=\left(Q(s, a)-\left(r+\gamma \max _{a^{\prime} \in A} Q\left(s^{\prime}, a^{\prime}\right)\right)^{2}\right. $ ## Target Network To avoid problems with convergence, a separate network with same structure is created. The target network maintains a fixed value during the learning process, and perodically resets it to the orginal Q-network value. ![[target network q.png]] [Image Credit](https://greentec.github.io/reinforcement-learning-third-en/) --- ## References