GRU - Notes on AI

# GRU The GRU unit controls the flow of information like the LSTM unit, but without having to use a **_memory unit, so there is no cell state_**. It just exposes the full hidden content without any control. GRU has update and reset gate instead of input, forget and output gates. Reset gate allows to throw away the previous information if it is irrelevant in the future. Update gate controls how much past state should matter now. Standard computes hidden layer at next time step directly $ h_t = \sigma \left(W^{(hh)}h_{t-1} + W^{(hx)}x_{[t]} \right) \\ $ GRU first computes an **update gate** based on current input word vector and hidden state: $ z_t = \sigma \left( W^{(z)}x_t + U^{(z)}h_{t-1} \right) $ Compute **reset gate** similarly but with different weights: $ r_t = \sigma \left( W^{(r)}x_t + U^{(r)}h_{t-1} \right) $ New **memory content**: $ \hat{h}_t = tanh(Wx_t + r_t \odot Uh_{t-1}) $ If reset gate unit is 0, then the previous memory is ignored and only new word information is stored. Final memory at time step combines current and previous time steps: $ h_t = z_t \odot h_{t-1} + (1 - z_t) \odot \hat{h}_t $ Reset gate allows to throw away the previous information if it is irrelevant in the future. ![[GRU.png]] Update gate controls how much past state should matter now. If z is close to 1 we can copy information in that unit through many time steps. Less vanishing gradient.