# Document Representations Document classification tasks: - text categorization (e.g. by topic) - sentiment analysis - authorship attribution - spam and phishing email filtering - misinformation detection - and many more Use sentence representations to build document representation using Bidirectional LSTM. How to get the sentence representation? 1. Use $h_{L}$ - the final hidden state of the LSTM, but for longer sentences forgetting may be a problem. 2. Use an average of LSTM hidden states at all time steps (mean-pooling) 3. Use max-pooling - take the maximum value in each vector component of all hidden states. Gives good results in some tasks, not really understood why. 4. Use an attention mechanism, i.e. a weighted sum of the hidden states at all time steps. $\alpha_{t}=w_{\alpha} \cdot \mathrm{FFNN}_{\alpha}\left(h_{t}\right)$ $a_{t}=\frac{e^{\alpha t}}{\sum_{k=1}^{L} e^{\alpha_{k}}}$ $h_{A T}=\sum_{t=1}^{L} a_{t} \cdot h_{t}$ How to get the document representation? 1. Feed the whole document to an LSTM word by word. Possibly use word-level attention to learn what are the useful words. But the problem again is the forgetting of RNNs. 2. Build a hierarchical model - First compute sentence representations. Combine sentence representations into a document representation. - Using another LSTM/and or attention over sentences. - Train with a document level objective. ## Hierarchical attention networks Yang et al. 2016. Hierarchical Attention Networks for Document Classification. NAACL. 1. Take pretrained word embeddings as input 2. LSTM sentence encoder with word-level attention (to construct sentence representations) 3. LSTM document encoder with sentence-level attention (to construct document representations) 4. Trained with document-level objective (sentiment analysis, text categorization). ![[hierarchical-attention-network.jpg]] --- ## References 1. Chapter 23: Discourse coherence in Jurafsky and Martin (3rd edition).