# Document Representations
Document classification tasks:
- text categorization (e.g. by topic)
- sentiment analysis
- authorship attribution
- spam and phishing email filtering
- misinformation detection
- and many more
Use sentence representations to build document representation using Bidirectional LSTM.
How to get the sentence representation?
1. Use $h_{L}$ - the final hidden state of the LSTM, but for longer sentences forgetting may be a problem.
2. Use an average of LSTM hidden states at all time steps (mean-pooling)
3. Use max-pooling - take the maximum value in each vector component of all hidden states. Gives good results in some tasks, not really understood why.
4. Use an attention mechanism, i.e. a weighted sum of the hidden states at all time steps.
$\alpha_{t}=w_{\alpha} \cdot \mathrm{FFNN}_{\alpha}\left(h_{t}\right)$
$a_{t}=\frac{e^{\alpha t}}{\sum_{k=1}^{L} e^{\alpha_{k}}}$
$h_{A T}=\sum_{t=1}^{L} a_{t} \cdot h_{t}$
How to get the document representation?
1. Feed the whole document to an LSTM word by word. Possibly use word-level attention to learn what are the useful words. But the problem again is the forgetting of RNNs.
2. Build a hierarchical model
- First compute sentence representations. Combine sentence representations into a document representation.
- Using another LSTM/and or attention over sentences.
- Train with a document level objective.
## Hierarchical attention networks
Yang et al. 2016. Hierarchical Attention Networks for Document Classification. NAACL.
1. Take pretrained word embeddings as input
2. LSTM sentence encoder with word-level attention (to construct sentence representations)
3. LSTM document encoder with sentence-level attention (to construct document representations)
4. Trained with document-level objective (sentiment analysis, text categorization).
![[hierarchical-attention-network.jpg]]
---
## References
1. Chapter 23: Discourse coherence in Jurafsky and Martin (3rd edition).