# Text Summarization
Task: generate a short version of a text that contains the most important information.
Single-document summarisation - given a single document produce its short summary.
Multi-document summarisation (very difficult problem) - given a set of documents, produce a brief summary of their content. Information ordering and redundancy make it more challenging.
### Generic vs. Query-focused summarization
Generic summarisation:
- identifying important information in the document(s) and presenting it in a short summary
Query-focused summarisation:
- summarising the document in order to answer a specific query from a user
- sits between the boundary of [[Natural Language Processing]] and [[Information Retrieval]]
- Example: information/knowledge cards on search results
## Extractive Summarization
extract important / relevant sentences from the document(s)
- combine them into a summary
Three main components:
- Content selection: identify important sentences to extract from the document
- Information ordering: order the sentences within the summary
- Sentence simplification
### Content Selection
#### Unsupervised approach
- Choose sentences that contain informative words
- Informativeness measured by:
- tf-idf: assign a weight to each word $i$ in the doc $j$ as
$
\begin{array}{l}
\text { weight }\left(w_{i}\right)=t f_{i j} * i d f_{i} \\
\end{array}
$
$tf_{i j}$ frequency of word } i in doc j
$i d f_{i}-$ inverse document frequency
$
i d f_{i}=\log \frac{N}{n_{i}}
$
$N$ - total docs; $n_{i}$ docs containing $w_{i}$
- [[Mutual Information]]
#### Unsupervised approach
- start with a training set of documents and their summaries
- align sentences in summaries and documents
- extract features:
- position of the sentence (e.g. first sentence)
- sentence length
- informative words
- cue phrases etc.
- train a binary classifier: should the sentence be included in the summary?
In reality, it doesn't work as well, worse than unsupervised approach.
Problems with the supervised approach:
- difficult to obtain data
- difficult to align human-produced summaries with sentences in the doc
- doesn't perform better than unsupervised in practice
### Ordering
For single document summarization,
- very straightforward
- simply follow the order in the original document.
An example summary:
![[example-summary-nlp.jpg]]
Problems:
- Repetition of his name, longer sentences, could use pronouns
Solutions - topical ordering, coherence-based ordering, or ordering based on semantic similarity.
## Abstractive Summarization
- interpret the content of the document (semantics, discourse etc.) and generate the summary
- formulate the summary using other words than in the document
- very hard to do!
Task: given a short article, generate a headline
Training data: e.g. Gigaword (10m articles), CNN dataset
![[abstractive-summarization-exs.jpg]]
Use [[Seq2Seq]] models:
- Encoder RNN: produces a fixed-size vector representation of the input document
- Decoder RNN: generates the output summary word-by-word based on the input representation
Chopra et al. 2017 . Abstractive Sentence Summarization with Attentive Recurrent Neural Networks
Input: economic growth in toronto will suffer this year because of sars, a think tank said friday as health authorities insisted the illness was under control in canada's largest city.
Summary: think tank says economic growth in toronto will suffer this year
Input: an international terror suspect who had been under a controversial loose form of house arrest is on the run, british home secretary john reid said tuesday.
Summary: international terror suspect under house arrest.
## Evaluating summarisation systems
1. Evaluate against human judgements
- "Is this a good summary?"
- Use multiple subjects, measure agreement
- The best way, but expensive
1. ROUGE (Recall oriented understudy for gisting evaluation) For each document in the dataset:
- humans produce a set of reference summaries $R_{1}, \ldots, R_{N}$
- the system generates a summary S
- compute the percentage of n-grams from the reference summaries that occur in S
- let's look at ROUGE-2 - using bigrams
- compute the percentage of bigrams from the reference Summaries $R_{1}, \ldots, R_{N}$ that occur in $S$
$
\text { ROUGE-2 }=\frac{\sum_{R_{i}} \sum_{\text {bigram}_{j} \in R_{i}} \text { COunt match }(j, S)}{\sum_{R_{i}} \sum_{\text {bigram}_{j} \in R_{i}} \operatorname{count}\left(j, R_{i}\right)}
$
Dong, 2018 A Survey on Neural Network-Based Summarization Methods
- Extractive summarisation
The highest ROUGE-2 = 0.27
- Abstractive summarisation
The highest ROUGE-2 = 0.17
Though the task / datasets are different, so not directly comparable.
---
## References