# Text Summarization Task: generate a short version of a text that contains the most important information. Single-document summarisation - given a single document produce its short summary. Multi-document summarisation (very difficult problem) - given a set of documents, produce a brief summary of their content. Information ordering and redundancy make it more challenging. ### Generic vs. Query-focused summarization Generic summarisation: - identifying important information in the document(s) and presenting it in a short summary Query-focused summarisation: - summarising the document in order to answer a specific query from a user - sits between the boundary of [[Natural Language Processing]] and [[Information Retrieval]] - Example: information/knowledge cards on search results ## Extractive Summarization extract important / relevant sentences from the document(s) - combine them into a summary Three main components: - Content selection: identify important sentences to extract from the document - Information ordering: order the sentences within the summary - Sentence simplification ### Content Selection #### Unsupervised approach - Choose sentences that contain informative words - Informativeness measured by: - tf-idf: assign a weight to each word $i$ in the doc $j$ as $ \begin{array}{l} \text { weight }\left(w_{i}\right)=t f_{i j} * i d f_{i} \\ \end{array} $ $tf_{i j}$ frequency of word } i in doc j $i d f_{i}-$ inverse document frequency $ i d f_{i}=\log \frac{N}{n_{i}} $ $N$ - total docs; $n_{i}$ docs containing $w_{i}$ - [[Mutual Information]] #### Unsupervised approach - start with a training set of documents and their summaries - align sentences in summaries and documents - extract features: - position of the sentence (e.g. first sentence) - sentence length - informative words - cue phrases etc. - train a binary classifier: should the sentence be included in the summary? In reality, it doesn't work as well, worse than unsupervised approach. Problems with the supervised approach: - difficult to obtain data - difficult to align human-produced summaries with sentences in the doc - doesn't perform better than unsupervised in practice ### Ordering For single document summarization, - very straightforward - simply follow the order in the original document. An example summary: ![[example-summary-nlp.jpg]] Problems: - Repetition of his name, longer sentences, could use pronouns Solutions - topical ordering, coherence-based ordering, or ordering based on semantic similarity. ## Abstractive Summarization - interpret the content of the document (semantics, discourse etc.) and generate the summary - formulate the summary using other words than in the document - very hard to do! Task: given a short article, generate a headline Training data: e.g. Gigaword (10m articles), CNN dataset ![[abstractive-summarization-exs.jpg]] Use [[Seq2Seq]] models: - Encoder RNN: produces a fixed-size vector representation of the input document - Decoder RNN: generates the output summary word-by-word based on the input representation Chopra et al. 2017 . Abstractive Sentence Summarization with Attentive Recurrent Neural Networks Input: economic growth in toronto will suffer this year because of sars, a think tank said friday as health authorities insisted the illness was under control in canada's largest city. Summary: think tank says economic growth in toronto will suffer this year Input: an international terror suspect who had been under a controversial loose form of house arrest is on the run, british home secretary john reid said tuesday. Summary: international terror suspect under house arrest. ## Evaluating summarisation systems 1. Evaluate against human judgements - "Is this a good summary?" - Use multiple subjects, measure agreement - The best way, but expensive 1. ROUGE (Recall oriented understudy for gisting evaluation) For each document in the dataset: - humans produce a set of reference summaries $R_{1}, \ldots, R_{N}$ - the system generates a summary S - compute the percentage of n-grams from the reference summaries that occur in S - let's look at ROUGE-2 - using bigrams - compute the percentage of bigrams from the reference Summaries $R_{1}, \ldots, R_{N}$ that occur in $S$ $ \text { ROUGE-2 }=\frac{\sum_{R_{i}} \sum_{\text {bigram}_{j} \in R_{i}} \text { COunt match }(j, S)}{\sum_{R_{i}} \sum_{\text {bigram}_{j} \in R_{i}} \operatorname{count}\left(j, R_{i}\right)} $ Dong, 2018 A Survey on Neural Network-Based Summarization Methods - Extractive summarisation The highest ROUGE-2 = 0.27 - Abstractive summarisation The highest ROUGE-2 = 0.17 Though the task / datasets are different, so not directly comparable. --- ## References