# Query-focused multi-document summarization Example query: "Describe the coal mine accidents in China and actions taken" Steps in summarization: 1. find a set of relevant documents 2. simplify sentences 3. identify informative sentences in the documents 4. order the sentences into a summary 5. modify the sentences as needed ## Sentence simplification - parse sentences - hand-code rules to decide which modifiers to prune - appositives: e.g. Also on display was a painting by Sandor Landeau, ~~an artist who was living in Paris at the time~~. - attribution clauses: e.g. Eating too much bacon can lead to cancer, ~~the WHO reported on Monday~~. - PPs without proper names: e.g. Electoral support for Plaid Cymru increased ~~to a new level~~. - initial adverbials: e.g. ~~For example~~, ~~On the other hand~~, - also possible to develop a classifier (e.g. satelite identification and removal) ## Content selection from multiple documents Select informative and non-redundunt sentences: - Estimate informativeness of each sentence (based on informative words) - Start with the most informative sentence: - identify informative words based on e.g. tf-idf - words in the query also considered informative - Add sentences to the summary based on maximal marginal relevance (MMR) Maximal marginal relevance (MMR): iterative method to choose the best sentence to add to the summary so far - Relevance to the query: high cosine similarity between the sentence and the query - Novelty wrt the summary so far: low cosine similarity with the summary sentences $ \hat{s}=\underset{s_{i} \in D}{\operatorname{argmax}}\left[\lambda \operatorname{sim}\left(s_{i}, Q\right)-(1-\lambda) \max _{s_{j} \in S} \operatorname{sim}\left(s_{i}, s_{j}\right)\right] $ Stop when the summary has reached the desired length ## Sentence ordering in the summary - Chronologically: e.g. by date of the document - Coherence: - order based on sentence similarity (sentences next to each other should be similar, e.g. by cosine) - order so that the sentences next to each other discuss the same entity / referent - Topical ordering: learn a set of topics present in the documents, e.g. using topic modelling, and then order sentences by topic. An example summary: ![[summarization-multidocs.jpg]] probems - not good coherence, ordering seem a little off etc --- ## References