# Query-focused multi-document summarization
Example query: "Describe the coal mine accidents in China and actions taken"
Steps in summarization:
1. find a set of relevant documents
2. simplify sentences
3. identify informative sentences in the documents
4. order the sentences into a summary
5. modify the sentences as needed
## Sentence simplification
- parse sentences
- hand-code rules to decide which modifiers to prune
- appositives: e.g. Also on display was a painting by Sandor Landeau, ~~an artist who was living in Paris at the time~~.
- attribution clauses: e.g. Eating too much bacon can lead to cancer, ~~the WHO reported on Monday~~.
- PPs without proper names: e.g. Electoral support for Plaid Cymru increased ~~to a new level~~.
- initial adverbials: e.g. ~~For example~~, ~~On the other hand~~,
- also possible to develop a classifier (e.g. satelite identification and removal)
## Content selection from multiple documents
Select informative and non-redundunt sentences:
- Estimate informativeness of each sentence (based on informative words)
- Start with the most informative sentence:
- identify informative words based on e.g. tf-idf
- words in the query also considered informative
- Add sentences to the summary based on maximal marginal relevance (MMR)
Maximal marginal relevance (MMR): iterative method to choose the best sentence to add to the summary so far
- Relevance to the query: high cosine similarity between the sentence and the query
- Novelty wrt the summary so far: low cosine similarity with the summary sentences
$
\hat{s}=\underset{s_{i} \in D}{\operatorname{argmax}}\left[\lambda \operatorname{sim}\left(s_{i}, Q\right)-(1-\lambda) \max _{s_{j} \in S} \operatorname{sim}\left(s_{i}, s_{j}\right)\right]
$
Stop when the summary has reached the desired length
## Sentence ordering in the summary
- Chronologically: e.g. by date of the document
- Coherence:
- order based on sentence similarity (sentences next to each other should be similar, e.g. by cosine)
- order so that the sentences next to each other discuss the same entity / referent
- Topical ordering: learn a set of topics present in the documents, e.g. using topic modelling, and then order sentences by topic.
An example summary:
![[summarization-multidocs.jpg]]
probems - not good coherence, ordering seem a little off etc
---
## References