# Discourse Processing
Modelling large text fragements is called discourse processing.
## Discourse structure
Most types of document are highly structured, implicitly or explicitly:
- Scientific papers: conventional structure (differences between disciplines).
- News stories: first sentence is a summary.
- Blogs, etc etc Topics within documents. Relationships between sentences.
Topics within documents
Ambiguity can arise from rhetorical relation ambiguity:
Example:
"Max fell. John pushed him." can be interpreted as:
"Max fell because John pushed him." EXPLANATION
or
"Max fell and then John pushed him." NARRATION
Implicit relationship: discourse relation or rhetorical relation because, and then are examples of cue phrases.
Analysis of text with rhetorical relations generally gives a binary branching structure:
- nucleus (the main phrase) and satellite (the subsidiary phrase: e.g. EXPLANATION, JUSTIFICATION
- equal weight: e.g., NARRATION
## Coherence
Discourses have to have connectivity to be coherent:
- "Kim got into her car. Sandy likes apples.""
Can be OK in context:
- "Kim got into her car. Sandy likes apples, so Kim thought she'd go to the farm shop and see if she could get some."
Discourse coherence assumptions can affect interpretation:
"J"ohn likes Bill. He gave him an expensive Christmas present."
If EXPLANATION - 'he' is probably Bill.
If JUSTIFICATION (supplying evidence for another sentence), 'he' is John.
## Facors influencing discourse interpretation
1. Cue phrases (e.g. because, and)
2. Punctuation (also prosody) and text structure. Ex: "Max fell, John pushed him and Kim laughed.""
3. Real world content: "Max fell. John pushed him as he lay on the ground."
4. Tense and aspect: "Max fell. John had pushed him. Max was as falling. Joh pushed him."
## Discourse parsing
Discourse parsing is the process of identifying discourse structure and relations. It is a hard problem, much research has focused on labelling relations between pairs of sentences / clauses.
1. Classification with hand-engineered features e.g. punctuation, cue phrases, syntactic and lexical
2. Neural models
- take two sentences as input
- train a sentence encoder
- objective: predict the relation
---
## References
1. Chapter 22: Coreference resolution in Jurafsky and Martin (3rd edition).
2. Chapter 23: Discourse coherence in Jurafsky and Martin (3rd edition).