# Discourse Processing Modelling large text fragements is called discourse processing. ## Discourse structure Most types of document are highly structured, implicitly or explicitly: - Scientific papers: conventional structure (differences between disciplines). - News stories: first sentence is a summary. - Blogs, etc etc Topics within documents. Relationships between sentences. Topics within documents Ambiguity can arise from rhetorical relation ambiguity: Example: "Max fell. John pushed him." can be interpreted as: "Max fell because John pushed him." EXPLANATION or "Max fell and then John pushed him." NARRATION Implicit relationship: discourse relation or rhetorical relation because, and then are examples of cue phrases. Analysis of text with rhetorical relations generally gives a binary branching structure: - nucleus (the main phrase) and satellite (the subsidiary phrase: e.g. EXPLANATION, JUSTIFICATION - equal weight: e.g., NARRATION ## Coherence Discourses have to have connectivity to be coherent: - "Kim got into her car. Sandy likes apples."" Can be OK in context: - "Kim got into her car. Sandy likes apples, so Kim thought she'd go to the farm shop and see if she could get some." Discourse coherence assumptions can affect interpretation: "J"ohn likes Bill. He gave him an expensive Christmas present." If EXPLANATION - 'he' is probably Bill. If JUSTIFICATION (supplying evidence for another sentence), 'he' is John. ## Facors influencing discourse interpretation 1. Cue phrases (e.g. because, and) 2. Punctuation (also prosody) and text structure. Ex: "Max fell, John pushed him and Kim laughed."" 3. Real world content: "Max fell. John pushed him as he lay on the ground." 4. Tense and aspect: "Max fell. John had pushed him. Max was as falling. Joh pushed him." ## Discourse parsing Discourse parsing is the process of identifying discourse structure and relations. It is a hard problem, much research has focused on labelling relations between pairs of sentences / clauses. 1. Classification with hand-engineered features e.g. punctuation, cue phrases, syntactic and lexical 2. Neural models - take two sentences as input - train a sentence encoder - objective: predict the relation --- ## References 1. Chapter 22: Coreference resolution in Jurafsky and Martin (3rd edition). 2. Chapter 23: Discourse coherence in Jurafsky and Martin (3rd edition).