# Language regeneration Regeneration form What? Starting points: Some semantic representation: - logical form (early work) - distributional representations (e.g. paraphrasing) - hidden states of a neural network Formally-defined data: databases, knowledge bases Numerical data: e.g., weather reports. ## Regeneration: transforming text Tasks that come under regeration are: - [[Machine Translation]] - [[Text Summarization]] - Text simplification ### Subtasks in generation - Content selection: deciding what information to convey (selecting important or relevant content) - Discourse structuring: overall ordering - Aggregation: splitting information into sentence-sized chunks - Referring expression generation: deciding when to use pronouns, which modifiers to use etc - Lexical choice: which lexical items convey a given concept - Realisation: mapping from a meaning representation to a string - Fluency ranking: discriminate between grammatically / semantically valid and invalid sentences ### Approaches to generation - Templates: fixed text with slots, fixed rules for content selection. - Statistical: use machine learning (supervised or unsupervised) for the various subtasks. - Deep learning: particularly for regeneration tasks. Large scale dialogue and question answering systems, such as Siri, use a combination of the above techniques. --- ## References