Natural Language Inference

# Natural Language Inference Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. ## Entailment - Entailment is a directional relation between two texts in which the information of the Premise entails the information of the Hypothesis. ## Contradiction - Contradiction is a symmetrical relation between two texts that cannot be true at the same time. ## Extended NLI tasks - NLI tasks can also involve recognizing: - Paraphrasing - symmetrical relation between two differently worded texts with approximately the same content - Specificity - directional relation between two texts in which one text is more precise and the other is more vague. ## Stanford Natural Language Inference Dataset (SNLI) - SNLI consists of 570,152 human annotated sentence pairs labeled for entailment, contradiction and semantic independance. Also consists of 10K dev and test pairs. - SNLI dataset addresses the issues of size, quality and indeterminancy that previous NLI datasets suffered from. - NLI datasets may suffer from indeterminancy of label in certain examples, which can only be solved after [[Coreference Resolution]]. Examples; A boat sank in Pacific ocean. A boat sank in Atlantic ocean. Is sea more important or a specific event more important in this context? - Before SNLI, primary sources of annotated NLI data have been Recognizing Textual Entailment challenge task (RTE) and Sentences Involving Compositional Knowlegde (SICK) datasets, but they were very small, with significant indeterminancy problems. - The Denotation Graph entrailment set contained millions of data points, but were annotated with fully automated methods, so it is noisy and is mostly used only as a supplementary source of data. - Top published results can be found here: https://nlp.stanford.edu/projects/snli/ ## Multi Genre Natual Language Inference (MultiNLI) - Modeled on SNLI, but differes in that it covers a range of genres of spoken and written text. --- ## References 1. A large annotated corpus for learning natural language inference, S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning, 2015. https://www.aclweb.org/anthology/D15-1075.pdf 2. Decomposing and comparing meaning relations, Kovatchev et al. https://www.aclweb.org/anthology/2020.lrec-1.709.pdf 3. The MultiNLI Copus https://cims.nyu.edu/~sbowman/multinli/