# Compositional semantics and sentence representations ## Principle of compositionality The meaning of a complex expression is determined by the meanings of its constituents and by the rules used to combine them. Words = morphemes + structure Sentence = Words + structure ## Issues with semantic composition Similar syntactic structures may have different meanings - "it runs" - "it rains, it snows (here, it is a *pleonastic pronoun*)" Different syntactic structures may have the same meaning (e.g., passive constructions) - "Eve ate the apple." - "The apple was eaten by Eve." Not all phrases are interpreted compositionally (e.g., idioms), but the compositional interpretation is still possible - "kick the bucket" - "pull someone's leg" Additional meaning can arise through composition (e.g., *logical metonymy*) - "fast car" - "fast algorithm" - "begin a book" Meaning transfers (e.g., *metaphor*) - "he put a grape into his mouth and swallowed it whole" - "he swallowed her story whole" Additional connotations can arise through composition - "I can't buy this story" Recursive composition - "Of course I care about how you imagined I thougbt you perceived I wanted you to feel." A defining property of natural languages is productivity: they license a theoretically infinite set of possible expressions. Compositionality and recursion allow for productivity. ## Modelling compositional semantics with distributional semantics - composition is modelled in a vector space - unsupervised - general purpose representations Can [[Distributional Semantics]] can be extended to account for the meaning of phrases and sentences? Given a finite vocabulary, natural languages licence an infinite amount of sentences. So it is impossible to learn vector representations for all sentences. But we can still use distributional word representations and learn to perform semantic composition in distributional space with the following models: [[Vector Mixture Models]] [[Lexical semantics]] ## Modelling compositional semantics with Neural Networks - typically supervised, task-specific representations ### Bag of Words - Additive model: does not take word order or syntax into account - Task-specific word representations with fixed dimensionality $(d=5)$ - Dimensions of vector space are explicit, interpretable Sum the word vectors to get sentence vectors. What does bias vector do? ### Continuous Bag of Words - Additive model: does not take word order or syntax into account - Task-specific word representations of arbitrary dimensionality - Dimensions of vector space are not interpretable - Prediction can be traced back to the sentence vector dimensions ![[cbow.jpg]] ### Deep Continuous Bag of Words - Additive model: does not take word order or syntax into account - Task-specific word representations of arbitrary dimensionality - Dimensions of vector space are not interpretable - More layers and non-linear transformations: prediction cannot be easily traced back ![[deep-cbow.jpg]] ### Deep CBOW + pre-trained embeddings - Additive model: does not take word order or syntax into account - Pre-trained general-purpose word representations (e.g., Skip-gram, GloVe) - frozen: not updated during training - fine-tuned: updated with task-specific learning signal (specialised) - Dimensions of vector space are not interpretable - Multiple layers and non-linear transformations: prediction cannot be easily traced back ![[deep-cbow-pretrained.jpg]] ## Recurrent Neural Networks [[Recurrent Neural Networks (RNN)]] contain a cycle within their connections. The value depends on earlier outputs. Naturally handle sequential input: proçess one element at a time It allows us to drop history independence assumption and capture and exploit the temporal nature of language, including word order (and syntactic structure). In practice, to avoid limitations of vanilla RNN, we use [[LSTM]]s. ## Tree structured RNNs - More faithful operationalisation of the principle of compositionality - Helpful in disambiguation: similar surface form, different underlying structure - Requires syntactic parse trees (constituency or dependency) Implementation: [[Tree LSTM]] - A generalization of the LSTM to tree-structured input. --- ## References