Lexical Function Models

# Lexical Function Models Drops the assumption that all syntactic types are similar enough to have the same dimensionality. $ \mathbf{p}=f(\mathbf{U}, \mathbf{v}, R, K) $ Distinguish between words whose meaning is directly determined by their distributional profile, e.g. nouns and words that act as functions transforming the distributional profile of other words, e.g. adjectives, adverbs Important work: Baroni and Zamparelli. (2010) Representing adjective noun constructions in semantic space. In Proceedings of EMNLP. $ \mathbf{p}=f(\mathbf{U}, \mathbf{v}, A D J)=\mathbf{U v} $ ## Adjectives modelled as lexical functions that are applied to nouns - Adjectives are parameter matrices ($A_{old}$, $A_{furry}$, etc.) - Nouns are vectors (house, dog, etc.) - Composition is a linear transformation: old dog = $A_{old}$ $\times dog$. Learning adjective matrices 1. Obtain a distributional vector $\mathbf{n}_{j}$ for each noun $n_{j}$ in the vocabulary using a conventional [[Distributional Semantics]] models. 2. Collective all adjective-noun pairs $\left(a_{i}, n_{j}\right)$ from the corpus. 3. Obtain a distributional phrase vector $\mathbf{p}_{i j}$ for each pair $\left(a_{i}, n_{j}\right)$ from the same corpus using a conventional DSM-treating the phrase $a_{7} n_{j}$ as a single word. 4. The set of tuples $\left\{\left(\mathbf{n}_{j}, \mathbf{p}_{i j}\right)\right\}_{j}$ represents a dataset $\mathscr{D}\left(a_{i}\right)$ for the adjective $a_{i}$ 5. Learn matrix $\mathbf{A}_{i}$ from $\mathscr{D}\left(a_{i}\right)$ using linear regression. Minimise the squared error loss: $ L\left(\mathbf{A}_{i}\right)=\sum_{j \in \mathscr{D}\left(a_{i}\right)}\left\|\mathbf{p}_{i j}-\mathbf{A}_{i} \mathbf{n}_{j}\right\|^{2} $ ## Verbs as lexical functions Verbs too can modelled as lexical functions that are applied to their arguments. They are represented as tensors whose order is determined by the subcategorisation frame of the verb (i.e., how many and what type of arguments the verb takes). Intransitive verbs take a subject as their only argument and are modelled as a matrix (second-order tensor) Example: dogs bark = $V_{bark}$ $\times$ dogs Transitive verbs take a subject and an object and are modelled as a third-order tensor. Example: dogs eat meat = $V_{eat}$ $\times$ meat $\times$ dogs --- ## References