Distributional Hypothesis

# Distributional Hypothesis You shall know a word by the company it keeps (Firth) The meaning of a word is defined by the way it is used (Wittgenstein). This leads to the distributional hypothesis about word meaning: - the context surrounding a given word provides information about its meaning; - words are similar if they share similar linguistic contexts; - semantic similarity $\approx$ distributional similarity. *Distributions* are vectors in a multidimensional *semantic space*. The [[Distributional Semantics]] space has dimensions which correspond to possible contexts - also called features. The whole semantic space can be represented as a large matrix of words $\times$ features: $ \begin{array}{l|cccc} & \text { feature }_{1} & \text { feature }_{2} & \ldots & \text { feature }_{n} \\ \hline \text { word }_{1} & f_{1,1} & f_{2,1} & & f_{n, 1} \\ \text { word }_{2} & f_{1,2} & & f_{2,2} & & f_{n, 2} \\ \ldots & & & \\ \text { ... } & & & \\ \text { word }_{m} & & f_{1, m} & & f_{2, m} & & f_{n, m} \end{array} $ For our purposes, a distribution can be seen as a point in that space (the vector being defined with respect to the origin of that space). For example: scrumpy -> [...pub 0.8, drink 0.7, strong 0.4, joke 0.2, mansion 0.02 , zebra 0.1...] --- ## References 1. Chapter 6: Vector semantics and embeddings, Jurafsky and Martin (3rd edition).