# Distributional Hypothesis
You shall know a word by the company it keeps (Firth)
The meaning of a word is defined by the way it is used (Wittgenstein).
This leads to the distributional hypothesis about word meaning:
- the context surrounding a given word provides information about its meaning;
- words are similar if they share similar linguistic contexts;
- semantic similarity $\approx$ distributional similarity.
*Distributions* are vectors in a multidimensional *semantic space*.
The [[Distributional Semantics]] space has dimensions which correspond to possible contexts - also called features. The whole semantic space can be represented as a large matrix of words $\times$ features:
$
\begin{array}{l|cccc}
& \text { feature }_{1} & \text { feature }_{2} & \ldots & \text { feature }_{n} \\
\hline \text { word }_{1} & f_{1,1} & f_{2,1} & & f_{n, 1} \\
\text { word }_{2} & f_{1,2} & & f_{2,2} & & f_{n, 2} \\
\ldots & & & \\
\text { ... } & & & \\
\text { word }_{m} & & f_{1, m} & & f_{2, m} & & f_{n, m}
\end{array}
$
For our purposes, a distribution can be seen as a point in that space (the vector being defined with respect to the origin of that space).
For example: scrumpy -> [...pub 0.8, drink 0.7, strong 0.4, joke 0.2, mansion 0.02 , zebra 0.1...]
---
## References
1. Chapter 6: Vector semantics and embeddings, Jurafsky and Martin (3rd edition).