# Distributional Semantics
Follows from [[Distributional Hypothesis]].
## Polysemy
- Can have multiple meaning - For example: pot
- Some researchers have used disambiguation methods.
- Most assume single space for each word: the different meaning of the word is mixed into one vector, context determines the use.
## Idiomatic expressions
- Fixed expression, ranked highly by [[Pointwise Mutual Information (PMI)]]
## Synonymy
In general, true synonymy does not correspond to higher similarity scores than near-synonymy.
## Similarity
Similarity in semantic space can be checked by looking at whether the two vectors point in the same direction or not. (See [[Similarity Measures]]).
But what does similarity capture?
In distributional semantics, very broad notion: synonyms, near-synonyms, hyponyms, taxonomical siblings, antonyms (a nuisance for us), etc.
Correlates with a psychological reality. Have been tested again and again via correlation with human judgments on a test set: Miller & Charles (1991), WordSim, MEN, SimLex. Correlations have come out very high, reported 0.8 or more.
In TOEFL synonym test, non-native English speakers applying to college in US reported to averag 65%. Best corpus-based results are 100% !
## Distributional methods are a usage representation
Distributions are a good conceptual representation if you believe that 'the meaning of a word is given by its usage'.
Corpus-dependent (encyclopedia vs social media), culture-dependent (american vs british), register-dependent. Example: similarity between policeman and cop is 0.23.
- Similarity between egglant/aubergine: 0.11 Relatively low cosine. Partly due to frequency (222 for eggplant, 56 for aubergine).
- Similarity between policeman/cop: 0.23
- Similarity between city/town: 0.73
In general, true synonymy does not correspond to higher similarity scores than near-synonymy.
## Antonyms
Antonyms have high distributional similarity: hard to distinguish from near-synonyms purely by distributions. Identification by heuristics applied to pairs of highly similar distributions.
For instance, antonyms are frequently coordinated while synonyms are not:
- a selection of cold and hot drinks
- wanted dead or alive
## Distributions and knowledge
What kind of information do distributions encode?
- lexical knowledge
- world knowledge
- boundary between the two is blurry
- no perceptual knowledge (experience - way things look, feel etc). has inspired collaboration of vision and NLP
Distributions are partial lexical semantic representations, but useful and theoretically interesting.
---
## References
1. Chapter 6: Vector semantics and embeddings, Jurafsky and Martin (3rd edition).