Distributional Semantics

# Distributional Semantics Follows from [[Distributional Hypothesis]]. ## Polysemy - Can have multiple meaning - For example: pot - Some researchers have used disambiguation methods. - Most assume single space for each word: the different meaning of the word is mixed into one vector, context determines the use. ## Idiomatic expressions - Fixed expression, ranked highly by [[Pointwise Mutual Information (PMI)]] ## Synonymy In general, true synonymy does not correspond to higher similarity scores than near-synonymy. ## Similarity Similarity in semantic space can be checked by looking at whether the two vectors point in the same direction or not. (See [[Similarity Measures]]). But what does similarity capture? In distributional semantics, very broad notion: synonyms, near-synonyms, hyponyms, taxonomical siblings, antonyms (a nuisance for us), etc. Correlates with a psychological reality. Have been tested again and again via correlation with human judgments on a test set: Miller & Charles (1991), WordSim, MEN, SimLex. Correlations have come out very high, reported 0.8 or more. In TOEFL synonym test, non-native English speakers applying to college in US reported to averag 65%. Best corpus-based results are 100% ! ## Distributional methods are a usage representation Distributions are a good conceptual representation if you believe that 'the meaning of a word is given by its usage'. Corpus-dependent (encyclopedia vs social media), culture-dependent (american vs british), register-dependent. Example: similarity between policeman and cop is 0.23. - Similarity between egglant/aubergine: 0.11 Relatively low cosine. Partly due to frequency (222 for eggplant, 56 for aubergine). - Similarity between policeman/cop: 0.23 - Similarity between city/town: 0.73 In general, true synonymy does not correspond to higher similarity scores than near-synonymy. ## Antonyms Antonyms have high distributional similarity: hard to distinguish from near-synonyms purely by distributions. Identification by heuristics applied to pairs of highly similar distributions. For instance, antonyms are frequently coordinated while synonyms are not: - a selection of cold and hot drinks - wanted dead or alive ## Distributions and knowledge What kind of information do distributions encode? - lexical knowledge - world knowledge - boundary between the two is blurry - no perceptual knowledge (experience - way things look, feel etc). has inspired collaboration of vision and NLP Distributions are partial lexical semantic representations, but useful and theoretically interesting. --- ## References 1. Chapter 6: Vector semantics and embeddings, Jurafsky and Martin (3rd edition).