Word Sense Disambiguation

# Word Sense Disambiguation Word meaning is needed for many application. It relies on context, for exmaple: striped bass (fish) vs bass guitar. Three methods are: 1. Supervised learning - Assume a predefined set of word senses, e.g. WordNet - Need a large sense-tagged training corpus (difficult to construct) - Don't really learn generalization 2. Semi-supervised learning (Yarowsky, 1995) - bootsrap from a few examples 3. Unsupervised sense induction - We don't assume pre-annotated sense, we learn the patterns of contextual use from clustering the contexts in which a word occurs ## WSD by semi-supervised learning An old approach based on Yarowsky, David (1995) Unsupervised word sense disambiguation rivalling supervised methods. Example: Disambiguating plant (factory/vegetation) 1. Find context in training corpus i.e examples with word 'plant' 2. Identify some seeds to disambiguate a few uses, example: 'plant life' vs 'manufacturing plant' (by human researcher) 3. Train a decision list classifier on Sense A/Sense B examples. Rank features by log-likelihood ratio: $\log \left(\frac{P\left(\text { Sense }_{A} \mid f_{i}\right)}{P\left(\text { Sense }_{B} \mid f_{i}\right)}\right)$ 4. Apply the classifier to the training set and add reliable examples to A and B sets. 5. Iterate the previous steps 3 and 4 until convergence Yarowsky reported accuracy of 95%, but the experiments were nearly all on homonyms: these principles may not hold as well for sense extension. ## Problems with WSD as supervised classification 1. Real performance around $75 \%$ (supervised) 2. Need to predefine word senses (not theoretically sound) 3. Need a very large training corpus (difficult to annotate, humans do not agree) 4. Learn a model for individual words - no real generalisation Better approach is to use unsupervised sense induction, which is a very hard task. --- ## References 1. Chapter 19, Jurafsky and Martin, 2019