Distributional Word Clustering

# Distributional Word Clustering [[Clustering]] techniques group objects into clusters so as to put similar objects in the same cluster, dissimilar objects in different clusters. It allows us to obtain generalisations over the data widely used in various NLP tasks: - semantics (e.g. word clustering); - summarization (e.g. sentence clustering); - text mining (e.g. document clustering). In distributional word clustering, we cluster words based on the contexts in which they occur assumption: words with similar meanings occur in similar contexts, i.e. are distributionally similar We will consider noun clustering as an example. We cluster 2000 nouns - most frequent in the British National Corpus, into 200 clusters. ### Feature vectors Can use different kinds of context as features for clustering - window based context - parsed or unparsed - syntactic dependencies Different types of context yield different results. Example experiment: use verbs that take the noun as a direct object or a subject as features for clustering Feature vectors: verb lemmas, indexed by dependency type, e.g. subject or direct object Feature values: corpus frequencies ### Clustering algorithms Many clustering algorithms are available. Widely used [[K-Means]], but result don't vary very much when using other algorithms. Clusters could be thought of as categories in [[Prototype theory]]. Not very straight forward to use, but has it's use as source of lexical information: 1. Word sense induction and disambiguation 2. Modelling predicate-argument structure (e.g. semantic roles) 3. Identifying figurative language and idioms 4. Paraphrasing and paraphrase detection 5. Used in applications directly, e.g. machine translation, information retrieval etc. --- ## References 1. Chapter 6: Vector semantics and embeddings, Jurafsky and Martin (3rd edition).