Content-based recommendation

# Content-based recommendation Main idea: Recommend items to customer $u$ similar to previous items $\mathscr{F}_{u}$ rated highly by $u$ Movie recommendation: Recommend movies with same actor(s), director, genre, ... Websites, blogs, news: Recommend other sites with "similar" content ## Item profiles For each item, create an item profile $x_{i}$ Profile is a set (vector) of features - Movies: author, title, actor, director,... - Text: set of "important" words in document How to pick important features? - Usual heuristic from text mining is [[TF-IDF]] weight Simple: (weighted) average of (positively) rated items profiles $ x_{u}=\sum_{i \in \mathcal{F}_{u}} r_{u i} x_{i} $ Variant: normalize weights using average rating of user More sophisticated aggregations possible Can also build classifiers/regressors to predict if a user likes an item. Suggest items whose feature vector $x_{i}$ is most similar to profile vector $x_{u}$ Cosine Similarity/Minimum Description Length ## Advantages - User independence - does not need information from other users (Collaborative Filtering requires this) - Can handle unique tastes of users - Unpopular items are alsa recommended - Transparency - Explanations are straightforward - Cold start (for items) is a non-issue as items are compared based on their content not on ratings ## Drawbacks - Feature Engineering / Domain knowledge is often needed - Often, content is not the only factor for users interacting with items - Overspecialisation - No serendipitous recommendations (unexpected items) - Cold start - new users --- ## References