Click models - Notes on AI

# Click Models User interactions are important for - Evaluating IR systems - Improving IR systems Modes of user search interactions - Click models - Mouse hovering - Models of time between user actions (necessary because there are bias, for example postional bias like the golden triangle in google search results). ## Positon-based click model (PBM) Separates bias from relevance. But examination of document at rank r does not depend on examination of click at postions r. ![[Position based model.png]] - Terminology - Examination = reading a snippet - $E_{r}$ - binary random variable denoting examination of a snippet at rank $r$ - Attractiveness = a user wants to click on a document after eximining its snipper - $A_{q d}$ - binary random variable showing whether document $d$ is attractive to a user, given query $q$ Position-based model (PBM) - Examination depends on rank $ P\left(E_{r}=1\right)=\gamma_{r} $ - Attractiveness depends on a query-document pair $ P\left(A_{q d}=1\right)=\alpha_{q d} $ - Probability of click is therefore given as $ P\left(C_{d}=1\right)=P\left(E_{r_{d}}=1\right) \cdot P\left(A_{q d}=1\right) $ Probabilistic graphical model: ![[position based model pgm.png]] From this model, we see that the biased click probability decomposes into the bias component p(E) and the true relevance p(A). ![[Position-based model params.png]] Problems with this model: - Assumption that probability of click only depends on the position. - These models are unable to predict the attractiveness of documents that were not seen in the training data. ### Parameter estimation - [[Maximum Likelihood Estimation]] - Used when there is no dependancy between parameters. Useful for cascade models. - [[Expectation Maximization]] - Used when parameters have to iteratively estimated together. - Set parameters to some intial values - Repeat until convergence - E-step: derive the expectation of the likelihood function - M-step: maximize this expectation $ \begin{array}{l} P\left(A_{d}=1\right)=\alpha_{q d} \\ P\left(E_{r}=1\right)=\gamma_{r} \end{array} $ We want to estimate alpha. $ \alpha_{q d}^{(t+1)}=\frac{1}{\left|\mathcal{S}_{q d}\right|} \sum_{s \in \mathcal{S}_{q d}}\left(c_{d}^{(s)}+\left(1-c_{d}^{(s)}\right) \frac{\left(1-\gamma_{r}^{(t)}\right) \alpha_{q d}^{(t)}}{1-\gamma_{r}^{(t)} \alpha_{q d}^{(t)}}\right) $ where $c_d^{(s)}$ is the click for the document-query pair. and $\mathcal{S}_{q d}$ is the click log. ## Cascade model (CM) Cascade dependancy of examination at r on examinations and clicks above r. But only one click is allowed. - Start from the first document - Examine documents one by one - If click, then stop $ \begin{array}{c} E_{r}=1 \text { and } A_{d_{r}}=1 \Leftrightarrow C_{r}=1 \\ P\left(A_{d_{r}}=1\right)=\alpha_{q d_{r}} \\ \underbrace{P\left(E_{1}=1\right)}_{\text {start from first }}=1 \\ \underbrace{P\left(E_{r}{ }=1 \mid E_{r-1}=1, C_{r-1}=0\right)}_{\text {otherwise, continue }}=1 \end{array} $ Probabilistic graphical model: ![[Cascade model.png]] Limitations: - Does not go beyond one click. ## Dynamic Bayesian network model (DBN) Model that goes beyond one click. It has two sets of parameters, percieved relevance (after examining) alpha and real relevance (after clicking and reading the document) sigma. $ \begin{array}{r} P\left(A_{r}=1\right)=\alpha_{q d r} \\ P\left(E_{1}=1\right)=1 \\ P\left(E_{r}=1 \mid S_{r-1}=1\right)=0 \\ P\left(E_{r}=1 \mid S_{r-1}=0\right)=\gamma \\ P\left(S_{r}=1 \mid C_{r}=0\right)=0 \\ P\left(S_{r}=1 \mid C_{r}=1\right)=\sigma_{q d_{r}} \\ \end{array} $ Probability of satisfaction is then given as $ \begin{aligned} P\left(S_{r}=1\right) &=P\left(S_{r}=1 \mid C_{r}=1\right) \cdot P\left(C_{r}=1\right) \\ &=\sigma_{q d_{r}} \cdot P\left(C_{r}=1\right) \\ &=\sigma_{q d_{r}} \cdot \alpha_{q d_{r}} \cdot P\left(E_{r}=1\right) \\ &=\sigma_{q d_{r}} \cdot \alpha_{q d_{r}} \cdot \prod_{i=1}^{r-1}\left(\gamma \cdot\left(1-\sigma_{q d_{i}} \cdot \alpha_{q d_{i}}\right)\right)\\ &=R_{q d_{r}} \cdot \prod_{\dot{j}=1}^{r-1}\left(\gamma \cdot\left(1-R_{q d_{i}}\right)\right) \end{aligned} $ This is exactly the ERR formula from offline [[Learning to Rank]]. $ \begin{aligned} E R R &=\sum_{r} \frac{1}{r} \cdot P\left(S_{r}=1\right) \\ &=\sum_{r} \frac{1}{r} \cdot R_{q d_{r}} \cdot \prod_{i=1}^{r-1}\left(\gamma \cdot\left(1-R_{q d_{i}}\right)\right) \end{aligned} $ ## Applications of click models - Full click probabilties - Model-based metrics - Utility-based metrics $\text { uMetric }=\sum_{r=1}^{n} P\left(C_{r}=1\right) \cdot U_{r}$ - Effort-based metrics $\text { eMetric }=\sum_{r=1}^{n} P\left(S_{r}=1\right) \cdot F_{r}$ where p(S) is satisfaction probability. - Conditional click probabilities - User simulation - Parameter values - Ranking --- ## References 1. Lecture 6.1, UvA IR1 course 2021