# Query Likelihood Model
- [[Language Models|Unigram]] language model is defined as
$
P\left(t \mid M_{d}\right)=\frac{t f(t, d)}{d l(d)}
$
- A document is a multinomial distribution over words.
- If some vocabulary terms do not appear in document $d$, then $P\left(t \mid M_{d}\right)=0$
- This is addressed by [[Language Models#Laplace add 1 smoothing]]
How do we match these two distr ibutions? It is given by Query Likelihood Model.
- Likelihood of a document given a query
$
P(d \mid q)=\frac{P(q \mid d) P(d)}{P(q)}
$
- The prior distribution over queries $P(q)$ does not affect matching for a particular query, so
$
P(d \mid q) \stackrel{\operatorname{rank}}{=} P(q \mid d) P(d)
$
- Usually, the prior distribution over documents $P(d)$ is assumed to be uniform ($M_d$ is the model of the document)
$
P(d \mid q) \stackrel{\text { rank }}{=} P(q \mid d)=P\left(q \mid M_{d}\right)
$
- "Bag of words" assumption: terms are independent. Therefore,
$
P\left(q \mid M_{d}\right)=\prod_{t \in q} P\left(t \mid M_{d}\right)=\prod_{t \in q} \frac{t f(t, d)}{d l(d)}
$
---
## References
1. IR1 Course 2021, UvA
2. https://course.ccs.neu.edu/cs6200sp15/slides/m03.s06%20-%20query%20likelihood%20retrieval.pdf