# BM25 Empirically found ranking method that works well. $ B M 25=\sum_{t \in q} \log \left[\frac{N}{d f(t)}\right] \cdot \frac{\left(k_{1}+1\right) \cdot t f(t, d)}{k_{1} \cdot\left[(1-b)+b \cdot \frac{d l(d)}{d l_{\text {avg }}}\right]+t f(t, d)} $ $k_{1}, b-$ parameters $d l(d)$ - length of document $d$ $dl_{avg}$ - average document length - $k_1$ controls the effect of tf and $b$ controls the effect of document length. - What if $k_{1} \in\{0, \infty\}$? Then it's just the sum of tf-idf. - What if $b \in\{0,1\}$? b=0 means no normalization by document length. - What if $t f(t, d)$ is small/large? $k_{1} \in[1.2,2], b=0.75$ In this case BM25 simplifies to sum of IDFs For long queries, another variation can be use: $ B M 25=\sum_{t \in q} \log \left[\frac{N}{d f(t)}\right] \cdot \frac{\left(k_{1}+1\right) \cdot t f(t, d)}{k_{1} \cdot\left[(1-b)+b \cdot \frac{d l(d)}{d l_{\text {ave }}}\right]+t f(t, d)} \cdot \frac{\left(k_{3}+1\right) \operatorname{tf}(t, q)}{k_{3}+t f(t, q)} $ --- ## References