# RankNet
RankNet [Burges et al., 2005] is a pairwise loss function-popular choice for training neural LTR models and also an industry favourite [Burges, 2015].
$
\begin{array}{l}
\text { Predicted probabilities: } P_{i j}=P\left(s_{i}>s_{j}\right) \equiv \frac{e^{\gamma \cdot s_{i}}}{e^{\gamma \cdot s_{i}}+e^{\gamma \cdot s_{j}}}=\frac{1}{1+e^{-\gamma\left(s_{i}-s_{j}\right)}} \\
\text { and } P_{j i} \equiv \frac{1}{1+e^{-\gamma\left(s_{j}-s_{i}\right)}}
\end{array}
$
Desired probabilities: $\bar{P}_{i j}=1$ and $\bar{P}_{j i}=0$
Computing cross-entropy between $\bar{P}$ and $P$,
$
\begin{aligned}
\mathcal{L}_{\text {RankNet }} &=-\bar{P}_{i j} \log \left(P_{i j}\right)-\bar{P}_{j i} \log \left(P_{j i}\right) \\
&=-\log \left(P_{i j}\right) \\
&=\log \left(1+e^{-\gamma\left(s_{i}-s_{j}\right)}\right)
\end{aligned}
$
Let $S_{i j} \in\{-1,0,1\}$ indicate the preference between $d_{i}$ and $d_{j}$. Then the desired probability for a pair is:
$
\bar{P}\left(d_{i} \succ d_{j}\right)=\frac{1}{2}\left(1-S_{i j}\right)
$
The predicted probability is:
$
P\left(d_{i} \succ d_{j}\right)=\frac{1}{1+e^{-\gamma\left(s_{i}-s_{j}\right)}}
$
The cross-entropy loss is then:
$
\mathcal{L}_{i j}=\frac{1}{2}\left(1-S_{i j}\right) \gamma\left(s_{i}-s_{j}\right)+\log \left(1+e^{-\gamma\left(s_{i}-s_{j}\right)}\right)
$
The cross-entropy loss is then:
$
\mathcal{L}_{i j}=\frac{1}{2}\left(1-S_{i j}\right) \gamma\left(s_{i}-s_{j}\right)+\log \left(1+e^{-\gamma\left(s_{i}-s_{j}\right)}\right)
$
The derivate w.r.t. $s_{i}$
$
\frac{\delta \mathcal{L}_{i j}}{\delta s_{i}}=\gamma\left(\frac{1}{2}\left(1-S_{i j}\right)-\frac{1}{1+e^{-\gamma\left(s_{i}-s_{j}\right)}}\right)=-\frac{\delta \mathcal{L}_{i j}}{\delta s_{j}}
$
Then we can factorize the loss it so that:
$
\frac{\delta \mathcal{L}_{i j}}{\delta w}=\frac{\delta \mathcal{L}_{i j}}{\delta s_{i}} \frac{\delta s_{i}}{\delta w}+\frac{\delta \mathcal{L}_{i j}}{\delta s_{j}} \frac{\delta s_{j}}{\delta w}=\gamma\left(\frac{1}{2}\left(1-S_{i j}\right)-\frac{1}{1+e^{-\gamma\left(s_{i}-s_{j}\right)}}\right)\left(\frac{\delta s_{i}}{\delta w}-\frac{\delta s_{j}}{\delta w}\right)
$
We choose $\lambda$ so that:
$
\frac{\delta \mathcal{L}_{i j}}{\delta w}=\lambda_{i j}\left(\frac{\delta s_{i}}{\delta w}-\frac{\delta s_{j}}{\delta w}\right)
$
where:
$
\lambda_{i j}=\gamma\left(\frac{1}{2}\left(1-S_{i j}\right)-\frac{1}{1+e^{-\gamma\left(s_{i}-s_{j}\right)}}\right)
$
These lambdas act like forces pushing pairs of documents apart or together.
On document level the same can be done:
$
\lambda_{i}=\sum_{j} \lambda_{i j}
$
Issues with RankNet:
- RankNet is based on virtual probabilities: $P\left(d_{i} \succ d_{j}\right)$.
- In reality the ranking model does not follow these probabilities.
- Not elegant, but not a big deal.
---
## References