# Dirichlet smoothing - A unigram language model can be seen as a multinomial distribution over words $\mathcal{L}_{d}\left(n_{1}, \ldots, n_{k} \mid p_{1}, \ldots, p_{k}\right)$ $ \begin{array}{l} n_{i}=t f\left(t_{i}, d\right) \\ p_{i}=P\left(t_{i} \mid M_{d}\right) \end{array} $ - The conjugate prior for multinomial is the Dirichlet distribution $P_{\text {prior }}\left(p_{1}, \ldots, p_{k} ; \alpha_{1}^{p r}, \ldots, \alpha_{k}^{p r}\right)$ - $\alpha_{i}^{p r}=\mu P\left(t_{i} \mid M_{c}\right)$ - $\mu$ is a smoothing parameter $\left(\lambda=\frac{d l}{d+\mu}\right)$ - The posterior is the Dirichlet distribution with parameters $\alpha_{i}^{p o}=n_{i}+\alpha_{i}^{p r}=t f\left(t_{i}, d\right)+\mu P\left(t_{i} \mid M_{c}\right)$ - Dirichlet smoothing $ P_{s}\left(t \mid M_{d}\right)=\frac{t f\left(t_{i}, d\right)+\mu P\left(t_{i} \mid M_{c}\right)}{d l(d)+\mu} $ --- ## References