# Dirichlet smoothing
- A unigram language model can be seen as a multinomial distribution over words $\mathcal{L}_{d}\left(n_{1}, \ldots, n_{k} \mid p_{1}, \ldots, p_{k}\right)$
$
\begin{array}{l}
n_{i}=t f\left(t_{i}, d\right) \\
p_{i}=P\left(t_{i} \mid M_{d}\right)
\end{array}
$
- The conjugate prior for multinomial is the Dirichlet distribution $P_{\text {prior }}\left(p_{1}, \ldots, p_{k} ; \alpha_{1}^{p r}, \ldots, \alpha_{k}^{p r}\right)$
- $\alpha_{i}^{p r}=\mu P\left(t_{i} \mid M_{c}\right)$
- $\mu$ is a smoothing parameter $\left(\lambda=\frac{d l}{d+\mu}\right)$
- The posterior is the Dirichlet distribution with parameters $\alpha_{i}^{p o}=n_{i}+\alpha_{i}^{p r}=t f\left(t_{i}, d\right)+\mu P\left(t_{i} \mid M_{c}\right)$
- Dirichlet smoothing
$
P_{s}\left(t \mid M_{d}\right)=\frac{t f\left(t_{i}, d\right)+\mu P\left(t_{i} \mid M_{c}\right)}{d l(d)+\mu}
$
---
## References