# Jelinek-Mercer smoothing
- If $p\left(w_{3} \mid w_{1}, w_{2}\right)=0$, maybe using less specific histories can help
- Here, less specific means shorter:
$p_{J M}\left(w_{3} \mid w_{1}, w_{2}\right)=\lambda_{3} p\left(w_{3} \mid w_{1}, w_{2}\right)+\lambda_{2} p\left(w_{3} \mid w_{2}\right)+\lambda_{1} p\left(w_{3}\right)$
where $\lambda_{3}+\lambda_{2}+\lambda_{1}=1$
- $p_{J M}$ is a weighted average of $n$-gram models of different orders
- Also known as interpolation smoothing
- $\lambda$ values can be estimated using [[Expectation Maximization]] on some held-out data.
- Same $\lambda$ values are used for all n-grams
---
## References