# Jelinek-Mercer smoothing - If $p\left(w_{3} \mid w_{1}, w_{2}\right)=0$, maybe using less specific histories can help - Here, less specific means shorter: $p_{J M}\left(w_{3} \mid w_{1}, w_{2}\right)=\lambda_{3} p\left(w_{3} \mid w_{1}, w_{2}\right)+\lambda_{2} p\left(w_{3} \mid w_{2}\right)+\lambda_{1} p\left(w_{3}\right)$ where $\lambda_{3}+\lambda_{2}+\lambda_{1}=1$ - $p_{J M}$ is a weighted average of $n$-gram models of different orders - Also known as interpolation smoothing - $\lambda$ values can be estimated using [[Expectation Maximization]] on some held-out data. - Same $\lambda$ values are used for all n-grams --- ## References