Model Complexity and Occams Razor

# Model Complexity and Occams Razor The evidence is the normalizing constant of Bayes's rule: $ P\left(D \mid \mathcal{H}_{i}\right)=\int P\left(D \mid \mathbf{w}, \mathcal{H}_{i}\right) P\left(\mathbf{w} \mid \mathcal{H}_{i}\right) \mathrm{d} \mathbf{w} $ For many problems the posterior $P\left(\mathbf{w} \mid D, \mathcal{H}_{i}\right) \propto P\left(D \mid \mathbf{w}, \mathcal{H}_{i}\right) P\left(\mathbf{w} \mid \mathcal{H}_{i}\right)$ has a strong peak at the most probable parameters $\mathbf{w}_{\mathrm{MP}}$. Then, taking for simplicity the one-dimensional case, the evidence can be approximated, using Laplace's method, by the height of the peak of the integrand $P\left(D \mid \mathbf{w}, \mathcal{H}_{i}\right) P\left(\mathbf{w} \mid \mathcal{H}_{i}\right)$ times its width, $\sigma_{w \mid D}$ : $ P\left(D \mid \mathcal{H}_{i}\right) \simeq \quad P\left(D \mid \mathbf{w}_{\mathrm{MP}}, \mathcal{H}_{i}\right) \times P\left(\mathbf{w}_{\mathrm{MP}} \mid \mathcal{H}_{i}\right) \sigma_{w \mid D} . $ ![[Model Evidence and Occam Factor.png]] Thus the evidence is found by taking the best-fit likelihood that the model can achieve and multiplying it by an 'Occam factor', which is a term with magnitude less than one that penalizes $\mathcal{H}_{i}$ for having the parameter $\mathbf{w}$. The quantity $\sigma_{w \mid D}$ is the posterior uncertainty in $\mathbf{w}$. Suppose for simplicity that the prior $P\left(\mathbf{w} \mid \mathcal{H}_{i}\right)$ is uniform on some large interval $\sigma_{w}$, representing the range of values of $\mathbf{w}$ that were possible a priori, according to $\mathcal{H}_{i}$. Then $P\left(\mathbf{w}_{\mathrm{MP}} \mid \mathcal{H}_{i}\right)=1 / \sigma_{w}$, and $ \text { Occam factor }=\frac{\sigma_{w \mid D}}{\sigma_{w}}, $ <mark style="background: #FF5582A6;">The magnitude of the Occam factor is thus a measure of complexity of the model.</mark> In contrast to alternative measures of model complexity, the Occam factor for a model is straightforward to evaluate: it simply depends on the error bars on the parameters, which we already evaluated when fitting the model to the data. --- ## References 1.