Basis Functions - Notes on AI

# Basis Functions Simplest linear model for regression is given as $ y(\mathbf{x}, \mathbf{w})=w_{0}+w_{1} x_{1}+\ldots+w_{D} x_{D} $ The key property of this model is that it is a linear function of the parameters. It is also however a linear function of the input variables, and this imposes significant limitation on our model. Basis functions $\phi(x)$ extend this class of models by considering linear combinations of handpicked fixed nonlinear functions of the input variables. $ y(\mathbf{x}, \mathbf{w})=w_{0}+\sum_{j=1}^{M-1} w_{j} \phi_{j}(\mathbf{x}) $ Or in vector form $ y(\mathbf{x}, \mathbf{w})=\mathbf{w}^{\mathrm{T}} \phi(\mathbf{x}) $ where $ \phi(\mathbf{x})=\left[\phi_{0}(x_1), \phi_{1}(x_2), \ldots, \phi_{M-1}(x_n)\right]^{T}$ and $ \mathbf{w}=\left(w_{0}, \ldots, w_{M-1}\right)^{\mathrm{T}} $ Note that $\mathbf{\phi}: \mathbb{R}^{D} \rightarrow \mathbb{R}^{M}$ and $\phi_{i}: \mathbb{R}^{D} \rightarrow \mathbb{R}$ Basis functions allow modeling non linearity in the data while keeping linearity in parameters, which greatly simplifies the analysis of these models. Using linear combination of different basis function, we can construct complex functions and still use linear regression. Example of basis functions: **Polynomial Basis function:** $ \phi_{j}(x)=x^{j} $ Their limitation is that they are global functions of the input variable and changes in one region of input space affects all the regions. **Gaussian Basis function:** $ \phi_{j}(x)=\exp \left\{-\frac{\left(x-\mu_{j}\right)^{2}}{2 s^{2}}\right\} $ Note that they do not have probabilistic interpretation and their normalization coefficient is not required. **Sigmoidal Basis function:** $ \phi_{j}(x)=\sigma\left(\frac{x-\mu_{j}}{s}\right) $ where $\sigma$ is logistic sigmoid function. ![[basis-functions.jpg]] ## Advantages - Closed form solution for least-squares problem - Tractable Bayesian treatment - Nonlinear models mapping input variables to target variables through basis functions ## Limitations - Assumption: Basis functions $\phi_{j}(\mathbf{x})$ are fixed, not learned. - Curse of dimensionality: to cover growing dimensions D of input vectors, the number of basis functions needs to grow rapidly / exponentially