Convolution - Notes on AI

# Convolution The convolution of two functions $f$ and $g$ is denoted by $*$ as the integral of the product of the two functions after one is reversed and shifted $ (f \cdot g)(t) \triangleq \int_{-\infty}^{\infty} f(\tau) g(t-\tau) d \tau=\int_{-\infty}^{\infty} f(t-\tau) g(\tau) d \tau $ For images $a_{r c}=x * w=\sum_{i=-a}^{a} \sum_{j=-b}^{b} x_{r-i, c-j} \cdot w_{i j}$ A convolution operation is both linear and shift invariant. Convolution has commutative property. The convolution operation reaches a maximum, only in cases where the filter is mostly similar to a specific section of the input signal. ## Convolution can be represented by Cirulant matrices In fully connected layer, the weight matrix is a dense matrix and full matrix multiplication takes place. However, in convolutional layer, the weight matrix shares elements and convolution operation can be seen as block diagonal matrix multiplication. Convolution can therefore be represented as matrices $C(w)$ called **circulant matrices**. These are multidiagonal matrices with each elements on each diagonal having the same value. ![[circulant-matix.jpg]] [Image Credit](https://towardsdatascience.com/deriving-convolution-from-first-principles-4ff124888028) One special property of circulant matrices is that unlike normal [[Linear Algebra#Basic Matrix Algebra|matrix multiplication]] *circulant matrix multiplications commute*. i.e $\mathrm{C}(\mathrm{w}) \mathrm{C}(\mathrm{u})=\mathrm{C}(\mathrm{u}) \mathrm{C}(\mathrm{w})$ This is because convolution operation is itself commutable. Note that this is amazing because circulant matrix is a special case of [[Linear Algebra#Toeplitz Matrices]] where each row is a cyclic shift of the previous row. This enables very fast computations i.e. multiplication can be done in O(nlogn)! ## The shift operator We apply convolutions by shifting the filters. For $w=[0,1, \ldots, 0]=C(w)$ the right-shift operator - Similar to convolution: shift once' to the right - Transpose for left-shift The shift operator is an orthogonal matrix. The shift operator is also circulant. ![[shiftoperator.jpg]] ## Circulant matrices enable Translation equivariance Circulant matrices enable translation equivariance to convolutions - Change the location of the input - The results will be the same (but shifted) For example, with input [1, 0, 0] and circulant matrix [[2, 1, 0], [0, 2, 1], [1, 0, 2]], the output is [2, 0, 1]. Shifting the input to [0, 1, 0] gives output [1, 2, 0]—the same pattern shifted by one position. Visual example with gaussian filter + right shift: ![[translation-eq-ciculant-matrices.jpg]] ## All circulant matrices have the same eigenvectors Convolution operations can be represented as multiplication by circulant matrices. The key insight is that all circulant matrices share the same eigenvectors—the DFT basis vectors—though they have different eigenvalues. The k-th eigenvector of any n×n circulant matrix is: $e^{(k)}=\left[\begin{array}{c}\omega_n^{0 \cdot k} \\ \omega_n^{1 \cdot k} \\ \omega_n^{2 \cdot k} \\ \vdots \\ \omega_n^{(n-1) \cdot k}\end{array}\right]$ where $\omega_n = \exp\left(\frac{2\pi i}{n}\right)$ The eigenvector matrix $\Phi = [e^{(0)} \; e^{(1)} \; \cdots \; e^{(n-1)}]$ is the DFT matrix. **Why this matters:** Since any circulant matrix C can be diagonalized as $C = \Phi D \Phi^{-1}$ where D contains the eigenvalues, convolution becomes element-wise multiplication in the frequency domain. This is the convolution theorem: $\text{conv}(x,h) = \text{IDFT}(\text{DFT}(x) \odot \text{DFT}(h))$, enabling fast O(n log n) convolution via FFT instead of O(n²) direct computation. ![[dft-conv.png]] ## References 1. Lecture 7.3, UvA Deep Learning course 2020 2. Circulant matrices and convolution intuition https://towardsdatascience.com/deriving-convolution-from-first-principles-4ff124888028