Linear Algebra - Notes on AI

# Linear Algebra ## Basic Concepts ### Basis vectors Basis vectors are unit vectors that define the dimensions of the vector space. Any time we define vectors numerically, it depends on our implicit choice we are making about the basis vectors. $\hat{i}$ and $\hat{j}$ are the common basis vectors. Formally a basis vectors is a set of linearly independent (non-colinear) vectors that span the full space. ### Span of vectors The span of vectors $\vec{v}$ and $\vec{w}$ is the set of all linear combinations $a\vec{v} + b\vec{w}$, where $a, b \in \mathbb{R}$. Span gives the answer to the question: "What are all the possible vectors you can reach with only scalar multiplication and vector addition?" If $\vec{v}$ and $\vec{w}$ "line up" i.e. are pointing in the same direction, their span is just a line. If not, its the plane. If they "line up", they are formally called "Linearly dependent". ### Linear transformations and matrices An $n\times n$ matrix is a compact representation of linear transformation in $n$ dimensional vector space. $ \begin{bmatrix} 3 & 2 \\ -2 & 1 \end{bmatrix} $ Here $[3,-2]^T$ and $[2,1]^T$ represents the new position of basis vector $i$ and $j$ respectively when the above transformation matrix is applied. Matrices can be multiplied to compose transformations into a single matrix. The transformation happens form right to left, for ex the combined transformation $\mathbf{A}.\mathbf{B}.\mathbf{C}$ is equivalent to transforming using $\mathbf{C}$, then $\mathbf{B}$ and then $\mathbf{A}$. This is why order of matrix multiplication matters. Thus matrices can be thought of linear operators of vectors. ![[transformation-matrices.jpg]] Note: **Affine transformations are combinations of linear transformations and translations.** In practice, homogenous forms are also used to define transformations as 3x3 matrices: ![[homogenous-transformations.jpg]] ### Matrix-vector operation properties $ \begin{aligned} (\mathbf{A B})^{-1} & =\mathbf{B}^{-1} \mathbf{A}^{-1} \\ (\mathbf{A B C} \ldots)^{-1} & =\ldots \mathbf{C}^{-1} \mathbf{B}^{-1} \mathbf{A}^{-1} \\ \left(\mathbf{A}^T\right)^{-1} & =\left(\mathbf{A}^{-1}\right)^T \\ (\mathbf{A}+\mathbf{B})^T & =\mathbf{A}^T+\mathbf{B}^T \\ (\mathbf{A B})^T & =\mathbf{B}^T \mathbf{A}^T \\ (\mathbf{A B C} \ldots)^T & =\ldots \mathbf{C}^T \mathbf{B}^T \mathbf{A}^T \\ \left(\mathbf{A}^H\right)^{-1} & =\left(\mathbf{A}^{-1}\right)^H \\ (\mathbf{A}+\mathbf{B})^H & =\mathbf{A}^H+\mathbf{B}^H \\ (\mathbf{A B})^H & =\mathbf{B}^H \mathbf{A}^H \\ (\mathbf{A B C} \ldots)^H & =\ldots \mathbf{C}^H \mathbf{B}^H \mathbf{A}^H \end{aligned} $ ### Inverse of Matrices The inverse $\mathbf{A}^{-1}$ of a matrix $\mathbf{A} \in \mathbb{C}^{n \times n}$ is defined such that $ \mathbf{A A}^{-1}=\mathbf{A}^{-1} \mathbf{A}=\mathbf{I}, $ where $\mathbf{I}$ is the $n \times n$ identity matrix. If $\mathbf{A}^{-1}$ exists, $\mathbf{A}$ is said to be nonsingular. Otherwise, $\mathbf{A}$ is said to be singular. Some properties: $ (\mathbf{A B})^{-1}=\mathbf{B}^{-1} \mathbf{A}^{-1} $ For any square matrix $A \in \mathbb{R}^{n \times n}$ it holds that $A$ is invertible if and only if $\operatorname{det}(\boldsymbol{A}) \neq 0$. ### Rank When matrix transforms the space, the number of dimensions of the new space is given by the rank of the matrix. Example: if a $3\times 3$ matrix has rank 2, this matrix squishes the 3D space into 2 dimensions. The set of vectors that land on origin when the matrix transform happens is called the kernel of a matrix. ### Change of basis We can change the basis of a space using change of basis matrix. Let's say $A$ is a matrix with it's columns as the new basis vectors. Then vectors in the new space $\vec{v}_{new}$ is given by $ \mathbf{v}_{new}= \mathbf{A}^{-1}\mathbf{v} $ $\mathbf{v}$ is the vector in the old space. This matrix $\mathbf{A}$ is called change of basis matrix. Let's say we have a matrix $\mathbf{D}$ representing a transformation in the old space. We can change it to new space by $\mathbf{A}^{-1} \mathbf{D} \mathbf{A}$. ### Eigenvalues and eigenvectors Most vectors get knocked off from their span when undergoing transforms (i.e. when being multiplied with matrices), for example during rotation. But some vectors may remain on their span (none in rotation), and only get scaled. These vectors are called the eigenvectors of the matrix. The factor with which the eigenvector get squished or stretched is called the eigenvalue of that eigenvector. Let's say we have a matrix $\mathbf{A}$ that represents a linear transformation. Then $\vec{v}$ represents the eigen vector and $\lambda$ it's eigen value when $ \mathbf{A}\vec{v} = \lambda \vec{v} $ For eigenvalues, we have $ \text{det}(\mathbf{A} - \lambda \mathbf{I}) = 0 $ this expression can be used to solve for possible eigen values of the matrix $\mathbf{A}$. **Note**: Eigenvalues aren't defined for rectangular matrices, but the singular values are closely related: The right and left singular values for rectangular matrix M are the eigenvalues of M'M and MM'. For example, let $A=\left[\begin{array}{ll} -5 & 2 \\ -7 & 4 \end{array}\right]$. First we find eigenvalues of A by solving the equation $\operatorname{det}(\lambda I-A)=0$. This gives: $ \begin{aligned} \operatorname{det}\left(\lambda\left[\begin{array}{ll} 1 & 0 \\ 0 & 1 \end{array}\right]-\left[\begin{array}{ll} -5 & 2 \\ -7 & 4 \end{array}\right]\right) & =0 \\ \operatorname{det}\left[\begin{array}{cc} \lambda+5 & -2 \\ 7 & \lambda-4 \end{array}\right] & =0 \end{aligned} $ Computing determinant as usual, the result is $\lambda^2+\lambda-6=0$. Solving this equation, we find that $\lambda_1=2$ and $\lambda_2=-3$. Now we need to find the basic eigenvectors for each $\lambda$. First we will find the eigenvectors for $\lambda_1=2$. We wish to find all vectors $X \neq 0$ such that $A X=2 X$. These are the solutions to $(2 I-A) X=0$. $ \begin{array}{r} \left(2\left[\begin{array}{ll} 1 & 0 \\ 0 & 1 \end{array}\right]-\left[\begin{array}{ll} -5 & 2 \\ -7 & 4 \end{array}\right]\right)\left[\begin{array}{l} x \\ y \end{array}\right]=\left[\begin{array}{l} 0 \\ 0 \end{array}\right] \\ {\left[\begin{array}{ll} 7 & -2 \\ 7 & -2 \end{array}\right]\left[\begin{array}{l} x \\ y \end{array}\right]=\left[\begin{array}{l} 0 \\ 0 \end{array}\right]} \end{array} $ This gives the basic eigenvector for $\lambda_1=2$ as $\left[\begin{array}{l}2 \\ 7\end{array}\right]$. ### Trace The trace of a square matrix is the sum of its diagonal elements: Tr(A) = Σᵢ Aᵢᵢ An important property is that the trace also equals the sum of the matrix's eigenvalues: Tr(A) = Σᵢ λᵢ The trace captures the total amount of "stretching" a linear transformation does across all directions. It's an invariant measure of whether the transformation tends to expand (tr > 0) or contract (tr < 0) space overall. $ \begin{gathered} \operatorname{Tr}(\mathbf{A})=\sum_i A_{i i} \operatorname{Tr}(\mathbf{A})=\sum_i \lambda_i, \quad \lambda_i=\operatorname{eig}(\mathbf{A}) \end{gathered} $ Some properties of trace are: $ \begin{aligned} \operatorname{Tr}(\mathbf{A}) & =\operatorname{Tr}\left(\mathbf{A}^T\right) \\ \operatorname{Tr}(\mathbf{A B}) & =\operatorname{Tr}(\mathbf{B A}) \\ \operatorname{Tr}(\mathbf{A}+\mathbf{B}) & =\operatorname{Tr}(\mathbf{A})+\operatorname{Tr}(\mathbf{B}) \\ \operatorname{Tr}(\mathbf{A B C}) & =\operatorname{Tr}(\mathbf{B C A})=\operatorname{Tr}(\mathbf{C A B}) \\ \mathbf{a}^T \mathbf{a} & =\operatorname{Tr}\left(\mathbf{a a}^T\right) \end{aligned} $ ### Determinant When a matrix applies transformation to a space, how much is the area scaled by this transformation is given by it's determinant. If determinant of matrix is 0, it squishes the space into a smaller dimension than the input dimension. Negative determinant mean the orientation of the transformed space is inverted. Determinant can be decomposed as $det(\mathbf{U}\mathbf{V})= det(\mathbf{U})det(\mathbf{V})$. Determinants of inverse matrix is equal to inverse of determinant of non-inverse matrix i.e $\operatorname{det}\left(\mathbf{U}^{-1}\right)=\frac{1}{\operatorname{det}(\mathbf{U})}$. Some properties of determinants are: Let $\mathbf{A}$ be an $n \times n$ matrix. $ \begin{aligned} \operatorname{det}(\mathbf{A}) & =\prod_i \lambda_i \quad \lambda_i=\operatorname{eig}(\mathbf{A}) \\ \operatorname{det}(c \mathbf{A}) & =c^n \operatorname{det}(\mathbf{A}), \quad \text { if } \mathbf{A} \in \mathbb{R}^{n \times n} \\ \operatorname{det}\left(\mathbf{A}^T\right) & =\operatorname{det}(\mathbf{A}) \\ \operatorname{det}(\mathbf{A B}) & =\operatorname{det}(\mathbf{A}) \operatorname{det}(\mathbf{B}) \\ \operatorname{det}\left(\mathbf{A}^{-1}\right) & =1 / \operatorname{det}(\mathbf{A}) \\ \operatorname{det}\left(\mathbf{A}^n\right) & =\operatorname{det}(\mathbf{A})^n \\ \operatorname{det}\left(\mathbf{I}+\mathbf{u v ^ { T }}\right) & =1+\mathbf{u}^T \mathbf{v} \end{aligned} $ ### The special case of 2x2 matrix Consider the matrix $\mathbf{A}$ $ \mathbf{A}=\left[\begin{array}{ll} A_{11} & A_{12} \\ A_{21} & A_{22} \end{array}\right] $ Determinant and trace $ \begin{gathered} \operatorname{det}(\mathbf{A})=A_{11} A_{22}-A_{12} A_{21} \\ \operatorname{Tr}(\mathbf{A})=A_{11}+A_{22} \end{gathered} $ Eigenvalues $ \lambda^2-\lambda \cdot \operatorname{Tr}(\mathbf{A})+\operatorname{det}(\mathbf{A})=0 $ $ \begin{array}{cl} \lambda_1=\frac{\operatorname{Tr}(\mathbf{A})+\sqrt{\operatorname{Tr}(\mathbf{A})^2-4 \operatorname{det}(\mathbf{A})}}{2} & \lambda_2=\frac{\operatorname{Tr}(\mathbf{A})-\sqrt{\operatorname{Tr}(\mathbf{A})^2-4 \operatorname{det}(\mathbf{A})}}{2} \\ \lambda_1+\lambda_2=\operatorname{Tr}(\mathbf{A}) & \lambda_1 \lambda_2=\operatorname{det}(\mathbf{A}) \end{array} $ Eigenvectors $ \mathbf{v}_1 \propto\left[\begin{array}{c} A_{12} \\ \lambda_1-A_{11} \end{array}\right] \quad \mathbf{v}_2 \propto\left[\begin{array}{c} A_{12} \\ \lambda_2-A_{11} \end{array}\right] $ Inverse $ \mathbf{A}^{-1}=\frac{1}{\operatorname{det}(\mathbf{A})}\left[\begin{array}{cc} A_{22} & -A_{12} \\ -A_{21} & A_{11} \end{array}\right] $ ## Special Matrices ### Symmetric matrices A matrix is called symmetric if $A = A^{T}$. ### Orthogonal matrices A square matrix is called orthogonal if $Q^{-1} = Q^T$. Basic properties for the orthogonal matrix $\mathbf{Q}$ $ \begin{aligned} \mathbf{Q}^{-1} & =\mathbf{Q}^T \\ \mathbf{Q}^{-T} & =\mathbf{Q} \\ \mathbf{Q Q}^T & =\mathbf{I} \\ \mathbf{Q}^T \mathbf{Q} & =\mathbf{I} \\ \operatorname{det}(\mathbf{Q}) & = \pm 1 \end{aligned} $ ### Diagonalizable matrices If your transformation (matrix) has lots of eigenvectors, enough so you can choose a set of vectors that spans a full space, then you could change the basis of your coordinate system to these eigenvectors. These new basis vectors are called "Eigenbasis". This has as a nice consequence of guaranteeing that the new transformation matrix is always a diagonal matrix. In this new diagonal matrix, all the columns represent eigenvectors with the values in the diagonal as their eigen values. For example, let's say $\mathbf{A}$ is the new change of basis matrix, with eigenvectors as its columns. Then our transformation matrix $\mathbf{D}$ can be factorized in the new space as $\mathbf{A}^{-1}\mathbf{D}\mathbf{A}$, where $\mathbf{D}$ is guaranteed to be a diagonal matrix of eigen vectors. Diagonal matrix are much easier to handle as the matrix multiplication simply boils down to raising the power of basis vectors to N number of times. ### Orthogonally diagonalizable matrices A matrix A is orthogonally diagonalizable when it can be diagonalized using orthogonal eigenvectors (perpendicular unit vectors). Formally, a matrix $A$ is said to be orthogonally diagonalizable when an orthogonal matrix $P$ can be found such that $P^{-1} A P=P^T A P$ is a diagonal matrix. More intuitively: The linear transformation A simply scales along perpendicular axes without any "mixing" between directions. Geometrically, imagine stretching or shrinking space along perpendicular coordinate axes - that's what these matrices do. All symmetric matrices have this property (spectral theorem). ### Positive definite and Semi-positive definite matrices - A square matrix is called positive definite if it is symmetric and all its eigenvalues λ are positive, that is λ > 0. These matrices, which arise whenever optimization (maximum and minimum) problems are encountered. - If A is positive definite, then it is invertible and det(A) > 0. - If all $\lambda \geq 0$, then the matrix is called positive semi-definite. **Key Properties:** - Always invertible with $\det(A) > 0$ - Has a unique Cholesky decomposition $A = LL^T$ - All diagonal entries $a_{ii} > 0$ - Defines an inner product $\langle \mathbf{x}, \mathbf{y} \rangle_A = \mathbf{x}^T A \mathbf{y}$ and induces a norm $\|\mathbf{x}\|_A = \sqrt{\mathbf{x}^T A \mathbf{x}}$ **Why They Matter:** - **Optimization:** Guarantee that quadratic functions $f(\mathbf{x}) = \mathbf{x}^T A \mathbf{x}$ have a unique global minimum - **Statistics:** Covariance matrices are positive semidefinite; full rank ones are positive definite - **Numerical Analysis:** Ensure stability and convergence of many algorithms - **Machine Learning:** Kernels in SVMs must be positive definite for valid Hilbert spaces - **Geometry:** Define ellipsoids and metrics in multidimensional spaces The quadratic form interpretation provides geometric intuition: the matrix "stretches" space in all directions (no negative eigenvalues that would cause reflections). ### Toeplitz Matrices A Toeplitz matrix $\mathbf{T}$ is a matrix where the elements of each diagonal is the same. In the $n \times n$ square case, it has the following structure: $ \mathbf{T}=\left[\begin{array}{cccc} t_{11} & t_{12} & \cdots & t_{1 n} \\ t_{21} & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & t_{12} \\ t_{n 1} & \cdots & t_{21} & t_{11} \end{array}\right]=\left[\begin{array}{cccc} t_0 & t_1 & \cdots & t_{n-1} \\ t_{-1} & \ddots & \ddots & \vdots \\ \vdots & \ddots & \ddots & t_1 \\ t_{-(n-1)} & \cdots & t_{-1} & t_0 \end{array}\right] $ The Toeplitz matrix has some computational advantages. Computational complexities: - Addition: O(n) - Multiplication: O(n log n) - Solving linear systems: O(n²) - Matrix inversion (positive definite): O(n²) Circulant matrices are a special case of Toeplitz matrices that arise in [[Convolution]] operations. The inverse of a Toeplitz matrix is persymmetric. The product of two lower triangular Toeplitz matrices is a Toeplitz matrix. ### Vandermonde Matrices Vandermonde matrices are matrices with a specific power structure. For values x₀, x₁, ..., x_{n-1}, the Vandermonde matrix V is: $V = \begin{bmatrix} 1 & 1 & 1 & \cdots & 1 \\ x_0 & x_1 & x_2 & \cdots & x_{n-1} \\ x_0^2 & x_1^2 & x_2^2 & \cdots & x_{n-1}^2 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ x_0^{n-1} & x_1^{n-1} & x_2^{n-1} & \cdots & x_{n-1}^{n-1} \end{bmatrix}$ **Why they're important:** 1. **Polynomial interpolation** - They're the key to finding unique polynomials through given points. If you want a polynomial passing through points (x₀,y₀), (x₁,y₁), ..., the coefficients solve: **V·c = y** 2. **Nice mathematical properties** - The determinant has a beautiful closed form: ∏(xⱼ - xᵢ) for all i < j 3. **Invertibility** - When all xᵢ values are distinct, V is always invertible **Where we see them:** - **Signal processing**: Discrete Fourier Transform matrices are Vandermonde with roots of unity - **Numerical analysis**: Polynomial fitting and interpolation algorithms - **Statistics**: Polynomial regression design matrices - **Coding theory**: Reed-Solomon error correction codes - **Computer graphics**: Bézier curves and spline interpolation - **Scientific computing**: Spectral methods and collocation techniques They're fundamental because they bridge the gap between polynomials (which are easy to work with) and linear algebra (which gives us computational tools). ## Matrix Factorizations The following are known as the big six matrix factorizations: 1. [[Cholesky Factorization]] 2. [[LU Factorization]] 3. [[QR Factorization]] 4. [[Schur Decomposition]] 5. [[Eigen Decomposition]] 6. [[Singular Value Decomposition (SVD)]] Other factorizations/decompositions - [[Butterfly Factorization]] ## [[Matrix Calculus]] ## References 1. Orthogonal Diagonalizable matrices https://www.math.wustl.edu/~freiwald/309orthogdiag.pdf 2. 3Blue1Brown Essence of Linear Algebra https://www.youtube.com/watch?v=FCmH4MqbFGs 3. The Big Picture of Linear Algebra by Prof. Gilbert Strang https://www.youtube.com/watch?v=ggWYkes-n6E 4. 1. https://nhigham.com/2022/05/18/the-big-six-matrix-factorizations/ 5. The Art of Linear Algebra, Kenji Hirannbe 2021 6. Orthogonal Diagonalization https://math.emory.edu/~lchen41/teaching/2020_Fall/Section_8-2.pdf 7. The Matrix Cookbook https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf