.. _linear-algebra: Linear algebra ============== .. raw:: html
The following is from Columbia University to Introduction to Numerical Methods course. Fundamentals ----------- Linear Operations ^^^^^^^^^^^^^^^ What does "linear" mean in this context? It satisfies the following two properties: 1. Additivity: :math:`f(x + y) = f(x) + f(y)` 2. Homogeneity: :math:`f(\alpha x) = \alpha f(x)` for any scalar :math:`\alpha` Core Matrix Properties ^^^^^^^^^^^^^^^^^^^ What is a strictly diagonally dominant matrix? A matrix A is strictly diagonally dominant if for each row i: .. math:: |a_{ii}| > \sum_{j \neq i} |a_{ij}| In other words, the magnitude of the diagonal element is larger than the sum of magnitudes of all other elements in that row. Example: .. math:: \begin{bmatrix} 3 & -1 & 0 \\ -1 & 4 & -1 \\ 0 & -2 & 5 \end{bmatrix} What is a positive definite matrix? A symmetric matrix A is positive definite if: 1. :math:`x^TAx > 0` for all nonzero vectors x 2. All eigenvalues are positive 3. All leading principal minors are positive A positive definite matrix :math:`A` with condition :math:`2|a_{ii}| > \sum_{j=1}^n |a_{ij}|` ensures convergence in iterative methods. Example of a positive definite matrix: .. math:: \begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix} This matrix is positive definite because: - It's symmetric - Its eigenvalues are both positive - :math:`2|a_{ii}| > |a_{i1}| + |a_{i2}|` for each row i Basic Operations -------------- Matrix Operations ^^^^^^^^^^^^^ How do you rotate a vector about the z-axis? A rotation about the z-axis by angle :math:`\theta` is represented by the rotation matrix below. The matrix multiplies a column vector to rotate it in the xy-plane: .. math:: \begin{bmatrix} \cos \theta & -\sin \theta & 0 \\ \sin \theta & \cos \theta & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} = \begin{bmatrix} x' \\ y' \\ z' \end{bmatrix} What is matrix transposition and how is it written? The transpose :math:`A^T` reflects a matrix across its main diagonal. Formally: .. math:: (A^T)_{ij} = A_{ji} \quad \text{for all } i,j Properties: - :math:`(A^T)^T = A` - :math:`(AB)^T = B^T A^T` - For symmetric matrices: :math:`A^T = A` Complex Operations ^^^^^^^^^^^^^ What is the complex conjugate of a matrix? Written as :math:`A^*`, it conjugates each entry (changes i to -i). For example: .. math:: A = \begin{bmatrix} 1+2i & 3 \\ 4 & 5-i \end{bmatrix}, \qquad A^{*} = \begin{bmatrix} 1-2i & 3 \\ 4 & 5+i \end{bmatrix} What is the Hermitian conjugate? The Hermitian (or adjoint) operation :math:`A^{\dagger}` combines complex conjugation and transposition: .. math:: A^{\dagger} = (A^{*})^{T} = \begin{bmatrix} 1-2i & 4 \\ 3 & 5+i \end{bmatrix} What is a Hermitian matrix? A matrix :math:`A` where :math:`A = A^{\dagger}`. Key properties: - Diagonal entries must be real - Off-diagonal entries are complex conjugates: if :math:`a_{ij} = x+yi`, then :math:`a_{ji} = x-yi` - Has real eigenvalues - Has orthonormal eigenvectors - Quadratic forms :math:`x^{\dagger}Ax` are real for any :math:`x` - Is positive semidefinite if and only if all eigenvalues are :math:`\ge 0` Matrix Analysis ------------- Eigenvalues and Eigenvectors ^^^^^^^^^^^^^^^^^^^^^^^^ What is an eigenvector, intuitively? An eigenvector :math:`v` is a nonzero vector whose direction remains unchanged under a linear transformation :math:`A`. The transformation only scales the vector by its eigenvalue :math:`\lambda`: .. math:: Av = \lambda v Key insight: - eigenvector = direction preserved by :math:`A` - eigenvalue = scaling factor applied to that direction Why are eigenvalues important? Eigenvalues capture the essential character of a transformation: - In dynamics: Define system stability and oscillation frequencies - In quantum mechanics: Represent observable energy states - In data science: Measure variance along principal components - In optimization: Determine convergence rates and condition numbers Diagonalization -------------- What does it mean to "diagonalize" a matrix? To find a basis where the matrix acts by simple scaling. Formally: for an :math:`n\times n` matrix :math:`A`, find matrices :math:`S` and :math:`\Lambda` where: .. math:: A = S\,\Lambda\,S^{-1} Here :math:`\Lambda` is diagonal (eigenvalues) and :math:`S` has eigenvectors as columns. When is a matrix diagonalizable? A matrix is diagonalizable if and only if it has n linearly independent eigenvectors. Two key cases: - Not all matrices are diagonalizable (these are called "defective") - Hermitian matrices are always diagonalizable with orthonormal eigenvectors What's special about Hermitian diagonalization? Hermitian matrices have an especially nice diagonalization (the spectral theorem): .. math:: A = U\,\Lambda\,U^{\dagger} where: - :math:`U` is unitary (:math:`U^{\dagger}=U^{-1}`) - :math:`\Lambda` has real eigenvalues - The eigenvectors (columns of :math:`U`) form an orthonormal basis Matrix inverses ------------- How do you find a matrix inverse? For any invertible matrix, :math:`A^{-1}` can be computed using the adjugate formula: .. math:: A^{-1} = \frac{1}{\det(A)} \operatorname{adj}(A) Can you show a 2×2 example? For a 2×2 matrix :math:`A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}`: - Determinant: :math:`\det(A) = ad - bc` - Adjugate: :math:`\operatorname{adj}(A) = \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}` - Inverse: :math:`A^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}` Determinant and trace ------------------- What is the relationship between eigenvalues and determinant? The determinant of a matrix is the product of its eigenvalues: .. math:: \det(A) = \prod_i \lambda_i What is the relationship between eigenvalues and trace? The trace of a matrix is the sum of its eigenvalues: .. math:: \operatorname{tr}(A) = \sum_i \lambda_i The trace is also equal to the sum of diagonal elements. Special Matrices ------------- Matrix Types ^^^^^^^^^^ What is a unitary matrix? A unitary matrix :math:`U` satisfies :math:`U^{\dagger}U = I`, where :math:`I` is the identity matrix. Key properties: - Preserves inner products and norms - Has orthonormal columns - :math:`U^{-1} = U^{\dagger}` What is a normal matrix? A matrix :math:`A` is normal if it commutes with its adjoint: :math:`AA^{\dagger} = A^{\dagger}A`. This condition unifies several important matrix classes: - Unifies Hermitian (:math:`A = A^{\dagger}`), skew-Hermitian (:math:`A = -A^{\dagger}`), and unitary (:math:`AA^{\dagger} = I`) matrices - Guarantees diagonalization by a unitary matrix: :math:`A = UDU^{\dagger}` - Eigenspaces corresponding to different eigenvalues are orthogonal - Norm invariance: :math:`\|Ax\| = \|A^{\dagger}x\|` for all vectors :math:`x` What is a skew-symmetric matrix? A matrix :math:`A` where :math:`A^T = -A`. Key properties: - Diagonal elements must be zero - Elements are antisymmetric: :math:`a_{ij} = -a_{ji}` - All eigenvalues are either 0 or purely imaginary - For 3×3 matrices, represents cross products: if :math:`v = [a,b,c]`, then .. math:: \begin{bmatrix} 0 & -c & b \\ c & 0 & -a \\ -b & a & 0 \end{bmatrix}x = v \times x Why is :math:`U^{-1} = U^{\dagger}` for unitary matrices? For a unitary matrix :math:`U`, let's prove why the inverse equals the conjugate transpose: 1. By definition of unitary matrix: .. math:: UU^{\dagger} = U^{\dagger}U = I 2. By definition of inverse: .. math:: UU^{-1} = U^{-1}U = I 3. Therefore: .. math:: UU^{\dagger} = UU^{-1} = I 4. Multiply both sides by :math:`U^{-1}` on the left: .. math:: U^{-1}(UU^{\dagger}) = U^{-1}(UU^{-1}) (U^{-1}U)U^{\dagger} = (U^{-1}U)U^{-1} IU^{\dagger} = IU^{-1} U^{\dagger} = U^{-1} This shows that for unitary matrices, the inverse and conjugate transpose must be equal. Matrix Decomposition ---------------- LU Decomposition ^^^^^^^^^^^^^ What is LU decomposition? LU decomposition factors a matrix :math:`A` into a product :math:`A = LU` where: - :math:`L` is lower triangular (with 1's on diagonal) - :math:`U` is upper triangular - The factorization mimics Gaussian elimination steps How are L and U connected to A? The entries of L and U come from Gaussian elimination: - U is what you get after doing elimination (the final upper triangular form) - L stores the multipliers used during elimination - Each column of L records how you eliminated entries in that column of A For example, consider eliminating in a 3×3 matrix: - First column: multiply row 1 by :math:`l_{21}` and :math:`l_{31}` to eliminate below :math:`a_{11}` - Second column: multiply row 2 by :math:`l_{32}` to eliminate below :math:`a_{22}` - These multipliers form L, while the resulting matrix is U .. math:: \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ l_{21} & 1 & 0 \\ l_{31} & l_{32} & 1 \end{bmatrix} \begin{bmatrix} u_{11} & u_{12} & u_{13} \\ 0 & u_{22} & u_{23} \\ 0 & 0 & u_{33} \end{bmatrix} The entries are related by: - :math:`u_{11} = a_{11}` - :math:`l_{21} = a_{21}/u_{11}` - :math:`l_{31} = a_{31}/u_{11}` - :math:`u_{22} = a_{22} - l_{21}u_{12}` - And so on... Could you share a simple step by step example? Let's decompose :math:`A = \begin{bmatrix} 2 & 1 \\ 6 & 4 \end{bmatrix}`: 1. Understanding what we're looking for: - We want to find L and U such that A = LU - L will store our elimination multipliers - U will be the result after elimination 2. First column elimination: - Need to eliminate 6 under the 2 - Multiplier: :math:`l_{21} = 6/2 = 3` (this will go in our L matrix) - Operation: Subtract 3 times row 1 from row 2 - This operation transforms A into U: .. math:: \begin{bmatrix} 2 & 1 \\ 6 & 4 \end{bmatrix} \xrightarrow{R_2 - 3R_1} \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix} = U 3. Building L and U: - L captures the elimination step: put 3 below the diagonal - U is the result we got after elimination - The diagonal of L is always 1's .. math:: L = \begin{bmatrix} 1 & 0 \\ 3 & 1 \end{bmatrix} \quad U = \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix} 4. Verify: .. math:: LU = \begin{bmatrix} 1 & 0 \\ 3 & 1 \end{bmatrix} \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 2 & 1 \\ 6 & 4 \end{bmatrix} = A How do you solve systems using LU decomposition? To solve :math:`Ax = b` using LU decomposition, we split it into two steps: 1. First solve :math:`Ly = b` for y 2. Then solve :math:`Ux = y` for x Why this works: - Since :math:`A = LU`, the equation :math:`Ax = b` becomes :math:`(LU)x = b` - This is equivalent to :math:`L(Ux) = b` - Let :math:`y = Ux`, then we have :math:`Ly = b` - Once we find y, we can solve :math:`Ux = y` Example for a 2×2 system: .. math:: \begin{bmatrix} 2 & 1 \\ 4 & 3 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 2 & 1 \end{bmatrix} \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix} Solving :math:`Ax = b` with :math:`b = \begin{bmatrix} 5 \\ 11 \end{bmatrix}`: 1. First solve :math:`Ly = b`: .. math:: \begin{bmatrix} 1 & 0 \\ 2 & 1 \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} = \begin{bmatrix} 5 \\ 11 \end{bmatrix} - From first row: :math:`y_1 = 5` - From second row: :math:`2y_1 + y_2 = 11`, so :math:`y_2 = 11 - 2(5) = 1` 2. Then solve :math:`Ux = y`: .. math:: \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 5 \\ 1 \end{bmatrix} - From second row: :math:`x_2 = 1` - From first row: :math:`2x_1 + x_2 = 5`, so :math:`2x_1 + 1 = 5`, giving :math:`x_1 = 2` What makes LU decomposition efficient? - Triangular systems are fast to solve - Determinant is just product of :math:`U`'s diagonal - Can update factorization efficiently when :math:`A` changes slightly - Parallel implementations available for large systems Advanced Topics ------------ Additional notation ------------------- What does a "hat" mean over a matrix symbol? When you see notation like :math:`\hat{A}`, the hat indicates you're working with an operator (a linear transformation) rather than just a matrix of numbers. This is common in quantum mechanics and functional analysis where operators may be infinite-dimensional or unbounded. Gradient descent ---------------- Iterative Methods ^^^^^^^^^^^^^ Why use iterative methods? For large systems, direct methods like LU decomposition become computationally expensive. Iterative methods: - Use less memory (don't need to store full matrices) - Can exploit sparsity (many zero elements) - Often faster for large systems - Can give approximate solutions quickly Jacobi method ^^^^^^^^^^^^ What is the Jacobi method? Let's understand this step by step with the simplest possible example: **More Interesting Example: 2 Variables** Consider these equations with different coefficients: .. math:: 4x - y &= 5 \\ -2x + 3y &= 1 1. **Why we can't solve directly**: - These equations represent lines in the x-y plane - Their intersection is the solution - But solving algebraically requires multiple steps 2. **The Jacobi idea**: "Let's rearrange each equation to isolate its main variable" Solve for x and y: .. math:: x &= \frac{5 + y}{4} \\ y &= \frac{1 + 2x}{3} 3. **See it work**: Start with guess :math:`x = 0, y = 0`: Iteration 1: - For x: plug in y = 0 → :math:`x = \frac{5 + 0}{4} = 1.25` - For y: plug in x = 0 → :math:`y = \frac{1 + 2(0)}{3} = 0.333` Iteration 2: - For x: plug in y = 0.333 → :math:`x = \frac{5 + 0.333}{4} = 1.333` - For y: plug in x = 1.25 → :math:`y = \frac{1 + 2(1.25)}{3} = 1.167` Iteration 3: - For x: plug in y = 1.167 → :math:`x = \frac{5 + 1.167}{4} = 1.542` - For y: plug in x = 1.333 → :math:`y = \frac{1 + 2(1.333)}{3} = 1.222` The method converges to :math:`x = 1.5, y = 1.333...` (exactly :math:`y = \frac{4}{3}`) 4. **Why it becomes that formula**: Let's break this down step by step using both general form and a specific example. **Step 1: Start with a concrete example** Let's work with this 3×3 system: .. math:: \begin{bmatrix} 4 & -1 & 0 \\ -1 & 4 & -1 \\ 0 & -1 & 4 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 1 \\ 5 \\ 2 \end{bmatrix} **Step 2: Focus on one row** Let's look at the second row (i = 2 in an n×n matrix). This row shows how each variable relates to others: .. math:: -1x_1 + 4x_2 + (-1)x_3 = 5 This is easier to understand than all equations at once. We will solve for :math:`x_2`, which represents the variable we want to solve in row i. **Step 3: Move terms around to isolate** :math:`x_2` Move everything except the :math:`x_2` term to the right side: .. math:: 4x_2 = 5 + x_1 + x_3 **Step 4: Solve for** :math:`x_2` Divide both sides by the coefficient of :math:`x_2` (which is 4): .. math:: x_2 = \frac{5}{4} + \frac{1}{4}(x_1 + x_3) **Step 5: Add iteration numbers and see the general pattern** Looking at what we did for :math:`x_2`, we can write row i as: .. math:: a_{i1}x_1 + a_{i2}x_2 + ... + a_{ii}x_i + ... + a_{in}x_n = b_i Move terms and solve for :math:`x_i`: .. math:: x_i = \frac{b_i}{a_{ii}} - \frac{1}{a_{ii}}\sum_{j \neq i} a_{ij}x_j For Jacobi method, we use previous values for all variables except the one we're solving for: .. math:: x_i^{(k+1)} = \frac{b_i}{a_{ii}} - \frac{1}{a_{ii}}\sum_{j \neq i} a_{ij}x_j^{(k)} **Why Local Minima Are Impossible** Unlike optimization problems (like gradient descent), the Jacobi method cannot get stuck because: 1. **Linear Systems Are Different**: - We're solving :math:`Ax = b`, which is a linear system - Linear systems have exactly one solution (if A is nonsingular) - There are no "hills" or "valleys" to get stuck in 2. **Each Update is Exact**: - We're not minimizing a function - Each new value comes directly from the equations - If :math:`x_2 = \frac{5}{4} + \frac{1}{4}(x_1 + x_3)`, this is an exact relationship 3. **Guaranteed Progress**: - With diagonal dominance (:math:`|a_{ii}| > \sum_{j \neq i} |a_{ij}|`) - Each update moves us closer to the solution - Like following a fixed path, not searching for a minimum For our specific example with :math:`x_2`, we have: .. math:: x_2^{(k+1)} = \frac{5}{4} + \frac{1}{4}(x_1^{(k)} + x_3^{(k)}) The key points: - Each equation is solved for one variable - We move everything else to the right side - Divide by diagonal coefficient - Use previous iteration values for other variables 5. **Key insights**: - Each new value (:math:`x_i^{(k+1)}`) uses old values (:math:`x_j^{(k)}`) - We're essentially saying "if everything else stays the same, what should this value be?" - It's like everyone adjusting their position based on where they see others standing Mathematically, this becomes: .. math:: x_i^{(k+1)} = \frac{1}{a_{ii}}\left(b_i - \sum_{j \neq i} a_{ij}x_j^{(k)}\right) where: - :math:`x_i^{(k)}` is our current guess for row i in step k - :math:`a_{ii}` is the diagonal coefficient in row i - :math:`b_i` is the constant term in row i - The sum represents how other variables influence :math:`x_i` How convergence happens visually For our 2×2 example above: .. math:: \begin{array}{|c|c|c|c|c|} \hline \text{Iteration} & x & y & \text{Error in x} & \text{Error in y} \\ \hline 0 & 0.000 & 0.000 & 1.500 & 1.333 \\ 1 & 1.250 & 0.333 & 0.250 & 1.000 \\ 2 & 1.333 & 1.167 & 0.167 & 0.167 \\ 3 & 1.542 & 1.222 & 0.042 & 0.111 \\ 4 & 1.486 & 1.306 & 0.014 & 0.027 \\ \hline \end{array} This is exactly how Jacobi iteration works: - Each variable (room) updates based on its neighbors' previous values - All updates happen simultaneously - The system gradually moves toward equilibrium Key advantages: - Simple to implement and understand - Naturally parallelizable (like all rooms updating at once) - Works well for diagonally dominant matrices (when each variable mainly depends on itself) - Memory efficient for sparse matrices (when most variables don't affect each other) Convergence conditions """""""""""""""""""" Jacobi method converges if matrix A is either **Strictly diagonally dominant** or **Positive definite**. Practical applications """""""""""""""""""" Common uses include: 1. **Partial Differential Equations**: - Solving Poisson's equation on a grid - Heat diffusion problems - Electronic structure calculations 2. **Network Analysis**: - Load flow analysis in power systems - Traffic flow in transportation networks - PageRank-like algorithms (modified versions) 3. **Image Processing**: - Image reconstruction - Denoising - Solving large linear systems in tomography