Linear algebra
The following is from Columbia University to Introduction to Numerical Methods course.
Fundamentals
Linear Operations
What does “linear” mean in this context? It satisfies the following two properties:
Additivity: \(f(x + y) = f(x) + f(y)\)
Homogeneity: \(f(\alpha x) = \alpha f(x)\) for any scalar \(\alpha\)
Core Matrix Properties
- What is a strictly diagonally dominant matrix?
A matrix A is strictly diagonally dominant if for each row i:
\[|a_{ii}| > \sum_{j \neq i} |a_{ij}|\]In other words, the magnitude of the diagonal element is larger than the sum of magnitudes of all other elements in that row.
- Example:
- \[\begin{split}\begin{bmatrix} 3 & -1 & 0 \\ -1 & 4 & -1 \\ 0 & -2 & 5 \end{bmatrix}\end{split}\]
- What is a positive definite matrix?
A symmetric matrix A is positive definite if:
\(x^TAx > 0\) for all nonzero vectors x
All eigenvalues are positive
All leading principal minors are positive
A positive definite matrix \(A\) with condition \(2|a_{ii}| > \sum_{j=1}^n |a_{ij}|\) ensures convergence in iterative methods.
- Example of a positive definite matrix:
- \[\begin{split}\begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix}\end{split}\]
- This matrix is positive definite because:
It’s symmetric
Its eigenvalues are both positive
\(2|a_{ii}| > |a_{i1}| + |a_{i2}|\) for each row i
Basic Operations
Matrix Operations
- How do you rotate a vector about the z-axis?
A rotation about the z-axis by angle \(\theta\) is represented by the rotation matrix below. The matrix multiplies a column vector to rotate it in the xy-plane:
\[\begin{split}\begin{bmatrix} \cos \theta & -\sin \theta & 0 \\ \sin \theta & \cos \theta & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} = \begin{bmatrix} x' \\ y' \\ z' \end{bmatrix}\end{split}\]- What is matrix transposition and how is it written?
The transpose \(A^T\) reflects a matrix across its main diagonal. Formally:
\[(A^T)_{ij} = A_{ji} \quad \text{for all } i,j\]- Properties:
\((A^T)^T = A\)
\((AB)^T = B^T A^T\)
For symmetric matrices: \(A^T = A\)
Complex Operations
- What is the complex conjugate of a matrix?
Written as \(A^*\), it conjugates each entry (changes i to -i). For example:
\[\begin{split}A = \begin{bmatrix} 1+2i & 3 \\ 4 & 5-i \end{bmatrix}, \qquad A^{*} = \begin{bmatrix} 1-2i & 3 \\ 4 & 5+i \end{bmatrix}\end{split}\]- What is the Hermitian conjugate?
The Hermitian (or adjoint) operation \(A^{\dagger}\) combines complex conjugation and transposition:
\[\begin{split}A^{\dagger} = (A^{*})^{T} = \begin{bmatrix} 1-2i & 4 \\ 3 & 5+i \end{bmatrix}\end{split}\]- What is a Hermitian matrix?
A matrix \(A\) where \(A = A^{\dagger}\). Key properties:
Diagonal entries must be real
Off-diagonal entries are complex conjugates: if \(a_{ij} = x+yi\), then \(a_{ji} = x-yi\)
Has real eigenvalues
Has orthonormal eigenvectors
Quadratic forms \(x^{\dagger}Ax\) are real for any \(x\)
Is positive semidefinite if and only if all eigenvalues are \(\ge 0\)
Matrix Analysis
Eigenvalues and Eigenvectors
- What is an eigenvector, intuitively?
An eigenvector \(v\) is a nonzero vector whose direction remains unchanged under a linear transformation \(A\). The transformation only scales the vector by its eigenvalue \(\lambda\):
\[Av = \lambda v\]- Key insight:
eigenvector = direction preserved by \(A\)
eigenvalue = scaling factor applied to that direction
- Why are eigenvalues important?
Eigenvalues capture the essential character of a transformation:
In dynamics: Define system stability and oscillation frequencies
In quantum mechanics: Represent observable energy states
In data science: Measure variance along principal components
In optimization: Determine convergence rates and condition numbers
Diagonalization
- What does it mean to “diagonalize” a matrix?
To find a basis where the matrix acts by simple scaling. Formally: for an \(n\times n\) matrix \(A\), find matrices \(S\) and \(\Lambda\) where:
\[A = S\,\Lambda\,S^{-1}\]Here \(\Lambda\) is diagonal (eigenvalues) and \(S\) has eigenvectors as columns.
- When is a matrix diagonalizable?
A matrix is diagonalizable if and only if it has n linearly independent eigenvectors. Two key cases:
Not all matrices are diagonalizable (these are called “defective”)
Hermitian matrices are always diagonalizable with orthonormal eigenvectors
- What’s special about Hermitian diagonalization?
Hermitian matrices have an especially nice diagonalization (the spectral theorem):
\[A = U\,\Lambda\,U^{\dagger}\]where:
\(U\) is unitary (\(U^{\dagger}=U^{-1}\))
\(\Lambda\) has real eigenvalues
The eigenvectors (columns of \(U\)) form an orthonormal basis
Matrix inverses
- How do you find a matrix inverse?
For any invertible matrix, \(A^{-1}\) can be computed using the adjugate formula:
\[A^{-1} = \frac{1}{\det(A)} \operatorname{adj}(A)\]- Can you show a 2×2 example?
For a 2×2 matrix \(A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\):
Determinant: \(\det(A) = ad - bc\)
Adjugate: \(\operatorname{adj}(A) = \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}\)
Inverse: \(A^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}\)
Determinant and trace
- What is the relationship between eigenvalues and determinant?
The determinant of a matrix is the product of its eigenvalues:
\[\det(A) = \prod_i \lambda_i\]- What is the relationship between eigenvalues and trace?
The trace of a matrix is the sum of its eigenvalues:
\[\operatorname{tr}(A) = \sum_i \lambda_i\]The trace is also equal to the sum of diagonal elements.
Special Matrices
Matrix Types
- What is a unitary matrix?
A unitary matrix \(U\) satisfies \(U^{\dagger}U = I\), where \(I\) is the identity matrix. Key properties:
Preserves inner products and norms
Has orthonormal columns
\(U^{-1} = U^{\dagger}\)
- What is a normal matrix?
A matrix \(A\) is normal if it commutes with its adjoint: \(AA^{\dagger} = A^{\dagger}A\). This condition unifies several important matrix classes:
Unifies Hermitian (\(A = A^{\dagger}\)), skew-Hermitian (\(A = -A^{\dagger}\)), and unitary (\(AA^{\dagger} = I\)) matrices
Guarantees diagonalization by a unitary matrix: \(A = UDU^{\dagger}\)
Eigenspaces corresponding to different eigenvalues are orthogonal
Norm invariance: \(\|Ax\| = \|A^{\dagger}x\|\) for all vectors \(x\)
- What is a skew-symmetric matrix?
A matrix \(A\) where \(A^T = -A\). Key properties:
Diagonal elements must be zero
Elements are antisymmetric: \(a_{ij} = -a_{ji}\)
All eigenvalues are either 0 or purely imaginary
For 3×3 matrices, represents cross products: if \(v = [a,b,c]\), then
\[\begin{split}\begin{bmatrix} 0 & -c & b \\ c & 0 & -a \\ -b & a & 0 \end{bmatrix}x = v \times x\end{split}\]- Why is \(U^{-1} = U^{\dagger}\) for unitary matrices?
For a unitary matrix \(U\), let’s prove why the inverse equals the conjugate transpose:
By definition of unitary matrix:
\[UU^{\dagger} = U^{\dagger}U = I\]By definition of inverse:
\[UU^{-1} = U^{-1}U = I\]Therefore:
\[UU^{\dagger} = UU^{-1} = I\]Multiply both sides by \(U^{-1}\) on the left:
\[ \begin{align}\begin{aligned}U^{-1}(UU^{\dagger}) = U^{-1}(UU^{-1})\\(U^{-1}U)U^{\dagger} = (U^{-1}U)U^{-1}\\IU^{\dagger} = IU^{-1}\\U^{\dagger} = U^{-1}\end{aligned}\end{align} \]
This shows that for unitary matrices, the inverse and conjugate transpose must be equal.
Matrix Decomposition
LU Decomposition
- What is LU decomposition?
LU decomposition factors a matrix \(A\) into a product \(A = LU\) where:
\(L\) is lower triangular (with 1’s on diagonal)
\(U\) is upper triangular
The factorization mimics Gaussian elimination steps
- How are L and U connected to A?
The entries of L and U come from Gaussian elimination:
U is what you get after doing elimination (the final upper triangular form)
L stores the multipliers used during elimination
Each column of L records how you eliminated entries in that column of A
For example, consider eliminating in a 3×3 matrix:
First column: multiply row 1 by \(l_{21}\) and \(l_{31}\) to eliminate below \(a_{11}\)
Second column: multiply row 2 by \(l_{32}\) to eliminate below \(a_{22}\)
These multipliers form L, while the resulting matrix is U
\[\begin{split}\begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ l_{21} & 1 & 0 \\ l_{31} & l_{32} & 1 \end{bmatrix} \begin{bmatrix} u_{11} & u_{12} & u_{13} \\ 0 & u_{22} & u_{23} \\ 0 & 0 & u_{33} \end{bmatrix}\end{split}\]The entries are related by:
\(u_{11} = a_{11}\)
\(l_{21} = a_{21}/u_{11}\)
\(l_{31} = a_{31}/u_{11}\)
\(u_{22} = a_{22} - l_{21}u_{12}\)
And so on…
- Could you share a simple step by step example?
Let’s decompose \(A = \begin{bmatrix} 2 & 1 \\ 6 & 4 \end{bmatrix}\):
Understanding what we’re looking for:
We want to find L and U such that A = LU
L will store our elimination multipliers
U will be the result after elimination
First column elimination:
Need to eliminate 6 under the 2
Multiplier: \(l_{21} = 6/2 = 3\) (this will go in our L matrix)
Operation: Subtract 3 times row 1 from row 2
This operation transforms A into U:
\[\begin{split}\begin{bmatrix} 2 & 1 \\ 6 & 4 \end{bmatrix} \xrightarrow{R_2 - 3R_1} \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix} = U\end{split}\]Building L and U:
L captures the elimination step: put 3 below the diagonal
U is the result we got after elimination
The diagonal of L is always 1’s
\[\begin{split}L = \begin{bmatrix} 1 & 0 \\ 3 & 1 \end{bmatrix} \quad U = \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix}\end{split}\]Verify:
\[\begin{split}LU = \begin{bmatrix} 1 & 0 \\ 3 & 1 \end{bmatrix} \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 2 & 1 \\ 6 & 4 \end{bmatrix} = A\end{split}\]
- How do you solve systems using LU decomposition?
To solve \(Ax = b\) using LU decomposition, we split it into two steps:
First solve \(Ly = b\) for y
Then solve \(Ux = y\) for x
- Why this works:
Since \(A = LU\), the equation \(Ax = b\) becomes \((LU)x = b\)
This is equivalent to \(L(Ux) = b\)
Let \(y = Ux\), then we have \(Ly = b\)
Once we find y, we can solve \(Ux = y\)
Example for a 2×2 system:
\[\begin{split}\begin{bmatrix} 2 & 1 \\ 4 & 3 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 2 & 1 \end{bmatrix} \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix}\end{split}\]Solving \(Ax = b\) with \(b = \begin{bmatrix} 5 \\ 11 \end{bmatrix}\):
First solve \(Ly = b\):
\[\begin{split}\begin{bmatrix} 1 & 0 \\ 2 & 1 \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} = \begin{bmatrix} 5 \\ 11 \end{bmatrix}\end{split}\]From first row: \(y_1 = 5\)
From second row: \(2y_1 + y_2 = 11\), so \(y_2 = 11 - 2(5) = 1\)
Then solve \(Ux = y\):
\[\begin{split}\begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 5 \\ 1 \end{bmatrix}\end{split}\]From second row: \(x_2 = 1\)
From first row: \(2x_1 + x_2 = 5\), so \(2x_1 + 1 = 5\), giving \(x_1 = 2\)
- What makes LU decomposition efficient?
Triangular systems are fast to solve
Determinant is just product of \(U\)’s diagonal
Can update factorization efficiently when \(A\) changes slightly
Parallel implementations available for large systems
Advanced Topics
Additional notation
- What does a “hat” mean over a matrix symbol?
When you see notation like \(\hat{A}\), the hat indicates you’re working with an operator (a linear transformation) rather than just a matrix of numbers. This is common in quantum mechanics and functional analysis where operators may be infinite-dimensional or unbounded.
Gradient descent
Iterative Methods
- Why use iterative methods?
For large systems, direct methods like LU decomposition become computationally expensive. Iterative methods:
Use less memory (don’t need to store full matrices)
Can exploit sparsity (many zero elements)
Often faster for large systems
Can give approximate solutions quickly
Jacobi method
- What is the Jacobi method?
Let’s understand this step by step with the simplest possible example:
More Interesting Example: 2 Variables Consider these equations with different coefficients:
\[\begin{split}4x - y &= 5 \\ -2x + 3y &= 1\end{split}\]Why we can’t solve directly: - These equations represent lines in the x-y plane - Their intersection is the solution - But solving algebraically requires multiple steps
The Jacobi idea: “Let’s rearrange each equation to isolate its main variable”
Solve for x and y:
\[\begin{split}x &= \frac{5 + y}{4} \\ y &= \frac{1 + 2x}{3}\end{split}\]See it work: Start with guess \(x = 0, y = 0\):
- Iteration 1:
For x: plug in y = 0 → \(x = \frac{5 + 0}{4} = 1.25\)
For y: plug in x = 0 → \(y = \frac{1 + 2(0)}{3} = 0.333\)
- Iteration 2:
For x: plug in y = 0.333 → \(x = \frac{5 + 0.333}{4} = 1.333\)
For y: plug in x = 1.25 → \(y = \frac{1 + 2(1.25)}{3} = 1.167\)
- Iteration 3:
For x: plug in y = 1.167 → \(x = \frac{5 + 1.167}{4} = 1.542\)
For y: plug in x = 1.333 → \(y = \frac{1 + 2(1.333)}{3} = 1.222\)
The method converges to \(x = 1.5, y = 1.333...\) (exactly \(y = \frac{4}{3}\))
Why it becomes that formula: Let’s break this down step by step using both general form and a specific example.
Step 1: Start with a concrete example Let’s work with this 3×3 system:
\[\begin{split}\begin{bmatrix} 4 & -1 & 0 \\ -1 & 4 & -1 \\ 0 & -1 & 4 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 1 \\ 5 \\ 2 \end{bmatrix}\end{split}\]Step 2: Focus on one row Let’s look at the second row (i = 2 in an n×n matrix). This row shows how each variable relates to others:
\[-1x_1 + 4x_2 + (-1)x_3 = 5\]This is easier to understand than all equations at once. We will solve for \(x_2\), which represents the variable we want to solve in row i.
Step 3: Move terms around to isolate \(x_2\) Move everything except the \(x_2\) term to the right side:
\[4x_2 = 5 + x_1 + x_3\]Step 4: Solve for \(x_2\) Divide both sides by the coefficient of \(x_2\) (which is 4):
\[x_2 = \frac{5}{4} + \frac{1}{4}(x_1 + x_3)\]Step 5: Add iteration numbers and see the general pattern Looking at what we did for \(x_2\), we can write row i as:
\[a_{i1}x_1 + a_{i2}x_2 + ... + a_{ii}x_i + ... + a_{in}x_n = b_i\]Move terms and solve for \(x_i\):
\[x_i = \frac{b_i}{a_{ii}} - \frac{1}{a_{ii}}\sum_{j \neq i} a_{ij}x_j\]For Jacobi method, we use previous values for all variables except the one we’re solving for:
\[x_i^{(k+1)} = \frac{b_i}{a_{ii}} - \frac{1}{a_{ii}}\sum_{j \neq i} a_{ij}x_j^{(k)}\]Why Local Minima Are Impossible Unlike optimization problems (like gradient descent), the Jacobi method cannot get stuck because:
Linear Systems Are Different: - We’re solving \(Ax = b\), which is a linear system - Linear systems have exactly one solution (if A is nonsingular) - There are no “hills” or “valleys” to get stuck in
Each Update is Exact: - We’re not minimizing a function - Each new value comes directly from the equations - If \(x_2 = \frac{5}{4} + \frac{1}{4}(x_1 + x_3)\), this is an exact relationship
Guaranteed Progress: - With diagonal dominance (\(|a_{ii}| > \sum_{j \neq i} |a_{ij}|\)) - Each update moves us closer to the solution - Like following a fixed path, not searching for a minimum
For our specific example with \(x_2\), we have:
\[x_2^{(k+1)} = \frac{5}{4} + \frac{1}{4}(x_1^{(k)} + x_3^{(k)})\]- The key points:
Each equation is solved for one variable
We move everything else to the right side
Divide by diagonal coefficient
Use previous iteration values for other variables
Key insights: - Each new value (\(x_i^{(k+1)}\)) uses old values (\(x_j^{(k)}\)) - We’re essentially saying “if everything else stays the same, what should this value be?” - It’s like everyone adjusting their position based on where they see others standing
Mathematically, this becomes:
\[x_i^{(k+1)} = \frac{1}{a_{ii}}\left(b_i - \sum_{j \neq i} a_{ij}x_j^{(k)}\right)\]- where:
\(x_i^{(k)}\) is our current guess for row i in step k
\(a_{ii}\) is the diagonal coefficient in row i
\(b_i\) is the constant term in row i
The sum represents how other variables influence \(x_i\)
- How convergence happens visually
For our 2×2 example above:
\[\begin{split}\begin{array}{|c|c|c|c|c|} \hline \text{Iteration} & x & y & \text{Error in x} & \text{Error in y} \\ \hline 0 & 0.000 & 0.000 & 1.500 & 1.333 \\ 1 & 1.250 & 0.333 & 0.250 & 1.000 \\ 2 & 1.333 & 1.167 & 0.167 & 0.167 \\ 3 & 1.542 & 1.222 & 0.042 & 0.111 \\ 4 & 1.486 & 1.306 & 0.014 & 0.027 \\ \hline \end{array}\end{split}\]- This is exactly how Jacobi iteration works:
Each variable (room) updates based on its neighbors’ previous values
All updates happen simultaneously
The system gradually moves toward equilibrium
- Key advantages:
Simple to implement and understand
Naturally parallelizable (like all rooms updating at once)
Works well for diagonally dominant matrices (when each variable mainly depends on itself)
Memory efficient for sparse matrices (when most variables don’t affect each other)
Convergence conditions
Jacobi method converges if matrix A is either Strictly diagonally dominant or Positive definite.
Practical applications
Common uses include:
Partial Differential Equations: - Solving Poisson’s equation on a grid - Heat diffusion problems - Electronic structure calculations
Network Analysis: - Load flow analysis in power systems - Traffic flow in transportation networks - PageRank-like algorithms (modified versions)
Image Processing: - Image reconstruction - Denoising - Solving large linear systems in tomography