Linear algebra

The following is from Columbia University to Introduction to Numerical Methods course.

Fundamentals

Linear Operations

What does “linear” mean in this context? It satisfies the following two properties:

  1. Additivity: \(f(x + y) = f(x) + f(y)\)

  2. Homogeneity: \(f(\alpha x) = \alpha f(x)\) for any scalar \(\alpha\)

Core Matrix Properties

What is a strictly diagonally dominant matrix?

A matrix A is strictly diagonally dominant if for each row i:

\[|a_{ii}| > \sum_{j \neq i} |a_{ij}|\]

In other words, the magnitude of the diagonal element is larger than the sum of magnitudes of all other elements in that row.

Example:
\[\begin{split}\begin{bmatrix} 3 & -1 & 0 \\ -1 & 4 & -1 \\ 0 & -2 & 5 \end{bmatrix}\end{split}\]
What is a positive definite matrix?

A symmetric matrix A is positive definite if:

  1. \(x^TAx > 0\) for all nonzero vectors x

  2. All eigenvalues are positive

  3. All leading principal minors are positive

A positive definite matrix \(A\) with condition \(2|a_{ii}| > \sum_{j=1}^n |a_{ij}|\) ensures convergence in iterative methods.

Example of a positive definite matrix:
\[\begin{split}\begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix}\end{split}\]
This matrix is positive definite because:
  • It’s symmetric

  • Its eigenvalues are both positive

  • \(2|a_{ii}| > |a_{i1}| + |a_{i2}|\) for each row i

Basic Operations

Matrix Operations

How do you rotate a vector about the z-axis?

A rotation about the z-axis by angle \(\theta\) is represented by the rotation matrix below. The matrix multiplies a column vector to rotate it in the xy-plane:

\[\begin{split}\begin{bmatrix} \cos \theta & -\sin \theta & 0 \\ \sin \theta & \cos \theta & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} = \begin{bmatrix} x' \\ y' \\ z' \end{bmatrix}\end{split}\]
What is matrix transposition and how is it written?

The transpose \(A^T\) reflects a matrix across its main diagonal. Formally:

\[(A^T)_{ij} = A_{ji} \quad \text{for all } i,j\]
Properties:
  • \((A^T)^T = A\)

  • \((AB)^T = B^T A^T\)

  • For symmetric matrices: \(A^T = A\)

Complex Operations

What is the complex conjugate of a matrix?

Written as \(A^*\), it conjugates each entry (changes i to -i). For example:

\[\begin{split}A = \begin{bmatrix} 1+2i & 3 \\ 4 & 5-i \end{bmatrix}, \qquad A^{*} = \begin{bmatrix} 1-2i & 3 \\ 4 & 5+i \end{bmatrix}\end{split}\]
What is the Hermitian conjugate?

The Hermitian (or adjoint) operation \(A^{\dagger}\) combines complex conjugation and transposition:

\[\begin{split}A^{\dagger} = (A^{*})^{T} = \begin{bmatrix} 1-2i & 4 \\ 3 & 5+i \end{bmatrix}\end{split}\]
What is a Hermitian matrix?

A matrix \(A\) where \(A = A^{\dagger}\). Key properties:

  • Diagonal entries must be real

  • Off-diagonal entries are complex conjugates: if \(a_{ij} = x+yi\), then \(a_{ji} = x-yi\)

  • Has real eigenvalues

  • Has orthonormal eigenvectors

  • Quadratic forms \(x^{\dagger}Ax\) are real for any \(x\)

  • Is positive semidefinite if and only if all eigenvalues are \(\ge 0\)

Matrix Analysis

Eigenvalues and Eigenvectors

What is an eigenvector, intuitively?

An eigenvector \(v\) is a nonzero vector whose direction remains unchanged under a linear transformation \(A\). The transformation only scales the vector by its eigenvalue \(\lambda\):

\[Av = \lambda v\]
Key insight:
  • eigenvector = direction preserved by \(A\)

  • eigenvalue = scaling factor applied to that direction

Why are eigenvalues important?

Eigenvalues capture the essential character of a transformation:

  • In dynamics: Define system stability and oscillation frequencies

  • In quantum mechanics: Represent observable energy states

  • In data science: Measure variance along principal components

  • In optimization: Determine convergence rates and condition numbers

Diagonalization

What does it mean to “diagonalize” a matrix?

To find a basis where the matrix acts by simple scaling. Formally: for an \(n\times n\) matrix \(A\), find matrices \(S\) and \(\Lambda\) where:

\[A = S\,\Lambda\,S^{-1}\]

Here \(\Lambda\) is diagonal (eigenvalues) and \(S\) has eigenvectors as columns.

When is a matrix diagonalizable?

A matrix is diagonalizable if and only if it has n linearly independent eigenvectors. Two key cases:

  • Not all matrices are diagonalizable (these are called “defective”)

  • Hermitian matrices are always diagonalizable with orthonormal eigenvectors

What’s special about Hermitian diagonalization?

Hermitian matrices have an especially nice diagonalization (the spectral theorem):

\[A = U\,\Lambda\,U^{\dagger}\]

where:

  • \(U\) is unitary (\(U^{\dagger}=U^{-1}\))

  • \(\Lambda\) has real eigenvalues

  • The eigenvectors (columns of \(U\)) form an orthonormal basis

Matrix inverses

How do you find a matrix inverse?

For any invertible matrix, \(A^{-1}\) can be computed using the adjugate formula:

\[A^{-1} = \frac{1}{\det(A)} \operatorname{adj}(A)\]
Can you show a 2×2 example?

For a 2×2 matrix \(A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\):

  • Determinant: \(\det(A) = ad - bc\)

  • Adjugate: \(\operatorname{adj}(A) = \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}\)

  • Inverse: \(A^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}\)

Determinant and trace

What is the relationship between eigenvalues and determinant?

The determinant of a matrix is the product of its eigenvalues:

\[\det(A) = \prod_i \lambda_i\]
What is the relationship between eigenvalues and trace?

The trace of a matrix is the sum of its eigenvalues:

\[\operatorname{tr}(A) = \sum_i \lambda_i\]

The trace is also equal to the sum of diagonal elements.

Special Matrices

Matrix Types

What is a unitary matrix?

A unitary matrix \(U\) satisfies \(U^{\dagger}U = I\), where \(I\) is the identity matrix. Key properties:

  • Preserves inner products and norms

  • Has orthonormal columns

  • \(U^{-1} = U^{\dagger}\)

What is a normal matrix?

A matrix \(A\) is normal if it commutes with its adjoint: \(AA^{\dagger} = A^{\dagger}A\). This condition unifies several important matrix classes:

  • Unifies Hermitian (\(A = A^{\dagger}\)), skew-Hermitian (\(A = -A^{\dagger}\)), and unitary (\(AA^{\dagger} = I\)) matrices

  • Guarantees diagonalization by a unitary matrix: \(A = UDU^{\dagger}\)

  • Eigenspaces corresponding to different eigenvalues are orthogonal

  • Norm invariance: \(\|Ax\| = \|A^{\dagger}x\|\) for all vectors \(x\)

What is a skew-symmetric matrix?

A matrix \(A\) where \(A^T = -A\). Key properties:

  • Diagonal elements must be zero

  • Elements are antisymmetric: \(a_{ij} = -a_{ji}\)

  • All eigenvalues are either 0 or purely imaginary

  • For 3×3 matrices, represents cross products: if \(v = [a,b,c]\), then

\[\begin{split}\begin{bmatrix} 0 & -c & b \\ c & 0 & -a \\ -b & a & 0 \end{bmatrix}x = v \times x\end{split}\]
Why is \(U^{-1} = U^{\dagger}\) for unitary matrices?

For a unitary matrix \(U\), let’s prove why the inverse equals the conjugate transpose:

  1. By definition of unitary matrix:

    \[UU^{\dagger} = U^{\dagger}U = I\]
  2. By definition of inverse:

    \[UU^{-1} = U^{-1}U = I\]
  3. Therefore:

    \[UU^{\dagger} = UU^{-1} = I\]
  4. Multiply both sides by \(U^{-1}\) on the left:

    \[ \begin{align}\begin{aligned}U^{-1}(UU^{\dagger}) = U^{-1}(UU^{-1})\\(U^{-1}U)U^{\dagger} = (U^{-1}U)U^{-1}\\IU^{\dagger} = IU^{-1}\\U^{\dagger} = U^{-1}\end{aligned}\end{align} \]

This shows that for unitary matrices, the inverse and conjugate transpose must be equal.

Matrix Decomposition

LU Decomposition

What is LU decomposition?

LU decomposition factors a matrix \(A\) into a product \(A = LU\) where:

  • \(L\) is lower triangular (with 1’s on diagonal)

  • \(U\) is upper triangular

  • The factorization mimics Gaussian elimination steps

How are L and U connected to A?

The entries of L and U come from Gaussian elimination:

  • U is what you get after doing elimination (the final upper triangular form)

  • L stores the multipliers used during elimination

  • Each column of L records how you eliminated entries in that column of A

For example, consider eliminating in a 3×3 matrix:

  • First column: multiply row 1 by \(l_{21}\) and \(l_{31}\) to eliminate below \(a_{11}\)

  • Second column: multiply row 2 by \(l_{32}\) to eliminate below \(a_{22}\)

  • These multipliers form L, while the resulting matrix is U

\[\begin{split}\begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ l_{21} & 1 & 0 \\ l_{31} & l_{32} & 1 \end{bmatrix} \begin{bmatrix} u_{11} & u_{12} & u_{13} \\ 0 & u_{22} & u_{23} \\ 0 & 0 & u_{33} \end{bmatrix}\end{split}\]

The entries are related by:

  • \(u_{11} = a_{11}\)

  • \(l_{21} = a_{21}/u_{11}\)

  • \(l_{31} = a_{31}/u_{11}\)

  • \(u_{22} = a_{22} - l_{21}u_{12}\)

  • And so on…

Could you share a simple step by step example?

Let’s decompose \(A = \begin{bmatrix} 2 & 1 \\ 6 & 4 \end{bmatrix}\):

  1. Understanding what we’re looking for:

    • We want to find L and U such that A = LU

    • L will store our elimination multipliers

    • U will be the result after elimination

  2. First column elimination:

    • Need to eliminate 6 under the 2

    • Multiplier: \(l_{21} = 6/2 = 3\) (this will go in our L matrix)

    • Operation: Subtract 3 times row 1 from row 2

    • This operation transforms A into U:

    \[\begin{split}\begin{bmatrix} 2 & 1 \\ 6 & 4 \end{bmatrix} \xrightarrow{R_2 - 3R_1} \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix} = U\end{split}\]
  3. Building L and U:

    • L captures the elimination step: put 3 below the diagonal

    • U is the result we got after elimination

    • The diagonal of L is always 1’s

    \[\begin{split}L = \begin{bmatrix} 1 & 0 \\ 3 & 1 \end{bmatrix} \quad U = \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix}\end{split}\]
  4. Verify:

    \[\begin{split}LU = \begin{bmatrix} 1 & 0 \\ 3 & 1 \end{bmatrix} \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 2 & 1 \\ 6 & 4 \end{bmatrix} = A\end{split}\]
How do you solve systems using LU decomposition?

To solve \(Ax = b\) using LU decomposition, we split it into two steps:

  1. First solve \(Ly = b\) for y

  2. Then solve \(Ux = y\) for x

Why this works:
  • Since \(A = LU\), the equation \(Ax = b\) becomes \((LU)x = b\)

  • This is equivalent to \(L(Ux) = b\)

  • Let \(y = Ux\), then we have \(Ly = b\)

  • Once we find y, we can solve \(Ux = y\)

Example for a 2×2 system:

\[\begin{split}\begin{bmatrix} 2 & 1 \\ 4 & 3 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 2 & 1 \end{bmatrix} \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix}\end{split}\]

Solving \(Ax = b\) with \(b = \begin{bmatrix} 5 \\ 11 \end{bmatrix}\):

  1. First solve \(Ly = b\):

    \[\begin{split}\begin{bmatrix} 1 & 0 \\ 2 & 1 \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} = \begin{bmatrix} 5 \\ 11 \end{bmatrix}\end{split}\]
    • From first row: \(y_1 = 5\)

    • From second row: \(2y_1 + y_2 = 11\), so \(y_2 = 11 - 2(5) = 1\)

  2. Then solve \(Ux = y\):

    \[\begin{split}\begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 5 \\ 1 \end{bmatrix}\end{split}\]
    • From second row: \(x_2 = 1\)

    • From first row: \(2x_1 + x_2 = 5\), so \(2x_1 + 1 = 5\), giving \(x_1 = 2\)

What makes LU decomposition efficient?
  • Triangular systems are fast to solve

  • Determinant is just product of \(U\)’s diagonal

  • Can update factorization efficiently when \(A\) changes slightly

  • Parallel implementations available for large systems

Advanced Topics

Additional notation

What does a “hat” mean over a matrix symbol?

When you see notation like \(\hat{A}\), the hat indicates you’re working with an operator (a linear transformation) rather than just a matrix of numbers. This is common in quantum mechanics and functional analysis where operators may be infinite-dimensional or unbounded.

Gradient descent

Iterative Methods

Why use iterative methods?

For large systems, direct methods like LU decomposition become computationally expensive. Iterative methods:

  • Use less memory (don’t need to store full matrices)

  • Can exploit sparsity (many zero elements)

  • Often faster for large systems

  • Can give approximate solutions quickly

Jacobi method

What is the Jacobi method?

Let’s understand this step by step with the simplest possible example:

More Interesting Example: 2 Variables Consider these equations with different coefficients:

\[\begin{split}4x - y &= 5 \\ -2x + 3y &= 1\end{split}\]
  1. Why we can’t solve directly: - These equations represent lines in the x-y plane - Their intersection is the solution - But solving algebraically requires multiple steps

  2. The Jacobi idea: “Let’s rearrange each equation to isolate its main variable”

    Solve for x and y:

    \[\begin{split}x &= \frac{5 + y}{4} \\ y &= \frac{1 + 2x}{3}\end{split}\]
  3. See it work: Start with guess \(x = 0, y = 0\):

    Iteration 1:
    • For x: plug in y = 0 → \(x = \frac{5 + 0}{4} = 1.25\)

    • For y: plug in x = 0 → \(y = \frac{1 + 2(0)}{3} = 0.333\)

    Iteration 2:
    • For x: plug in y = 0.333 → \(x = \frac{5 + 0.333}{4} = 1.333\)

    • For y: plug in x = 1.25 → \(y = \frac{1 + 2(1.25)}{3} = 1.167\)

    Iteration 3:
    • For x: plug in y = 1.167 → \(x = \frac{5 + 1.167}{4} = 1.542\)

    • For y: plug in x = 1.333 → \(y = \frac{1 + 2(1.333)}{3} = 1.222\)

    The method converges to \(x = 1.5, y = 1.333...\) (exactly \(y = \frac{4}{3}\))

  4. Why it becomes that formula: Let’s break this down step by step using both general form and a specific example.

    Step 1: Start with a concrete example Let’s work with this 3×3 system:

    \[\begin{split}\begin{bmatrix} 4 & -1 & 0 \\ -1 & 4 & -1 \\ 0 & -1 & 4 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} 1 \\ 5 \\ 2 \end{bmatrix}\end{split}\]

    Step 2: Focus on one row Let’s look at the second row (i = 2 in an n×n matrix). This row shows how each variable relates to others:

    \[-1x_1 + 4x_2 + (-1)x_3 = 5\]

    This is easier to understand than all equations at once. We will solve for \(x_2\), which represents the variable we want to solve in row i.

    Step 3: Move terms around to isolate \(x_2\) Move everything except the \(x_2\) term to the right side:

    \[4x_2 = 5 + x_1 + x_3\]

    Step 4: Solve for \(x_2\) Divide both sides by the coefficient of \(x_2\) (which is 4):

    \[x_2 = \frac{5}{4} + \frac{1}{4}(x_1 + x_3)\]

    Step 5: Add iteration numbers and see the general pattern Looking at what we did for \(x_2\), we can write row i as:

    \[a_{i1}x_1 + a_{i2}x_2 + ... + a_{ii}x_i + ... + a_{in}x_n = b_i\]

    Move terms and solve for \(x_i\):

    \[x_i = \frac{b_i}{a_{ii}} - \frac{1}{a_{ii}}\sum_{j \neq i} a_{ij}x_j\]

    For Jacobi method, we use previous values for all variables except the one we’re solving for:

    \[x_i^{(k+1)} = \frac{b_i}{a_{ii}} - \frac{1}{a_{ii}}\sum_{j \neq i} a_{ij}x_j^{(k)}\]

    Why Local Minima Are Impossible Unlike optimization problems (like gradient descent), the Jacobi method cannot get stuck because:

    1. Linear Systems Are Different: - We’re solving \(Ax = b\), which is a linear system - Linear systems have exactly one solution (if A is nonsingular) - There are no “hills” or “valleys” to get stuck in

    2. Each Update is Exact: - We’re not minimizing a function - Each new value comes directly from the equations - If \(x_2 = \frac{5}{4} + \frac{1}{4}(x_1 + x_3)\), this is an exact relationship

    3. Guaranteed Progress: - With diagonal dominance (\(|a_{ii}| > \sum_{j \neq i} |a_{ij}|\)) - Each update moves us closer to the solution - Like following a fixed path, not searching for a minimum

    For our specific example with \(x_2\), we have:

    \[x_2^{(k+1)} = \frac{5}{4} + \frac{1}{4}(x_1^{(k)} + x_3^{(k)})\]
    The key points:
    • Each equation is solved for one variable

    • We move everything else to the right side

    • Divide by diagonal coefficient

    • Use previous iteration values for other variables

  5. Key insights: - Each new value (\(x_i^{(k+1)}\)) uses old values (\(x_j^{(k)}\)) - We’re essentially saying “if everything else stays the same, what should this value be?” - It’s like everyone adjusting their position based on where they see others standing

Mathematically, this becomes:

\[x_i^{(k+1)} = \frac{1}{a_{ii}}\left(b_i - \sum_{j \neq i} a_{ij}x_j^{(k)}\right)\]
where:
  • \(x_i^{(k)}\) is our current guess for row i in step k

  • \(a_{ii}\) is the diagonal coefficient in row i

  • \(b_i\) is the constant term in row i

  • The sum represents how other variables influence \(x_i\)

How convergence happens visually

For our 2×2 example above:

\[\begin{split}\begin{array}{|c|c|c|c|c|} \hline \text{Iteration} & x & y & \text{Error in x} & \text{Error in y} \\ \hline 0 & 0.000 & 0.000 & 1.500 & 1.333 \\ 1 & 1.250 & 0.333 & 0.250 & 1.000 \\ 2 & 1.333 & 1.167 & 0.167 & 0.167 \\ 3 & 1.542 & 1.222 & 0.042 & 0.111 \\ 4 & 1.486 & 1.306 & 0.014 & 0.027 \\ \hline \end{array}\end{split}\]
This is exactly how Jacobi iteration works:
  • Each variable (room) updates based on its neighbors’ previous values

  • All updates happen simultaneously

  • The system gradually moves toward equilibrium

Key advantages:
  • Simple to implement and understand

  • Naturally parallelizable (like all rooms updating at once)

  • Works well for diagonally dominant matrices (when each variable mainly depends on itself)

  • Memory efficient for sparse matrices (when most variables don’t affect each other)

Convergence conditions

Jacobi method converges if matrix A is either Strictly diagonally dominant or Positive definite.

Practical applications

Common uses include:

  1. Partial Differential Equations: - Solving Poisson’s equation on a grid - Heat diffusion problems - Electronic structure calculations

  2. Network Analysis: - Load flow analysis in power systems - Traffic flow in transportation networks - PageRank-like algorithms (modified versions)

  3. Image Processing: - Image reconstruction - Denoising - Solving large linear systems in tomography