.. _linear-algebra:

Linear algebra
==============

.. raw:: html

   <div id="question-toc-container"></div>
   <script src="../_static/question-nav.js"></script>
   
The following is from Columbia University to Introduction to Numerical Methods course.

Fundamentals
-----------

Linear Operations
^^^^^^^^^^^^^^^
What does "linear" mean in this context? It satisfies the following two properties:

1. Additivity: :math:`f(x + y) = f(x) + f(y)`
2. Homogeneity: :math:`f(\alpha x) = \alpha f(x)` for any scalar :math:`\alpha`

Core Matrix Properties
^^^^^^^^^^^^^^^^^^^

What is a strictly diagonally dominant matrix?
  A matrix A is strictly diagonally dominant if for each row i:

  .. math::

      |a_{ii}| > \sum_{j \neq i} |a_{ij}|

  In other words, the magnitude of the diagonal element is larger than the sum of magnitudes of all other elements in that row.

  Example:
    .. math::
        \begin{bmatrix}
        3 & -1 & 0 \\
        -1 & 4 & -1 \\
        0 & -2 & 5
        \end{bmatrix}

What is a positive definite matrix?
  A symmetric matrix A is positive definite if:
  
  1. :math:`x^TAx > 0` for all nonzero vectors x
  2. All eigenvalues are positive
  3. All leading principal minors are positive

  A positive definite matrix :math:`A` with condition :math:`2|a_{ii}| > \sum_{j=1}^n |a_{ij}|` ensures convergence in iterative methods.

  Example of a positive definite matrix:
    .. math::
        \begin{bmatrix}
        2 & -1 \\
        -1 & 2
        \end{bmatrix}

    This matrix is positive definite because:
      - It's symmetric
      - Its eigenvalues are both positive
      - :math:`2|a_{ii}| > |a_{i1}| + |a_{i2}|` for each row i

Basic Operations
--------------

Matrix Operations
^^^^^^^^^^^^^
How do you rotate a vector about the z-axis?
  A rotation about the z-axis by angle :math:`\theta` is represented by the rotation matrix below. The matrix multiplies a column vector to rotate it in the xy-plane:

  .. math::

      \begin{bmatrix}
      \cos \theta & -\sin \theta & 0 \\
      \sin \theta & \cos \theta & 0 \\
      0 & 0 & 1
      \end{bmatrix}
      \begin{bmatrix}
      x \\
      y \\
      z
      \end{bmatrix}
      =
      \begin{bmatrix}
      x' \\
      y' \\
      z'
      \end{bmatrix}

What is matrix transposition and how is it written?
  The transpose :math:`A^T` reflects a matrix across its main diagonal. Formally:

  .. math::

      (A^T)_{ij} = A_{ji} \quad \text{for all } i,j

  Properties:
    - :math:`(A^T)^T = A`
    - :math:`(AB)^T = B^T A^T`
    - For symmetric matrices: :math:`A^T = A`

Complex Operations
^^^^^^^^^^^^^

What is the complex conjugate of a matrix?
  Written as :math:`A^*`, it conjugates each entry (changes i to -i). For example:

  .. math::

      A = \begin{bmatrix} 1+2i & 3 \\ 4 & 5-i \end{bmatrix}, \qquad
      A^{*} = \begin{bmatrix} 1-2i & 3 \\ 4 & 5+i \end{bmatrix}

What is the Hermitian conjugate?
  The Hermitian (or adjoint) operation :math:`A^{\dagger}` combines complex conjugation and transposition:

  .. math::

      A^{\dagger} = (A^{*})^{T} = \begin{bmatrix} 1-2i & 4 \\ 3 & 5+i \end{bmatrix}


What is a Hermitian matrix?
  A matrix :math:`A` where :math:`A = A^{\dagger}`. Key properties:
  
    - Diagonal entries must be real
    - Off-diagonal entries are complex conjugates: if :math:`a_{ij} = x+yi`, then :math:`a_{ji} = x-yi`
    - Has real eigenvalues
    - Has orthonormal eigenvectors
    - Quadratic forms :math:`x^{\dagger}Ax` are real for any :math:`x`
    - Is positive semidefinite if and only if all eigenvalues are :math:`\ge 0`

Matrix Analysis
-------------

Eigenvalues and Eigenvectors
^^^^^^^^^^^^^^^^^^^^^^^^
What is an eigenvector, intuitively?
  An eigenvector :math:`v` is a nonzero vector whose direction remains unchanged under a linear transformation :math:`A`. The transformation only scales the vector by its eigenvalue :math:`\lambda`:

  .. math::

      Av = \lambda v

  Key insight:
    - eigenvector = direction preserved by :math:`A`
    - eigenvalue = scaling factor applied to that direction

Why are eigenvalues important?
  Eigenvalues capture the essential character of a transformation:
  
    - In dynamics: Define system stability and oscillation frequencies
    - In quantum mechanics: Represent observable energy states
    - In data science: Measure variance along principal components
    - In optimization: Determine convergence rates and condition numbers

Diagonalization
--------------

What does it mean to "diagonalize" a matrix?
  To find a basis where the matrix acts by simple scaling. Formally: for an :math:`n\times n` matrix :math:`A`, find matrices :math:`S` and :math:`\Lambda` where:

  .. math::

      A = S\,\Lambda\,S^{-1}

  Here :math:`\Lambda` is diagonal (eigenvalues) and :math:`S` has eigenvectors as columns.

When is a matrix diagonalizable?
  A matrix is diagonalizable if and only if it has n linearly independent eigenvectors. Two key cases:
  
    - Not all matrices are diagonalizable (these are called "defective")
    - Hermitian matrices are always diagonalizable with orthonormal eigenvectors

What's special about Hermitian diagonalization?
  Hermitian matrices have an especially nice diagonalization (the spectral theorem):
  
  .. math::

      A = U\,\Lambda\,U^{\dagger}

  where:
  
    - :math:`U` is unitary (:math:`U^{\dagger}=U^{-1}`)
    - :math:`\Lambda` has real eigenvalues
    - The eigenvectors (columns of :math:`U`) form an orthonormal basis

Matrix inverses
-------------

How do you find a matrix inverse?
  For any invertible matrix, :math:`A^{-1}` can be computed using the adjugate formula:

  .. math::

      A^{-1} = \frac{1}{\det(A)} \operatorname{adj}(A)

Can you show a 2×2 example?
  For a 2×2 matrix :math:`A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}`:
  
    - Determinant: :math:`\det(A) = ad - bc`
    - Adjugate: :math:`\operatorname{adj}(A) = \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}`
    - Inverse: :math:`A^{-1} = \frac{1}{ad-bc} \begin{bmatrix} d & -b \\ -c & a \end{bmatrix}`

Determinant and trace
-------------------

What is the relationship between eigenvalues and determinant?
  The determinant of a matrix is the product of its eigenvalues:

  .. math::

      \det(A) = \prod_i \lambda_i

What is the relationship between eigenvalues and trace?
  The trace of a matrix is the sum of its eigenvalues:

  .. math::

      \operatorname{tr}(A) = \sum_i \lambda_i

  The trace is also equal to the sum of diagonal elements.

Special Matrices
-------------

Matrix Types
^^^^^^^^^^

What is a unitary matrix?
  A unitary matrix :math:`U` satisfies :math:`U^{\dagger}U = I`, where :math:`I` is the identity matrix. Key properties:
  
    - Preserves inner products and norms
    - Has orthonormal columns
    - :math:`U^{-1} = U^{\dagger}`

What is a normal matrix?
  A matrix :math:`A` is normal if it commutes with its adjoint: :math:`AA^{\dagger} = A^{\dagger}A`. This condition unifies several important matrix classes:
  
    - Unifies Hermitian (:math:`A = A^{\dagger}`), skew-Hermitian (:math:`A = -A^{\dagger}`), and unitary (:math:`AA^{\dagger} = I`) matrices
    - Guarantees diagonalization by a unitary matrix: :math:`A = UDU^{\dagger}`
    - Eigenspaces corresponding to different eigenvalues are orthogonal
    - Norm invariance: :math:`\|Ax\| = \|A^{\dagger}x\|` for all vectors :math:`x`

What is a skew-symmetric matrix?
  A matrix :math:`A` where :math:`A^T = -A`. Key properties:
  
    - Diagonal elements must be zero
    - Elements are antisymmetric: :math:`a_{ij} = -a_{ji}`
    - All eigenvalues are either 0 or purely imaginary
    - For 3×3 matrices, represents cross products: if :math:`v = [a,b,c]`, then

    .. math::

        \begin{bmatrix}
        0 & -c & b \\
        c & 0 & -a \\
        -b & a & 0
        \end{bmatrix}x = v \times x

Why is :math:`U^{-1} = U^{\dagger}` for unitary matrices?
  For a unitary matrix :math:`U`, let's prove why the inverse equals the conjugate transpose:

  1. By definition of unitary matrix:
     
     .. math::

         UU^{\dagger} = U^{\dagger}U = I

  2. By definition of inverse:
     
     .. math::

         UU^{-1} = U^{-1}U = I

  3. Therefore:
     
     .. math::

         UU^{\dagger} = UU^{-1} = I

  4. Multiply both sides by :math:`U^{-1}` on the left:
     
     .. math::

         U^{-1}(UU^{\dagger}) = U^{-1}(UU^{-1})
         
         (U^{-1}U)U^{\dagger} = (U^{-1}U)U^{-1}
         
         IU^{\dagger} = IU^{-1}
         
         U^{\dagger} = U^{-1}

  This shows that for unitary matrices, the inverse and conjugate transpose must be equal.

Matrix Decomposition
----------------

LU Decomposition
^^^^^^^^^^^^^
What is LU decomposition?
  LU decomposition factors a matrix :math:`A` into a product :math:`A = LU` where:
  
    - :math:`L` is lower triangular (with 1's on diagonal)
    - :math:`U` is upper triangular
    - The factorization mimics Gaussian elimination steps

How are L and U connected to A?
  The entries of L and U come from Gaussian elimination:
  
    - U is what you get after doing elimination (the final upper triangular form)
    - L stores the multipliers used during elimination
    - Each column of L records how you eliminated entries in that column of A

  For example, consider eliminating in a 3×3 matrix:
  
    - First column: multiply row 1 by :math:`l_{21}` and :math:`l_{31}` to eliminate below :math:`a_{11}`
    - Second column: multiply row 2 by :math:`l_{32}` to eliminate below :math:`a_{22}`
    - These multipliers form L, while the resulting matrix is U

  .. math::

      \begin{bmatrix}
      a_{11} & a_{12} & a_{13} \\
      a_{21} & a_{22} & a_{23} \\
      a_{31} & a_{32} & a_{33}
      \end{bmatrix} =
      \begin{bmatrix}
      1 & 0 & 0 \\
      l_{21} & 1 & 0 \\
      l_{31} & l_{32} & 1
      \end{bmatrix}
      \begin{bmatrix}
      u_{11} & u_{12} & u_{13} \\
      0 & u_{22} & u_{23} \\
      0 & 0 & u_{33}
      \end{bmatrix}

  The entries are related by:
  
    - :math:`u_{11} = a_{11}`
    - :math:`l_{21} = a_{21}/u_{11}`
    - :math:`l_{31} = a_{31}/u_{11}`
    - :math:`u_{22} = a_{22} - l_{21}u_{12}`
    - And so on...

Could you share a simple step by step example?
  Let's decompose :math:`A = \begin{bmatrix} 2 & 1 \\ 6 & 4 \end{bmatrix}`:

  1. Understanding what we're looking for:
     
     - We want to find L and U such that A = LU
     - L will store our elimination multipliers
     - U will be the result after elimination

  2. First column elimination:
     
     - Need to eliminate 6 under the 2
     - Multiplier: :math:`l_{21} = 6/2 = 3` (this will go in our L matrix)
     - Operation: Subtract 3 times row 1 from row 2
     - This operation transforms A into U:

     .. math::
     
         \begin{bmatrix} 2 & 1 \\ 6 & 4 \end{bmatrix} \xrightarrow{R_2 - 3R_1}
         \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix} = U

  3. Building L and U:
     
     - L captures the elimination step: put 3 below the diagonal
     - U is the result we got after elimination
     - The diagonal of L is always 1's

     .. math::

         L = \begin{bmatrix} 1 & 0 \\ 3 & 1 \end{bmatrix} \quad
         U = \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix}

  
  4. Verify:
     
     .. math::

         LU = \begin{bmatrix} 1 & 0 \\ 3 & 1 \end{bmatrix}
         \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix}
         = \begin{bmatrix} 2 & 1 \\ 6 & 4 \end{bmatrix} = A
         
How do you solve systems using LU decomposition?
  To solve :math:`Ax = b` using LU decomposition, we split it into two steps:
  
  1. First solve :math:`Ly = b` for y
  2. Then solve :math:`Ux = y` for x
  
  Why this works:
    - Since :math:`A = LU`, the equation :math:`Ax = b` becomes :math:`(LU)x = b`
    - This is equivalent to :math:`L(Ux) = b`
    - Let :math:`y = Ux`, then we have :math:`Ly = b`
    - Once we find y, we can solve :math:`Ux = y`

  Example for a 2×2 system:

  .. math::

      \begin{bmatrix} 2 & 1 \\ 4 & 3 \end{bmatrix} = 
      \begin{bmatrix} 1 & 0 \\ 2 & 1 \end{bmatrix}
      \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix}

  Solving :math:`Ax = b` with :math:`b = \begin{bmatrix} 5 \\ 11 \end{bmatrix}`:
  
  1. First solve :math:`Ly = b`:
     
     .. math::

         \begin{bmatrix} 1 & 0 \\ 2 & 1 \end{bmatrix}
         \begin{bmatrix} y_1 \\ y_2 \end{bmatrix} = 
         \begin{bmatrix} 5 \\ 11 \end{bmatrix}

     - From first row: :math:`y_1 = 5`
     - From second row: :math:`2y_1 + y_2 = 11`, so :math:`y_2 = 11 - 2(5) = 1`

  2. Then solve :math:`Ux = y`:
     
     .. math::

         \begin{bmatrix} 2 & 1 \\ 0 & 1 \end{bmatrix}
         \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = 
         \begin{bmatrix} 5 \\ 1 \end{bmatrix}

     - From second row: :math:`x_2 = 1`
     - From first row: :math:`2x_1 + x_2 = 5`, so :math:`2x_1 + 1 = 5`, giving :math:`x_1 = 2`

What makes LU decomposition efficient?
    - Triangular systems are fast to solve
    - Determinant is just product of :math:`U`'s diagonal
    - Can update factorization efficiently when :math:`A` changes slightly
    - Parallel implementations available for large systems

Advanced Topics
------------


Additional notation
-------------------

What does a "hat" mean over a matrix symbol?
  When you see notation like :math:`\hat{A}`, the hat indicates you're working with an operator (a linear transformation) rather than just a matrix of numbers. This is common in quantum mechanics and functional analysis where operators may be infinite-dimensional or unbounded.


Gradient descent
----------------

Iterative Methods
^^^^^^^^^^^^^
Why use iterative methods?
  For large systems, direct methods like LU decomposition become computationally expensive. Iterative methods:
    
    - Use less memory (don't need to store full matrices)
    - Can exploit sparsity (many zero elements)
    - Often faster for large systems
    - Can give approximate solutions quickly

Jacobi method
^^^^^^^^^^^^

What is the Jacobi method?
    Let's understand this step by step with the simplest possible example:

    **More Interesting Example: 2 Variables**
    Consider these equations with different coefficients:
    
    .. math::
        4x - y &= 5 \\
        -2x + 3y &= 1

    1. **Why we can't solve directly**:
       - These equations represent lines in the x-y plane
       - Their intersection is the solution
       - But solving algebraically requires multiple steps

    2. **The Jacobi idea**:
       "Let's rearrange each equation to isolate its main variable"

       Solve for x and y:
       
       .. math::
           x &= \frac{5 + y}{4} \\
           y &= \frac{1 + 2x}{3}

    3. **See it work**:
       Start with guess :math:`x = 0, y = 0`:

       Iteration 1:
         - For x: plug in y = 0 → :math:`x = \frac{5 + 0}{4} = 1.25`
         - For y: plug in x = 0 → :math:`y = \frac{1 + 2(0)}{3} = 0.333`

       Iteration 2:
         - For x: plug in y = 0.333 → :math:`x = \frac{5 + 0.333}{4} = 1.333`
         - For y: plug in x = 1.25 → :math:`y = \frac{1 + 2(1.25)}{3} = 1.167`

       Iteration 3:
         - For x: plug in y = 1.167 → :math:`x = \frac{5 + 1.167}{4} = 1.542`
         - For y: plug in x = 1.333 → :math:`y = \frac{1 + 2(1.333)}{3} = 1.222`

       The method converges to :math:`x = 1.5, y = 1.333...` (exactly :math:`y = \frac{4}{3}`)

    4. **Why it becomes that formula**:
       Let's break this down step by step using both general form and a specific example.

       **Step 1: Start with a concrete example**
       Let's work with this 3×3 system:

       .. math::
           \begin{bmatrix}
           4 & -1 & 0 \\
           -1 & 4 & -1 \\
           0 & -1 & 4
           \end{bmatrix}
           \begin{bmatrix}
           x_1 \\ x_2 \\ x_3
           \end{bmatrix} =
           \begin{bmatrix}
           1 \\ 5 \\ 2
           \end{bmatrix}

       **Step 2: Focus on one row**
       Let's look at the second row (i = 2 in an n×n matrix). This row shows how each variable relates to others:
       
       .. math::
           -1x_1 + 4x_2 + (-1)x_3 = 5

       This is easier to understand than all equations at once. We will solve for :math:`x_2`, which represents the variable we want to solve in row i.

       **Step 3: Move terms around to isolate** :math:`x_2`
       Move everything except the :math:`x_2` term to the right side:
       
       .. math::
           4x_2 = 5 + x_1 + x_3

       **Step 4: Solve for** :math:`x_2`
       Divide both sides by the coefficient of :math:`x_2` (which is 4):
       
       .. math::
           x_2 = \frac{5}{4} + \frac{1}{4}(x_1 + x_3)

       **Step 5: Add iteration numbers and see the general pattern**
       Looking at what we did for :math:`x_2`, we can write row i as:
       
       .. math::
           a_{i1}x_1 + a_{i2}x_2 + ... + a_{ii}x_i + ... + a_{in}x_n = b_i

       Move terms and solve for :math:`x_i`:
       
       .. math::
           x_i = \frac{b_i}{a_{ii}} - \frac{1}{a_{ii}}\sum_{j \neq i} a_{ij}x_j

       For Jacobi method, we use previous values for all variables except the one we're solving for:
       
       .. math::
           x_i^{(k+1)} = \frac{b_i}{a_{ii}} - \frac{1}{a_{ii}}\sum_{j \neq i} a_{ij}x_j^{(k)}

       **Why Local Minima Are Impossible**
       Unlike optimization problems (like gradient descent), the Jacobi method cannot get stuck because:
       
       1. **Linear Systems Are Different**:
          - We're solving :math:`Ax = b`, which is a linear system
          - Linear systems have exactly one solution (if A is nonsingular)
          - There are no "hills" or "valleys" to get stuck in
       
       2. **Each Update is Exact**:
          - We're not minimizing a function
          - Each new value comes directly from the equations
          - If :math:`x_2 = \frac{5}{4} + \frac{1}{4}(x_1 + x_3)`, this is an exact relationship
       
       3. **Guaranteed Progress**:
          - With diagonal dominance (:math:`|a_{ii}| > \sum_{j \neq i} |a_{ij}|`)
          - Each update moves us closer to the solution
          - Like following a fixed path, not searching for a minimum

       For our specific example with :math:`x_2`, we have:
       
       .. math::
           x_2^{(k+1)} = \frac{5}{4} + \frac{1}{4}(x_1^{(k)} + x_3^{(k)})

       The key points:
         - Each equation is solved for one variable
         - We move everything else to the right side
         - Divide by diagonal coefficient
         - Use previous iteration values for other variables

    5. **Key insights**:
       - Each new value (:math:`x_i^{(k+1)}`) uses old values (:math:`x_j^{(k)}`)
       - We're essentially saying "if everything else stays the same, what should this value be?"
       - It's like everyone adjusting their position based on where they see others standing

    Mathematically, this becomes:

    .. math::

        x_i^{(k+1)} = \frac{1}{a_{ii}}\left(b_i - \sum_{j \neq i} a_{ij}x_j^{(k)}\right)

    where:
        - :math:`x_i^{(k)}` is our current guess for row i in step k
        - :math:`a_{ii}` is the diagonal coefficient in row i
        - :math:`b_i` is the constant term in row i
        - The sum represents how other variables influence :math:`x_i`

How convergence happens visually
    For our 2×2 example above:

    .. math::

        \begin{array}{|c|c|c|c|c|}
        \hline
        \text{Iteration} & x & y & \text{Error in x} & \text{Error in y} \\
        \hline
        0 & 0.000 & 0.000 & 1.500 & 1.333 \\
        1 & 1.250 & 0.333 & 0.250 & 1.000 \\
        2 & 1.333 & 1.167 & 0.167 & 0.167 \\
        3 & 1.542 & 1.222 & 0.042 & 0.111 \\
        4 & 1.486 & 1.306 & 0.014 & 0.027 \\
        \hline
        \end{array}

    This is exactly how Jacobi iteration works:
        - Each variable (room) updates based on its neighbors' previous values
        - All updates happen simultaneously
        - The system gradually moves toward equilibrium

    Key advantages:
        - Simple to implement and understand
        - Naturally parallelizable (like all rooms updating at once)
        - Works well for diagonally dominant matrices (when each variable mainly depends on itself)
        - Memory efficient for sparse matrices (when most variables don't affect each other)

Convergence conditions
""""""""""""""""""""
Jacobi method converges if matrix A is either **Strictly diagonally dominant** or **Positive definite**.

Practical applications
""""""""""""""""""""
Common uses include:

1. **Partial Differential Equations**:
   - Solving Poisson's equation on a grid
   - Heat diffusion problems
   - Electronic structure calculations

2. **Network Analysis**:
   - Load flow analysis in power systems
   - Traffic flow in transportation networks
   - PageRank-like algorithms (modified versions)

3. **Image Processing**:
   - Image reconstruction
   - Denoising
   - Solving large linear systems in tomography