Image processing
================


Affine transformations
----------------------

What is an affine transformation?
    An affine transformation maps point :math:`(x, y)` to :math:`(x', y')` via matrix multiplication:

    .. math::

        \begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix} = 
        \begin{bmatrix} a & b & t_x \\ c & d & t_y \\ 0 & 0 & 1 \end{bmatrix}
        \begin{bmatrix} x \\ y \\ 1 \end{bmatrix}

    This expands to :math:`x' = ax + by + t_x` and :math:`y' = cx + dy + t_y`. The 2×3 matrix :math:`\begin{bmatrix} a & b & t_x \\ c & d & t_y \end{bmatrix}` defines the transformation.

What are common transformation examples?

    **Translation** (shift by :math:`t_x, t_y`):

    .. math::

        \begin{bmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \end{bmatrix}

    Example: shift right 50 pixels, down 30 pixels:

    .. math::

        \begin{bmatrix} 1 & 0 & 50 \\ 0 & 1 & 30 \end{bmatrix}

    **Scaling** (by factors :math:`s_x, s_y`):

    .. math::

        \begin{bmatrix} s_x & 0 & 0 \\ 0 & s_y & 0 \end{bmatrix}

    Example: double width, half height:

    .. math::

        \begin{bmatrix} 2 & 0 & 0 \\ 0 & 0.5 & 0 \end{bmatrix}

    **Rotation** (by angle :math:`\theta` counterclockwise):

    .. math::

        \begin{bmatrix} \cos\theta & -\sin\theta & 0 \\ \sin\theta & \cos\theta & 0 \end{bmatrix}

    Example: rotate 45° around origin:

    .. math::

        \begin{bmatrix} 0.707 & -0.707 & 0 \\ 0.707 & 0.707 & 0 \end{bmatrix}

    **Shearing** (horizontal by factor :math:`k`):

    .. math::

        \begin{bmatrix} 1 & k & 0 \\ 0 & 1 & 0 \end{bmatrix}

    Example: horizontal shear with k=0.5:

    .. math::

        \begin{bmatrix} 1 & 0.5 & 0 \\ 0 & 1 & 0 \end{bmatrix}

What does "linear" mean mathematically?
  A transformation :math:`T` is linear if it satisfies:
  
  - **Additivity**: :math:`T(\mathbf{a} + \mathbf{b}) = T(\mathbf{a}) + T(\mathbf{b})`
  - **Homogeneity**: :math:`T(c\mathbf{a}) = cT(\mathbf{a})`

  Translation fails this test. For :math:`T(x, y) = (x + 5, y + 3)`:

  .. math::

    T((1, 2) + (3, 4)) = T(4, 6) = (9, 9)

  but

  .. math::

    T(1, 2) + T(3, 4) = (6, 5) + (8, 7) = (14, 12)

  Therefore, translation is not linear.

Why is it called "affine" and not "linear"?
  Translation breaks linearity because :math:`f(0) \neq 0` when :math:`t_x` or :math:`t_y` are non-zero. "Affine" (from Latin *affinis* meaning "related") means linear plus a constant shift.

  **Properties preserved:**
  
  - Parallel lines remain parallel
  - Ratios of distances along the same line are preserved (collinearity)
  - Midpoints are preserved (a line segment's midpoint remains its midpoint, though coordinates change)

  **Properties not preserved:**
  
  - Origin is shifted
  - Angles change (except rotation and reflection)
  - Distances change (except rotation and translation)
  - Ratios of distances between different directions change (e.g., scaling stretches width but not height)
  - Areas change (except rotation and translation)

  **Visualization of affine transformations:**

  .. plot:: notebooks/plot_affine_transformations.py
     :include-source: False

  The figure demonstrates various affine transformations and their properties:
  
  - **Row 1**: Different transformation types (translation, rotation, scaling, shearing) applied to a rectangle with parallel lines and midpoints marked
  - **Row 2**: More transformations including horizontal shear, vertical shear, reflection, and combined operations
  - **Row 3**: Property demonstrations showing that parallel lines stay parallel (blue dashed lines remain parallel after shearing), ratios along the same line are preserved (collinear points maintain ratio 1.00 even after 2x scaling), midpoints are preserved (the midpoint of a line segment remains the midpoint after rotation, though its absolute coordinates change), but angles change (90° becomes 117° after shearing)

What are the advantages of affine transformations?
  - **Compact**: Only 6 parameters define any affine transformation
  - **Composable**: Chaining transformations reduces to matrix multiplication
  - **Invertible**: Easy to compute inverse transformation

  **Applications:**
  
  - Image registration (aligning photos from different angles)
  - Lens distortion correction
  - Medical imaging (CT and MRI alignment)
  - Computer graphics (sprite transformations)
  - Document scanning (correcting perspective skew)

What can't affine transformations do?
  Curved distortions like lens barrel or pincushion effects require:
  
  - Perspective transformations (projective)
  - Polynomial warping
  - Radial distortion models


Image alignment
---------------

Template matching
^^^^^^^^^^^^^^^^^

What is template matching?
  Template matching locates a small reference image (template) within a larger image by computing similarity at each position.

Why use template matching in electron microscopy?
  **Drift correction**: During long acquisitions, thermal expansion, mechanical vibrations, and stage instabilities cause specimen shifts between frames. Template matching tracks a distinctive feature (atomic column, nanoparticle, or edge) across frames to measure displacement.

  **Two correction approaches:**
  
  1. **Post-acquisition**: Measure drift vectors and computationally realign frames
  2. **Real-time feedback**: Actively adjust stage position during acquisition

  Without drift correction, multi-frame averaging blurs atomic features.

How does template matching work?
  For a 512×512 image and 10×10 template:

  1. Place template at position :math:`(x, y)`
  2. Compute similarity score between template and image region
  3. Slide template to next position
  4. Repeat for all ~250,000 positions
  5. Similarity map peaks indicate template locations

Cross-correlation
^^^^^^^^^^^^^^^^^

How is cross-correlation used in template matching?
  Cross-correlation computes the similarity score at each position by multiplying corresponding pixels and summing results.

What is cross-correlation?
  Cross-correlation uses two coordinate systems:
  
  - **Global coordinates** :math:`(x,y)`: Template position in the large image
  - **Local coordinates** :math:`(u,v)`: Pixel indices within the template

  For a template of size :math:`M \times N`, the cross-correlation at position :math:`(x,y)` is:

  .. math::

      C(x,y) = \sum_{u=0}^{M-1} \sum_{v=0}^{N-1} T(u,v) \cdot I(x+u, y+v)

  **Example**: Template :math:`\begin{bmatrix} 100 & 150 \\ 120 & 180 \end{bmatrix}` with image region :math:`\begin{bmatrix} 95 & 145 \\ 125 & 175 \end{bmatrix}`:

  .. math::

      C = (100)(95) + (150)(145) + (120)(125) + (180)(175) = 77{,}750

Why does higher cross-correlation mean better match?
  When bright pixels align with bright pixels and dark with dark, each multiplication produces a large positive number. High sum indicates similarity.

What is the problem with raw cross-correlation?
  Brightness variations dominate the score. A featureless bright region can outscore a perfect pattern match in a darker area.

  **Example**: Template [10, 20, 30]
  
  - Match against itself: :math:`C = 10^2 + 20^2 + 30^2 = 1400`
  - Match against bright patch [100, 100, 100]: :math:`C = 10(100) + 20(100) + 30(100) = 6000`

  The uniform brightness wins by 4×, despite capturing none of the pattern.

Why use normalized cross-correlation?
  Normalized cross-correlation (NCC) measures pattern shape rather than absolute brightness by subtracting means and dividing by standard deviations:

  .. math::

      \text{NCC}(x,y) = \frac{\sum_{u,v} [T(u,v) - \bar{T}][I(x+u,y+v) - \bar{I}]}{\sqrt{\sum_{u,v}[T(u,v) - \bar{T}]^2 \sum_{u,v}[I(x+u,y+v) - \bar{I}]^2}}

  where :math:`\bar{T}` is the template's mean intensity and :math:`\bar{I}` is the image region's mean.

  **Output range**: -1 (inverted pattern) to +1 (perfect match)
  
  **Robust to**:
  
  - Lighting changes
  - Contrast variations  
  - Detector gain differences

Fourier-based alignment
^^^^^^^^^^^^^^^^^^^^^^^

What is Fourier-based alignment?
  Fourier-based alignment detects image shifts by working in frequency space. The Fourier shift theorem states: **translation in real space becomes a phase shift in frequency space**.

Why is this important for beam tilt correction?
  In beam tilt corrected STEM (tcBF-STEM), each detector pixel corresponds to a different beam tilt angle. By reciprocity, this creates multiple shifted views of the same sample. Fourier-based alignment provides the speed and accuracy needed to align thousands of these shifted images.

How does the Fourier shift theorem work?
  Translating an image by :math:`(\Delta x, \Delta y)` multiplies its Fourier transform by a phase factor:

  .. math::

      \mathcal{F}\{f(x - \Delta x, y - \Delta y)\} = e^{-2\pi i(k_x \Delta x + k_y \Delta y)} \mathcal{F}\{f(x,y)\}

  where :math:`k_x` and :math:`k_y` are spatial frequencies. The magnitude remains unchanged—only the phase shifts.

How do you apply the Fourier shift theorem?
  To shift an image by :math:`(\Delta x, \Delta y) = (2.5, 1.3)` pixels:

  1. **Compute FFT**: Call ``np.fft.fft2()`` to get complex array :math:`F[i,j] = a + bi`

  2. **Calculate phase shift** for each frequency :math:`(k_x, k_y)`:

     .. math::

         \theta = -2\pi(k_x \Delta x + k_y \Delta y)

     At frequency (0.1, 0.2):

     .. math::

         \theta = -2\pi(0.1 \times 2.5 + 0.2 \times 1.3) = -3.20 \text{ rad}

  3. **Convert to complex number**:

     .. math::

         e^{i\theta} = \cos(-3.20) + i\sin(-3.20) = -0.9899 - 0.1411i

  4. **Multiply Fourier coefficient**: For :math:`F[i,j] = 3.62 + 3.73i`:

     .. math::

         (3.62 + 3.73i)(-0.9899 - 0.1411i) = -3.057 - 4.203i

  5. **Verify magnitude preservation**:

     - Before: :math:`\sqrt{3.62^2 + 3.73^2} = 5.20`
     - After: :math:`\sqrt{3.057^2 + 4.203^2} = 5.20` ✓

  6. **Inverse FFT**: ``np.fft.ifft2()`` reconstructs the shifted image

  Repeat for all 262,144 elements (512×512 image). The shift never moves pixels directly—it modifies phases in frequency space.

What is the computational cost of real-space shifting?
  For a 1024×1024 image with known shift:

  **Integer shifts** (e.g., exactly 10 pixels):
  
  - Copy each pixel to new position: ~1 million operations

  **Sub-pixel shifts** (e.g., 10.5 pixels):
  
  - Requires interpolation between neighboring pixels
  - Linear interpolation: 4 operations per pixel
  - Bicubic interpolation: 10 operations per pixel  
  - Total: ~10 million operations

How does Fourier shifting compare?
  **Computational cost**:
  
  - Two FFTs + phase multiplication: ~21 million operations
  - Independent of shift amount

  **When real-space wins**:
  
  - Integer shifts: 1M operations (21× faster than Fourier)
  - Sub-pixel shifts: 10M operations (2× faster than Fourier)

  **Fourier advantages**:
  
  - Simpler implementation (no coordinate transformation logic)
  - No edge handling required
  - Sub-pixel precision natural from phase
  - Both methods produce equivalent quality

Why use Fourier-based shifts for unknown shifts?
  The advantage emerges when measuring drift between two images where the shift amount is unknown.

  **Brute force real-space search**:
  
  Testing all possible integer shifts from -50 to +50 pixels in both directions:
  
  - Shift candidates: 101 × 101 = 10,201 positions
  - Each trial: shift image (~1M ops) + compute similarity (~1M ops) = 2M ops
  - Total: ~20 billion operations

  **Phase correlation in Fourier space**:
  
  - Three FFTs: ~31 million operations
  - Produces 2D map where peak directly reveals shift
  - All 10,000+ candidates evaluated simultaneously
  - **Speedup**: 600×

  **Sub-pixel precision comparison**:
  
  Real-space with 10 sub-pixel positions per integer shift:
  
  - Search space: 1,010 × 1,010 = 1,020,100 positions
  - Each sub-pixel shift requires interpolation (~10M ops)
  - Total: ~10 trillion operations

  Fourier phase correlation:
  
  - Coarse shift: 31M operations
  - Sub-pixel refinement: ~1M operations  
  - **Speedup**: 30,000×

Phase correlation in Fourier space
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

What is phase correlation?
  Phase correlation finds the shift between two images by comparing phases in frequency space.

  Given reference image :math:`I_1` and shifted image :math:`I_2`:

  1. Compute Fourier transforms :math:`F_1` and :math:`F_2`
  2. Calculate cross-power spectrum:

     .. math::

         R = \frac{F_1 \cdot F_2^*}{|F_1 \cdot F_2^*|}

  3. Inverse Fourier transform of :math:`R` produces a sharp peak at :math:`(\Delta x, \Delta y)`

  The peak position directly reveals the shift.

How does this differ from template matching?
  **Template matching**:
  
  - Scans every position
  - Complexity: :math:`O(N^2M^2)` for :math:`N \times N` image and :math:`M \times M` template
  - Can find rotations and complex patterns

  **Phase correlation**:
  
  - Uses FFT
  - Complexity: :math:`O(N^2 \log N)`  
  - Handles entire image at once
  - Only for translations
  - Superior speed and sub-pixel accuracy for shifts