{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d44cffb1",
   "metadata": {},
   "source": [
    "# NumPy\n",
    "\n",
    "NOTE: The page is currently updated.\n",
    "\n",
    "References:\n",
    "- https://www.w3schools.com/python/numpy/numpy_array_slicing.asp\n",
    "- https://numpy.org/doc/\n",
    "- ChatGPT 5"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5bc635d0",
   "metadata": {},
   "source": [
    "## Motivation\n",
    "\n",
    "A matrix represents a set of values. Matrices are used in solving a system of equations, representing graphs, etc. The more concisely and clearly we represent matrices in scripts, the less time is required for debugging.\n",
    "\n",
    "Assume ``X`` and ``Y`` represent matrices and ``vec`` is a 1-D array.\n",
    "\n",
    "\n",
    "```\n",
    "np.add(X,Y)       # Add\n",
    "np.substract(X,Y) # Substract\n",
    "np.divide(X,Y)    # Divide\n",
    "\n",
    "# Multiply, all same\n",
    "X @ Y             # recommended\n",
    "np.multiply(X,Y)\n",
    "np.matmul(X, Y)\n",
    "np.dot(X, Y)\n",
    "X.dot(Y)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0374be0",
   "metadata": {},
   "source": [
    "## Matrix operations\n",
    "\n",
    "```\n",
    "X.flatten()        # Flatten\n",
    "np.sqrt(X)         # Square root all elements\n",
    "np.sum(X)          # Sum all elements\n",
    "np.sum(X,axis=0)   # Row-wise sum\n",
    "np.sum(X,axis=1)   # Column-wise sum\n",
    "np.amax(X)         # Single max value\n",
    "np.amax(X, axis=0) # Get max in each column\n",
    "np.amax(X, axis=1) # Get max in each row\n",
    "np.mean(X)         # Mean\n",
    "np.std(X)          # Standard deviation\n",
    "np.var(X)          # Variance\n",
    "np.trace(X)        # Sum of the elements on the diagonal\n",
    "np.linalg.matrix_rank(X)  # Rank of the matrix\n",
    "np.linalg.det(X)   # Determinant of the matrix\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29ad3de5",
   "metadata": {},
   "source": [
    "## Slicing\n",
    "\n",
    "### 1D slicing\n",
    "\n",
    "```python\n",
    "vec = list(range(10)) # [0, ..., 9]\n",
    "vec[4:8]       # [4, 5, 6, 7]\n",
    "vec[-5:-2]     # 5th last to 2nd last => [5, 6, 7]\n",
    "\n",
    "# Get every Nth index value\n",
    "vec[::2]      # [0, 2, 4, 6, 8]\n",
    "vec[::5]      # [0, 5]\n",
    "\n",
    "# Inverse\n",
    "vec[::-1]     # Temp inverse [9, 8, ... 1, 0]\n",
    "vec.reverse() # Permanent inverse\n",
    "```\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "edf4d2f7",
   "metadata": {},
   "source": [
    "\n",
    "Boolean indexing\n",
    "\n",
    "```\n",
    "cols = X[0, :] > 1  # select col(s) where first row > 1\n",
    "# => [False  True  True]\n",
    "X[:, cols]\n",
    "# => [[2 3]\n",
    "#     [5 6]\n",
    "#     [8 9]]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d5bdb459",
   "metadata": {},
   "source": [
    "From the second element, :\n",
    "Recall x:y where y doesn't include it"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "72c93d1e",
   "metadata": {},
   "source": [
    "### 2D slicing\n",
    "\n",
    "```\n",
    "X =  vec.reshape((3, 3))\n",
    "X[1, :]       # get second row\n",
    "X[:, -1]      # get last col\n",
    "X[0:2, :]     # get first two rows\n",
    "X[[0, 2], :]  # get first and third rows\n",
    "X[:, 0:2]     # get first two columns\n",
    "X[:, [0, 2]]  # get first and third columns\n",
    "X[0:2, 0:2]   # get submatrix of first two rows/columns\n",
    "X[X > 5]      # get elements greater than 5\n",
    "\n",
    "# Advanced\n",
    "X[:, ::-1]    # reverse cols for each row\n",
    "# => [[3 2 1]\n",
    "#     [6 5 4]\n",
    "#     [9 8 7]]\n",
    "X[1:, ::-1]   # same as above but skip first row\n",
    "# => [[6 5 4]\n",
    "#     [9 8 7]]\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "2880a654",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "5755c5fc",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "# HIDDEN\n",
    "# Helper function to display matrix\n",
    "def display(df, indent=1):\n",
    "    s = df.to_string(index=False, header=False)\n",
    "    indented = \"\\n\".join(\" \" * indent + line for line in s.splitlines())\n",
    "    print(indented)\n",
    "    print(\"\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "e4ae45ed",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Origin array:\n",
      " 1 2 3 4  5\n",
      " 6 7 8 9 10\n",
      "\n",
      "[0:1, 1:4] ->\n",
      " 2 3 4\n",
      "\n",
      "[:1, 1:4] ->\n",
      " 2 3 4\n",
      "\n",
      "[0:2, 2] ->\n",
      " 3\n",
      " 8\n",
      "\n",
      "[0:2, 1:4] ->\n",
      " 2 3 4\n",
      " 7 8 9\n",
      "\n"
     ]
    }
   ],
   "source": [
    "arr = np.array([[1, 2, 3, 4, 5],\n",
    "                [6, 7, 8, 9, 10]])\n",
    "\n",
    "df = pd.DataFrame(arr)\n",
    "print(\"Origin array:\")\n",
    "display(df)\n",
    "print(\"[0:1, 1:4] ->\")\n",
    "display(df.iloc[0:1, 1:4])\n",
    "\n",
    "print(\"[:1, 1:4] ->\")\n",
    "display(df.iloc[:1, 1:4])\n",
    "\n",
    "print(\"[0:2, 2] ->\")\n",
    "display(df.iloc[0:2, 2].to_frame())\n",
    "\n",
    "print(\"[0:2, 1:4] ->\")\n",
    "display(df.iloc[0:2, 1:4])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a3aabfeb",
   "metadata": {},
   "source": [
    "## Add new dimension\n",
    "\n",
    "``none`` is used to insert a new axis or dimension."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "98552b34",
   "metadata": {},
   "outputs": [],
   "source": [
    "arr = np.arange(10) \n",
    "assert arr.shape == (10,)\n",
    "# Add two new axes using [:, None, None]\n",
    "reshaped = arr[:, None, None]\n",
    "assert reshaped.shape == (10, 1, 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "4d50565d",
   "metadata": {},
   "outputs": [],
   "source": [
    "arr2 = arr.reshape(2, 5)\n",
    "assert arr2.shape == (2, 5)\n",
    "# Add two new axes after the first axis (\"row\")\n",
    "assert arr2[:, None, None].shape == (2, 1, 1, 5)\n",
    "assert arr2[:, :, None, None].shape == (2, 5, 1, 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f2eb209",
   "metadata": {},
   "source": [
    "## Create and copy tensor\n",
    "\n",
    "```text\n",
    "# Create and reshape at once\n",
    "np.matrix(np.arange(12).reshape((3,4)))\n",
    "np.zeros((5,), dtype=int)\n",
    "np.zeros((2, 1))\n",
    "\n",
    "# Rehsape\n",
    "X = np.arange(6)\n",
    "X = X.reshape((2, 3))\n",
    "\n",
    "# Copy exactly\n",
    "np.copy(X)\n",
    "\n",
    "# Copy shape\n",
    "np.ones_like(X)         # Return 1's with (2,3) shape\n",
    "np.zeros_like(X)        # Return 0's with (2,3) shape\n",
    "\n",
    "# Full\n",
    "np.full((2, 2), 10)     # Generate (2,2), all 10\n",
    "np.full((2, 2), np.inf) # Generate (2,2), all inf\n",
    "np.full((2, 2), [1, 2]) # Generate (2,2), each row of [1,2]\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bfdeb541",
   "metadata": {},
   "source": [
    "## Broadcast"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "a77274b9",
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.array([1,2,3])\n",
    "assert a.shape == (3,)\n",
    "b = np.array([\n",
    "    [10],\n",
    "    [20],\n",
    "    [30]])\n",
    "assert b.shape == (3,1)\n",
    "\n",
    "# In a, 1, 2, 3 duplicated across new rows\n",
    "# In b, 10, 20, 30 duplicated acorss new columns\n",
    "# And then those are added\n",
    "expected = np.array([[11,12,13],\n",
    "                     [21,22,23],\n",
    "                     [31,32,33]])\n",
    "\n",
    "assert np.array_equal(a + b, expected)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "76f901d3",
   "metadata": {},
   "source": [
    "## Advanced indexing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "0290b984",
   "metadata": {},
   "outputs": [],
   "source": [
    "X = np.arange(9).reshape(3,3)\n",
    "assert np.array_equal(X, [[0, 1, 2],\n",
    "                          [3, 4, 5],\n",
    "                          [6, 7, 8]])\n",
    "result = X[[0,1,2], [0,1,2]]\n",
    "expected = np.array([0, 4, 8])\n",
    "assert np.array_equal(result, expected)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bab2db64",
   "metadata": {},
   "source": [
    "## Stacking\n",
    "\n",
    "- Axis 0 - rows\n",
    "- Axis 1 - columns\n",
    "- Axis 2 - depth\n",
    "- Axis 3 - so on.."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "da7ba3c2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Base arrays\n",
    "a = np.array([1, 2, 3])\n",
    "b = np.array([4, 5, 6])\n",
    "\n",
    "# Stack across rows (Method 1/2)\n",
    "stacked = np.stack([a, b], axis=0) \n",
    "expected = np.array([[1,2,3],\n",
    "                     [4,5,6]])\n",
    "assert stacked.shape == (2,3)\n",
    "assert np.array_equal(stacked, expected)\n",
    "\n",
    "# Stack across rows (Method 2/2)\n",
    "vstacked = np.vstack([a, b])   # shape (2,3)\n",
    "expected = np.array([[1,2,3],\n",
    "                     [4,5,6]])\n",
    "assert np.array_equal(vstacked, expected)\n",
    "\n",
    "# Stack across columns (imagine you rotate the matrix and new rows)\n",
    "stacked_axis1 = np.stack([a, b], axis=1) \n",
    "expected_axis1 = np.array([[1,4],\n",
    "                           [2,5],\n",
    "                           [3,6]])\n",
    "assert stacked_axis1.shape == (3,2)\n",
    "assert np.array_equal(stacked_axis1, expected_axis1)\n",
    "\n",
    "# np.hstack (concatenate along columns)\n",
    "hstacked = np.hstack([a, b])   # shape (6,)\n",
    "expected = np.array([1,2,3,4,5,6])\n",
    "assert np.array_equal(hstacked, expected)\n",
    "\n",
    "# np.dstack (stack along depth / third axis)\n",
    "c = np.array([7,8,9])\n",
    "dstacked = np.dstack([a, b, c])  # shape (1,3,3)\n",
    "expected = np.array([[[1,4,7],\n",
    "                      [2,5,8],\n",
    "                      [3,6,9]]])\n",
    "assert np.array_equal(dstacked, expected)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b9d8f3f",
   "metadata": {},
   "source": [
    "Just to note that `np.vstack` is a shorthand for vertical stacking `like np.concatenate(..., axis=0)`. `np.stack` lets you choose any axis so it's more general."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91095d7b",
   "metadata": {},
   "source": [
    "## Performance\n",
    "\n",
    "- Vectoization - use array ops to loops\n",
    "- use ``where`` for conditional element selection ``np.where(X > 5, 1, 0)  # Replace with 1 if >5 else 0``"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "687e2122",
   "metadata": {},
   "source": [
    "## Missing data\n",
    "\n",
    "```python\n",
    "import numpy as np\n",
    "\n",
    "# Example array with NaN and Inf\n",
    "arr = np.array([1.0, 2.0, np.nan, np.inf, -np.inf, 3.0])\n",
    "\n",
    "# Count NaNs\n",
    "assert np.isnan(arr).sum() == 1   # only one np.nan\n",
    "\n",
    "# Count Infs\n",
    "assert np.isinf(arr).sum() == 2   # +inf and -inf\n",
    "\n",
    "# Mean ignoring NaNs\n",
    "arr2 = np.array([1.0, 2.0, np.nan, 3.0])\n",
    "assert np.nanmean(arr2) == 2.0    # (1+2+3)/3\n",
    "\n",
    "# Replace NaN/Inf with finite values\n",
    "cleaned = np.nan_to_num(arr, nan=0.0, posinf=999.0, neginf=-999.0)\n",
    "expected = np.array([1.0, 2.0, 0.0, 999.0, -999.0, 3.0])\n",
    "assert np.array_equal(cleaned, expected)\n",
    "\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3518f55f",
   "metadata": {},
   "source": [
    "## Other useful stuff\n",
    "\n",
    "### Print nicely\n",
    "\n",
    "```python\n",
    "np.set_printoptions(\n",
    "    precision=3,   # Set decimal places\n",
    "    suppress=True, # Avoid scientific notations\n",
    "    threshold=100, # Max number of elements to be printed\n",
    "    linewidth=80,\n",
    "    edgeitems=2    # Show two values per edge when truncated\n",
    ")\n",
    "```\n",
    "\n",
    "### Random geneator\n",
    "\n",
    "```python\n",
    "# Uniform [0,1)\n",
    "arr1 = np.random.rand(3, 2)\n",
    "assert arr1.shape == (3, 2)\n",
    "assert np.all((arr1 >= 0) & (arr1 < 1))\n",
    "\n",
    "# Standard normal (mean ≈ 0, std ≈ 1, but here just shape check)\n",
    "arr2 = np.random.randn(3, 2)\n",
    "assert arr2.shape == (3, 2)\n",
    "# Values can be any real number, so no bound check\n",
    "\n",
    "# Random integers between 0 and 9\n",
    "arr3 = np.random.randint(0, 10, (2, 3))\n",
    "assert arr3.shape == (2, 3)\n",
    "assert np.all((arr3 >= 0) & (arr3 < 10))\n",
    "\n",
    "# Sampling with replacement\n",
    "arr4 = np.random.choice([1, 2, 3], size=5, replace=True)\n",
    "assert arr4.shape == (5,)\n",
    "assert np.all(np.isin(arr4, [1, 2, 3]))\n",
    "\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "ophus-env",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}