{ "cells": [ { "cell_type": "markdown", "id": "d44cffb1", "metadata": {}, "source": [ "# NumPy\n", "\n", "NOTE: The page is currently updated.\n", "\n", "References:\n", "- https://www.w3schools.com/python/numpy/numpy_array_slicing.asp\n", "- https://numpy.org/doc/\n", "- ChatGPT 5" ] }, { "cell_type": "markdown", "id": "5bc635d0", "metadata": {}, "source": [ "## Motivation\n", "\n", "A matrix represents a set of values. Matrices are used in solving a system of equations, representing graphs, etc. The more concisely and clearly we represent matrices in scripts, the less time is required for debugging.\n", "\n", "Assume ``X`` and ``Y`` represent matrices and ``vec`` is a 1-D array.\n", "\n", "\n", "```\n", "np.add(X,Y) # Add\n", "np.substract(X,Y) # Substract\n", "np.divide(X,Y) # Divide\n", "\n", "# Multiply, all same\n", "X @ Y # recommended\n", "np.multiply(X,Y)\n", "np.matmul(X, Y)\n", "np.dot(X, Y)\n", "X.dot(Y)\n", "```" ] }, { "cell_type": "markdown", "id": "c0374be0", "metadata": {}, "source": [ "## Matrix operations\n", "\n", "```\n", "X.flatten() # Flatten\n", "np.sqrt(X) # Square root all elements\n", "np.sum(X) # Sum all elements\n", "np.sum(X,axis=0) # Row-wise sum\n", "np.sum(X,axis=1) # Column-wise sum\n", "np.amax(X) # Single max value\n", "np.amax(X, axis=0) # Get max in each column\n", "np.amax(X, axis=1) # Get max in each row\n", "np.mean(X) # Mean\n", "np.std(X) # Standard deviation\n", "np.var(X) # Variance\n", "np.trace(X) # Sum of the elements on the diagonal\n", "np.linalg.matrix_rank(X) # Rank of the matrix\n", "np.linalg.det(X) # Determinant of the matrix\n", "```" ] }, { "cell_type": "markdown", "id": "29ad3de5", "metadata": {}, "source": [ "## Slicing\n", "\n", "### 1D slicing\n", "\n", "```python\n", "vec = list(range(10)) # [0, ..., 9]\n", "vec[4:8] # [4, 5, 6, 7]\n", "vec[-5:-2] # 5th last to 2nd last => [5, 6, 7]\n", "\n", "# Get every Nth index value\n", "vec[::2] # [0, 2, 4, 6, 8]\n", "vec[::5] # [0, 5]\n", "\n", "# Inverse\n", "vec[::-1] # Temp inverse [9, 8, ... 1, 0]\n", "vec.reverse() # Permanent inverse\n", "```\n", "\n" ] }, { "cell_type": "markdown", "id": "edf4d2f7", "metadata": {}, "source": [ "\n", "Boolean indexing\n", "\n", "```\n", "cols = X[0, :] > 1 # select col(s) where first row > 1\n", "# => [False True True]\n", "X[:, cols]\n", "# => [[2 3]\n", "# [5 6]\n", "# [8 9]]\n", "```" ] }, { "cell_type": "markdown", "id": "d5bdb459", "metadata": {}, "source": [ "From the second element, :\n", "Recall x:y where y doesn't include it" ] }, { "cell_type": "markdown", "id": "72c93d1e", "metadata": {}, "source": [ "### 2D slicing\n", "\n", "```\n", "X = vec.reshape((3, 3))\n", "X[1, :] # get second row\n", "X[:, -1] # get last col\n", "X[0:2, :] # get first two rows\n", "X[[0, 2], :] # get first and third rows\n", "X[:, 0:2] # get first two columns\n", "X[:, [0, 2]] # get first and third columns\n", "X[0:2, 0:2] # get submatrix of first two rows/columns\n", "X[X > 5] # get elements greater than 5\n", "\n", "# Advanced\n", "X[:, ::-1] # reverse cols for each row\n", "# => [[3 2 1]\n", "# [6 5 4]\n", "# [9 8 7]]\n", "X[1:, ::-1] # same as above but skip first row\n", "# => [[6 5 4]\n", "# [9 8 7]]\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "id": "2880a654", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "id": "5755c5fc", "metadata": { "nbsphinx": "hidden" }, "outputs": [], "source": [ "# HIDDEN\n", "# Helper function to display matrix\n", "def display(df, indent=1):\n", " s = df.to_string(index=False, header=False)\n", " indented = \"\\n\".join(\" \" * indent + line for line in s.splitlines())\n", " print(indented)\n", " print(\"\")" ] }, { "cell_type": "code", "execution_count": 3, "id": "e4ae45ed", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Origin array:\n", " 1 2 3 4 5\n", " 6 7 8 9 10\n", "\n", "[0:1, 1:4] ->\n", " 2 3 4\n", "\n", "[:1, 1:4] ->\n", " 2 3 4\n", "\n", "[0:2, 2] ->\n", " 3\n", " 8\n", "\n", "[0:2, 1:4] ->\n", " 2 3 4\n", " 7 8 9\n", "\n" ] } ], "source": [ "arr = np.array([[1, 2, 3, 4, 5],\n", " [6, 7, 8, 9, 10]])\n", "\n", "df = pd.DataFrame(arr)\n", "print(\"Origin array:\")\n", "display(df)\n", "print(\"[0:1, 1:4] ->\")\n", "display(df.iloc[0:1, 1:4])\n", "\n", "print(\"[:1, 1:4] ->\")\n", "display(df.iloc[:1, 1:4])\n", "\n", "print(\"[0:2, 2] ->\")\n", "display(df.iloc[0:2, 2].to_frame())\n", "\n", "print(\"[0:2, 1:4] ->\")\n", "display(df.iloc[0:2, 1:4])\n" ] }, { "cell_type": "markdown", "id": "a3aabfeb", "metadata": {}, "source": [ "## Add new dimension\n", "\n", "``none`` is used to insert a new axis or dimension." ] }, { "cell_type": "code", "execution_count": 4, "id": "98552b34", "metadata": {}, "outputs": [], "source": [ "arr = np.arange(10) \n", "assert arr.shape == (10,)\n", "# Add two new axes using [:, None, None]\n", "reshaped = arr[:, None, None]\n", "assert reshaped.shape == (10, 1, 1)" ] }, { "cell_type": "code", "execution_count": 5, "id": "4d50565d", "metadata": {}, "outputs": [], "source": [ "arr2 = arr.reshape(2, 5)\n", "assert arr2.shape == (2, 5)\n", "# Add two new axes after the first axis (\"row\")\n", "assert arr2[:, None, None].shape == (2, 1, 1, 5)\n", "assert arr2[:, :, None, None].shape == (2, 5, 1, 1)" ] }, { "cell_type": "markdown", "id": "3f2eb209", "metadata": {}, "source": [ "## Create and copy tensor\n", "\n", "```text\n", "# Create and reshape at once\n", "np.matrix(np.arange(12).reshape((3,4)))\n", "np.zeros((5,), dtype=int)\n", "np.zeros((2, 1))\n", "\n", "# Rehsape\n", "X = np.arange(6)\n", "X = X.reshape((2, 3))\n", "\n", "# Copy exactly\n", "np.copy(X)\n", "\n", "# Copy shape\n", "np.ones_like(X) # Return 1's with (2,3) shape\n", "np.zeros_like(X) # Return 0's with (2,3) shape\n", "\n", "# Full\n", "np.full((2, 2), 10) # Generate (2,2), all 10\n", "np.full((2, 2), np.inf) # Generate (2,2), all inf\n", "np.full((2, 2), [1, 2]) # Generate (2,2), each row of [1,2]\n", "```" ] }, { "cell_type": "markdown", "id": "bfdeb541", "metadata": {}, "source": [ "## Broadcast" ] }, { "cell_type": "code", "execution_count": 6, "id": "a77274b9", "metadata": {}, "outputs": [], "source": [ "a = np.array([1,2,3])\n", "assert a.shape == (3,)\n", "b = np.array([\n", " [10],\n", " [20],\n", " [30]])\n", "assert b.shape == (3,1)\n", "\n", "# In a, 1, 2, 3 duplicated across new rows\n", "# In b, 10, 20, 30 duplicated acorss new columns\n", "# And then those are added\n", "expected = np.array([[11,12,13],\n", " [21,22,23],\n", " [31,32,33]])\n", "\n", "assert np.array_equal(a + b, expected)\n" ] }, { "cell_type": "markdown", "id": "76f901d3", "metadata": {}, "source": [ "## Advanced indexing" ] }, { "cell_type": "code", "execution_count": 7, "id": "0290b984", "metadata": {}, "outputs": [], "source": [ "X = np.arange(9).reshape(3,3)\n", "assert np.array_equal(X, [[0, 1, 2],\n", " [3, 4, 5],\n", " [6, 7, 8]])\n", "result = X[[0,1,2], [0,1,2]]\n", "expected = np.array([0, 4, 8])\n", "assert np.array_equal(result, expected)\n" ] }, { "cell_type": "markdown", "id": "bab2db64", "metadata": {}, "source": [ "## Stacking\n", "\n", "- Axis 0 - rows\n", "- Axis 1 - columns\n", "- Axis 2 - depth\n", "- Axis 3 - so on.." ] }, { "cell_type": "code", "execution_count": 12, "id": "da7ba3c2", "metadata": {}, "outputs": [], "source": [ "# Base arrays\n", "a = np.array([1, 2, 3])\n", "b = np.array([4, 5, 6])\n", "\n", "# Stack across rows (Method 1/2)\n", "stacked = np.stack([a, b], axis=0) \n", "expected = np.array([[1,2,3],\n", " [4,5,6]])\n", "assert stacked.shape == (2,3)\n", "assert np.array_equal(stacked, expected)\n", "\n", "# Stack across rows (Method 2/2)\n", "vstacked = np.vstack([a, b]) # shape (2,3)\n", "expected = np.array([[1,2,3],\n", " [4,5,6]])\n", "assert np.array_equal(vstacked, expected)\n", "\n", "# Stack across columns (imagine you rotate the matrix and new rows)\n", "stacked_axis1 = np.stack([a, b], axis=1) \n", "expected_axis1 = np.array([[1,4],\n", " [2,5],\n", " [3,6]])\n", "assert stacked_axis1.shape == (3,2)\n", "assert np.array_equal(stacked_axis1, expected_axis1)\n", "\n", "# np.hstack (concatenate along columns)\n", "hstacked = np.hstack([a, b]) # shape (6,)\n", "expected = np.array([1,2,3,4,5,6])\n", "assert np.array_equal(hstacked, expected)\n", "\n", "# np.dstack (stack along depth / third axis)\n", "c = np.array([7,8,9])\n", "dstacked = np.dstack([a, b, c]) # shape (1,3,3)\n", "expected = np.array([[[1,4,7],\n", " [2,5,8],\n", " [3,6,9]]])\n", "assert np.array_equal(dstacked, expected)\n" ] }, { "cell_type": "markdown", "id": "7b9d8f3f", "metadata": {}, "source": [ "Just to note that `np.vstack` is a shorthand for vertical stacking `like np.concatenate(..., axis=0)`. `np.stack` lets you choose any axis so it's more general." ] }, { "cell_type": "markdown", "id": "91095d7b", "metadata": {}, "source": [ "## Performance\n", "\n", "- Vectoization - use array ops to loops\n", "- use ``where`` for conditional element selection ``np.where(X > 5, 1, 0) # Replace with 1 if >5 else 0``" ] }, { "cell_type": "markdown", "id": "687e2122", "metadata": {}, "source": [ "## Missing data\n", "\n", "```python\n", "import numpy as np\n", "\n", "# Example array with NaN and Inf\n", "arr = np.array([1.0, 2.0, np.nan, np.inf, -np.inf, 3.0])\n", "\n", "# Count NaNs\n", "assert np.isnan(arr).sum() == 1 # only one np.nan\n", "\n", "# Count Infs\n", "assert np.isinf(arr).sum() == 2 # +inf and -inf\n", "\n", "# Mean ignoring NaNs\n", "arr2 = np.array([1.0, 2.0, np.nan, 3.0])\n", "assert np.nanmean(arr2) == 2.0 # (1+2+3)/3\n", "\n", "# Replace NaN/Inf with finite values\n", "cleaned = np.nan_to_num(arr, nan=0.0, posinf=999.0, neginf=-999.0)\n", "expected = np.array([1.0, 2.0, 0.0, 999.0, -999.0, 3.0])\n", "assert np.array_equal(cleaned, expected)\n", "\n", "```" ] }, { "cell_type": "markdown", "id": "3518f55f", "metadata": {}, "source": [ "## Other useful stuff\n", "\n", "### Print nicely\n", "\n", "```python\n", "np.set_printoptions(\n", " precision=3, # Set decimal places\n", " suppress=True, # Avoid scientific notations\n", " threshold=100, # Max number of elements to be printed\n", " linewidth=80,\n", " edgeitems=2 # Show two values per edge when truncated\n", ")\n", "```\n", "\n", "### Random geneator\n", "\n", "```python\n", "# Uniform [0,1)\n", "arr1 = np.random.rand(3, 2)\n", "assert arr1.shape == (3, 2)\n", "assert np.all((arr1 >= 0) & (arr1 < 1))\n", "\n", "# Standard normal (mean ≈ 0, std ≈ 1, but here just shape check)\n", "arr2 = np.random.randn(3, 2)\n", "assert arr2.shape == (3, 2)\n", "# Values can be any real number, so no bound check\n", "\n", "# Random integers between 0 and 9\n", "arr3 = np.random.randint(0, 10, (2, 3))\n", "assert arr3.shape == (2, 3)\n", "assert np.all((arr3 >= 0) & (arr3 < 10))\n", "\n", "# Sampling with replacement\n", "arr4 = np.random.choice([1, 2, 3], size=5, replace=True)\n", "assert arr4.shape == (5,)\n", "assert np.all(np.isin(arr4, [1, 2, 3]))\n", "\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "ophus-env", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" } }, "nbformat": 4, "nbformat_minor": 5 }