NumPy

A comprehensive guide to NumPy operations with practical examples.

References:

Basics

Key points from the NumPy docs:

  • Fixed size; resizing creates a new array.

  • Same data type for all elements.

  • Element-wise ops use compiled C code (fast).

  • Vectorization: no explicit loops or indexing.

  • Broadcasting: implicit element-wise ops on arrays of different shapes.

  • Advanced indexing returns a copy; basic slicing returns a view. x[(1,2,3),]x[1,2,3].

  • Python arrays with tuples can be converted to multi-dimensional arrays.

  • Specify data type at creation: np.array([], dtype=complex).

  • Changing size is expensive; use np.zeros, np.ones for fixed-length arrays, e.g., np.ones(3, dtype=np.int32).

  • Random: rg = np.random.default_rng(1); rg.random((2,3)).

  • Upcasting: combining arrays of different types yields a more general type.

[1]:
# HIDDEN
import numpy as np
import pandas as pd
# Helper function to display matrix
def display(df, indent=1):
    s = df.to_string(index=False, header=False)
    indented = "\n".join(" " * indent + line for line in s.splitlines())
    print(indented)
    print("")

def eq(a, b):
    assert np.array_equal(a, np.array(b)), f"\nExpected:\n{b}\nGot:\n{a}"

Matrix operations

Here is a quick API summary

X.flatten()        # Flatten
np.sqrt(X)         # Square root all elements
np.sum(X)          # Sum all elements
np.sum(X,axis=0)   # Row-wise sum
np.sum(X,axis=1)   # Column-wise sum
np.amax(X)         # Single max value
np.amax(X, axis=0) # Get max in each column
np.amax(X, axis=1) # Get max in each row
np.mean(X)         # Mean
np.std(X)          # Standard deviation
np.var(X)          # Variance
np.trace(X)        # Sum of the elements on the diagonal
np.linalg.matrix_rank(X)  # Rank of the matrix
np.linalg.det(X)   # Determinant of the matrix

Slicing

1D slicing

Basic 1D slicing follows [start:stop:step] syntax:

[ ]:
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Basic slicing
eq(x[2:7], [2, 3, 4, 5, 6])      # Elements 2 through 6
eq(x[:5], [0, 1, 2, 3, 4])       # First 5 elements
eq(x[5:], [5, 6, 7, 8, 9])       # From index 5 to end
eq(x[-3:], [7, 8, 9])            # Last 3 elements

# With step
eq(x[::2], [0, 2, 4, 6, 8])      # Every other element
eq(x[1::2], [1, 3, 5, 7, 9])     # Every other, starting from 1
eq(x[::-1], [9, 8, 7, 6, 5, 4, 3, 2, 1, 0])  # Reverse
eq(x[8:2:-1], [8, 7, 6, 5, 4, 3])  # Reverse from 8 to 3

Boolean indexing

Select elements based on conditions:

[ ]:
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# Boolean conditions
eq(x[x > 5], [6, 7, 8, 9])
eq(x[x % 2 == 0], [0, 2, 4, 6, 8])  # Even numbers
eq(x[(x > 2) & (x < 7)], [3, 4, 5, 6])  # AND condition
eq(x[(x < 3) | (x > 7)], [0, 1, 2, 8, 9])  # OR condition

# Modify elements matching condition
y = x.copy()
y[y > 5] = 0
eq(y, [0, 1, 2, 3, 4, 5, 0, 0, 0, 0])

Advanced indexing

Unlike basic slicing using : which returns a “view”, advanced indexing uses integer arrays, list, or boolean masks.

[2]:
a = np.arange(9).reshape(3,3)

# Arbitrary element selection
eq(a[[2,0,1], [1,2,0]], [7, 2, 3])  # (2,1)->7, (0,2)->2, (1,0)->3

# Specific rows
eq(a[[0,2]], [[0, 1, 2],
              [6, 7, 8]])

# Specific columns
eq(a[:, [0,2]], [[0, 2],
                 [3, 5],
                 [6, 8]])

# Boolean mask for elements
mask = a > 4
eq(a[mask], [5, 6, 7, 8])  # flattened selection

[3]:
# i:k:l - i is the starting, j the end, k the step.
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
eq(x[1:7:2], [1, 3, 5])


2D slicing

Slicing returns a view, rather than a copy.

X =  np.arrange(0, 10, 1).reshape((3, 3))
X[1, :]       # get second row
X[:, -1]      # get last col
X[0:2, :]     # get first two rows
X[[0, 2], :]  # get first and third rows
X[:, 0:2]     # get first two columns
X[:, [0, 2]]  # get first and third columns
X[0:2, 0:2]   # get submatrix of first two rows/columns
X[X > 5]      # get elements greater than 5

If there are two : in each row or column, this implies that k value can be now adjusted. In other words, you can selectively choose the internal either across rows or columns, respectively.

[4]:
arr = np.array([[ 1,  2,  3,  4,  5],
                [ 6,  7,  8,  9, 10],
                [11, 12, 13, 14, 15]])

# 2. Slice with steps

# a) Every 2nd column (step of 2), i,j=all, k=2
eq(arr[:, ::2], [[ 1,  3,  5],
                 [ 6,  8, 10],
                 [11, 13, 15]])

# b) Every 2nd column starting from col=1, i=1, j=all, k=2
eq(arr[:, 1::2], [[ 2,  4],
                  [ 7,  9],
                  [12, 14]])

# c) Every 2nd row (started from i=0)
eq(arr[::2, :], [[ 1,  2,  3,  4,  5],
                 [11, 12, 13, 14, 15]])

# d) Every row, but every 3rd column only
eq(arr[:, ::3], [[ 1,  4],
                 [ 6,  9],
                 [11, 14]])

# e) From 2nd row to end, every 2nd column
eq(arr[1:, ::2], [[ 6,  8, 10],
                  [11, 13, 15]])

# f) From 1st row to last, every 2nd row and 2nd column
eq(arr[::2, ::2], [[ 1,  3,  5],
                   [11, 13, 15]])

# g) Reverse every 2nd column (step -2)
eq(arr[:, ::-2], [[ 5,  3,  1],
                  [10,  8,  6],
                  [15, 13, 11]])

# h) Reverse rows, but only every other one (step -2)
eq(arr[::-2, :], [[11, 12, 13, 14, 15],
                  [ 1,  2,  3,  4,  5]])

# i) Take a sub-slice with start, stop, and step
eq(arr[0:3:2, 1:5:2], [[ 2,  4],
                       [12, 14]])

# 3. Fancy indexing (list/array of indices)
eq(arr[[0, 1], [0, 2]], [1, 8])       # elements (0,0) and (1,2)
eq(arr[[1, 0], [4, 0]], [10, 1])      # (1,4) and (0,0)

# 4. Using np.ix_ for 2D fancy indexing
rows = [0, 1]
cols = [1, 3, 4]
eq(arr[np.ix_(rows, cols)],
   [[2, 4, 5],
    [7, 9, 10]])                      # cross-product of row/col indices


Reverse rows and columns

[5]:
# 1. Reverse rows or columns
eq(arr[::-1, :], [[11, 12, 13, 14, 15],
                  [6, 7, 8, 9, 10],
                  [1, 2, 3, 4, 5]])   # reverse rows

eq(arr[:, ::-1], [[5, 4, 3, 2, 1],
                  [10, 9, 8, 7, 6],
                  [11, 12, 13, 14, 15]])  # reverse columns

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[5], line 6
      1 # 1. Reverse rows or columns
      2 eq(arr[::-1, :], [[11, 12, 13, 14, 15],
      3                   [6, 7, 8, 9, 10],
      4                   [1, 2, 3, 4, 5]])   # reverse rows
----> 6 eq(arr[:, ::-1], [[5, 4, 3, 2, 1],
      7                   [10, 9, 8, 7, 6],
      8                   [11, 12, 13, 14, 15]])  # reverse columns

Cell In[1], line 12, in eq(a, b)
     11 def eq(a, b):
---> 12     assert np.array_equal(a, np.array(b)), f"\nExpected:\n{b}\nGot:\n{a}"

AssertionError:
Expected:
[[5, 4, 3, 2, 1], [10, 9, 8, 7, 6], [11, 12, 13, 14, 15]]
Got:
[[ 5  4  3  2  1]
 [10  9  8  7  6]
 [15 14 13 12 11]]

Slicing with intervals

[ ]:
arr = np.array([[1, 2, 3, 4, 5],
                [6, 7, 8, 9, 10]])

eq(arr[0:1, 1:4], np.array([[2, 3, 4]]))
eq(arr[:1, 1:4], np.array([[2, 3, 4]]))
eq(arr[0:2, 2], np.array([3, 8]))
eq(arr[0:2, 1:4], np.array([[2, 3, 4],
                                   [7, 8, 9]]))

Add new dimension

none is used to insert a new axis or dimension.

[ ]:
arr = np.arange(10)
assert arr.shape == (10,)
# Add two new axes using [:, None, None]
reshaped = arr[:, None, None]
assert reshaped.shape == (10, 1, 1)
[ ]:
arr2 = arr.reshape(2, 5)
assert arr2.shape == (2, 5)
# Add two new axes after the first axis ("row")
assert arr2[:, None, None].shape == (2, 1, 1, 5)
assert arr2[:, :, None, None].shape == (2, 5, 1, 1)

Create and copy tensor

[ ]:
eq(np.matrix(np.arange(12).reshape((3, 4))), np.arange(12).reshape(3, 4))
eq(np.zeros((5,), dtype=int), np.array([0, 0, 0, 0, 0], dtype=int))
eq(np.zeros((2, 1)), np.zeros((2, 1)))

# Reshape
X = np.arange(6).reshape((2, 3))
eq(X, np.array([[0, 1, 2],
                [3, 4, 5]]))

# Copy exactly
eq(np.copy(X), X)

# Copy shape
eq(np.ones_like(X),  np.array([[1, 1, 1],
                               [1, 1, 1]]))
eq(np.zeros_like(X), np.array([[0, 0, 0],
                               [0, 0, 0]]))

# Full
eq(np.full((2, 2), 10),      np.array([[10, 10],
                                       [10, 10]]))
eq(np.full((2, 2), np.inf),  np.array([[np.inf, np.inf],
                                       [np.inf, np.inf]]))
eq(np.full((2, 2), [1, 2]),  np.array([[1, 2],
                                       [1, 2]]))

Broadcast

[ ]:
a = np.array([1,2,3])
assert a.shape == (3,)
b = np.array([
    [10],
    [20],
    [30]])
assert b.shape == (3,1)

# In a, 1, 2, 3 duplicated across new rows
# In b, 10, 20, 30 duplicated acorss new columns
# And then those are added
expected = np.array([[11,12,13],
                     [21,22,23],
                     [31,32,33]])

assert np.array_equal(a + b, expected)

Stacking

  • Axis 0 - rows

  • Axis 1 - columns

  • Axis 2 - depth

  • Axis 3 - so on..

[ ]:
# Base arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Stack across rows (Method 1)
eq(np.stack([a, b], axis=0),
   np.array([[1, 2, 3],
             [4, 5, 6]]))

# Stack across rows (Method 2)
eq(np.vstack([a, b]),
   np.array([[1, 2, 3],
             [4, 5, 6]]))

# Stack across columns (like rotate 90 and row-sum)
eq(np.stack([a, b], axis=1),
   np.array([[1, 4],
             [2, 5],
             [3, 6]]))

# Concatenate along columns
eq(np.hstack([a, b]),
   np.array([1, 2, 3, 4, 5, 6]))

# Stack along depth / third axis
c = np.array([7, 8, 9])
eq(np.dstack([a, b, c]),
   np.array([[[1, 4, 7],
              [2, 5, 8],
              [3, 6, 9]]]))

Just to note that np.vstack is a shorthand for vertical stacking like np.concatenate(..., axis=0). np.stack lets you choose any axis so it’s more general.

Performance

  • Vectoization - use array ops to loops

  • use where for conditional element selection np.where(X > 5, 1, 0)  # Replace with 1 if >5 else 0

[ ]:
# Example array with NaN and Inf
arr = np.array([1.0, 2.0, np.nan, np.inf, -np.inf, 3.0])

# Count NaNs
assert np.isnan(arr).sum() == 1   # only one np.nan

# Count Infs
assert np.isinf(arr).sum() == 2   # +inf and -inf

# Mean ignoring NaNs
arr2 = np.array([1.0, 2.0, np.nan, 3.0])
assert np.nanmean(arr2) == 2.0    # (1+2+3)/3

# Replace NaN/Inf with finite values
cleaned = np.nan_to_num(arr, nan=0.0, posinf=999.0, neginf=-999.0)
expected = np.array([1.0, 2.0, 0.0, 999.0, -999.0, 3.0])
assert np.array_equal(cleaned, expected)
[ ]:
np.set_printoptions(
    precision=3,   # Set decimal places
    suppress=True, # Avoid scientific notations
    threshold=100, # Max number of elements to be printed
    linewidth=80,
    edgeitems=2    # Show two values per edge when truncated
)

Other useful stuff

Linear algebra operations

[ ]:
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Matrix multiplication (dot product)
eq(A @ B, [[19, 22], [43, 50]])
eq(np.dot(A, B), [[19, 22], [43, 50]])

# Element-wise multiplication
eq(A * B, [[5, 12], [21, 32]])

# Transpose
eq(A.T, [[1, 3], [2, 4]])

# Inverse
A_inv = np.linalg.inv(A)
identity = A @ A_inv
assert np.allclose(identity, np.eye(2))  # Should be identity matrix

# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Eigenvalues: {eigenvalues}")
print(f"Eigenvectors:\n{eigenvectors}")

# Solve linear system Ax = b
b = np.array([1, 2])
x = np.linalg.solve(A, b)
assert np.allclose(A @ x, b)  # Verify solution

Array manipulation

[ ]:
x = np.arange(12)

# Reshape
eq(x.reshape(3, 4), [[0, 1, 2, 3],
                     [4, 5, 6, 7],
                     [8, 9, 10, 11]])

eq(x.reshape(2, 6), [[0, 1, 2, 3, 4, 5],
                     [6, 7, 8, 9, 10, 11]])

# Reshape with -1 (auto-calculate dimension)
eq(x.reshape(3, -1), [[0, 1, 2, 3],
                      [4, 5, 6, 7],
                      [8, 9, 10, 11]])

# Flatten
A = np.array([[1, 2, 3], [4, 5, 6]])
eq(A.flatten(), [1, 2, 3, 4, 5, 6])
eq(A.ravel(), [1, 2, 3, 4, 5, 6])  # Similar but returns view when possible

# Squeeze - remove axes of length 1
B = np.array([[[1], [2], [3]]])
assert B.shape == (1, 3, 1)
C = np.squeeze(B)
assert C.shape == (3,)
eq(C, [1, 2, 3])

# Expand dimensions
eq(np.expand_dims(C, axis=0), [[1, 2, 3]])
eq(np.expand_dims(C, axis=1), [[1], [2], [3]])

Statistical operations

[ ]:
data = np.array([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9]])

# Basic statistics
print(f"Mean: {np.mean(data)}")           # 5.0
print(f"Median: {np.median(data)}")       # 5.0
print(f"Std: {np.std(data)}")             # ~2.58
print(f"Var: {np.var(data)}")             # ~6.67

# Axis-wise operations
print(f"Column means: {np.mean(data, axis=0)}")  # [4, 5, 6]
print(f"Row means: {np.mean(data, axis=1)}")     # [2, 5, 8]

# Min/Max
print(f"Min: {np.min(data)}")             # 1
print(f"Max: {np.max(data)}")             # 9
print(f"Argmin: {np.argmin(data)}")       # 0 (index of min)
print(f"Argmax: {np.argmax(data)}")       # 8 (index of max)

# Percentiles
print(f"25th percentile: {np.percentile(data, 25)}")  # 2.5
print(f"75th percentile: {np.percentile(data, 75)}")  # 7.5

# Cumulative operations
print(f"Cumsum: {np.cumsum([1, 2, 3, 4])}")      # [1, 3, 6, 10]
print(f"Cumprod: {np.cumprod([1, 2, 3, 4])}")    # [1, 2, 6, 24]

Universal functions (ufuncs)

Element-wise operations that are highly optimized:

[ ]:
x = np.array([1, 4, 9, 16, 25])

# Mathematical functions
eq(np.sqrt(x), [1.0, 2.0, 3.0, 4.0, 5.0])
eq(np.square(x), [1, 16, 81, 256, 625])
eq(np.exp([0, 1, 2]), [1.0, np.e, np.e**2])
eq(np.log([1, np.e, np.e**2]), [0.0, 1.0, 2.0])

# Trigonometric
angles = np.array([0, np.pi/2, np.pi])
assert np.allclose(np.sin(angles), [0, 1, 0])
assert np.allclose(np.cos(angles), [1, 0, -1])

# Rounding
y = np.array([1.2, 2.5, 3.7, 4.9])
eq(np.round(y), [1.0, 2.0, 4.0, 5.0])
eq(np.floor(y), [1.0, 2.0, 3.0, 4.0])
eq(np.ceil(y), [2.0, 3.0, 4.0, 5.0])

# Absolute value
z = np.array([-1, -2, 3, -4])
eq(np.abs(z), [1, 2, 3, 4])

# Clip values
data = np.array([1, 5, 10, 15, 20])
eq(np.clip(data, 5, 15), [5, 5, 10, 15, 15])  # Limit between 5 and 15

Sorting and searching

[ ]:
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6])

# Sort (returns new sorted array)
eq(np.sort(arr), [1, 1, 2, 3, 4, 5, 6, 9])

# Argsort (returns indices that would sort the array)
indices = np.argsort(arr)
eq(indices, [1, 3, 6, 0, 2, 4, 7, 5])
eq(arr[indices], [1, 1, 2, 3, 4, 5, 6, 9])  # Verify

# Sort 2D array
matrix = np.array([[3, 1, 4],
                   [1, 5, 9],
                   [2, 6, 5]])

eq(np.sort(matrix, axis=0),  # Sort each column
   [[1, 1, 4],
    [2, 5, 5],
    [3, 6, 9]])

eq(np.sort(matrix, axis=1),  # Sort each row
   [[1, 3, 4],
    [1, 5, 9],
    [2, 5, 6]])

# Unique values
data = np.array([1, 2, 2, 3, 3, 3, 4])
eq(np.unique(data), [1, 2, 3, 4])

# Unique with counts
unique, counts = np.unique(data, return_counts=True)
eq(unique, [1, 2, 3, 4])
eq(counts, [1, 2, 3, 1])

# Where (find indices matching condition)
x = np.array([1, 2, 3, 4, 5])
indices = np.where(x > 2)
eq(indices[0], [2, 3, 4])  # Indices where x > 2

Random number geneator

[ ]:

# Uniform [0,1) a = np.random.rand(3, 2) assert a.shape == (3, 2) assert np.all((a >= 0) & (a < 1)) # Standard normal (mean ≈ 0, std ≈ 1, but here just shape check) b = np.random.randn(3, 2) assert b.shape == (3, 2) # Values can be any real number, so no bound check # Random integers between 0 and 9 c = np.random.randint(0, 10, (2, 3)) assert c.shape == (2, 3) assert np.all((c >= 0) & (c < 10)) # Sampling with replacement d = np.random.choice([1, 2, 3], size=5, replace=True) assert d.shape == (5,) assert np.all(np.isin(d, [1, 2, 3]))