NumPy
A comprehensive guide to NumPy operations with practical examples.
References:
Basics
Key points from the NumPy docs:
Fixed size; resizing creates a new array.
Same data type for all elements.
Element-wise ops use compiled C code (fast).
Vectorization: no explicit loops or indexing.
Broadcasting: implicit element-wise ops on arrays of different shapes.
Advanced indexing returns a copy; basic slicing returns a view.
x[(1,2,3),]≠x[1,2,3].Python arrays with tuples can be converted to multi-dimensional arrays.
Specify data type at creation:
np.array([], dtype=complex).Changing size is expensive; use
np.zeros,np.onesfor fixed-length arrays, e.g.,np.ones(3, dtype=np.int32).Random:
rg = np.random.default_rng(1);rg.random((2,3)).Upcasting: combining arrays of different types yields a more general type.
[1]:
# HIDDEN
import numpy as np
import pandas as pd
# Helper function to display matrix
def display(df, indent=1):
s = df.to_string(index=False, header=False)
indented = "\n".join(" " * indent + line for line in s.splitlines())
print(indented)
print("")
def eq(a, b):
assert np.array_equal(a, np.array(b)), f"\nExpected:\n{b}\nGot:\n{a}"
Matrix operations
Here is a quick API summary
X.flatten() # Flatten
np.sqrt(X) # Square root all elements
np.sum(X) # Sum all elements
np.sum(X,axis=0) # Row-wise sum
np.sum(X,axis=1) # Column-wise sum
np.amax(X) # Single max value
np.amax(X, axis=0) # Get max in each column
np.amax(X, axis=1) # Get max in each row
np.mean(X) # Mean
np.std(X) # Standard deviation
np.var(X) # Variance
np.trace(X) # Sum of the elements on the diagonal
np.linalg.matrix_rank(X) # Rank of the matrix
np.linalg.det(X) # Determinant of the matrix
Slicing
1D slicing
Basic 1D slicing follows [start:stop:step] syntax:
[ ]:
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Basic slicing
eq(x[2:7], [2, 3, 4, 5, 6]) # Elements 2 through 6
eq(x[:5], [0, 1, 2, 3, 4]) # First 5 elements
eq(x[5:], [5, 6, 7, 8, 9]) # From index 5 to end
eq(x[-3:], [7, 8, 9]) # Last 3 elements
# With step
eq(x[::2], [0, 2, 4, 6, 8]) # Every other element
eq(x[1::2], [1, 3, 5, 7, 9]) # Every other, starting from 1
eq(x[::-1], [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]) # Reverse
eq(x[8:2:-1], [8, 7, 6, 5, 4, 3]) # Reverse from 8 to 3
Boolean indexing
Select elements based on conditions:
[ ]:
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Boolean conditions
eq(x[x > 5], [6, 7, 8, 9])
eq(x[x % 2 == 0], [0, 2, 4, 6, 8]) # Even numbers
eq(x[(x > 2) & (x < 7)], [3, 4, 5, 6]) # AND condition
eq(x[(x < 3) | (x > 7)], [0, 1, 2, 8, 9]) # OR condition
# Modify elements matching condition
y = x.copy()
y[y > 5] = 0
eq(y, [0, 1, 2, 3, 4, 5, 0, 0, 0, 0])
Advanced indexing
Unlike basic slicing using : which returns a “view”, advanced indexing uses integer arrays, list, or boolean masks.
[2]:
a = np.arange(9).reshape(3,3)
# Arbitrary element selection
eq(a[[2,0,1], [1,2,0]], [7, 2, 3]) # (2,1)->7, (0,2)->2, (1,0)->3
# Specific rows
eq(a[[0,2]], [[0, 1, 2],
[6, 7, 8]])
# Specific columns
eq(a[:, [0,2]], [[0, 2],
[3, 5],
[6, 8]])
# Boolean mask for elements
mask = a > 4
eq(a[mask], [5, 6, 7, 8]) # flattened selection
[3]:
# i:k:l - i is the starting, j the end, k the step.
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
eq(x[1:7:2], [1, 3, 5])
2D slicing
Slicing returns a view, rather than a copy.
X = np.arrange(0, 10, 1).reshape((3, 3))
X[1, :] # get second row
X[:, -1] # get last col
X[0:2, :] # get first two rows
X[[0, 2], :] # get first and third rows
X[:, 0:2] # get first two columns
X[:, [0, 2]] # get first and third columns
X[0:2, 0:2] # get submatrix of first two rows/columns
X[X > 5] # get elements greater than 5
If there are two : in each row or column, this implies that k value can be now adjusted. In other words, you can selectively choose the internal either across rows or columns, respectively.
[4]:
arr = np.array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]])
# 2. Slice with steps
# a) Every 2nd column (step of 2), i,j=all, k=2
eq(arr[:, ::2], [[ 1, 3, 5],
[ 6, 8, 10],
[11, 13, 15]])
# b) Every 2nd column starting from col=1, i=1, j=all, k=2
eq(arr[:, 1::2], [[ 2, 4],
[ 7, 9],
[12, 14]])
# c) Every 2nd row (started from i=0)
eq(arr[::2, :], [[ 1, 2, 3, 4, 5],
[11, 12, 13, 14, 15]])
# d) Every row, but every 3rd column only
eq(arr[:, ::3], [[ 1, 4],
[ 6, 9],
[11, 14]])
# e) From 2nd row to end, every 2nd column
eq(arr[1:, ::2], [[ 6, 8, 10],
[11, 13, 15]])
# f) From 1st row to last, every 2nd row and 2nd column
eq(arr[::2, ::2], [[ 1, 3, 5],
[11, 13, 15]])
# g) Reverse every 2nd column (step -2)
eq(arr[:, ::-2], [[ 5, 3, 1],
[10, 8, 6],
[15, 13, 11]])
# h) Reverse rows, but only every other one (step -2)
eq(arr[::-2, :], [[11, 12, 13, 14, 15],
[ 1, 2, 3, 4, 5]])
# i) Take a sub-slice with start, stop, and step
eq(arr[0:3:2, 1:5:2], [[ 2, 4],
[12, 14]])
# 3. Fancy indexing (list/array of indices)
eq(arr[[0, 1], [0, 2]], [1, 8]) # elements (0,0) and (1,2)
eq(arr[[1, 0], [4, 0]], [10, 1]) # (1,4) and (0,0)
# 4. Using np.ix_ for 2D fancy indexing
rows = [0, 1]
cols = [1, 3, 4]
eq(arr[np.ix_(rows, cols)],
[[2, 4, 5],
[7, 9, 10]]) # cross-product of row/col indices
Reverse rows and columns
[5]:
# 1. Reverse rows or columns
eq(arr[::-1, :], [[11, 12, 13, 14, 15],
[6, 7, 8, 9, 10],
[1, 2, 3, 4, 5]]) # reverse rows
eq(arr[:, ::-1], [[5, 4, 3, 2, 1],
[10, 9, 8, 7, 6],
[11, 12, 13, 14, 15]]) # reverse columns
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[5], line 6
1 # 1. Reverse rows or columns
2 eq(arr[::-1, :], [[11, 12, 13, 14, 15],
3 [6, 7, 8, 9, 10],
4 [1, 2, 3, 4, 5]]) # reverse rows
----> 6 eq(arr[:, ::-1], [[5, 4, 3, 2, 1],
7 [10, 9, 8, 7, 6],
8 [11, 12, 13, 14, 15]]) # reverse columns
Cell In[1], line 12, in eq(a, b)
11 def eq(a, b):
---> 12 assert np.array_equal(a, np.array(b)), f"\nExpected:\n{b}\nGot:\n{a}"
AssertionError:
Expected:
[[5, 4, 3, 2, 1], [10, 9, 8, 7, 6], [11, 12, 13, 14, 15]]
Got:
[[ 5 4 3 2 1]
[10 9 8 7 6]
[15 14 13 12 11]]
Slicing with intervals
[ ]:
arr = np.array([[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10]])
eq(arr[0:1, 1:4], np.array([[2, 3, 4]]))
eq(arr[:1, 1:4], np.array([[2, 3, 4]]))
eq(arr[0:2, 2], np.array([3, 8]))
eq(arr[0:2, 1:4], np.array([[2, 3, 4],
[7, 8, 9]]))
Add new dimension
none is used to insert a new axis or dimension.
[ ]:
arr = np.arange(10)
assert arr.shape == (10,)
# Add two new axes using [:, None, None]
reshaped = arr[:, None, None]
assert reshaped.shape == (10, 1, 1)
[ ]:
arr2 = arr.reshape(2, 5)
assert arr2.shape == (2, 5)
# Add two new axes after the first axis ("row")
assert arr2[:, None, None].shape == (2, 1, 1, 5)
assert arr2[:, :, None, None].shape == (2, 5, 1, 1)
Create and copy tensor
[ ]:
eq(np.matrix(np.arange(12).reshape((3, 4))), np.arange(12).reshape(3, 4))
eq(np.zeros((5,), dtype=int), np.array([0, 0, 0, 0, 0], dtype=int))
eq(np.zeros((2, 1)), np.zeros((2, 1)))
# Reshape
X = np.arange(6).reshape((2, 3))
eq(X, np.array([[0, 1, 2],
[3, 4, 5]]))
# Copy exactly
eq(np.copy(X), X)
# Copy shape
eq(np.ones_like(X), np.array([[1, 1, 1],
[1, 1, 1]]))
eq(np.zeros_like(X), np.array([[0, 0, 0],
[0, 0, 0]]))
# Full
eq(np.full((2, 2), 10), np.array([[10, 10],
[10, 10]]))
eq(np.full((2, 2), np.inf), np.array([[np.inf, np.inf],
[np.inf, np.inf]]))
eq(np.full((2, 2), [1, 2]), np.array([[1, 2],
[1, 2]]))
Broadcast
[ ]:
a = np.array([1,2,3])
assert a.shape == (3,)
b = np.array([
[10],
[20],
[30]])
assert b.shape == (3,1)
# In a, 1, 2, 3 duplicated across new rows
# In b, 10, 20, 30 duplicated acorss new columns
# And then those are added
expected = np.array([[11,12,13],
[21,22,23],
[31,32,33]])
assert np.array_equal(a + b, expected)
Stacking
Axis 0 - rows
Axis 1 - columns
Axis 2 - depth
Axis 3 - so on..
[ ]:
# Base arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
# Stack across rows (Method 1)
eq(np.stack([a, b], axis=0),
np.array([[1, 2, 3],
[4, 5, 6]]))
# Stack across rows (Method 2)
eq(np.vstack([a, b]),
np.array([[1, 2, 3],
[4, 5, 6]]))
# Stack across columns (like rotate 90 and row-sum)
eq(np.stack([a, b], axis=1),
np.array([[1, 4],
[2, 5],
[3, 6]]))
# Concatenate along columns
eq(np.hstack([a, b]),
np.array([1, 2, 3, 4, 5, 6]))
# Stack along depth / third axis
c = np.array([7, 8, 9])
eq(np.dstack([a, b, c]),
np.array([[[1, 4, 7],
[2, 5, 8],
[3, 6, 9]]]))
Just to note that np.vstack is a shorthand for vertical stacking like np.concatenate(..., axis=0). np.stack lets you choose any axis so it’s more general.
Performance
Vectoization - use array ops to loops
use
wherefor conditional element selectionnp.where(X > 5, 1, 0) # Replace with 1 if >5 else 0
[ ]:
# Example array with NaN and Inf
arr = np.array([1.0, 2.0, np.nan, np.inf, -np.inf, 3.0])
# Count NaNs
assert np.isnan(arr).sum() == 1 # only one np.nan
# Count Infs
assert np.isinf(arr).sum() == 2 # +inf and -inf
# Mean ignoring NaNs
arr2 = np.array([1.0, 2.0, np.nan, 3.0])
assert np.nanmean(arr2) == 2.0 # (1+2+3)/3
# Replace NaN/Inf with finite values
cleaned = np.nan_to_num(arr, nan=0.0, posinf=999.0, neginf=-999.0)
expected = np.array([1.0, 2.0, 0.0, 999.0, -999.0, 3.0])
assert np.array_equal(cleaned, expected)
[ ]:
np.set_printoptions(
precision=3, # Set decimal places
suppress=True, # Avoid scientific notations
threshold=100, # Max number of elements to be printed
linewidth=80,
edgeitems=2 # Show two values per edge when truncated
)
Other useful stuff
Print nicely
Linear algebra operations
[ ]:
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Matrix multiplication (dot product)
eq(A @ B, [[19, 22], [43, 50]])
eq(np.dot(A, B), [[19, 22], [43, 50]])
# Element-wise multiplication
eq(A * B, [[5, 12], [21, 32]])
# Transpose
eq(A.T, [[1, 3], [2, 4]])
# Inverse
A_inv = np.linalg.inv(A)
identity = A @ A_inv
assert np.allclose(identity, np.eye(2)) # Should be identity matrix
# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Eigenvalues: {eigenvalues}")
print(f"Eigenvectors:\n{eigenvectors}")
# Solve linear system Ax = b
b = np.array([1, 2])
x = np.linalg.solve(A, b)
assert np.allclose(A @ x, b) # Verify solution
Array manipulation
[ ]:
x = np.arange(12)
# Reshape
eq(x.reshape(3, 4), [[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]])
eq(x.reshape(2, 6), [[0, 1, 2, 3, 4, 5],
[6, 7, 8, 9, 10, 11]])
# Reshape with -1 (auto-calculate dimension)
eq(x.reshape(3, -1), [[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]])
# Flatten
A = np.array([[1, 2, 3], [4, 5, 6]])
eq(A.flatten(), [1, 2, 3, 4, 5, 6])
eq(A.ravel(), [1, 2, 3, 4, 5, 6]) # Similar but returns view when possible
# Squeeze - remove axes of length 1
B = np.array([[[1], [2], [3]]])
assert B.shape == (1, 3, 1)
C = np.squeeze(B)
assert C.shape == (3,)
eq(C, [1, 2, 3])
# Expand dimensions
eq(np.expand_dims(C, axis=0), [[1, 2, 3]])
eq(np.expand_dims(C, axis=1), [[1], [2], [3]])
Statistical operations
[ ]:
data = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
# Basic statistics
print(f"Mean: {np.mean(data)}") # 5.0
print(f"Median: {np.median(data)}") # 5.0
print(f"Std: {np.std(data)}") # ~2.58
print(f"Var: {np.var(data)}") # ~6.67
# Axis-wise operations
print(f"Column means: {np.mean(data, axis=0)}") # [4, 5, 6]
print(f"Row means: {np.mean(data, axis=1)}") # [2, 5, 8]
# Min/Max
print(f"Min: {np.min(data)}") # 1
print(f"Max: {np.max(data)}") # 9
print(f"Argmin: {np.argmin(data)}") # 0 (index of min)
print(f"Argmax: {np.argmax(data)}") # 8 (index of max)
# Percentiles
print(f"25th percentile: {np.percentile(data, 25)}") # 2.5
print(f"75th percentile: {np.percentile(data, 75)}") # 7.5
# Cumulative operations
print(f"Cumsum: {np.cumsum([1, 2, 3, 4])}") # [1, 3, 6, 10]
print(f"Cumprod: {np.cumprod([1, 2, 3, 4])}") # [1, 2, 6, 24]
Universal functions (ufuncs)
Element-wise operations that are highly optimized:
[ ]:
x = np.array([1, 4, 9, 16, 25])
# Mathematical functions
eq(np.sqrt(x), [1.0, 2.0, 3.0, 4.0, 5.0])
eq(np.square(x), [1, 16, 81, 256, 625])
eq(np.exp([0, 1, 2]), [1.0, np.e, np.e**2])
eq(np.log([1, np.e, np.e**2]), [0.0, 1.0, 2.0])
# Trigonometric
angles = np.array([0, np.pi/2, np.pi])
assert np.allclose(np.sin(angles), [0, 1, 0])
assert np.allclose(np.cos(angles), [1, 0, -1])
# Rounding
y = np.array([1.2, 2.5, 3.7, 4.9])
eq(np.round(y), [1.0, 2.0, 4.0, 5.0])
eq(np.floor(y), [1.0, 2.0, 3.0, 4.0])
eq(np.ceil(y), [2.0, 3.0, 4.0, 5.0])
# Absolute value
z = np.array([-1, -2, 3, -4])
eq(np.abs(z), [1, 2, 3, 4])
# Clip values
data = np.array([1, 5, 10, 15, 20])
eq(np.clip(data, 5, 15), [5, 5, 10, 15, 15]) # Limit between 5 and 15
Sorting and searching
[ ]:
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6])
# Sort (returns new sorted array)
eq(np.sort(arr), [1, 1, 2, 3, 4, 5, 6, 9])
# Argsort (returns indices that would sort the array)
indices = np.argsort(arr)
eq(indices, [1, 3, 6, 0, 2, 4, 7, 5])
eq(arr[indices], [1, 1, 2, 3, 4, 5, 6, 9]) # Verify
# Sort 2D array
matrix = np.array([[3, 1, 4],
[1, 5, 9],
[2, 6, 5]])
eq(np.sort(matrix, axis=0), # Sort each column
[[1, 1, 4],
[2, 5, 5],
[3, 6, 9]])
eq(np.sort(matrix, axis=1), # Sort each row
[[1, 3, 4],
[1, 5, 9],
[2, 5, 6]])
# Unique values
data = np.array([1, 2, 2, 3, 3, 3, 4])
eq(np.unique(data), [1, 2, 3, 4])
# Unique with counts
unique, counts = np.unique(data, return_counts=True)
eq(unique, [1, 2, 3, 4])
eq(counts, [1, 2, 3, 1])
# Where (find indices matching condition)
x = np.array([1, 2, 3, 4, 5])
indices = np.where(x > 2)
eq(indices[0], [2, 3, 4]) # Indices where x > 2
Random number geneator
[ ]:
# Uniform [0,1)
a = np.random.rand(3, 2)
assert a.shape == (3, 2)
assert np.all((a >= 0) & (a < 1))
# Standard normal (mean ≈ 0, std ≈ 1, but here just shape check)
b = np.random.randn(3, 2)
assert b.shape == (3, 2)
# Values can be any real number, so no bound check
# Random integers between 0 and 9
c = np.random.randint(0, 10, (2, 3))
assert c.shape == (2, 3)
assert np.all((c >= 0) & (c < 10))
# Sampling with replacement
d = np.random.choice([1, 2, 3], size=5, replace=True)
assert d.shape == (5,)
assert np.all(np.isin(d, [1, 2, 3]))