Data structures

This page covers fundamental data structure design principles commonly used in scientific computing, systems programming, and algorithm implementation. This section is co-authored with Claude.

Fundamental concepts

What is a data structure?

A data structure is a way of organizing and storing data to enable efficient access and modification. Different data structures are optimized for different operations (insertion, deletion, searching, sorting).

What is time complexity?

Time complexity describes how the runtime of an algorithm grows with input size, typically expressed using Big O notation:

O(1): Constant time, independent of input size
O(log n): Logarithmic time, grows slowly (e.g., binary search)
O(n): Linear time, proportional to input size
O(n log n): Linearithmic time (efficient sorting)
O(n²): Quadratic time, grows rapidly with input size

What is space complexity?

Space complexity describes how much memory an algorithm or data structure requires relative to the input size. Trade-offs between time and space complexity are common in algorithm design.

Summary table

Data Structure	Access	Insert	Delete	Use Case
Array/Vector	O(1)	O(n)	O(n)	Random access, cache-friendly
Linked List	O(n)	O(1)	O(1)	Frequent insertions, sparse data
Stack	O(n)	O(1)	O(1)	LIFO, function calls, backtracking
Queue	O(n)	O(1)	O(1)	FIFO, BFS, task scheduling
Heap	O(n)	O(log n)	O(log n)	Priority queue, k-largest/smallest
Hash Table	O(1)	O(1)	O(1)	Fast lookups, caching
BST (balanced)	O(log n)	O(log n)	O(log n)	Sorted data, range queries
Graph (adj list)	O(degree)	O(1)	O(degree)	Networks, relationships

Array based structures

Arrays

What is an array?

A contiguous block of memory storing elements of the same type. Arrays provide O(1) random access by index but fixed size.

Memory layout visualization:

Index:     0      1      2      3      4
         ┌──────┬──────┬──────┬──────┬──────┐
data[]   │ 3.14 │ 2.71 │ 1.41 │ 5.00 │ 7.89 │
         └──────┴──────┴──────┴──────┴──────┘
Address: 0x100  0x108  0x110  0x118  0x120
         (each double = 8 bytes)

Access data[2]:  O(1)
Address calculation: base_address + (index × element_size)
                    0x100 + (2 × 8) = 0x110

// C++ fixed-size array
double data[5] = {3.14, 2.71, 1.41, 5.00, 7.89};
double x = data[2];              // O(1) access: 1.41

// Dynamic array
double* data = new double[n];    // Heap allocation
delete[] data;

What are the advantages of arrays?

O(1) random access by index
Cache-friendly (contiguous memory)
Simple and efficient for fixed-size data
Predictable memory layout

What are the disadvantages of arrays?

Fixed size (in C-style arrays)
Expensive insertion/deletion in middle (O(n))
No bounds checking (in C/C++)

Dynamic arrays (vectors)

What is a dynamic array?

A resizable array that grows automatically when capacity is exceeded. In C++, this is std::vector. It provides array-like performance with dynamic sizing.

Vector growth visualization:

Initial state (capacity=2):
┌─────┬─────┐
│ 3.14│ 2.71│
└─────┴─────┘
size=2, capacity=2

After push_back(1.41) - capacity exceeded, reallocate:

Step 1: Allocate new array (capacity = 2 × 2 = 4)
┌─────┬─────┬─────┬─────┐
│     │     │     │     │
└─────┴─────┴─────┴─────┘

Step 2: Copy existing elements
┌─────┬─────┬─────┬─────┐
│ 3.14│ 2.71│     │     │
└─────┴─────┴─────┴─────┘

Step 3: Add new element
┌─────┬─────┬─────┬─────┐
│ 3.14│ 2.71│ 1.41│     │
└─────┴─────┴─────┴─────┘
size=3, capacity=4

Old array is deallocated

#include <vector>

std::vector<double> data;
data.push_back(3.14);           // O(1) amortized
data.push_back(2.71);

double x = data[0];             // O(1) access
data.resize(1000);              // Change size

How does a vector handle growth?

When capacity is exceeded, the vector allocates a new larger array (typically 1.5x or 2x the current size), copies all elements, and deallocates the old array. This makes push_back O(1) amortized time.

std::vector<double> data;
data.reserve(1000);  // Pre-allocate to avoid reallocation

When should you use vectors vs arrays?

Use std::vector for dynamic sizing, automatic memory management
Use fixed arrays for small, compile-time known sizes
Use std::array for type-safe fixed-size containers

Linked structures

Linked lists

What is a linked list?

A sequence of nodes where each node contains data and a pointer to the next node. Unlike arrays, linked lists store elements non-contiguously in memory.

Singly linked list visualization:

head
 ↓
┌────┬────┐    ┌────┬────┐    ┌────┬────┐
│3.14│ ●──┼───→│2.71│ ●──┼───→│1.41│ ●──┼───→ NULL
└────┴────┘    └────┴────┘    └────┴────┘
 Node 1         Node 2         Node 3
 (0x200)        (0x350)        (0x1A0)

Each node: [data | next pointer]
Memory locations are scattered (not contiguous)

Doubly linked list visualization:

head
 ↓
┌────┬────┬────┐    ┌────┬────┬────┐    ┌────┬────┬────┐
│NULL│3.14│ ●──┼───→│ ●←─┤2.71│ ●──┼───→│ ●←─┤1.41│NULL│
└────┴────┴────┘    └────┴────┴────┘    └────┴────┴────┘
               ←────────────────────────
Each node: [prev | data | next]

struct Node {
    double data;
    Node* next;
};

// Creating a linked list
Node* head = new Node{3.14, nullptr};
head->next = new Node{2.71, nullptr};
head->next->next = new Node{1.41, nullptr};

Insertion at front (O(1)):

Before inserting 5.00:
head
 ↓
┌────┬────┐    ┌────┬────┐
│3.14│ ●──┼───→│2.71│NULL│
└────┴────┘    └────┴────┘

After inserting 5.00 at front:
head
 ↓
┌────┬────┐    ┌────┬────┐    ┌────┬────┐
│5.00│ ●──┼───→│3.14│ ●──┼───→│2.71│NULL│
└────┴────┘    └────┴────┘    └────┴────┘
New node        Old head

Accessing element at index i (O(n)):

To access index 2, must traverse:
head → Node 0 → Node 1 → Node 2
(No direct address calculation like arrays)

What are the advantages of linked lists?

O(1) insertion/deletion at front
No reallocation needed when growing
Can grow to any size (limited by available memory)
Efficient for frequent insertions/deletions

What are the disadvantages of linked lists?

O(n) access by index (must traverse)
Extra memory for pointers (overhead)
Poor cache performance (non-contiguous)
More complex memory management

When are linked lists used in scientific computing?

Linked lists are rarely used in numerical computing due to poor cache performance, but they appear in:

Sparse matrix storage (linked list of non-zero entries)
Memory allocation algorithms
Graph representations (adjacency lists)
Undo/redo stacks in applications

Stack (LIFO)

What is a stack?

A Last-In-First-Out (LIFO) data structure. Elements are added and removed from the same end (the “top”). Think of a stack of plates—you add and remove from the top.

Stack operations visualization:

Initial state:           After push(3.14):        After push(2.71):
┌─────────┐             ┌─────────┐              ┌─────────┐
│  Empty  │             │  3.14   │ ← top        │  2.71   │ ← top
└─────────┘             └─────────┘              ├─────────┤
                                                 │  3.14   │
                                                 └─────────┘

After push(1.41):        After pop():             After pop():
┌─────────┐             ┌─────────┐              ┌─────────┐
│  1.41   │ ← top       │  2.71   │ ← top        │  3.14   │ ← top
├─────────┤             ├─────────┤              └─────────┘
│  2.71   │             │  3.14   │
├─────────┤             └─────────┘
│  3.14   │
└─────────┘

Function call stack example:

main() called:          foo(5) called:           bar(15) called:
┌──────────┐           ┌──────────┐            ┌──────────┐
│ main     │           │ a=5      │            │ x=15     │
│ locals   │           │ b=10     │            │ c=16     │
│ return   │           │ return   │            │ return   │
└──────────┘           ├──────────┤            ├──────────┤
                       │ main     │            │ a=5      │
                       │ locals   │            │ b=10     │
                       │ return   │            │ return   │
                       └──────────┘            ├──────────┤
                                               │ main     │
                                               │ locals   │
                                               │ return   │
                                               └──────────┘
                                               Growing down →

Key operations:

push(x): Add element to top, O(1)
pop(): Remove and return top element, O(1)
peek()/top(): View top element without removing, O(1)

How is a stack implemented?

Stacks can be implemented using arrays or linked lists:

#include <stack>

std::stack<double> s;
s.push(3.14);        // Add to top
s.push(2.71);

double top = s.top();  // View top: 2.71
s.pop();             // Remove top

// Custom implementation with vector
std::vector<double> stack;
stack.push_back(3.14);    // push
double x = stack.back();  // peek
stack.pop_back();         // pop

Where are stacks used?

Function call management (call stack)
Expression evaluation (postfix notation)
Backtracking algorithms (DFS, maze solving)
Undo mechanisms in editors
Recursive algorithm simulation

What is the function call stack?

When functions are called, the CPU uses a stack to store:

Return addresses
Function parameters
Local variables

This enables function calls and returns in LIFO order. See the memory layout section in the C++ documentation for visualization.

Queue (FIFO)

What is a queue?

A First-In-First-Out (FIFO) data structure. Elements are added at the back and removed from the front. Think of a line at a store—first person in line is served first.

Queue operations visualization:

Initial state:
Front                                          Back
┌───────────────────────────────────────────────┐
│                   Empty                        │
└───────────────────────────────────────────────┘

After enqueue(3.14):
Front                                          Back
┌───────────────────────────────────────────────┐
│  3.14                                          │
└───────────────────────────────────────────────┘

After enqueue(2.71):
Front                                          Back
┌───────────────────────────────────────────────┐
│  3.14  →  2.71                                 │
└───────────────────────────────────────────────┘

After enqueue(1.41):
Front                                          Back
┌───────────────────────────────────────────────┐
│  3.14  →  2.71  →  1.41                        │
└───────────────────────────────────────────────┘

After dequeue() - removes 3.14:
Front                                          Back
┌───────────────────────────────────────────────┐
│  2.71  →  1.41                                 │
└───────────────────────────────────────────────┘

After dequeue() - removes 2.71:
Front                                          Back
┌───────────────────────────────────────────────┐
│  1.41                                          │
└───────────────────────────────────────────────┘

Circular queue implementation (efficient use of array):

Array-based circular queue (capacity=5):

           front=0, back=0 (empty)
     ┌───┬───┬───┬───┬───┐
     │   │   │   │   │   │
     └───┴───┴───┴───┴───┘
      0   1   2   3   4

After enqueue(A), enqueue(B), enqueue(C):
           front=0, back=3
     ┌───┬───┬───┬───┬───┐
     │ A │ B │ C │   │   │
     └───┴───┴───┴───┴───┘
      ↑           ↑
    front        back

After dequeue() twice, then enqueue(D), enqueue(E), enqueue(F):
           front=2, back=0 (wraps around)
     ┌───┬───┬───┬───┬───┐
     │ F │   │ C │ D │ E │
     └───┴───┴───┴───┴───┘
      ↑       ↑
    back    front

Key operations:

enqueue(x) or push(x): Add element to back, O(1)
dequeue() or pop(): Remove element from front, O(1)
front(): View front element, O(1)

How is a queue implemented?

#include <queue>

std::queue<double> q;
q.push(3.14);        // Add to back
q.push(2.71);

double front = q.front();  // View front: 3.14
q.pop();             // Remove from front

Where are queues used?

Task scheduling (CPU scheduling, print queues)
Breadth-first search (BFS) in graphs
Buffering (data streams, I/O)
Simulation of real-world queues
Message passing systems

Priority queue (heap)

What is a priority queue?

A data structure where each element has a priority, and elements are removed in priority order (not insertion order). Typically implemented using a binary heap.

#include <queue>

// Max heap (largest first)
std::priority_queue<double> max_heap;
max_heap.push(3.14);
max_heap.push(2.71);
max_heap.push(5.0);

double largest = max_heap.top();  // 5.0
max_heap.pop();

// Min heap (smallest first)
std::priority_queue<double, std::vector<double>, std::greater<double>> min_heap;

What is a binary heap?

A complete binary tree where each parent node has a value greater than or equal to (max heap) or less than or equal to (min heap) its children.

Max heap tree visualization:

              100
             /   \
           80     90
          /  \   /  \
        70   60 85  75
       / \   /
     50  40 55

Properties:
- Parent ≥ all descendants (max heap)
- Complete binary tree (filled left to right)
- Root contains maximum value

Min heap tree visualization:

               10
             /    \
           20      15
          /  \    /  \
        30   25  40  18
       / \
     50  45

Properties:
- Parent ≤ all descendants (min heap)
- Root contains minimum value

Array representation of heap:

Max heap tree:        100
                     /   \
                   80     90
                  /  \   /  \
                70   60 85  75

Stored as array:
Index:  0    1   2   3   4   5   6
     ┌────┬───┬───┬───┬───┬───┬───┐
Array:│100 │80 │90 │70 │60 │85 │75 │
     └────┴───┴───┴───┴───┴───┴───┘

Relationships:
- Parent of i: (i-1)/2
- Left child of i: 2i+1
- Right child of i: 2i+2

Example: Element at index 1 (80)
- Parent: (1-1)/2 = 0 → 100
- Left child: 2(1)+1 = 3 → 70
- Right child: 2(1)+2 = 4 → 60

Insert operation (bubble up):

Insert 95 into max heap:

Step 1: Add at end             Step 2: Compare with parent (90)
          100                            100
         /   \                          /   \
       80     90                      80     90
      /  \   /  \                    /  \   /  \
    70   60 85  75                70   60 85  75
   /                              /
 95                             95
                                95 > 90, swap!

Step 3: Swap and continue       Step 4: Compare with parent (100)
          100                            100
         /   \                          /   \
       80     95                      80     95
      /  \   /  \                    /  \   /  \
    70   60 85  75                70   60 85  75
   /                              /
 90                             90
                                95 < 100, done!

Extract max operation (bubble down):

Remove root (100):

Step 1: Replace root with last    Step 2: Compare with children
         75                                75
        /  \                              /  \
      80    95                          80    95
     /  \  /  \                        /  \  /  \
   70  60 85  90                    70  60 85  90

75 < max(80, 95), swap with 95!

Step 3: Continue bubbling down    Step 4: Done (heap property restored)
         95                                95
        /  \                              /  \
      80    75                          80    90
     /  \  /  \                        /  \  /  \
   70  60 85  90                    70  60 85  75

75 < 90, swap!                     75 > 85, done!

What are the time complexities for heap operations?

Insert: O(log n) (add at end, then bubble up)
Extract max/min: O(log n) (remove root, move last to root, bubble down)
Peek max/min: O(1) (just view root)
Build heap from array: O(n) (heapify algorithm)

Where are heaps used?

Dijkstra’s shortest path algorithm
Heap sort (O(n log n) sorting)
Finding k largest/smallest elements
Job scheduling with priorities
Huffman coding for compression
Event-driven simulation

Practical example in scientific computing

// Find k smallest distances in large dataset
std::priority_queue<double> max_heap;  // Keep k smallest
int k = 10;

for (double distance : large_dataset) {
    if (max_heap.size() < k) {
        max_heap.push(distance);
    } else if (distance < max_heap.top()) {
        max_heap.pop();
        max_heap.push(distance);
    }
}

// max_heap now contains k smallest distances

Hash tables

What is a hash table?

A data structure that maps keys to values using a hash function. Provides average O(1) insertion, deletion, and lookup.

Hash table visualization:

Hash function: hash(key) % table_size

Keys to insert: "pi"→3.14, "e"→2.71, "phi"→1.61, "sqrt2"→1.41

Step 1: Calculate hash values
hash("pi") = 17 → 17 % 8 = 1
hash("e") = 5 → 5 % 8 = 5
hash("phi") = 22 → 22 % 8 = 6
hash("sqrt2") = 29 → 29 % 8 = 5  (collision!)

Hash table (size=8):
Index  │ Key     │ Value
───────┼─────────┼───────
  0    │         │
  1    │  "pi"   │ 3.14
  2    │         │
  3    │         │
  4    │         │
  5    │  "e"    │ 2.71  ← collision at index 5
  6    │  "phi"  │ 1.61
  7    │         │

Collision handling - Chaining:

Using linked lists at each index:

Index  │ Bucket (linked list)
───────┼─────────────────────────────
  0    │ → NULL
  1    │ → ["pi", 3.14] → NULL
  2    │ → NULL
  3    │ → NULL
  4    │ → NULL
  5    │ → ["e", 2.71] → ["sqrt2", 1.41] → NULL
       │    ↑              ↑
       │    First          Chained (collision)
  6    │ → ["phi", 1.61] → NULL
  7    │ → NULL

Collision handling - Open Addressing (Linear Probing):

When collision occurs, probe next slots:

Try to insert "sqrt2" at index 5 (occupied):

Index  │ Key     │ Value   │ Probing
───────┼─────────┼─────────┼─────────
  0    │         │         │
  1    │  "pi"   │ 3.14    │
  2    │         │         │
  3    │         │         │
  4    │         │         │
  5    │  "e"    │ 2.71    │ Occupied, try 5+1
  6    │  "phi"  │ 1.61    │ Occupied, try 5+2
  7    │ "sqrt2" │ 1.41    │ Empty! Insert here

Hash table lookup visualization:

Looking up "sqrt2":

Calculate hash: hash("sqrt2") % 8 = 5
Check index 5: "e" ≠ "sqrt2", continue
Check index 6: "phi" ≠ "sqrt2", continue
Check index 7: "sqrt2" = "sqrt2", found! Return 1.41

#include <unordered_map>

std::unordered_map<std::string, double> constants;
constants["pi"] = 3.14159;
constants["e"] = 2.71828;

double pi = constants["pi"];        // O(1) lookup
bool exists = constants.count("e"); // Check existence

How does a hash table work?

Hash function converts key to integer (hash code)
Hash code mapped to array index (hash code % table_size)
Value stored at that index
Collisions handled by chaining or open addressing

What are hash collisions?

When two different keys hash to the same index. Common solutions:

Chaining: Store multiple elements at same index using linked list
Open addressing: Find next available slot (linear probing, quadratic probing)

Where are hash tables used?

Database indexing
Caching (memoization)
Symbol tables in compilers
Counting frequencies
Detecting duplicates
Implementing dictionaries and sets

Trees

Binary search tree (BST)

What is a binary search tree?

A binary tree where for each node, all left descendants have smaller values and all right descendants have larger values. Enables efficient searching.

BST structure visualization:

Insert sequence: 50, 30, 70, 20, 40, 60, 80

                50
              /    \
            30      70
           /  \    /  \
         20   40  60  80

BST property: Left < Node < Right
- 20, 30 < 50 < 60, 70, 80
- 20 < 30 < 40
- 60 < 70 < 80

BST search visualization:

Search for 60:

Step 1: Start at root        Step 2: 60 > 50, go right
      50                           50
    /    \                       /    \
  30      70 ✓                 30      70 ✓
 /  \    /  \                 /  \    /  \
20  40  60  80               20  40  60  80

Step 3: 60 < 70, go left     Step 4: Found!
      50                           50
    /    \                       /    \
  30      70                   30      70
 /  \    /  \                 /  \    /  \
20  40  60✓ 80               20  40  60✓ 80
        ↑                             ↑
     Found!                        Return 60

Comparisons: 3 (log₂ 7 ≈ 2.8)

BST insert visualization:

Insert 55 into existing BST:

Step 1: Start at root        Step 2: 55 > 50, go right
      50                           50
    /    \                       /    \
  30      70                   30      70
 /  \    /  \                 /  \    /  \
20  40  60  80               20  40  60  80

Step 3: 55 < 70, go left     Step 4: 55 < 60, go left
      50                           50
    /    \                       /    \
  30      70                   30      70
 /  \    /  \                 /  \    /  \
20  40  60  80               20  40  60  80
                                   /
                                 NULL

Step 5: Insert at NULL position
      50
    /    \
  30      70
 /  \    /  \
20  40  60  80
       /
     55 ← New node inserted

BST in-order traversal (gives sorted sequence):

      50
    /    \
  30      70
 /  \    /  \
20  40  60  80

In-order: Left → Node → Right
Sequence: 20, 30, 40, 50, 60, 70, 80 (sorted!)

BST delete visualization:

Delete 30 (node with two children):

Original tree:                Find successor (leftmost of right subtree):
      50                            50
    /    \                        /    \
  30✗     70                    30✗     70
 /  \    /  \                  /  \    /  \
20  40  60  80                20  40← 60  80
                                   ↑
                                Successor

Replace 30 with 40:           Remove 40 from original position:
      50                            50
    /    \                        /    \
  40      70                    40      70
 /  \    /  \                  /      /  \
20  40  60  80                20     60  80

Balanced vs unbalanced BST:

Balanced BST:                Unbalanced BST (worst case):
      50                           10
    /    \                           \
  30      70                          20
 /  \    /  \                           \
20  40  60  80                          30
                                          \
Height: 3                                 40
Search: O(log n)                            \
                                            50
                                Height: 5
                                Search: O(n)

Properties:

Left subtree < node < right subtree
In-order traversal gives sorted sequence
Average O(log n) for search, insert, delete
Worst case O(n) if tree becomes unbalanced

When is a BST useful?

Maintaining sorted data with frequent insertions
Range queries (find all values between x and y)
Finding predecessor/successor
Implementation of sets and maps

Balanced trees

What is a balanced tree?

A tree that maintains approximately equal height in left and right subtrees, ensuring O(log n) operations even in worst case. Examples: AVL trees, Red-Black trees.

C++ std::map and std::set use balanced trees (typically Red-Black trees).

#include <map>
#include <set>

std::map<std::string, double> sorted_map;  // Keys kept sorted
sorted_map["b"] = 2.0;
sorted_map["a"] = 1.0;
sorted_map["c"] = 3.0;

// Iterate in sorted order: a, b, c
for (const auto& [key, value] : sorted_map) {
    std::cout << key << ": " << value << std::endl;
}

Graphs

What is a graph?

A collection of nodes (vertices) connected by edges. Graphs model relationships, networks, and connectivity.

Graph types visualization:

Undirected graph:              Directed graph (Digraph):

    A — B — C                      A → B → C
    |   |   |                      ↑   ↓   ↓
    D — E — F                      D ← E → F

Edges have no direction         Edges have direction (arrows)

Weighted graph:                 Complete graph (all connected):

    A --5-- B                       A
   /|       |\                     /|\
 2/ |3     1| \4                  / | \
 /  |       |  \                 /  |  \
D   C --6-- E   F               B---C---D

Edges have weights/costs        Every node connected to all others

Graph representations:

Example graph:
    A — B
    |   |
    C — D

Adjacency Matrix (2D array):

      A  B  C  D
    ┌──────────────┐
  A │ 0  1  1  0   │
  B │ 1  0  0  1   │
  C │ 1  0  0  1   │
  D │ 0  1  1  0   │
    └──────────────┘

  matrix[i][j] = 1 if edge exists, 0 otherwise
  Space: O(n²), Edge check: O(1)

Adjacency List (array of lists):

  A: [B, C]
  B: [A, D]
  C: [A, D]
  D: [B, C]

  Space: O(n + m), Edge check: O(degree)
  (n = nodes, m = edges)

How are graphs represented?

Adjacency matrix (2D array):

// For n nodes
std::vector<std::vector<int>> adj_matrix(n, std::vector<int>(n, 0));
adj_matrix[i][j] = 1;  // Edge from i to j

// Space: O(n²), Edge check: O(1)

Adjacency list (array of lists):

std::vector<std::vector<int>> adj_list(n);
adj_list[i].push_back(j);  // Edge from i to j

// Space: O(n + m), Edge check: O(degree)

Weighted graph with adjacency list:

Graph with weights:
    A --5-- B
   /|       |\
 2/ |3     1| \4
 /  |       |  \
D   C --6-- E   F

Adjacency list representation:

  A: [(B,5), (C,3), (D,2)]
  B: [(A,5), (E,1), (F,4)]
  C: [(A,3), (E,6)]
  D: [(A,2)]
  E: [(B,1), (C,6)]
  F: [(B,4)]

// Weighted graph
struct Edge {
    int dest;
    double weight;
};

std::vector<std::vector<Edge>> graph(n);
graph[u].push_back({v, weight});  // Edge from u to v

Graph traversal visualization:

Breadth-First Search (BFS):     Depth-First Search (DFS):
Start at A, visit level by level   Start at A, go deep first

    A (0)                           A (1)
   / \                             / \
 B(1) C(1)                       B(2) C(5)
 /     \                         /     \
D(2)   E(2)                    D(3)   E(6)
       /                        |
     F(3)                      F(4)

Visit order: A,B,C,D,E,F        Visit order: A,B,D,F,E,C
Uses Queue                      Uses Stack (or recursion)

Real-world graph example:

Molecular structure (benzene ring):

    C₁ — C₂
   /      \
 C₆        C₃
   \      /
    C₅ — C₄

Vertices: Carbon atoms
Edges: Chemical bonds
Properties: Ring structure, all nodes have degree 2

Where are graphs used in scientific computing?

Molecular structures (atoms as nodes, bonds as edges)
Neural networks (nodes as neurons, edges as connections)
Finite element meshes (nodes as points, edges as connections)
Dependency graphs (task scheduling)
Network analysis (social networks, protein interactions)

Choosing the right data structure

General guidelines

For random access by index:

Use std::vector or arrays

For frequent insertions/deletions at ends:

Use std::deque (double-ended queue)

For LIFO access (Last-In-First-Out):

Use std::stack

For FIFO access (First-In-First-Out):

Use std::queue

For priority-based access:

Use std::priority_queue (heap)

For key-value lookups:

Use std::unordered_map (hash table) for average O(1) Use std::map (balanced tree) for sorted keys

For sorted data with frequent updates:

Use std::set or std::map

For large datasets in scientific computing:

Vectors for contiguous data (best cache performance)
Sparse matrices for mostly-zero data
Heaps for k-largest/k-smallest problems
Hash tables for fast lookups without order

Performance considerations

Cache performance matters:

Modern CPUs are 100x faster at accessing contiguous memory (arrays, vectors) than scattered memory (linked lists, trees). For scientific computing, prefer arrays and vectors.

Memory overhead:

Array/vector: minimal overhead
Linked list: 8-16 bytes per element (pointer overhead)
Hash table: typically 2x memory for good performance
Tree: 16-24 bytes per element (2-3 pointers)

Trade-offs:

Fast insertion vs fast access
Memory usage vs time complexity
Simplicity vs flexibility

Scientific computing applications

Data Structure	Scientific Use Cases
Arrays	Fixed-size atomic coordinates, lattice parameters, transformation matrices, grid points in finite difference methods, wave function coefficients
Vectors	Growing trajectory data, time series measurements, molecular dynamics positions over time, adaptive mesh refinement, particle tracking
Linked Lists	Sparse matrix row storage, event chains in molecular dynamics, undo/redo operations in simulation software
Stacks	Recursion in tree traversals, backtracking algorithms, depth-first search in molecular conformations, expression parsing
Queues	Breadth-first search in crystal structures, task scheduling in parallel simulations, buffering data streams from instruments
Heaps (Priority Queues)	Event-driven molecular dynamics, Monte Carlo sampling, A* pathfinding in energy landscapes, Dijkstra’s algorithm for phonon dispersion
Hash Tables	Atom type lookups by atomic number, symmetry operation caching, memoization in quantum chemistry, unique structure identification, element property databases
Binary Search Trees	Range queries in spatial databases, ordered traversal of energy levels, interval trees for overlap detection
Graphs	Crystal structure connectivity, molecular bond networks, finite element meshes, phonon dispersion relations, reaction pathways, protein folding landscapes

Cache-friendly patterns for performance

Why cache locality matters

Modern CPUs access memory in cache lines (typically 64 bytes). Accessing data sequentially is 100x faster than random access due to cache hits.

CPU hierarchy (access time):

L1 Cache:  ~1 ns     (32-64 KB per core)
L2 Cache:  ~3 ns     (256-512 KB per core)
L3 Cache:  ~12 ns    (8-32 MB shared)
RAM:       ~100 ns   (8-64 GB)

Random access breaks cache locality!

Structure of arrays vs array of structures

For numerical computations with SIMD vectorization:

// Array of Structures (AoS) - BAD for vectorization
struct Particle {
    double x, y, z;      // position
    double vx, vy, vz;   // velocity
    double mass;
};
std::vector<Particle> particles(1000000);

// To update all x positions, CPU must skip over y, z, vx, vy, vz, mass
// Cannot efficiently vectorize - data not contiguous!
for (auto& p : particles) {
    p.x += p.vx * dt;  // Strided memory access
}

// Structure of Arrays (SoA) - GOOD for vectorization
struct ParticleSystem {
    std::vector<double> x, y, z;          // positions
    std::vector<double> vx, vy, vz;       // velocities
    std::vector<double> mass;
};
ParticleSystem particles;
particles.x.resize(1000000);
particles.vx.resize(1000000);

// All x positions are contiguous - perfect for SIMD!
// CPU can process 4-8 values simultaneously with AVX/AVX512
for (size_t i = 0; i < particles.x.size(); ++i) {
    particles.x[i] += particles.vx[i] * dt;  // Contiguous memory
}

Performance comparison for 10^6 particles

AoS: 50 ms (cache misses, no vectorization)
SoA: 5 ms (cache-friendly, 8x SIMD speedup)
Speedup: 10x faster with SoA!

Contiguous memory access patterns

// GOOD: Row-major access (cache-friendly)
std::vector<std::vector<double>> matrix(1000, std::vector<double>(1000));

for (size_t i = 0; i < 1000; ++i) {
    for (size_t j = 0; j < 1000; ++j) {
        matrix[i][j] = compute(i, j);  // Sequential in memory
    }
}

// BAD: Column-major access (cache-unfriendly)
for (size_t j = 0; j < 1000; ++j) {
    for (size_t i = 0; i < 1000; ++i) {
        matrix[i][j] = compute(i, j);  // Jumps 1000 elements each time!
    }
}

// BETTER: Use flat 1D arrays for 2D/3D data
std::vector<double> matrix(1000 * 1000);  // Contiguous allocation

for (size_t i = 0; i < 1000; ++i) {
    for (size_t j = 0; j < 1000; ++j) {
        size_t idx = i * 1000 + j;  // Row-major indexing
        matrix[idx] = compute(i, j);
    }
}

Key principles for scientific computing

Prefer arrays/vectors over linked structures for numerical data
Use SoA layout when processing same property across many objects
Access memory sequentially in nested loops (row-major for C++)
Align data to cache line boundaries (64 bytes) for optimal performance
Minimize pointer chasing - each indirection costs ~100 cycles

GPU and parallel computing considerations

GPU-friendly data structures

Data Structure	GPU Compatibility	Considerations
Arrays	✓ Excellent	Perfect for coalesced memory access, native CUDA support, minimal overhead
Vectors (std::vector)	✓ Good	Copy to device with `cudaMemcpy`, use raw pointers in kernels, manage capacity carefully
Hash Tables	⚠ Challenging	Requires special GPU implementations (cuco library, custom kernels), atomic operations for concurrent access
Linked Lists	✗ Poor	Pointer chasing kills parallelism, each thread diverges, terrible memory access pattern
Trees (BST, Heap)	✗ Poor	Similar issues to linked lists, use GPU-specific implementations (thrust::priority_queue)
Graphs	⚠ Moderate	Use CSR (Compressed Sparse Row) format, adjacency arrays work well, avoid adjacency lists

Memory coalescing patterns

GPUs achieve peak performance when threads in a warp (32 threads) access contiguous memory:

// GOOD: Coalesced access (threads access adjacent elements)
__global__ void add_vectors(double* a, double* b, double* c, int n) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx < n) {
        c[idx] = a[idx] + b[idx];  // Thread 0: a[0], Thread 1: a[1], etc.
    }
}
// All 32 threads in warp access consecutive memory - FAST!

// BAD: Strided access (threads skip elements)
__global__ void transpose_naive(double* in, double* out, int width) {
    int x = blockIdx.x * blockDim.x + threadIdx.x;
    int y = blockIdx.y * blockDim.y + threadIdx.y;

    // Reading is coalesced, but writing is strided!
    out[x * width + y] = in[y * width + x];  // Thread 0: out[0], out[width], out[2*width]...
}
// Threads in warp write to memory locations 'width' apart - SLOW!

Structure of arrays on GPU

// Host code - SoA layout
struct ParticleSystemGPU {
    double *x, *y, *z;       // Device pointers
    double *vx, *vy, *vz;
    int num_particles;
};

// Allocate device memory
cudaMalloc(&particles.x, n * sizeof(double));
cudaMalloc(&particles.vx, n * sizeof(double));

// Kernel - perfect for coalescing!
__global__ void update_positions(double* x, double* vx, double dt, int n) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx < n) {
        x[idx] += vx[idx] * dt;  // All threads in warp access x[0], x[1], x[2]...
    }
}

Parallel algorithm patterns

// Pattern 1: Map (element-wise operations)
// Perfect for GPU: O(n) work, no dependencies
__global__ void square_elements(double* data, int n) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx < n) data[idx] = data[idx] * data[idx];
}

// Pattern 2: Reduce (sum, max, min)
// Good for GPU with shared memory optimization
__global__ void reduce_sum(double* input, double* output, int n) {
    __shared__ double shared[256];
    int tid = threadIdx.x;
    int idx = blockIdx.x * blockDim.x + threadIdx.x;

    shared[tid] = (idx < n) ? input[idx] : 0.0;
    __syncthreads();

    // Tree reduction in shared memory
    for (int s = blockDim.x / 2; s > 0; s >>= 1) {
        if (tid < s) shared[tid] += shared[tid + s];
        __syncthreads();
    }

    if (tid == 0) output[blockIdx.x] = shared[0];
}

// Pattern 3: Scan (prefix sum)
// Challenging but powerful for GPU
// Use thrust::exclusive_scan() or CUB library

When to use CPU vs GPU

Use GPU when:

Processing large arrays (> 10^5 elements)
Element-wise operations (map, element-wise math)
Regular memory access patterns
High arithmetic intensity (many FLOPs per memory access)
Data parallel algorithms (same operation on different data)

Use CPU when:

Small data sets (< 10^4 elements)
Irregular memory access (trees, graphs, hash tables)
Lots of branching and conditionals
Sequential dependencies
Frequent host-device transfers (overhead dominates)

Practical example for ptychography reconstruction

Ptychography involves computing exit waves (probe × transmission) across many probe positions, followed by FFT to generate diffraction patterns. This is perfectly suited for GPU acceleration with proper data structures.

// CPU: Use vectors for probe position management
// Irregular scanning patterns, adaptive step sizes
struct PtychographySetup {
    std::vector<double> probe_x;        // Probe positions (may be irregular)
    std::vector<double> probe_y;
    std::vector<int> measurement_order; // Adaptive ordering
    int num_positions;
    int probe_size;  // e.g., 256x256
};

void generate_probe_positions(PtychographySetup& setup, double step_size) {
    // Complex logic: spiral scan, adaptive sampling, overlapping constraints
    // May skip regions, vary step size based on convergence
    for (int i = 0; i < setup.num_positions; ++i) {
        // Branching logic, unpredictable access patterns
        if (needs_refinement(i)) {
            add_extra_positions(setup, i);
        }
    }
}

// GPU: Use SoA arrays for exit wave and FFT computation
// Regular, predictable, massively parallel across all probe positions
#include <cufft.h>

struct ProbeDataGPU {
    cufftDoubleComplex* probe;        // Incident probe (256×256 complex)
    cufftDoubleComplex* transmission; // Object transmission function
    cufftDoubleComplex* exit_waves;   // probe × transmission for each position
    cufftDoubleComplex* diffraction;  // FFT results for each position
    double* probe_positions_x;        // Flattened probe coordinates
    double* probe_positions_y;
    int num_positions;
    int probe_width, probe_height;
};

// Kernel: Compute exit wave = probe × transmission at each position
__global__ void compute_exit_waves(
    cufftDoubleComplex* probe,
    cufftDoubleComplex* transmission,
    cufftDoubleComplex* exit_waves,
    double* probe_x, double* probe_y,
    int num_positions, int probe_width, int probe_height,
    int transmission_width) {

    int pos_idx = blockIdx.z;  // Which probe position
    int y = blockIdx.y * blockDim.y + threadIdx.y;
    int x = blockIdx.x * blockDim.x + threadIdx.x;

    if (pos_idx >= num_positions || x >= probe_width || y >= probe_height) return;

    // Calculate position in transmission function
    int trans_x = (int)(probe_x[pos_idx]) + x;
    int trans_y = (int)(probe_y[pos_idx]) + y;

    // Flattened array indices
    int probe_idx = y * probe_width + x;
    int trans_idx = trans_y * transmission_width + trans_x;
    int exit_idx = pos_idx * probe_width * probe_height + probe_idx;

    // Complex multiplication: exit = probe × transmission
    // Perfect for coalescing - all threads in warp access adjacent memory
    cufftDoubleComplex p = probe[probe_idx];
    cufftDoubleComplex t = transmission[trans_idx];

    exit_waves[exit_idx].x = p.x * t.x - p.y * t.y;  // Real part
    exit_waves[exit_idx].y = p.x * t.y + p.y * t.x;  // Imaginary part
}

// Use cuFFT for batch FFT across all probe positions
void compute_diffraction_patterns(ProbeDataGPU& data) {
    cufftHandle plan;
    int n[2] = {data.probe_height, data.probe_width};
    int batch = data.num_positions;

    // Create batch FFT plan - process all positions in parallel
    cufftPlanMany(&plan, 2, n, NULL, 1, 0, NULL, 1, 0,
                  CUFFT_Z2Z, batch);

    // Execute FFT on all exit waves simultaneously
    // GPU processes 100+ probe positions in parallel!
    cufftExecZ2Z(plan, data.exit_waves, data.diffraction,
                 CUFFT_FORWARD);

    cufftDestroy(plan);
}

// Kernel: Compute error metric (compare with measured intensities)
__global__ void compute_error(
    cufftDoubleComplex* diffraction,
    double* measured_intensity,
    double* error,
    int num_positions, int probe_size) {

    int pos_idx = blockIdx.y;
    int pixel_idx = blockIdx.x * blockDim.x + threadIdx.x;

    if (pos_idx >= num_positions || pixel_idx >= probe_size) return;

    int idx = pos_idx * probe_size + pixel_idx;

    // Compute intensity: |FFT|²
    double intensity = diffraction[idx].x * diffraction[idx].x +
                      diffraction[idx].y * diffraction[idx].y;

    // Error metric (e.g., least squares)
    double diff = intensity - measured_intensity[idx];
    error[idx] = diff * diff;

    // All threads process different probe positions in parallel
}

Performance breakdown

Probe positions: 100-1000 positions
Probe size: 256×256 complex numbers (1 MB per position)
Total data: 100 MB - 1 GB of exit waves
GPU FFT: ~1 ms per batch of 100 positions (cuFFT is highly optimized)
CPU FFT: ~100 ms (FFTW single-threaded)
Speedup: 100x with GPU + batch FFT

Why this works well on GPU

Arrays of complex numbers - perfect memory layout for coalescing
Same operation across all positions - data parallelism
Batch FFT - cuFFT processes multiple FFTs simultaneously
No branching - all threads follow same code path
High arithmetic intensity - complex multiplication + FFT (many FLOPs per memory access)

Key takeaways

Arrays and vectors are the only truly GPU-friendly structures
Memory coalescing is critical for GPU performance (32x speedup possible)
SoA layout enables vectorization on both CPU (SIMD) and GPU (coalescing)
Minimize host-device transfers - they’re expensive (~10 GB/s vs 1000 GB/s device memory)
Hybrid approach: Use CPU for irregular algorithms, GPU for regular numerical kernels