C++

Nov 7, 2025 - This is a learning journal for C++ questions focused on scientific computing. I used C++ and C as an undergrad but never built scientific applications. Now, with GPU programming and high performance computing, I want to revisit C++ for scientific computing.

Purpose of this documentation: This page is designed as a practical reference to help you read and write C++ code faster for scientific computing. It focuses on common patterns, syntax, and concepts you’ll encounter in real codebases, with emphasis on floating-point arithmetic and numerical computing.

The following is sourced from various online references, Copilot, and my own experiments. I do not claim originality for any of the content here.

Quick reference for & and * symbols

Context

Code Example

Meaning of &

Type of Thing

Behavior / Effect

In a type declaration

int& ref = x;

Declares a reference

Reference

ref becomes another name for x (no copy). Changing ref changes x.

In an expression

int* ptr = &x;

Address-of operator

Pointer

&x gets the memory address of x, stored in ptr.

Using pointer dereference

*ptr = 10;

* means “value at address”

Pointer

Combining both

int& alias = *ptr;

*ptr gives the value of x, alias refers to it

Reference + Pointer

alias becomes another name for whatever ptr points to.

In a function parameter

void foo(int& n)

Passed by reference

Reference

Function uses the original variable, not a copy.

In a copy constructor

Tracker(const Tracker& other)

Copy parameter safely and efficiently

Reference

Avoids recursion (no extra copies).

Pointer in practice

To get the device number in CUDA, you run this code:

int device;
cudaGetDevice(&device);

The & means “address of”. We want to pass the address of device so that cudaGetDevice can write the result into that memory location. Now, if you look at the function signature for cudaGetDevice (simplified):

cudaError_t cudaGetDevice(int* device);

// Ask the driver (lower-level API) which GPU is bound to this thread
int currentDevice = driver_get_current_device();  // e.g. returns 1

// Copy that value into the caller’s memory
*device = currentDevice;

return cudaSuccess;

Now, the parameter requires int* device, it is the pointer type that holds the address of an integer that the user has entered. Inside the function, we use *device to dereference the pointer and write the value into that memory location.

Why const is needed for member functions

The const keyword after a member function declaration indicates that the function will not modify the object’s state. This is important for const-correctness and enables the function to be called on const objects.

class Rectangle {
    int width, height;
public:
    void setWidth(int w) { width = w; }       // modifies state (non-const)
    int area() const { return width * height; }  // doesn't modify state (const)
};

Key points

When you write int area() const, the const applies to the hidden this pointer inside the method.

C++ secretly treats member functions like this:

int area(Rectangle* const this) { return this->width * this->height; }

But if you mark the function as const, it becomes:

int area(const Rectangle* const this) { return this->width * this->height; }

Now this points to a const Rectangle, meaning the compiler won’t let you modify any member variables inside that function. This ensures:

  • The function can be called on const objects

  • The function is guaranteed not to modify the object’s state

  • Better code safety and clearer intent

In contrast with Python

Python makes self explicit:

class Rectangle:
    def area(self):
        return self.width * self.height

C++ makes this implicit - it’s there, you just don’t have to say it. When you write width in a C++ member function, the compiler automatically treats it as this->width.

Numerical data types for scientific computing

Type

Size

Precision

Scientific use

float

4 bytes (32-bit)

~7 decimal digits

GPU computing (faster), machine learning, graphics

double

8 bytes (64-bit)

~15-16 decimal digits

Default for scientific computing, general numerical work

long double

16 bytes (128-bit)

~18-19 decimal digits

High-precision calculations, rarely needed

std::complex<float>

8 bytes

Same as float

Quantum mechanics, signal processing (GPU)

std::complex<double>

16 bytes

Same as double

Quantum mechanics, signal processing (default)

int

4 bytes

Exact integers

Loop counters, array indices only

Key points for scientific computing:

  • Always use ``double`` by default unless you have a specific reason (GPU memory, performance)

  • Use ``float`` for GPU computing where memory bandwidth matters

  • Use ``std::complex<double>`` for complex numbers in quantum mechanics, FFTs, signal processing

  • Precision matters: float accumulates error faster in iterative algorithms

Understanding floating-point precision

Floating-point numbers have limited precision:

float x = 1.0f / 3.0f;      // 0.333333... (7 digits)
double y = 1.0 / 3.0;       // 0.333333333333333... (15 digits)

// Accumulated error example
float sum_f = 0.0f;
double sum_d = 0.0;

for (int i = 0; i < 1000000; i++) {
    sum_f += 0.1f;  // Accumulates rounding error
    sum_d += 0.1;   // Better, but still not perfect
}

std::cout << sum_f << std::endl;  // ~99994.5 (wrong!)
std::cout << sum_d << std::endl;  // ~100000.0 (closer)

Rule of thumb: For scientific computing with many iterations or accumulations, prefer double. Use float only when profiling shows performance benefits.

Complex numbers in C++

For quantum mechanics, signal processing, and Fourier transforms:

#include <complex>
#include <iostream>

// Declaration and initialization
std::complex<double> z1(3.0, 4.0);        // 3 + 4i
std::complex<double> z2 = {1.0, -2.0};    // 1 - 2i
std::complex<float> z3(2.0f, 1.0f);       // Use float for GPU

// Basic operations
auto z_sum = z1 + z2;                     // (4, 2)
auto z_prod = z1 * z2;                    // (11, -2)
auto z_conj = std::conj(z1);              // 3 - 4i (complex conjugate)

// Access real and imaginary parts
double real_part = z1.real();             // 3.0
double imag_part = z1.imag();             // 4.0

// Magnitude and phase
double magnitude = std::abs(z1);          // 5.0 = sqrt(3²+4²)
double phase = std::arg(z1);              // ~0.927 radians

// Exponential form (Euler's formula: e^(iθ) = cos(θ) + i*sin(θ))
std::complex<double> z_polar = std::polar(5.0, 0.927);  // 5*e^(i*0.927)

Common in scientific computing:

// Wave functions in quantum mechanics
std::complex<double> psi(0.707, 0.707);  // normalized state

// Fourier transform (simplified)
std::vector<std::complex<double>> fft_result(1024);

// Complex exponentials for signal processing
double omega = 2.0 * M_PI * 440.0;  // 440 Hz
std::complex<double> signal = std::exp(std::complex<double>(0.0, omega * t));

Array initialization

Understanding various initialization methods is essential because different codebases prefer different styles. Some projects use classic C-style initialization, while others adopt modern C++11 brace initialization. Being familiar with all approaches helps you read, understand, and contribute to diverse codebases more effectively.

1. Classic C-style

double arr[3] = {1.0, 2.5, 3.14};

This is the most common way. Specify the size and the list.

If you make the list shorter than the size, the remaining elements are initialized to zero:

double arr[5] = {1.0, 2.5};  // becomes {1.0, 2.5, 0.0, 0.0, 0.0}

If you omit the size, it is deduced from the initializer list:

double arr[] = {1.0, 2.5, 3.14}; // size is 3

2. Default (zero) initialization

double arr[3] = {};     // all elements are 0.0
double arr2[3] = {0};   // also all elements are 0.0

3. C++11 uniform (brace) initialization

double arr[3]{1.0, 2.5, 3.14};   // same as = {1.0, 2.5, 3.14}, but modern style
double arr2[]{1.0, 2.5, 3.14};   // size deduced automatically
double arr3[5]{};                // all zeros

4. Using memset (low-level, C-style)

double arr[3];
std::memset(arr, 0, sizeof(arr));  // set all bytes to 0

Note: This does not support non-zero initialization directly. Use with caution for floating-point - it’s safer to use loop initialization or brace initialization.

5. Runtime initialization (loops or algorithms)

double arr[3];
for (int i = 0; i < 3; ++i)
    arr[i] = static_cast<double>(i) + 1.5;   // {1.5, 2.5, 3.5}

Uninitialized arrays

What happens with double arr[3]; without initialization within a function?
void foo() {
    double arr[3];
    std::cout << arr[0];  // ❌ undefined behavior (garbage)
}

arr is a local automatic variable. It lives on stack memory, and the contents are uninitialized.

What is stack memory? Stack memory is a region of memory used for local variables and function calls. It operates in a Last-In-First-Out (LIFO) manner. When a function is called, space is allocated on the stack for its local variables. When the function returns, that space is freed.

Why is it undefined behavior? When you declare double arr[3]; without initialization, the compiler allocates space on the stack but does not clear or set those memory locations. Whatever data was previously stored in those memory locations remains there. This is called “garbage” data—it could be zeros, could be leftover values from previous function calls, or any random bit pattern. Reading this uninitialized memory leads to unpredictable program behavior, which is why it’s called undefined behavior.

What about double arr[3]; for global or static arrays?
double arr[3];        // global → all elements = 0.0

void foo() {
    static double arr2[3];  // static local → also all 0.0
}

Global and static arrays are automatically zero-initialized.

What does “static” mean here? The static keyword in this context means the variable has static storage duration. Unlike normal local variables that are created on the stack each time a function is called and destroyed when the function returns, a static local variable:

  • Is created only once, the first time the function is called

  • Persists for the entire lifetime of the program

  • Retains its value between function calls

  • Is stored in a separate memory region (not on the stack), similar to global variables

  • Is automatically zero-initialized by the compiler (just like global variables)

Example:

void counter() {
    static int count = 0;  // initialized only once
    count++;
    std::cout << count << std::endl;
}

int main() {
    counter();  // prints 1
    counter();  // prints 2
    counter();  // prints 3
}

The count variable persists between function calls, unlike a normal local variable which would reset to 0 each time.

Array initialization summary

Form

Example

Notes

Explicit C-style

double arr[3] = {1.0, 2.5, 3.14};

Classic

Implicit size

double arr[] = {1.0, 2.5, 3.14};

Compiler deduces size

Partial init

double arr[5] = {1.0, 2.5};

Remaining elements = 0.0

Zero init

double arr[3] = {};

All zeros

Brace init

double arr[3]{1.0, 2.5, 3.14};

C++11 style

Heap array

double* arr = new double[3]{1.0, 2.5, 3.14};

Manual memory

std::array

std::array<double,3> a{1.0,2.5,3.14};

Safer fixed-size

std::vector

std::vector<double> v{1.0,2.5,3.14};

Resizable container

Why std::array and std::vector exist

C++ inherited raw C arrays (double arr[5]), but they have fundamental safety issues. The Standard Library introduced std::array and std::vector to solve these problems:

The problem with C arrays

double arr[5] = {1.0, 2.5, 3.14, 4.2, 5.7};

// No size information - decays to pointer
void process(double arr[]) {  // really just double*
    int size = sizeof(arr);  // Wrong! Gives pointer size, not array size
}

// No bounds checking
arr[10] = 42.0;  // Compiles but writes to random memory (undefined behavior)

// Can't be copied or returned easily
double arr2[5] = arr;  // Compilation error

// Can't be resized
// No way to dynamically change size

std::array: Safe fixed-size arrays

Created to replace C arrays with zero runtime overhead while adding safety:

std::array<double, 5> arr = {1.0, 2.5, 3.14, 4.2, 5.7};

// Knows its size
void process(std::array<double, 5>& arr) {
    int size = arr.size();  // Returns 5
}

// Optional bounds checking
arr.at(10) = 42.0;  // Throws exception (debug builds)
arr[10] = 42.0;     // Undefined behavior (same as C array, but you have .at())

// Can be copied and returned
std::array<double, 5> arr2 = arr;  // Works

// Works with STL algorithms
std::sort(arr.begin(), arr.end());

Key limitation: Size must be known at compile time. Same memory layout as C arrays (lives on stack).

std::vector: Dynamic arrays with memory management

Created to provide dynamic sizing with automatic memory management:

std::vector<double> vec = {1.0, 2.5, 3.14, 4.2, 5.7};

// Size determined at runtime
int n;
std::cin >> n;
std::vector<double> data(n);  // Size from user input

// Can grow and shrink
vec.push_back(6.8);   // Add element
vec.pop_back();       // Remove element
vec.resize(100);      // Change size

// Automatic memory management (RAII)
{
    std::vector<double> temp(1000000);  // Allocates memory from heap
    // use temp...
}  // Destructor automatically frees memory

// Prevents stack overflow for large data
std::vector<double> big_data(10000000);  // Works (heap allocation)

Key tradeoff: Small overhead for dynamic capability (stores size, capacity, and pointer to heap memory).

Summary

  • C arrays: Fast but unsafe, no size info, can’t be copied, fixed size at compile time

  • std::array: Safe wrapper for C arrays, same performance, knows its size, but still fixed at compile time

  • std::vector: Dynamic sizing, automatic memory management, slight overhead, lives on heap

Modern C++ code should prefer std::array for fixed sizes and std::vector for dynamic data. Use raw C arrays only when interfacing with C APIs or for specific performance reasons (measure first!).

Practical tips for working with large datasets

When working with large arrays or datasets in scientific computing:

Use ``std::vector`` for dynamic data

std::vector<double> data(1000000);            // 1 million doubles, all zeros
std::vector<double> data2(1000000, 3.14);     // all initialized to π
std::vector<std::complex<double>> wavefunction(1024);  // complex array
What does “automatically manage memory” mean?

Vectors handle allocation and deallocation behind the scenes. When you create a vector, it allocates memory from the heap. When the vector goes out of scope, it automatically frees that memory. You don’t need to call new/delete or malloc/free manually.

{
    std::vector<double> data(1000);  // memory allocated
    // use data...
}  // vector destroyed, memory automatically freed

This prevents memory leaks compared to manual management:

double* data = new double[1000];  // manual allocation
// use data...
delete[] data;  // must remember to free! (easy to forget)
What about GPU computing and fine control?

For GPU computing (CUDA, OpenCL, etc.), you’re right that you need fine control over memory transfers between CPU and GPU. In those cases:

  • CPU-side preparation: std::vector is commonly used to prepare data on the CPU side before transferring to GPU

  • GPU memory: You still use explicit GPU memory APIs (cudaMalloc, cudaMemcpy, etc.) for GPU-side buffers

  • Why still use vectors?: Even in GPU programming, vectors are preferred on the CPU side for their safety and convenience. You get the data pointer with data.data() when you need to copy to GPU

// Common pattern in CUDA programming
std::vector<float> host_data(1000000, 1.0f);  // CPU side, safe and easy

float* device_data;
cudaMalloc(&device_data, host_data.size() * sizeof(float));  // GPU side
cudaMemcpy(device_data, host_data.data(), /* ... */);  // transfer

// ... GPU computation ...

cudaFree(device_data);  // manual GPU cleanup
// host_data automatically cleaned up when out of scope

For high-performance computing, the pattern is: use modern C++ containers (vectors) for CPU data management, and explicit APIs for GPU/specialized memory.

Use ``std::array`` for fixed-size data

std::array<double, 1000> buffer{};  // all zeros, safer than C arrays
std::array<float, 3> position{0.0f, 0.0f, 0.0f};  // 3D coordinates

Provides bounds checking in debug mode and works with STL algorithms.

What are STL algorithms?

STL (Standard Template Library) algorithms are pre-built functions in <algorithm> for common operations on containers. Examples:

#include <algorithm>
#include <array>

std::array<double, 5> arr = {3.1, 1.4, 4.2, 1.0, 5.9};

// Sort the array
std::sort(arr.begin(), arr.end());

// Find an element
auto it = std::find(arr.begin(), arr.end(), 4);

// Transform elements (square each value)
std::transform(arr.begin(), arr.end(), arr.begin(),
               [](int x) { return x * x; });

// Count occurrences
int count = std::count(arr.begin(), arr.end(), 1);

These work seamlessly with std::array and std::vector, but not with raw C arrays.

Analogy to NumPy

std::array and std::vector are analogous to NumPy’s ndarray in Python:

# Python with NumPy
import numpy as np
arr = np.array([1.0, 2.5, 3.14, 4.2, 5.9])
arr.sort()           # method on array
arr.mean()           # built-in statistical function
arr * 2              # vectorized operations
// C++ with std::array
std::array<double, 5> arr = {1.0, 2.5, 3.14, 4.2, 5.9};
std::sort(arr.begin(), arr.end());           // algorithm
// No built-in mean, but can use STL accumulate
double mean = std::accumulate(arr.begin(), arr.end(), 0.0) / arr.size();
std::transform(arr.begin(), arr.end(), arr.begin(),
               [](double x) { return x * 2.0; });  // element-wise operation

Key difference: NumPy has more built-in numerical methods (mean, std, etc.), while C++ requires STL algorithms or external libraries (like Eigen or Armadillo) for numerical computing. However, both provide contiguous memory layout for performance.

Reserve capacity for vectors when size is known

std::vector<double> data;
data.reserve(1000000);  // pre-allocate memory to avoid reallocations
for (int i = 0; i < 1000000; ++i) {
    data.push_back(static_cast<double>(i) * 0.001);  // convert to double
}

This avoids multiple memory reallocations during growth.

Avoid uninitialized arrays for large data

// Bad: uninitialized, potentially dangerous
double data[1000000];

// Good: explicitly zero-initialized
double data[1000000] = {};

// Better: use vector
std::vector<double> data(1000000);

For large datasets, uninitialized data can lead to hard-to-debug issues.

Why is vector better than array here?

Automatic memory management

  • Arrays like double data[1000000] allocate on the stack (limited size, typically a few MB)

  • Large arrays can cause stack overflow

  • Vectors allocate on the heap automatically, no size limit concerns

Safety and flexibility

  • If you need to change the size later, vectors can resize; arrays cannot

  • Vectors know their own size (data.size()); arrays do not

  • Vectors automatically clean up when going out of scope

Example of the problem with large stack arrays:

void foo() {
    double data[10000000];  // Stack overflow (80 MB on stack)
}

void bar() {
    std::vector<double> data(10000000);  // Works fine (heap allocation)
}
When should I use std::array vs std::vector?

Use ``std::array`` when:

  • Size is fixed and known at compile time

  • Small, performance-critical code (stack allocation is faster)

  • You want array bounds checking with .at()

std::array<int, 100> buffer;  // fixed size, stack allocated

Use ``std::vector`` when:

  • Size is determined at runtime or might change

  • Size is large (> few KB)

  • You need to resize, add, or remove elements

std::vector<double> data(n);  // n determined at runtime
data.push_back(42.5);          // can grow dynamically

Use raw C arrays when:

  • Working with legacy code or C APIs

  • Interfacing with GPU memory APIs

  • Extreme performance optimization (but measure first!)

double buffer[256];  // C-style, minimal overhead

Use memory-efficient initialization for specific patterns

// Fill with specific value
std::vector<double> data(1000000, 3.14);

// Or use std::fill
std::vector<double> data(1000000);
std::fill(data.begin(), data.end(), 3.14);

// Generate values with a pattern
std::vector<double> data(1000);
std::iota(data.begin(), data.end(), 0.0);  // 0.0, 1.0, 2.0, ..., 999.0

Only zero-fill when you need zero

Zeroing a large buffer is O(n). If you will overwrite everything anyway, skip the zeroing and write the data once:

// Wasteful: zeros then overwrites
std::vector<double> data(1000000);  // zeros all elements
for (int i = 0; i < 1000000; ++i) {
    data[i] = compute_value(i);  // overwrites the zeros
}

// Better: allocate but don't initialize, then fill once
std::vector<double> data;
data.reserve(1000000);  // allocate space without initializing
for (int i = 0; i < 1000000; ++i) {
    data.push_back(compute_value(i));  // write data once
}

Prefer standard containers over raw arrays

Standard containers like std::vector and std::array offer significant advantages over raw C arrays:

Fixed size known at compile time:

std::array<double, 1000> data{};  // safer than double data[1000]

Large or variable size:

std::vector<double> data(1000000);  // contiguous, cache-friendly

Why prefer standard containers?

  • Automatic memory management: std::vector handles allocation and deallocation, preventing memory leaks

  • Bounds checking: std::array::at() and std::vector::at() check bounds in debug mode

  • Size tracking: containers know their own size (data.size()), unlike raw arrays

  • STL algorithm compatibility: works seamlessly with std::sort, std::find, std::transform, etc.

  • Exception safety: proper cleanup on exceptions, unlike raw arrays with manual memory management

  • Resizable: std::vector can grow dynamically without manual reallocation

  • No pointer arithmetic errors: iterators are safer than pointer manipulation

  • Similar to NumPy arrays: just as NumPy arrays are preferred over raw Python lists for numerical work, std::vector is preferred for similar reasons—contiguous memory layout enables cache efficiency and SIMD optimizations

Pointers

What is a pointer?

A pointer in C++ is a variable that stores the memory address of another object or function. It provides direct memory access and flexibility, but requires careful management.

While powerful, pointers are more primitive and potentially dangerous compared to references. They can be null, uninitialized, or point to arbitrary memory locations, which makes them both powerful and risky to use.

Pointer basics

  • Pointers are stored in memory managed by the operating system

  • Size: 4 bytes on 32-bit systems, 8 bytes on 64-bit systems (need to address 2^32 or 2^64 memory locations)

  • Example memory address: 0x7ffee3bff6c8 (hexadecimal format)

Null pointers

In C++, null refers to a pointer that does not point to any valid object. It is represented by nullptr (C++11) or NULL (from C).

Why use null pointers? - Indicating absence: Signify that a pointer is not currently pointing to any valid object Pointers ——–

What is a pointer?

A pointer is a variable that stores the memory address of another variable. Pointers are essential for dynamic memory allocation, data structures, and interfacing with C libraries.

Why use pointers?

  • Dynamic memory: Allocate memory at runtime (essential for large datasets)

  • Efficiency: Pass large objects without copying

  • Data structures: Build linked lists, trees, graphs

  • Hardware access: Interface with GPU memory, C APIs, hardware registers

  • Polymorphism: Access derived classes through base class pointers

  • Error handling: Functions can return null to indicate failure

  • Memory management: Set pointers to null after deallocation to prevent dangling pointers

Declaring pointers with concrete examples

Pointers have types and can be null. Each pointer declaration specifies what type of data it points to. Here’s what actually happens in memory:

#include <iostream>

double x = 3.14159;
double* p1 = &x;        // p1 stores address of x

std::cout << "Value of x: " << x << std::endl;           // 3.14159
std::cout << "Address of x: " << &x << std::endl;        // 0x7ffeeb5c4a18 (example)
std::cout << "Value of p1: " << p1 << std::endl;         // 0x7ffeeb5c4a18 (same address)
std::cout << "Dereferenced p1: " << *p1 << std::endl;    // 3.14159 (accesses value at address)

Memory visualization:

Memory Address    Variable    Value
──────────────    ────────    ─────────────────
0x7ffeeb5c4a18    x          3.14159 (double, 8 bytes)
0x7ffeeb5c4a20    p1         0x7ffeeb5c4a18 (pointer, 8 bytes)
                              └─ points to x

Common pointer types for scientific computing:

double* ptr1;                   // pointer to double (most common in scientific code)
float* ptr2;                    // pointer to float (GPU computing)
std::complex<double>* ptr3;     // pointer to complex number
const double* ptr4;             // pointer to constant double (can't modify *ptr4)
double* const ptr5 = &x;        // constant pointer (can't change ptr5 itself)

Important: Uninitialized pointers contain garbage values (random addresses). Always initialize:

double x = 2.718;

double* ptr = nullptr;      // Good: explicitly null
double* ptr2 = &x;          // Good: points to valid object (address: e.g., 0x7ffeeb5c4b00)

double* ptr3;               // Bad: contains garbage address (e.g., 0x000deadbeef)
*ptr3 = 5.0;                // Crash - writing to random memory

The type of a pointer determines: - What type of data it can point to - How pointer arithmetic works (e.g., ptr + 1 moves by sizeof(type) bytes) - Whether it can be null (all pointer types can be nullptr)

Dereferencing pointers

Use the asterisk (*) operator to access the value a pointer points to:

double x = 42.5;
double* ptr = &x;           // ptr stores address of x (e.g., 0x7ffeeb5c4c10)
double value = *ptr;        // dereference: value = 42.5

*ptr = 99.9;                // modify through pointer
std::cout << x;             // x is now 99.9

Practical example: passing arrays to functions

void compute_mean(const double* data, int size) {
    double sum = 0.0;
    for (int i = 0; i < size; i++) {
        sum += data[i];  // same as *(data + i)
    }
    return sum / size;
}

std::vector<double> measurements = {1.2, 3.4, 5.6, 7.8};
double mean = compute_mean(measurements.data(), measurements.size());

Dangling pointers

A dangling pointer does not point to a valid object. This typically occurs when the object was deleted or deallocated, but the pointer was not updated. Accessing a dangling pointer leads to undefined behavior, crashes, or data corruption.

double* ptr = new double(3.14);
delete ptr;         // memory freed
// ptr still holds old address (e.g., 0x600001234567)
*ptr = 2.71;        // Crash - memory no longer valid

ptr = nullptr;      // Good practice: mark as invalid

References

What is a reference?

A reference in C++ is an alias for another variable, created using the ampersand (&) symbol. Once a reference is initialized, it cannot be changed to refer to another variable.

double x = 3.14;
double& ref = x;   // ref is an alias for x
ref = 2.71;        // modifies x directly
std::cout << x;    // prints 2.71

Why use references instead of pointers?

References provide a safer alternative to pointers:

  • Cannot be null or uninitialized (must always refer to a valid object)

  • More intuitive syntax (no need for dereferencing)

  • Prevent common pointer-related errors

  • Avoid copying large objects when passing to functions

// Pass by reference: no copy, direct access
void increment(double& value) {
    value += 1.0;  // modifies the original
}

double x = 5.5;
increment(x);  // x is now 6.5

Analogy: pointers vs references

If a pointer is like holding someone’s address written on paper, a reference is like being given a permanent portal directly into their house—you don’t need to look up the address, you just walk in.

Pointers vs References comparison

Feature

Pointer

Reference

Can be null?

Yes

No (must refer to something valid)

Can be reseated?

Yes (p = &y;)

No (always refers to the same object)

Syntax for access

*p (dereference)

Just use the name

Used for

Dynamic memory, low-level control

Function parameters, operator overloads, clean aliasing

How do you use references in C++?
icon:

info

color:

info

int x = 10;
int& ref = x;   // ref refers to x
ref = 20;       // modifies x directly
std::cout << x; // prints 20
What problem does reference solve?
icon:

info

color:

info

In C, you must use pointers to modify variables in functions:

void increment(int* p) {
    (*p)++;
}

int main() {
    int x = 5;
    increment(&x);  // must pass address
    std::cout << x; // prints 6
}

This can lead to issues like null pointers or pointer arithmetic errors.

In C++, references provide a safer and more intuitive way:

void increment(int& r) {
    r++;
}

int main() {
    int x = 5;
    increment(x);  // pass by reference
    std::cout << x; // prints 6
}

References eliminate pointer-related risks while maintaining efficiency.

CUDA C++ extensions for PyTorch

When writing custom CUDA kernels for PyTorch (e.g., for ptychography reconstruction, FFTs, or custom neural network layers), you divide code into host (CPU) and device (GPU) operations.

Host code responsibilities

1. Input validation and tensor interfacing

#include <torch/extension.h>

torch::Tensor ptychography_forward(
    torch::Tensor probe,           // complex probe function
    torch::Tensor object,          // object transmission
    torch::Tensor positions        // scan positions
) {
    // Validate inputs
    TORCH_CHECK(probe.is_cuda(), "probe must be CUDA tensor");
    TORCH_CHECK(probe.dtype() == torch::kComplexFloat,
               "probe must be complex32");

    // Get tensor dimensions
    int64_t batch_size = probe.size(0);
    int64_t probe_size = probe.size(1);

2. Memory allocation and data preparation

// Allocate output tensor on GPU
auto options = torch::TensorOptions()
    .dtype(torch::kComplexFloat)
    .device(probe.device());
auto output = torch::zeros({batch_size, probe_size, probe_size}, options);

// Get raw pointers for CUDA kernel
auto probe_ptr = probe.data_ptr<c10::complex<float>>();
auto object_ptr = object.data_ptr<c10::complex<float>>();
auto output_ptr = output.data_ptr<c10::complex<float>>();

3. CUDA kernel launch configuration

// Calculate grid and block dimensions
const int threads = 256;
const int blocks = (batch_size * probe_size * probe_size + threads - 1) / threads;

// Launch kernel
ptychography_kernel<<<blocks, threads>>>(
    probe_ptr,
    object_ptr,
    output_ptr,
    batch_size,
    probe_size
);

// Check for kernel errors
cudaError_t error = cudaGetLastError();
TORCH_CHECK(error == cudaSuccess,
           "CUDA kernel failed: ", cudaGetErrorString(error));

4. Synchronization and output handling

    // Wait for kernel completion
    cudaDeviceSynchronize();

    return output;  // Return PyTorch tensor
}

// Bind to Python
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
    m.def("forward", &ptychography_forward, "Ptychography forward pass");
}

Device code (CUDA kernel)

__global__ void ptychography_kernel(
    const c10::complex<float>* probe,
    const c10::complex<float>* object,
    c10::complex<float>* output,
    int batch_size,
    int probe_size
) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    int total = batch_size * probe_size * probe_size;

    if (idx < total) {
        // Complex multiplication for ptychography
        output[idx] = probe[idx] * object[idx];
    }
}

Why this division?

  • Host: Handles PyTorch tensor interface, memory management, validation, kernel orchestration

  • Device: Pure computation (complex arithmetic, FFTs, ptychography operations)

  • PyTorch integration: Host code bridges Python/C++/CUDA seamlessly

Common host operations for scientific computing

  • Input/output tensor validation (types, shapes, device placement)

  • Memory allocation (output tensors, workspace buffers)

  • Data layout conversions (contiguous memory, proper strides)

  • Kernel launch parameters (grid/block sizing)

  • Error checking and synchronization

  • Gradient computation setup (for backward pass)

  • Multi-GPU coordination (if using distributed computing)

Typical workflow pattern

// 1. Validate inputs (host)
TORCH_CHECK(input.is_cuda(), "input must be on CUDA");
TORCH_CHECK(input.is_contiguous(), "input must be contiguous");

// 2. Allocate outputs (host)
auto output = torch::empty_like(input);

// 3. Get raw pointers (host)
auto input_ptr = input.data_ptr<float>();
auto output_ptr = output.data_ptr<float>();

// 4. Launch kernel (host calls device)
const int threads = 256;
const int blocks = (input.numel() + threads - 1) / threads;
my_kernel<<<blocks, threads>>>(input_ptr, output_ptr, input.numel());

// 5. Synchronize (host)
cudaDeviceSynchronize();

// 6. Return PyTorch tensor (host)
return output;

Building and using the extension

setup.py for compilation:

from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension

setup(
    name='ptychography_cuda',
    ext_modules=[
        CUDAExtension(
            'ptychography_cuda',
            ['ptychography.cpp', 'ptychography_kernel.cu'],
            extra_compile_args={
                'cxx': ['-O3'],
                'nvcc': ['-O3', '--use_fast_math']
            }
        )
    ],
    cmdclass={'build_ext': BuildExtension}
)

Using in Python:

import torch
import ptychography_cuda

# Create complex tensors on GPU
probe = torch.randn(32, 256, 256, dtype=torch.complex64, device='cuda')
object_wave = torch.randn(256, 256, dtype=torch.complex64, device='cuda')
positions = torch.randn(32, 2, device='cuda')

# Call custom CUDA kernel
result = ptychography_cuda.forward(probe, object_wave, positions)

Memory management

Memory allocation in C

How does memory allocation work in C?

Memory management in C uses <stdlib.h> (or <cstdlib> in C++):

  • malloc(size): Allocates heap memory and returns a void* pointer

  • free(ptr): Deallocates memory allocated by malloc

Example: int* ptr = (int*)malloc(sizeof(int) * n);

Memory allocation in C++

What are the new and delete operators in C++?

In C++, the new operator allocates memory for an object or array on the heap, while the delete operator deallocates that memory when it is no longer needed.

  • To allocate memory for a single object:

    int* ptr = new int; // Allocates memory for a single integer
    
  • To allocate memory for an array of objects:

    int* arr = new int[10]; // Allocates memory for an array of 10 integers
    
  • To deallocate memory for a single object:

    delete ptr; // Deallocates memory for the single integer
    
  • To deallocate memory for an array of objects:

    delete[] arr; // Deallocates memory for the array of integers
    
What is the difference between malloc/free in C and new/delete in C++?

malloc/free deal with raw bytes, while new/delete handle typed objects and call constructors/destructors.

class Widget {
    public:
        Widget() { std::cout << "Constructing\n"; }
        ~Widget() { std::cout << "Destructing\n"; }
    };

    Widget* w = new Widget(); // allocates and constructs
    delete w;                 // destructs and deallocates

The ~Widget() destructor is called when delete is used, ensuring proper cleanup of resources. The Widget object is constructed when allocated with new.

Memory layout and organization

Why do we need organized memory layout?

Memory organization is fundamental to how programs execute. Memory isn’t just for storing variables and arrays—it’s also for storing function call information (parameters, return addresses, local variables), program instructions (the compiled code itself), and dynamically allocated data. The operating system and CPU work together to organize memory into different regions, each optimized for specific purposes:

  • Code segment: Stores your compiled program instructions

  • Static/global data: Stores global and static variables with fixed addresses

  • Stack: Manages function calls and local variables with automatic cleanup

  • Heap: Provides flexible dynamic memory for data whose size or lifetime isn’t known at compile time

This division allows the CPU to optimize memory access patterns, prevents different types of data from interfering with each other, and enables efficient memory management. Without this organization, programs would be slower, less reliable, and more prone to crashes.

The stack

What is the stack and why is it used for function calls?

The stack is a region of memory with Last-In-First-Out (LIFO) structure, used for managing function calls and local variables.

Complete memory layout visualization:

High memory addresses
┌─────────────────────────┐
│   Stack (grows down)    │ ← Stack pointer (SP)
│   - Function calls      │   Fast, automatic cleanup
│   - Local variables     │   Size limit (~1-8 MB)
│   - Return addresses    │
│   - Function parameters │
├─────────────────────────┤
│   (unused space)        │
├─────────────────────────┤
│   Heap (grows up)       │ ← Heap pointer
│   - Dynamic allocation  │   Flexible, manual management
│   - new/malloc          │   Size: GB+ (OS dependent)
│   - std::vector data    │
│   - Large objects       │
├─────────────────────────┤
│   Static/Global data    │
│   - Global variables    │   Fixed addresses
│   - Static variables    │   Lifetime: entire program
│   - Constants           │
├─────────────────────────┤
│   Code/Text segment     │
│   - Program instructions│   Read-only
│   - Compiled functions  │   Executable code
└─────────────────────────┘
Low memory addresses

How the stack works during function calls:

void bar(double x) {
    double c = x + 1.0;  // local variable on stack
}

void foo(double a) {
    double b = 10.0;     // local variable on stack
    bar(a + b);          // call bar
}

int main() {
    foo(5.0);            // call foo
    return 0;
}

Stack evolution:

1. main() called:
┌──────────────┐
│ return addr  │
└──────────────┘

2. foo(5.0) called:
┌──────────────┐
│ a = 5.0      │
│ b = 10.0     │
│ return addr  │
├──────────────┤
│ return addr  │ ← from main
└──────────────┘

3. bar(15.0) called:
┌──────────────┐
│ x = 15.0     │
│ c = 16.0     │
│ return addr  │
├──────────────┤
│ a = 5.0      │
│ b = 10.0     │
│ return addr  │
├──────────────┤
│ return addr  │ ← from main
└──────────────┘

4. bar() returns, its frame is removed:
┌──────────────┐
│ a = 5.0      │
│ b = 10.0     │
│ return addr  │
├──────────────┤
│ return addr  │ ← from main
└──────────────┘

5. foo() returns, its frame is removed:
┌──────────────┐
│ return addr  │ ← from main
└──────────────┘

Why use the stack for function calls?

  • Speed: Stack allocation is extremely fast—just move a pointer (no searching for free memory)

  • Automatic cleanup: When a function returns, its stack frame is automatically freed

  • Predictable lifetime: Local variables have clear scope and lifetime

  • CPU cache-friendly: Stack memory is accessed sequentially, which is optimal for CPU caches

  • No fragmentation: Memory is always allocated and freed in LIFO order

  • Simple and efficient: The hardware supports stack operations directly (push/pop instructions)

The heap

What is heap memory and how does it work?

Heap memory is a region for dynamic memory allocation where the size and lifetime of data can be determined at runtime.

Key characteristics:

  • Managed by the operating system’s memory allocator

  • Used for runtime memory allocation (new/delete in C++, malloc/free in C)

  • Much larger than stack (can be GB+, limited by available RAM)

  • Memory persists until explicitly freed

  • Slower allocation than stack (requires searching for free blocks)

  • Risk of memory leaks if not properly managed

Heap memory visualization:

Heap grows upward (toward higher addresses)

Before allocation:
┌─────────────────────────────────┐
│     Free space (unallocated)    │ Size: GB+
└─────────────────────────────────┘

After allocations:
┌─────────────────────────────────┐
│  Free space                     │
├─────────────────────────────────┤
│  Block 3: double[1000000]       │ ← ptr3 (8 MB)
│  (vector data)                  │
├─────────────────────────────────┤
│  Free (after deletion)          │ ← Fragmentation
├─────────────────────────────────┤
│  Block 2: std::complex<double>  │ ← ptr2 (16 bytes)
├─────────────────────────────────┤
│  Block 1: double                │ ← ptr1 (8 bytes)
│  Value: 3.14159                 │
└─────────────────────────────────┘
Heap base

Example with actual memory operations:

// Allocate single value on heap
double* ptr1 = new double(3.14159);
// Allocates 8 bytes on heap
// ptr1 stores heap address (e.g., 0x6000012a4b00)

// Allocate complex number
std::complex<double>* ptr2 = new std::complex<double>(1.0, 2.0);
// Allocates 16 bytes on heap
// ptr2 stores heap address (e.g., 0x6000012a4b10)

// Allocate large array via vector (internal heap allocation)
std::vector<double> data(1000000);
// Allocates 8 MB on heap for array storage
// vector manages this pointer internally

delete ptr1;  // Free the 8 bytes
delete ptr2;  // Free the 16 bytes
// data automatically freed when vector goes out of scope

Comparison with stack allocation:

void example() {
    // Stack allocation (fast, automatic)
    double x = 3.14;              // 8 bytes on stack
    double arr[100];              // 800 bytes on stack

    // Heap allocation (flexible, manual)
    double* p = new double(2.71); // 8 bytes on heap
    double* big = new double[1000000];  // 8 MB on heap

    // Stack overflow risk
    // double huge[10000000];  // Crash - too big for stack

    delete p;
    delete[] big;
}  // Stack memory automatically freed here

When to use heap vs stack:

  • Use stack: Small, fixed-size local variables, function parameters

  • Use heap: Large data, dynamic size, data that outlives function scope, shared data

Real-world scientific computing example:

void process_diffraction_data(int num_patterns) {
    // Stack: small, fixed data
    int pattern_size = 512;
    double threshold = 0.001;

    // Heap: large, runtime-sized data
    std::vector<double> patterns(num_patterns * pattern_size * pattern_size);
    std::vector<std::complex<double>> fft_result(pattern_size * pattern_size);

    // Process data...
    // All heap memory automatically cleaned up when vectors destroyed
}
Why does the OS divide memory into stack and heap?

The OS divides memory to optimize different usage patterns:

Stack Memory - Fast, fixed size (typically 1-8 MB) - LIFO (Last-In, First-Out) - For local variables and function calls - Automatic memory management - No fragmentation - Best for: small, short-lived data

Heap Memory - Flexible, variable size (GB+) - For dynamic data structures - Manual memory management (or RAII with smart pointers) - Can fragment over time - Best for: large data, dynamic lifetime, shared ownership

Example showing the difference:

void bad_function() {
    double data[10000000];  // Stack overflow - 80 MB on limited stack
}

void good_function() {
    std::vector<double> data(10000000);  // Heap allocation - no problem
}

Memory fragmentation

What is memory fragmentation?

Memory fragmentation is when free memory becomes divided into small, scattered blocks, making it difficult to allocate large contiguous chunks of memory.

Why does memory fragmentation occur?

Fragmentation happens when: - Memory blocks are allocated in different sizes - Blocks are freed in random order - Gaps form between allocated blocks - These gaps are too small for new allocations

How can you avoid memory fragmentation?

Key strategies: - Use memory pools - Allocate larger blocks upfront - Minimize frequent small allocations - Use fixed-size allocations when possible - Monitor memory usage patterns

Memory pools

What are memory pools?

Memory pools are pre-allocated blocks of memory that manage smaller allocations internally:

  • Allocate one large block upfront

  • Manage smaller allocations within that block

  • Reduce fragmentation

  • Improve allocation speed

  • Simplify memory management

Can you show an example of a memory pool in C++?

Here is a simple example of a memory pool in C++:

#include <iostream>
#include <vector>

class MemoryPool {
public:
    MemoryPool(size_t size) : poolSize(size), pool(new char[size]), offset(0) {}

    ~MemoryPool() {
        delete[] pool;
    }

    void* allocate(size_t size) {
        if (offset + size > poolSize) {
            throw std::bad_alloc();
        }
        void* ptr = pool + offset;
        offset += size;
        return ptr;
    }

    void deallocate(void* ptr, size_t size) {
        // In a simple memory pool, deallocation is often a no-op
    }

private:
    size_t poolSize;
    char* pool;
    size_t offset;
};

int main() {
    MemoryPool memPool(1024); // Create a memory pool of 1024 bytes

    int* intPtr = static_cast<int*>(memPool.allocate(sizeof(int)));
    *intPtr = 42;
    std::cout << "Allocated integer: " << *intPtr << std::endl;

    memPool.deallocate(intPtr, sizeof(int)); // Deallocation is a no-op in this simple example

    return 0;
}

Smart pointers

Memory management is error prone. C++ provides automatic memory management through smart pointers:

  • std::unique_ptr<T>: Exclusive ownership, auto-deletion when out of scope

  • std::shared_ptr<T>: Shared ownership with reference counting

  • std::weak_ptr<T>: Non-owning reference to prevent circular references

Practice examples

Pointer declarations

Basic pointer with actual memory addresses:

double* p;              // pointer to double (uninitialized, contains garbage)
double x = 3.14159;
p = &x;                 // p now holds address of x (e.g., 0x7ffeeb5c4a18)

std::cout << "Value: " << x << std::endl;      // 3.14159
std::cout << "Address: " << &x << std::endl;   // 0x7ffeeb5c4a18
std::cout << "Pointer: " << p << std::endl;    // 0x7ffeeb5c4a18
std::cout << "Dereferenced: " << *p << std::endl;  // 3.14159

Multiple pointers on one line:

double *p1, *p2;  // both are pointers to double
double* p1, p2;   // p1 is a pointer to double, p2 is a double (confusing!)

Constant pointers:

double x = 3.14, y = 2.71;
const double* p1 = &x;         // pointer to const double (can't modify *p1)
double* const p2 = &x;         // const pointer (can't change address)
const double* const p3 = &x;   // both const

Generic pointer:

void* vp = &x; // can point to any type, but you must cast to use it

Pointer to pointer:

double** pp = &p; // points to a pointer (e.g., stores 0x7ffeeb5c4a20)

Using references

How do you use references in C++?
icon:

info

color:

info

double x = 3.14;
double& ref = x;   // ref refers to x
ref = 2.71;        // modifies x directly
std::cout << x;    // prints 2.71
What problem does reference solve?
icon:

info

color:

info

In C, you must use pointers to modify variables in functions:

void increment(double* p) {
    (*p) += 1.0;
}

int main() {
    double x = 5.5;
    increment(&x);  // must pass address
    std::cout << x; // prints 6.5
}

This can lead to issues like null pointers or pointer arithmetic errors.

In C++, references provide a safer and more intuitive way:

void increment(double& r) {
    r += 1.0;
}

int main() {
    double x = 5.5;
    increment(x);  // pass by reference
    std::cout << x; // prints 6.5
}

References eliminate pointer-related risks while maintaining efficiency.

What is the concept of “null” in C++?

In C++, “null” refers to a pointer that does not point to any valid object or memory location. It is represented by the constant nullptr (introduced in C++11) or the macro NULL (from C). A null pointer is used to indicate that the pointer is not currently assigned to any object.

Why do you want to have a pointer that is “null”?

Having a null pointer is useful for several reasons:

  1. Indicating Absence: A null pointer can signify that a pointer is not currently pointing to any valid object, which is useful for optional data or uninitialized pointers.

  2. Error Handling: Functions can return a null pointer to indicate failure or that no valid object could be returned.

  3. Memory Management: It helps in managing dynamic memory by allowing pointers to be set to null after deallocation, preventing dangling pointers.

What’s a good analogy to understand pointers vs references?

If a pointer is like holding someone’s address written on paper,

A reference is like being given a permanent portal directly into their house. You don’t need to look up the address — you just walk in.