C++

Nov 7, 2025 - This is a learning journal for C++ questions focused on scientific computing. I used C++ and C as an undergrad but never built scientific applications. Now, with GPU programming and high performance computing, I want to revisit C++ for scientific computing.

Purpose of this documentation: This page is designed as a practical reference to help you read and write C++ code faster for scientific computing. It focuses on common patterns, syntax, and concepts you’ll encounter in real codebases, with emphasis on floating-point arithmetic and numerical computing.

The following is sourced from various online references, Copilot, and my own experiments. I do not claim originality for any of the content here.

Quick reference for `&` and `*` symbols

Context	Code Example	Meaning of `&`	Type of Thing	Behavior / Effect
In a type declaration	`int& ref = x;`	Declares a reference	Reference	`ref` becomes another name for `x` (no copy). Changing `ref` changes `x`.
In an expression	`int* ptr = &x;`	Address-of operator	Pointer	`&x` gets the memory address of `x`, stored in `ptr`.
Using pointer dereference	`*ptr = 10;`	`*` means “value at address”	Pointer
Combining both	`int& alias = *ptr;`	`*ptr` gives the value of `x`, `alias` refers to it	Reference + Pointer	`alias` becomes another name for whatever `ptr` points to.
In a function parameter	`void foo(int& n)`	Passed by reference	Reference	Function uses the original variable, not a copy.
In a copy constructor	`Tracker(const Tracker& other)`	Copy parameter safely and efficiently	Reference	Avoids recursion (no extra copies).

Pointer in practice

To get the device number in CUDA, you run this code:

int device;
cudaGetDevice(&device);

The & means “address of”. We want to pass the address of device so that cudaGetDevice can write the result into that memory location. Now, if you look at the function signature for cudaGetDevice (simplified):

cudaError_t cudaGetDevice(int* device);

// Ask the driver (lower-level API) which GPU is bound to this thread
int currentDevice = driver_get_current_device();  // e.g. returns 1

// Copy that value into the caller’s memory
*device = currentDevice;

return cudaSuccess;

Now, the parameter requires int* device, it is the pointer type that holds the address of an integer that the user has entered. Inside the function, we use *device to dereference the pointer and write the value into that memory location.

Why `const` is needed for member functions

The const keyword after a member function declaration indicates that the function will not modify the object’s state. This is important for const-correctness and enables the function to be called on const objects.

class Rectangle {
    int width, height;
public:
    void setWidth(int w) { width = w; }       // modifies state (non-const)
    int area() const { return width * height; }  // doesn't modify state (const)
};

Key points

When you write int area() const, the const applies to the hidden this pointer inside the method.

C++ secretly treats member functions like this:

int area(Rectangle* const this) { return this->width * this->height; }

But if you mark the function as const, it becomes:

int area(const Rectangle* const this) { return this->width * this->height; }

Now this points to a const Rectangle, meaning the compiler won’t let you modify any member variables inside that function. This ensures:

The function can be called on const objects
The function is guaranteed not to modify the object’s state
Better code safety and clearer intent

In contrast with Python

Python makes self explicit:

class Rectangle:
    def area(self):
        return self.width * self.height

C++ makes this implicit - it’s there, you just don’t have to say it. When you write width in a C++ member function, the compiler automatically treats it as this->width.

Numerical data types for scientific computing

Type	Size	Precision	Scientific use
`float`	4 bytes (32-bit)	~7 decimal digits	GPU computing (faster), machine learning, graphics
`double`	8 bytes (64-bit)	~15-16 decimal digits	Default for scientific computing, general numerical work
`long double`	16 bytes (128-bit)	~18-19 decimal digits	High-precision calculations, rarely needed
`std::complex<float>`	8 bytes	Same as `float`	Quantum mechanics, signal processing (GPU)
`std::complex<double>`	16 bytes	Same as `double`	Quantum mechanics, signal processing (default)
`int`	4 bytes	Exact integers	Loop counters, array indices only

Key points for scientific computing:

Always use ``double`` by default unless you have a specific reason (GPU memory, performance)
Use ``float`` for GPU computing where memory bandwidth matters
Use ``std::complex<double>`` for complex numbers in quantum mechanics, FFTs, signal processing
Precision matters: float accumulates error faster in iterative algorithms

Complex numbers in C++

For quantum mechanics, signal processing, and Fourier transforms:

#include <complex>
#include <iostream>

// Declaration and initialization
std::complex<double> z1(3.0, 4.0);        // 3 + 4i
std::complex<double> z2 = {1.0, -2.0};    // 1 - 2i
std::complex<float> z3(2.0f, 1.0f);       // Use float for GPU

// Basic operations
auto z_sum = z1 + z2;                     // (4, 2)
auto z_prod = z1 * z2;                    // (11, -2)
auto z_conj = std::conj(z1);              // 3 - 4i (complex conjugate)

// Access real and imaginary parts
double real_part = z1.real();             // 3.0
double imag_part = z1.imag();             // 4.0

// Magnitude and phase
double magnitude = std::abs(z1);          // 5.0 = sqrt(3²+4²)
double phase = std::arg(z1);              // ~0.927 radians

// Exponential form (Euler's formula: e^(iθ) = cos(θ) + i*sin(θ))
std::complex<double> z_polar = std::polar(5.0, 0.927);  // 5*e^(i*0.927)

Common in scientific computing:

// Wave functions in quantum mechanics
std::complex<double> psi(0.707, 0.707);  // normalized state

// Fourier transform (simplified)
std::vector<std::complex<double>> fft_result(1024);

// Complex exponentials for signal processing
double omega = 2.0 * M_PI * 440.0;  // 440 Hz
std::complex<double> signal = std::exp(std::complex<double>(0.0, omega * t));

Array initialization

Understanding various initialization methods is essential because different codebases prefer different styles. Some projects use classic C-style initialization, while others adopt modern C++11 brace initialization. Being familiar with all approaches helps you read, understand, and contribute to diverse codebases more effectively.

1. Classic C-style

double arr[3] = {1.0, 2.5, 3.14};

This is the most common way. Specify the size and the list.

If you make the list shorter than the size, the remaining elements are initialized to zero:

double arr[5] = {1.0, 2.5};  // becomes {1.0, 2.5, 0.0, 0.0, 0.0}

If you omit the size, it is deduced from the initializer list:

double arr[] = {1.0, 2.5, 3.14}; // size is 3

2. Default (zero) initialization

double arr[3] = {};     // all elements are 0.0
double arr2[3] = {0};   // also all elements are 0.0

3. C++11 uniform (brace) initialization

double arr[3]{1.0, 2.5, 3.14};   // same as = {1.0, 2.5, 3.14}, but modern style
double arr2[]{1.0, 2.5, 3.14};   // size deduced automatically
double arr3[5]{};                // all zeros

4. Using memset (low-level, C-style)

double arr[3];
std::memset(arr, 0, sizeof(arr));  // set all bytes to 0

Note: This does not support non-zero initialization directly. Use with caution for floating-point - it’s safer to use loop initialization or brace initialization.

5. Runtime initialization (loops or algorithms)

double arr[3];
for (int i = 0; i < 3; ++i)
    arr[i] = static_cast<double>(i) + 1.5;   // {1.5, 2.5, 3.5}

Uninitialized arrays

Array initialization summary

Form	Example	Notes
Explicit C-style	`double arr[3] = {1.0, 2.5, 3.14};`	Classic
Implicit size	`double arr[] = {1.0, 2.5, 3.14};`	Compiler deduces size
Partial init	`double arr[5] = {1.0, 2.5};`	Remaining elements = 0.0
Zero init	`double arr[3] = {};`	All zeros
Brace init	`double arr[3]{1.0, 2.5, 3.14};`	C++11 style
Heap array	`double* arr = new double[3]{1.0, 2.5, 3.14};`	Manual memory
`std::array`	`std::array<double,3> a{1.0,2.5,3.14};`	Safer fixed-size
`std::vector`	`std::vector<double> v{1.0,2.5,3.14};`	Resizable container

Why std::array and std::vector exist

C++ inherited raw C arrays (double arr[5]), but they have fundamental safety issues. The Standard Library introduced std::array and std::vector to solve these problems:

The problem with C arrays

double arr[5] = {1.0, 2.5, 3.14, 4.2, 5.7};

// No size information - decays to pointer
void process(double arr[]) {  // really just double*
    int size = sizeof(arr);  // Wrong! Gives pointer size, not array size
}

// No bounds checking
arr[10] = 42.0;  // Compiles but writes to random memory (undefined behavior)

// Can't be copied or returned easily
double arr2[5] = arr;  // Compilation error

// Can't be resized
// No way to dynamically change size

std::array: Safe fixed-size arrays

Created to replace C arrays with zero runtime overhead while adding safety:

std::array<double, 5> arr = {1.0, 2.5, 3.14, 4.2, 5.7};

// Knows its size
void process(std::array<double, 5>& arr) {
    int size = arr.size();  // Returns 5
}

// Optional bounds checking
arr.at(10) = 42.0;  // Throws exception (debug builds)
arr[10] = 42.0;     // Undefined behavior (same as C array, but you have .at())

// Can be copied and returned
std::array<double, 5> arr2 = arr;  // Works

// Works with STL algorithms
std::sort(arr.begin(), arr.end());

Key limitation: Size must be known at compile time. Same memory layout as C arrays (lives on stack).

std::vector: Dynamic arrays with memory management

Created to provide dynamic sizing with automatic memory management:

std::vector<double> vec = {1.0, 2.5, 3.14, 4.2, 5.7};

// Size determined at runtime
int n;
std::cin >> n;
std::vector<double> data(n);  // Size from user input

// Can grow and shrink
vec.push_back(6.8);   // Add element
vec.pop_back();       // Remove element
vec.resize(100);      // Change size

// Automatic memory management (RAII)
{
    std::vector<double> temp(1000000);  // Allocates memory from heap
    // use temp...
}  // Destructor automatically frees memory

// Prevents stack overflow for large data
std::vector<double> big_data(10000000);  // Works (heap allocation)

Key tradeoff: Small overhead for dynamic capability (stores size, capacity, and pointer to heap memory).

Summary

C arrays: Fast but unsafe, no size info, can’t be copied, fixed size at compile time
std::array: Safe wrapper for C arrays, same performance, knows its size, but still fixed at compile time
std::vector: Dynamic sizing, automatic memory management, slight overhead, lives on heap

Modern C++ code should prefer std::array for fixed sizes and std::vector for dynamic data. Use raw C arrays only when interfacing with C APIs or for specific performance reasons (measure first!).

Practical tips for working with large datasets

When working with large arrays or datasets in scientific computing:

Use ``std::vector`` for dynamic data

std::vector<double> data(1000000);            // 1 million doubles, all zeros
std::vector<double> data2(1000000, 3.14);     // all initialized to π
std::vector<std::complex<double>> wavefunction(1024);  // complex array

Use ``std::array`` for fixed-size data

std::array<double, 1000> buffer{};  // all zeros, safer than C arrays
std::array<float, 3> position{0.0f, 0.0f, 0.0f};  // 3D coordinates

Provides bounds checking in debug mode and works with STL algorithms.

Analogy to NumPy

std::array and std::vector are analogous to NumPy’s ndarray in Python:

# Python with NumPy
import numpy as np
arr = np.array([1.0, 2.5, 3.14, 4.2, 5.9])
arr.sort()           # method on array
arr.mean()           # built-in statistical function
arr * 2              # vectorized operations

// C++ with std::array
std::array<double, 5> arr = {1.0, 2.5, 3.14, 4.2, 5.9};
std::sort(arr.begin(), arr.end());           // algorithm
// No built-in mean, but can use STL accumulate
double mean = std::accumulate(arr.begin(), arr.end(), 0.0) / arr.size();
std::transform(arr.begin(), arr.end(), arr.begin(),
               [](double x) { return x * 2.0; });  // element-wise operation

Key difference: NumPy has more built-in numerical methods (mean, std, etc.), while C++ requires STL algorithms or external libraries (like Eigen or Armadillo) for numerical computing. However, both provide contiguous memory layout for performance.

Reserve capacity for vectors when size is known

std::vector<double> data;
data.reserve(1000000);  // pre-allocate memory to avoid reallocations
for (int i = 0; i < 1000000; ++i) {
    data.push_back(static_cast<double>(i) * 0.001);  // convert to double
}

This avoids multiple memory reallocations during growth.

Avoid uninitialized arrays for large data

// Bad: uninitialized, potentially dangerous
double data[1000000];

// Good: explicitly zero-initialized
double data[1000000] = {};

// Better: use vector
std::vector<double> data(1000000);

For large datasets, uninitialized data can lead to hard-to-debug issues.

Use memory-efficient initialization for specific patterns

// Fill with specific value
std::vector<double> data(1000000, 3.14);

// Or use std::fill
std::vector<double> data(1000000);
std::fill(data.begin(), data.end(), 3.14);

// Generate values with a pattern
std::vector<double> data(1000);
std::iota(data.begin(), data.end(), 0.0);  // 0.0, 1.0, 2.0, ..., 999.0

Only zero-fill when you need zero

Zeroing a large buffer is O(n). If you will overwrite everything anyway, skip the zeroing and write the data once:

// Wasteful: zeros then overwrites
std::vector<double> data(1000000);  // zeros all elements
for (int i = 0; i < 1000000; ++i) {
    data[i] = compute_value(i);  // overwrites the zeros
}

// Better: allocate but don't initialize, then fill once
std::vector<double> data;
data.reserve(1000000);  // allocate space without initializing
for (int i = 0; i < 1000000; ++i) {
    data.push_back(compute_value(i));  // write data once
}

Prefer standard containers over raw arrays

Standard containers like std::vector and std::array offer significant advantages over raw C arrays:

Fixed size known at compile time:

std::array<double, 1000> data{};  // safer than double data[1000]

Large or variable size:

std::vector<double> data(1000000);  // contiguous, cache-friendly

Why prefer standard containers?

Automatic memory management: std::vector handles allocation and deallocation, preventing memory leaks
Bounds checking: std::array::at() and std::vector::at() check bounds in debug mode
Size tracking: containers know their own size (data.size()), unlike raw arrays
STL algorithm compatibility: works seamlessly with std::sort, std::find, std::transform, etc.
Exception safety: proper cleanup on exceptions, unlike raw arrays with manual memory management
Resizable: std::vector can grow dynamically without manual reallocation
No pointer arithmetic errors: iterators are safer than pointer manipulation
Similar to NumPy arrays: just as NumPy arrays are preferred over raw Python lists for numerical work, std::vector is preferred for similar reasons—contiguous memory layout enables cache efficiency and SIMD optimizations

Pointers

What is a pointer?

A pointer in C++ is a variable that stores the memory address of another object or function. It provides direct memory access and flexibility, but requires careful management.

While powerful, pointers are more primitive and potentially dangerous compared to references. They can be null, uninitialized, or point to arbitrary memory locations, which makes them both powerful and risky to use.

Pointer basics

Pointers are stored in memory managed by the operating system
Size: 4 bytes on 32-bit systems, 8 bytes on 64-bit systems (need to address 2^32 or 2^64 memory locations)
Example memory address: 0x7ffee3bff6c8 (hexadecimal format)

Null pointers

In C++, null refers to a pointer that does not point to any valid object. It is represented by nullptr (C++11) or NULL (from C).

Why use null pointers? - Indicating absence: Signify that a pointer is not currently pointing to any valid object Pointers ——–

What is a pointer?

A pointer is a variable that stores the memory address of another variable. Pointers are essential for dynamic memory allocation, data structures, and interfacing with C libraries.

Why use pointers?

Dynamic memory: Allocate memory at runtime (essential for large datasets)
Efficiency: Pass large objects without copying
Data structures: Build linked lists, trees, graphs
Hardware access: Interface with GPU memory, C APIs, hardware registers
Polymorphism: Access derived classes through base class pointers
Error handling: Functions can return null to indicate failure
Memory management: Set pointers to null after deallocation to prevent dangling pointers

Declaring pointers with concrete examples

Pointers have types and can be null. Each pointer declaration specifies what type of data it points to. Here’s what actually happens in memory:

#include <iostream>

double x = 3.14159;
double* p1 = &x;        // p1 stores address of x

std::cout << "Value of x: " << x << std::endl;           // 3.14159
std::cout << "Address of x: " << &x << std::endl;        // 0x7ffeeb5c4a18 (example)
std::cout << "Value of p1: " << p1 << std::endl;         // 0x7ffeeb5c4a18 (same address)
std::cout << "Dereferenced p1: " << *p1 << std::endl;    // 3.14159 (accesses value at address)

Memory visualization:

Memory Address    Variable    Value
──────────────    ────────    ─────────────────
0x7ffeeb5c4a18    x          3.14159 (double, 8 bytes)
0x7ffeeb5c4a20    p1         0x7ffeeb5c4a18 (pointer, 8 bytes)
                              └─ points to x

Common pointer types for scientific computing:

double* ptr1;                   // pointer to double (most common in scientific code)
float* ptr2;                    // pointer to float (GPU computing)
std::complex<double>* ptr3;     // pointer to complex number
const double* ptr4;             // pointer to constant double (can't modify *ptr4)
double* const ptr5 = &x;        // constant pointer (can't change ptr5 itself)

Important: Uninitialized pointers contain garbage values (random addresses). Always initialize:

double x = 2.718;

double* ptr = nullptr;      // Good: explicitly null
double* ptr2 = &x;          // Good: points to valid object (address: e.g., 0x7ffeeb5c4b00)

double* ptr3;               // Bad: contains garbage address (e.g., 0x000deadbeef)
*ptr3 = 5.0;                // Crash - writing to random memory

The type of a pointer determines: - What type of data it can point to - How pointer arithmetic works (e.g., ptr + 1 moves by sizeof(type) bytes) - Whether it can be null (all pointer types can be nullptr)

Dereferencing pointers

Use the asterisk (*) operator to access the value a pointer points to:

double x = 42.5;
double* ptr = &x;           // ptr stores address of x (e.g., 0x7ffeeb5c4c10)
double value = *ptr;        // dereference: value = 42.5

*ptr = 99.9;                // modify through pointer
std::cout << x;             // x is now 99.9

Practical example: passing arrays to functions

void compute_mean(const double* data, int size) {
    double sum = 0.0;
    for (int i = 0; i < size; i++) {
        sum += data[i];  // same as *(data + i)
    }
    return sum / size;
}

std::vector<double> measurements = {1.2, 3.4, 5.6, 7.8};
double mean = compute_mean(measurements.data(), measurements.size());

Dangling pointers

A dangling pointer does not point to a valid object. This typically occurs when the object was deleted or deallocated, but the pointer was not updated. Accessing a dangling pointer leads to undefined behavior, crashes, or data corruption.

double* ptr = new double(3.14);
delete ptr;         // memory freed
// ptr still holds old address (e.g., 0x600001234567)
*ptr = 2.71;        // Crash - memory no longer valid

ptr = nullptr;      // Good practice: mark as invalid

References

What is a reference?

A reference in C++ is an alias for another variable, created using the ampersand (&) symbol. Once a reference is initialized, it cannot be changed to refer to another variable.

double x = 3.14;
double& ref = x;   // ref is an alias for x
ref = 2.71;        // modifies x directly
std::cout << x;    // prints 2.71

Why use references instead of pointers?

References provide a safer alternative to pointers:

Cannot be null or uninitialized (must always refer to a valid object)
More intuitive syntax (no need for dereferencing)
Prevent common pointer-related errors
Avoid copying large objects when passing to functions

// Pass by reference: no copy, direct access
void increment(double& value) {
    value += 1.0;  // modifies the original
}

double x = 5.5;
increment(x);  // x is now 6.5

Analogy: pointers vs references

If a pointer is like holding someone’s address written on paper, a reference is like being given a permanent portal directly into their house—you don’t need to look up the address, you just walk in.

Pointers vs References comparison

Feature	Pointer	Reference
Can be null?	Yes	No (must refer to something valid)
Can be reseated?	Yes (`p = &y;`)	No (always refers to the same object)
Syntax for access	`*p` (dereference)	Just use the name
Used for	Dynamic memory, low-level control	Function parameters, operator overloads, clean aliasing

CUDA C++ extensions for PyTorch

When writing custom CUDA kernels for PyTorch (e.g., for ptychography reconstruction, FFTs, or custom neural network layers), you divide code into host (CPU) and device (GPU) operations.

Host code responsibilities

1. Input validation and tensor interfacing

#include <torch/extension.h>

torch::Tensor ptychography_forward(
    torch::Tensor probe,           // complex probe function
    torch::Tensor object,          // object transmission
    torch::Tensor positions        // scan positions
) {
    // Validate inputs
    TORCH_CHECK(probe.is_cuda(), "probe must be CUDA tensor");
    TORCH_CHECK(probe.dtype() == torch::kComplexFloat,
               "probe must be complex32");

    // Get tensor dimensions
    int64_t batch_size = probe.size(0);
    int64_t probe_size = probe.size(1);

2. Memory allocation and data preparation

// Allocate output tensor on GPU
auto options = torch::TensorOptions()
    .dtype(torch::kComplexFloat)
    .device(probe.device());
auto output = torch::zeros({batch_size, probe_size, probe_size}, options);

// Get raw pointers for CUDA kernel
auto probe_ptr = probe.data_ptr<c10::complex<float>>();
auto object_ptr = object.data_ptr<c10::complex<float>>();
auto output_ptr = output.data_ptr<c10::complex<float>>();

3. CUDA kernel launch configuration

// Calculate grid and block dimensions
const int threads = 256;
const int blocks = (batch_size * probe_size * probe_size + threads - 1) / threads;

// Launch kernel
ptychography_kernel<<<blocks, threads>>>(
    probe_ptr,
    object_ptr,
    output_ptr,
    batch_size,
    probe_size
);

// Check for kernel errors
cudaError_t error = cudaGetLastError();
TORCH_CHECK(error == cudaSuccess,
           "CUDA kernel failed: ", cudaGetErrorString(error));

4. Synchronization and output handling

    // Wait for kernel completion
    cudaDeviceSynchronize();

    return output;  // Return PyTorch tensor
}

// Bind to Python
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
    m.def("forward", &ptychography_forward, "Ptychography forward pass");
}

Device code (CUDA kernel)

__global__ void ptychography_kernel(
    const c10::complex<float>* probe,
    const c10::complex<float>* object,
    c10::complex<float>* output,
    int batch_size,
    int probe_size
) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    int total = batch_size * probe_size * probe_size;

    if (idx < total) {
        // Complex multiplication for ptychography
        output[idx] = probe[idx] * object[idx];
    }
}

Why this division?

Host: Handles PyTorch tensor interface, memory management, validation, kernel orchestration
Device: Pure computation (complex arithmetic, FFTs, ptychography operations)
PyTorch integration: Host code bridges Python/C++/CUDA seamlessly

Common host operations for scientific computing

Input/output tensor validation (types, shapes, device placement)
Memory allocation (output tensors, workspace buffers)
Data layout conversions (contiguous memory, proper strides)
Kernel launch parameters (grid/block sizing)
Error checking and synchronization
Gradient computation setup (for backward pass)
Multi-GPU coordination (if using distributed computing)

Typical workflow pattern

// 1. Validate inputs (host)
TORCH_CHECK(input.is_cuda(), "input must be on CUDA");
TORCH_CHECK(input.is_contiguous(), "input must be contiguous");

// 2. Allocate outputs (host)
auto output = torch::empty_like(input);

// 3. Get raw pointers (host)
auto input_ptr = input.data_ptr<float>();
auto output_ptr = output.data_ptr<float>();

// 4. Launch kernel (host calls device)
const int threads = 256;
const int blocks = (input.numel() + threads - 1) / threads;
my_kernel<<<blocks, threads>>>(input_ptr, output_ptr, input.numel());

// 5. Synchronize (host)
cudaDeviceSynchronize();

// 6. Return PyTorch tensor (host)
return output;

Building and using the extension

setup.py for compilation:

from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension

setup(
    name='ptychography_cuda',
    ext_modules=[
        CUDAExtension(
            'ptychography_cuda',
            ['ptychography.cpp', 'ptychography_kernel.cu'],
            extra_compile_args={
                'cxx': ['-O3'],
                'nvcc': ['-O3', '--use_fast_math']
            }
        )
    ],
    cmdclass={'build_ext': BuildExtension}
)

Using in Python:

import torch
import ptychography_cuda

# Create complex tensors on GPU
probe = torch.randn(32, 256, 256, dtype=torch.complex64, device='cuda')
object_wave = torch.randn(256, 256, dtype=torch.complex64, device='cuda')
positions = torch.randn(32, 2, device='cuda')

# Call custom CUDA kernel
result = ptychography_cuda.forward(probe, object_wave, positions)

Memory management

Memory allocation in C

Memory allocation in C++

Memory layout and organization

Why do we need organized memory layout?

Memory organization is fundamental to how programs execute. Memory isn’t just for storing variables and arrays—it’s also for storing function call information (parameters, return addresses, local variables), program instructions (the compiled code itself), and dynamically allocated data. The operating system and CPU work together to organize memory into different regions, each optimized for specific purposes:

Code segment: Stores your compiled program instructions
Static/global data: Stores global and static variables with fixed addresses
Stack: Manages function calls and local variables with automatic cleanup
Heap: Provides flexible dynamic memory for data whose size or lifetime isn’t known at compile time

This division allows the CPU to optimize memory access patterns, prevents different types of data from interfering with each other, and enables efficient memory management. Without this organization, programs would be slower, less reliable, and more prone to crashes.

The stack

What is the stack and why is it used for function calls?

The stack is a region of memory with Last-In-First-Out (LIFO) structure, used for managing function calls and local variables.

Complete memory layout visualization:

High memory addresses
┌─────────────────────────┐
│   Stack (grows down)    │ ← Stack pointer (SP)
│   - Function calls      │   Fast, automatic cleanup
│   - Local variables     │   Size limit (~1-8 MB)
│   - Return addresses    │
│   - Function parameters │
├─────────────────────────┤
│   (unused space)        │
├─────────────────────────┤
│   Heap (grows up)       │ ← Heap pointer
│   - Dynamic allocation  │   Flexible, manual management
│   - new/malloc          │   Size: GB+ (OS dependent)
│   - std::vector data    │
│   - Large objects       │
├─────────────────────────┤
│   Static/Global data    │
│   - Global variables    │   Fixed addresses
│   - Static variables    │   Lifetime: entire program
│   - Constants           │
├─────────────────────────┤
│   Code/Text segment     │
│   - Program instructions│   Read-only
│   - Compiled functions  │   Executable code
└─────────────────────────┘
Low memory addresses

How the stack works during function calls:

void bar(double x) {
    double c = x + 1.0;  // local variable on stack
}

void foo(double a) {
    double b = 10.0;     // local variable on stack
    bar(a + b);          // call bar
}

int main() {
    foo(5.0);            // call foo
    return 0;
}

Stack evolution:

1. main() called:
┌──────────────┐
│ return addr  │
└──────────────┘

2. foo(5.0) called:
┌──────────────┐
│ a = 5.0      │
│ b = 10.0     │
│ return addr  │
├──────────────┤
│ return addr  │ ← from main
└──────────────┘

3. bar(15.0) called:
┌──────────────┐
│ x = 15.0     │
│ c = 16.0     │
│ return addr  │
├──────────────┤
│ a = 5.0      │
│ b = 10.0     │
│ return addr  │
├──────────────┤
│ return addr  │ ← from main
└──────────────┘

4. bar() returns, its frame is removed:
┌──────────────┐
│ a = 5.0      │
│ b = 10.0     │
│ return addr  │
├──────────────┤
│ return addr  │ ← from main
└──────────────┘

5. foo() returns, its frame is removed:
┌──────────────┐
│ return addr  │ ← from main
└──────────────┘

Why use the stack for function calls?

Speed: Stack allocation is extremely fast—just move a pointer (no searching for free memory)
Automatic cleanup: When a function returns, its stack frame is automatically freed
Predictable lifetime: Local variables have clear scope and lifetime
CPU cache-friendly: Stack memory is accessed sequentially, which is optimal for CPU caches
No fragmentation: Memory is always allocated and freed in LIFO order
Simple and efficient: The hardware supports stack operations directly (push/pop instructions)

The heap

What is heap memory and how does it work?

Heap memory is a region for dynamic memory allocation where the size and lifetime of data can be determined at runtime.

Key characteristics:

Managed by the operating system’s memory allocator
Used for runtime memory allocation (new/delete in C++, malloc/free in C)
Much larger than stack (can be GB+, limited by available RAM)
Memory persists until explicitly freed
Slower allocation than stack (requires searching for free blocks)
Risk of memory leaks if not properly managed

Heap memory visualization:

Heap grows upward (toward higher addresses)

Before allocation:
┌─────────────────────────────────┐
│     Free space (unallocated)    │ Size: GB+
└─────────────────────────────────┘

After allocations:
┌─────────────────────────────────┐
│  Free space                     │
├─────────────────────────────────┤
│  Block 3: double[1000000]       │ ← ptr3 (8 MB)
│  (vector data)                  │
├─────────────────────────────────┤
│  Free (after deletion)          │ ← Fragmentation
├─────────────────────────────────┤
│  Block 2: std::complex<double>  │ ← ptr2 (16 bytes)
├─────────────────────────────────┤
│  Block 1: double                │ ← ptr1 (8 bytes)
│  Value: 3.14159                 │
└─────────────────────────────────┘
Heap base

Example with actual memory operations:

// Allocate single value on heap
double* ptr1 = new double(3.14159);
// Allocates 8 bytes on heap
// ptr1 stores heap address (e.g., 0x6000012a4b00)

// Allocate complex number
std::complex<double>* ptr2 = new std::complex<double>(1.0, 2.0);
// Allocates 16 bytes on heap
// ptr2 stores heap address (e.g., 0x6000012a4b10)

// Allocate large array via vector (internal heap allocation)
std::vector<double> data(1000000);
// Allocates 8 MB on heap for array storage
// vector manages this pointer internally

delete ptr1;  // Free the 8 bytes
delete ptr2;  // Free the 16 bytes
// data automatically freed when vector goes out of scope

Comparison with stack allocation:

void example() {
    // Stack allocation (fast, automatic)
    double x = 3.14;              // 8 bytes on stack
    double arr[100];              // 800 bytes on stack

    // Heap allocation (flexible, manual)
    double* p = new double(2.71); // 8 bytes on heap
    double* big = new double[1000000];  // 8 MB on heap

    // Stack overflow risk
    // double huge[10000000];  // Crash - too big for stack

    delete p;
    delete[] big;
}  // Stack memory automatically freed here

When to use heap vs stack:

Use stack: Small, fixed-size local variables, function parameters
Use heap: Large data, dynamic size, data that outlives function scope, shared data

Real-world scientific computing example:

void process_diffraction_data(int num_patterns) {
    // Stack: small, fixed data
    int pattern_size = 512;
    double threshold = 0.001;

    // Heap: large, runtime-sized data
    std::vector<double> patterns(num_patterns * pattern_size * pattern_size);
    std::vector<std::complex<double>> fft_result(pattern_size * pattern_size);

    // Process data...
    // All heap memory automatically cleaned up when vectors destroyed
}

Memory fragmentation

Memory pools

Smart pointers

Memory management is error prone. C++ provides automatic memory management through smart pointers:

std::unique_ptr<T>: Exclusive ownership, auto-deletion when out of scope

std::shared_ptr<T>: Shared ownership with reference counting

std::weak_ptr<T>: Non-owning reference to prevent circular references

Practice examples

Pointer declarations

Basic pointer with actual memory addresses:

double* p;              // pointer to double (uninitialized, contains garbage)
double x = 3.14159;
p = &x;                 // p now holds address of x (e.g., 0x7ffeeb5c4a18)

std::cout << "Value: " << x << std::endl;      // 3.14159
std::cout << "Address: " << &x << std::endl;   // 0x7ffeeb5c4a18
std::cout << "Pointer: " << p << std::endl;    // 0x7ffeeb5c4a18
std::cout << "Dereferenced: " << *p << std::endl;  // 3.14159

Multiple pointers on one line:

double *p1, *p2;  // both are pointers to double
double* p1, p2;   // p1 is a pointer to double, p2 is a double (confusing!)

Constant pointers:

double x = 3.14, y = 2.71;
const double* p1 = &x;         // pointer to const double (can't modify *p1)
double* const p2 = &x;         // const pointer (can't change address)
const double* const p3 = &x;   // both const

Generic pointer:

void* vp = &x; // can point to any type, but you must cast to use it

Pointer to pointer:

double** pp = &p; // points to a pointer (e.g., stores 0x7ffeeb5c4a20)

Using references

What is the concept of “null” in C++?

In C++, “null” refers to a pointer that does not point to any valid object or memory location. It is represented by the constant nullptr (introduced in C++11) or the macro NULL (from C). A null pointer is used to indicate that the pointer is not currently assigned to any object.

Why do you want to have a pointer that is “null”?

Having a null pointer is useful for several reasons:

Indicating Absence: A null pointer can signify that a pointer is not currently pointing to any valid object, which is useful for optional data or uninitialized pointers.
Error Handling: Functions can return a null pointer to indicate failure or that no valid object could be returned.
Memory Management: It helps in managing dynamic memory by allowing pointers to be set to null after deallocation, preventing dangling pointers.

What’s a good analogy to understand pointers vs references?

If a pointer is like holding someone’s address written on paper,

A reference is like being given a permanent portal directly into their house. You don’t need to look up the address — you just walk in.