C++
Nov 7, 2025 - This is a learning journal for C++ questions focused on scientific computing. I used C++ and C as an undergrad but never built scientific applications. Now, with GPU programming and high performance computing, I want to revisit C++ for scientific computing.
Purpose of this documentation: This page is designed as a practical reference to help you read and write C++ code faster for scientific computing. It focuses on common patterns, syntax, and concepts you’ll encounter in real codebases, with emphasis on floating-point arithmetic and numerical computing.
The following is sourced from various online references, Copilot, and my own experiments. I do not claim originality for any of the content here.
Quick reference for & and * symbols
Context |
Code Example |
Meaning of |
Type of Thing |
Behavior / Effect |
|---|---|---|---|---|
In a type declaration |
|
Declares a reference |
Reference |
|
In an expression |
|
Address-of operator |
Pointer |
|
Using pointer dereference |
|
|
Pointer |
|
Combining both |
|
|
Reference + Pointer |
|
In a function parameter |
|
Passed by reference |
Reference |
Function uses the original variable, not a copy. |
In a copy constructor |
|
Copy parameter safely and efficiently |
Reference |
Avoids recursion (no extra copies). |
Pointer in practice
To get the device number in CUDA, you run this code:
int device;
cudaGetDevice(&device);
The & means “address of”. We want to pass the address of device so that cudaGetDevice can write the result into that memory location. Now, if you look at the function signature for cudaGetDevice (simplified):
cudaError_t cudaGetDevice(int* device);
// Ask the driver (lower-level API) which GPU is bound to this thread
int currentDevice = driver_get_current_device(); // e.g. returns 1
// Copy that value into the caller’s memory
*device = currentDevice;
return cudaSuccess;
Now, the parameter requires int* device, it is the pointer type that holds the address of an integer that the user has entered. Inside the function, we use *device to dereference the pointer and write the value into that memory location.
Why const is needed for member functions
The const keyword after a member function declaration indicates that the function will not modify the object’s state. This is important for const-correctness and enables the function to be called on const objects.
class Rectangle {
int width, height;
public:
void setWidth(int w) { width = w; } // modifies state (non-const)
int area() const { return width * height; } // doesn't modify state (const)
};
Key points
When you write int area() const, the const applies to the hidden this pointer inside the method.
C++ secretly treats member functions like this:
int area(Rectangle* const this) { return this->width * this->height; }
But if you mark the function as const, it becomes:
int area(const Rectangle* const this) { return this->width * this->height; }
Now this points to a const Rectangle, meaning the compiler won’t let you modify any member variables inside that function. This ensures:
The function can be called on const objects
The function is guaranteed not to modify the object’s state
Better code safety and clearer intent
In contrast with Python
Python makes self explicit:
class Rectangle:
def area(self):
return self.width * self.height
C++ makes this implicit - it’s there, you just don’t have to say it. When you write width in a C++ member function, the compiler automatically treats it as this->width.
Numerical data types for scientific computing
Type |
Size |
Precision |
Scientific use |
|---|---|---|---|
|
4 bytes (32-bit) |
~7 decimal digits |
GPU computing (faster), machine learning, graphics |
|
8 bytes (64-bit) |
~15-16 decimal digits |
Default for scientific computing, general numerical work |
|
16 bytes (128-bit) |
~18-19 decimal digits |
High-precision calculations, rarely needed |
|
8 bytes |
Same as |
Quantum mechanics, signal processing (GPU) |
|
16 bytes |
Same as |
Quantum mechanics, signal processing (default) |
|
4 bytes |
Exact integers |
Loop counters, array indices only |
Key points for scientific computing:
Always use ``double`` by default unless you have a specific reason (GPU memory, performance)
Use ``float`` for GPU computing where memory bandwidth matters
Use ``std::complex<double>`` for complex numbers in quantum mechanics, FFTs, signal processing
Precision matters:
floataccumulates error faster in iterative algorithms
Understanding floating-point precision
Floating-point numbers have limited precision:
float x = 1.0f / 3.0f; // 0.333333... (7 digits)
double y = 1.0 / 3.0; // 0.333333333333333... (15 digits)
// Accumulated error example
float sum_f = 0.0f;
double sum_d = 0.0;
for (int i = 0; i < 1000000; i++) {
sum_f += 0.1f; // Accumulates rounding error
sum_d += 0.1; // Better, but still not perfect
}
std::cout << sum_f << std::endl; // ~99994.5 (wrong!)
std::cout << sum_d << std::endl; // ~100000.0 (closer)
Rule of thumb: For scientific computing with many iterations or accumulations, prefer double. Use float only when profiling shows performance benefits.
Complex numbers in C++
For quantum mechanics, signal processing, and Fourier transforms:
#include <complex>
#include <iostream>
// Declaration and initialization
std::complex<double> z1(3.0, 4.0); // 3 + 4i
std::complex<double> z2 = {1.0, -2.0}; // 1 - 2i
std::complex<float> z3(2.0f, 1.0f); // Use float for GPU
// Basic operations
auto z_sum = z1 + z2; // (4, 2)
auto z_prod = z1 * z2; // (11, -2)
auto z_conj = std::conj(z1); // 3 - 4i (complex conjugate)
// Access real and imaginary parts
double real_part = z1.real(); // 3.0
double imag_part = z1.imag(); // 4.0
// Magnitude and phase
double magnitude = std::abs(z1); // 5.0 = sqrt(3²+4²)
double phase = std::arg(z1); // ~0.927 radians
// Exponential form (Euler's formula: e^(iθ) = cos(θ) + i*sin(θ))
std::complex<double> z_polar = std::polar(5.0, 0.927); // 5*e^(i*0.927)
Common in scientific computing:
// Wave functions in quantum mechanics
std::complex<double> psi(0.707, 0.707); // normalized state
// Fourier transform (simplified)
std::vector<std::complex<double>> fft_result(1024);
// Complex exponentials for signal processing
double omega = 2.0 * M_PI * 440.0; // 440 Hz
std::complex<double> signal = std::exp(std::complex<double>(0.0, omega * t));
Array initialization
Understanding various initialization methods is essential because different codebases prefer different styles. Some projects use classic C-style initialization, while others adopt modern C++11 brace initialization. Being familiar with all approaches helps you read, understand, and contribute to diverse codebases more effectively.
1. Classic C-style
double arr[3] = {1.0, 2.5, 3.14};
This is the most common way. Specify the size and the list.
If you make the list shorter than the size, the remaining elements are initialized to zero:
double arr[5] = {1.0, 2.5}; // becomes {1.0, 2.5, 0.0, 0.0, 0.0}
If you omit the size, it is deduced from the initializer list:
double arr[] = {1.0, 2.5, 3.14}; // size is 3
2. Default (zero) initialization
double arr[3] = {}; // all elements are 0.0
double arr2[3] = {0}; // also all elements are 0.0
3. C++11 uniform (brace) initialization
double arr[3]{1.0, 2.5, 3.14}; // same as = {1.0, 2.5, 3.14}, but modern style
double arr2[]{1.0, 2.5, 3.14}; // size deduced automatically
double arr3[5]{}; // all zeros
4. Using memset (low-level, C-style)
double arr[3];
std::memset(arr, 0, sizeof(arr)); // set all bytes to 0
Note: This does not support non-zero initialization directly. Use with caution for floating-point - it’s safer to use loop initialization or brace initialization.
5. Runtime initialization (loops or algorithms)
double arr[3];
for (int i = 0; i < 3; ++i)
arr[i] = static_cast<double>(i) + 1.5; // {1.5, 2.5, 3.5}
Uninitialized arrays
What happens with double arr[3]; without initialization within a function?
void foo() {
double arr[3];
std::cout << arr[0]; // ❌ undefined behavior (garbage)
}
arr is a local automatic variable. It lives on stack memory, and the contents are uninitialized.
What is stack memory? Stack memory is a region of memory used for local variables and function calls. It operates in a Last-In-First-Out (LIFO) manner. When a function is called, space is allocated on the stack for its local variables. When the function returns, that space is freed.
Why is it undefined behavior? When you declare double arr[3]; without initialization, the compiler allocates space on the stack but does not clear or set those memory locations. Whatever data was previously stored in those memory locations remains there. This is called “garbage” data—it could be zeros, could be leftover values from previous function calls, or any random bit pattern. Reading this uninitialized memory leads to unpredictable program behavior, which is why it’s called undefined behavior.
What about double arr[3]; for global or static arrays?
double arr[3]; // global → all elements = 0.0
void foo() {
static double arr2[3]; // static local → also all 0.0
}
Global and static arrays are automatically zero-initialized.
What does “static” mean here? The static keyword in this context means the variable has static storage duration. Unlike normal local variables that are created on the stack each time a function is called and destroyed when the function returns, a static local variable:
Is created only once, the first time the function is called
Persists for the entire lifetime of the program
Retains its value between function calls
Is stored in a separate memory region (not on the stack), similar to global variables
Is automatically zero-initialized by the compiler (just like global variables)
Example:
void counter() {
static int count = 0; // initialized only once
count++;
std::cout << count << std::endl;
}
int main() {
counter(); // prints 1
counter(); // prints 2
counter(); // prints 3
}
The count variable persists between function calls, unlike a normal local variable which would reset to 0 each time.
Array initialization summary
Form |
Example |
Notes |
|---|---|---|
Explicit C-style |
|
Classic |
Implicit size |
|
Compiler deduces size |
Partial init |
|
Remaining elements = 0.0 |
Zero init |
|
All zeros |
Brace init |
|
C++11 style |
Heap array |
|
Manual memory |
|
|
Safer fixed-size |
|
|
Resizable container |
Why std::array and std::vector exist
C++ inherited raw C arrays (double arr[5]), but they have fundamental safety issues. The Standard Library introduced std::array and std::vector to solve these problems:
The problem with C arrays
double arr[5] = {1.0, 2.5, 3.14, 4.2, 5.7};
// No size information - decays to pointer
void process(double arr[]) { // really just double*
int size = sizeof(arr); // Wrong! Gives pointer size, not array size
}
// No bounds checking
arr[10] = 42.0; // Compiles but writes to random memory (undefined behavior)
// Can't be copied or returned easily
double arr2[5] = arr; // Compilation error
// Can't be resized
// No way to dynamically change size
std::array: Safe fixed-size arrays
Created to replace C arrays with zero runtime overhead while adding safety:
std::array<double, 5> arr = {1.0, 2.5, 3.14, 4.2, 5.7};
// Knows its size
void process(std::array<double, 5>& arr) {
int size = arr.size(); // Returns 5
}
// Optional bounds checking
arr.at(10) = 42.0; // Throws exception (debug builds)
arr[10] = 42.0; // Undefined behavior (same as C array, but you have .at())
// Can be copied and returned
std::array<double, 5> arr2 = arr; // Works
// Works with STL algorithms
std::sort(arr.begin(), arr.end());
Key limitation: Size must be known at compile time. Same memory layout as C arrays (lives on stack).
std::vector: Dynamic arrays with memory management
Created to provide dynamic sizing with automatic memory management:
std::vector<double> vec = {1.0, 2.5, 3.14, 4.2, 5.7};
// Size determined at runtime
int n;
std::cin >> n;
std::vector<double> data(n); // Size from user input
// Can grow and shrink
vec.push_back(6.8); // Add element
vec.pop_back(); // Remove element
vec.resize(100); // Change size
// Automatic memory management (RAII)
{
std::vector<double> temp(1000000); // Allocates memory from heap
// use temp...
} // Destructor automatically frees memory
// Prevents stack overflow for large data
std::vector<double> big_data(10000000); // Works (heap allocation)
Key tradeoff: Small overhead for dynamic capability (stores size, capacity, and pointer to heap memory).
Summary
C arrays: Fast but unsafe, no size info, can’t be copied, fixed size at compile time
std::array: Safe wrapper for C arrays, same performance, knows its size, but still fixed at compile time
std::vector: Dynamic sizing, automatic memory management, slight overhead, lives on heap
Modern C++ code should prefer std::array for fixed sizes and std::vector for dynamic data. Use raw C arrays only when interfacing with C APIs or for specific performance reasons (measure first!).
Practical tips for working with large datasets
When working with large arrays or datasets in scientific computing:
Use ``std::vector`` for dynamic data
std::vector<double> data(1000000); // 1 million doubles, all zeros
std::vector<double> data2(1000000, 3.14); // all initialized to π
std::vector<std::complex<double>> wavefunction(1024); // complex array
What does “automatically manage memory” mean?
Vectors handle allocation and deallocation behind the scenes. When you create a vector, it allocates memory from the heap. When the vector goes out of scope, it automatically frees that memory. You don’t need to call new/delete or malloc/free manually.
{
std::vector<double> data(1000); // memory allocated
// use data...
} // vector destroyed, memory automatically freed
This prevents memory leaks compared to manual management:
double* data = new double[1000]; // manual allocation
// use data...
delete[] data; // must remember to free! (easy to forget)
What about GPU computing and fine control?
For GPU computing (CUDA, OpenCL, etc.), you’re right that you need fine control over memory transfers between CPU and GPU. In those cases:
CPU-side preparation:
std::vectoris commonly used to prepare data on the CPU side before transferring to GPUGPU memory: You still use explicit GPU memory APIs (
cudaMalloc,cudaMemcpy, etc.) for GPU-side buffersWhy still use vectors?: Even in GPU programming, vectors are preferred on the CPU side for their safety and convenience. You get the data pointer with
data.data()when you need to copy to GPU
// Common pattern in CUDA programming
std::vector<float> host_data(1000000, 1.0f); // CPU side, safe and easy
float* device_data;
cudaMalloc(&device_data, host_data.size() * sizeof(float)); // GPU side
cudaMemcpy(device_data, host_data.data(), /* ... */); // transfer
// ... GPU computation ...
cudaFree(device_data); // manual GPU cleanup
// host_data automatically cleaned up when out of scope
For high-performance computing, the pattern is: use modern C++ containers (vectors) for CPU data management, and explicit APIs for GPU/specialized memory.
Use ``std::array`` for fixed-size data
std::array<double, 1000> buffer{}; // all zeros, safer than C arrays
std::array<float, 3> position{0.0f, 0.0f, 0.0f}; // 3D coordinates
Provides bounds checking in debug mode and works with STL algorithms.
What are STL algorithms?
STL (Standard Template Library) algorithms are pre-built functions in <algorithm> for common operations on containers. Examples:
#include <algorithm>
#include <array>
std::array<double, 5> arr = {3.1, 1.4, 4.2, 1.0, 5.9};
// Sort the array
std::sort(arr.begin(), arr.end());
// Find an element
auto it = std::find(arr.begin(), arr.end(), 4);
// Transform elements (square each value)
std::transform(arr.begin(), arr.end(), arr.begin(),
[](int x) { return x * x; });
// Count occurrences
int count = std::count(arr.begin(), arr.end(), 1);
These work seamlessly with std::array and std::vector, but not with raw C arrays.
Analogy to NumPy
std::array and std::vector are analogous to NumPy’s ndarray in Python:
# Python with NumPy
import numpy as np
arr = np.array([1.0, 2.5, 3.14, 4.2, 5.9])
arr.sort() # method on array
arr.mean() # built-in statistical function
arr * 2 # vectorized operations
// C++ with std::array
std::array<double, 5> arr = {1.0, 2.5, 3.14, 4.2, 5.9};
std::sort(arr.begin(), arr.end()); // algorithm
// No built-in mean, but can use STL accumulate
double mean = std::accumulate(arr.begin(), arr.end(), 0.0) / arr.size();
std::transform(arr.begin(), arr.end(), arr.begin(),
[](double x) { return x * 2.0; }); // element-wise operation
Key difference: NumPy has more built-in numerical methods (mean, std, etc.), while C++ requires STL algorithms or external libraries (like Eigen or Armadillo) for numerical computing. However, both provide contiguous memory layout for performance.
Reserve capacity for vectors when size is known
std::vector<double> data;
data.reserve(1000000); // pre-allocate memory to avoid reallocations
for (int i = 0; i < 1000000; ++i) {
data.push_back(static_cast<double>(i) * 0.001); // convert to double
}
This avoids multiple memory reallocations during growth.
Avoid uninitialized arrays for large data
// Bad: uninitialized, potentially dangerous
double data[1000000];
// Good: explicitly zero-initialized
double data[1000000] = {};
// Better: use vector
std::vector<double> data(1000000);
For large datasets, uninitialized data can lead to hard-to-debug issues.
Why is vector better than array here?
Automatic memory management
Arrays like
double data[1000000]allocate on the stack (limited size, typically a few MB)Large arrays can cause stack overflow
Vectors allocate on the heap automatically, no size limit concerns
Safety and flexibility
If you need to change the size later, vectors can resize; arrays cannot
Vectors know their own size (
data.size()); arrays do notVectors automatically clean up when going out of scope
Example of the problem with large stack arrays:
void foo() {
double data[10000000]; // Stack overflow (80 MB on stack)
}
void bar() {
std::vector<double> data(10000000); // Works fine (heap allocation)
}
When should I use std::array vs std::vector?
Use ``std::array`` when:
Size is fixed and known at compile time
Small, performance-critical code (stack allocation is faster)
You want array bounds checking with
.at()
std::array<int, 100> buffer; // fixed size, stack allocated
Use ``std::vector`` when:
Size is determined at runtime or might change
Size is large (> few KB)
You need to resize, add, or remove elements
std::vector<double> data(n); // n determined at runtime
data.push_back(42.5); // can grow dynamically
Use raw C arrays when:
Working with legacy code or C APIs
Interfacing with GPU memory APIs
Extreme performance optimization (but measure first!)
double buffer[256]; // C-style, minimal overhead
Use memory-efficient initialization for specific patterns
// Fill with specific value
std::vector<double> data(1000000, 3.14);
// Or use std::fill
std::vector<double> data(1000000);
std::fill(data.begin(), data.end(), 3.14);
// Generate values with a pattern
std::vector<double> data(1000);
std::iota(data.begin(), data.end(), 0.0); // 0.0, 1.0, 2.0, ..., 999.0
Only zero-fill when you need zero
Zeroing a large buffer is O(n). If you will overwrite everything anyway, skip the zeroing and write the data once:
// Wasteful: zeros then overwrites
std::vector<double> data(1000000); // zeros all elements
for (int i = 0; i < 1000000; ++i) {
data[i] = compute_value(i); // overwrites the zeros
}
// Better: allocate but don't initialize, then fill once
std::vector<double> data;
data.reserve(1000000); // allocate space without initializing
for (int i = 0; i < 1000000; ++i) {
data.push_back(compute_value(i)); // write data once
}
Prefer standard containers over raw arrays
Standard containers like std::vector and std::array offer significant advantages over raw C arrays:
Fixed size known at compile time:
std::array<double, 1000> data{}; // safer than double data[1000]
Large or variable size:
std::vector<double> data(1000000); // contiguous, cache-friendly
Why prefer standard containers?
Automatic memory management:
std::vectorhandles allocation and deallocation, preventing memory leaksBounds checking:
std::array::at()andstd::vector::at()check bounds in debug modeSize tracking: containers know their own size (
data.size()), unlike raw arraysSTL algorithm compatibility: works seamlessly with
std::sort,std::find,std::transform, etc.Exception safety: proper cleanup on exceptions, unlike raw arrays with manual memory management
Resizable:
std::vectorcan grow dynamically without manual reallocationNo pointer arithmetic errors: iterators are safer than pointer manipulation
Similar to NumPy arrays: just as NumPy arrays are preferred over raw Python lists for numerical work,
std::vectoris preferred for similar reasons—contiguous memory layout enables cache efficiency and SIMD optimizations
Pointers
What is a pointer?
A pointer in C++ is a variable that stores the memory address of another object or function. It provides direct memory access and flexibility, but requires careful management.
While powerful, pointers are more primitive and potentially dangerous compared to references. They can be null, uninitialized, or point to arbitrary memory locations, which makes them both powerful and risky to use.
Pointer basics
Pointers are stored in memory managed by the operating system
Size: 4 bytes on 32-bit systems, 8 bytes on 64-bit systems (need to address 2^32 or 2^64 memory locations)
Example memory address: 0x7ffee3bff6c8 (hexadecimal format)
Null pointers
In C++, null refers to a pointer that does not point to any valid object. It is represented by nullptr (C++11) or NULL (from C).
Why use null pointers? - Indicating absence: Signify that a pointer is not currently pointing to any valid object Pointers ——–
What is a pointer?
A pointer is a variable that stores the memory address of another variable. Pointers are essential for dynamic memory allocation, data structures, and interfacing with C libraries.
Why use pointers?
Dynamic memory: Allocate memory at runtime (essential for large datasets)
Efficiency: Pass large objects without copying
Data structures: Build linked lists, trees, graphs
Hardware access: Interface with GPU memory, C APIs, hardware registers
Polymorphism: Access derived classes through base class pointers
Error handling: Functions can return null to indicate failure
Memory management: Set pointers to null after deallocation to prevent dangling pointers
Declaring pointers with concrete examples
Pointers have types and can be null. Each pointer declaration specifies what type of data it points to. Here’s what actually happens in memory:
#include <iostream>
double x = 3.14159;
double* p1 = &x; // p1 stores address of x
std::cout << "Value of x: " << x << std::endl; // 3.14159
std::cout << "Address of x: " << &x << std::endl; // 0x7ffeeb5c4a18 (example)
std::cout << "Value of p1: " << p1 << std::endl; // 0x7ffeeb5c4a18 (same address)
std::cout << "Dereferenced p1: " << *p1 << std::endl; // 3.14159 (accesses value at address)
Memory visualization:
Memory Address Variable Value
────────────── ──────── ─────────────────
0x7ffeeb5c4a18 x 3.14159 (double, 8 bytes)
0x7ffeeb5c4a20 p1 0x7ffeeb5c4a18 (pointer, 8 bytes)
└─ points to x
Common pointer types for scientific computing:
double* ptr1; // pointer to double (most common in scientific code)
float* ptr2; // pointer to float (GPU computing)
std::complex<double>* ptr3; // pointer to complex number
const double* ptr4; // pointer to constant double (can't modify *ptr4)
double* const ptr5 = &x; // constant pointer (can't change ptr5 itself)
Important: Uninitialized pointers contain garbage values (random addresses). Always initialize:
double x = 2.718;
double* ptr = nullptr; // Good: explicitly null
double* ptr2 = &x; // Good: points to valid object (address: e.g., 0x7ffeeb5c4b00)
double* ptr3; // Bad: contains garbage address (e.g., 0x000deadbeef)
*ptr3 = 5.0; // Crash - writing to random memory
The type of a pointer determines:
- What type of data it can point to
- How pointer arithmetic works (e.g., ptr + 1 moves by sizeof(type) bytes)
- Whether it can be null (all pointer types can be nullptr)
Dereferencing pointers
Use the asterisk (*) operator to access the value a pointer points to:
double x = 42.5;
double* ptr = &x; // ptr stores address of x (e.g., 0x7ffeeb5c4c10)
double value = *ptr; // dereference: value = 42.5
*ptr = 99.9; // modify through pointer
std::cout << x; // x is now 99.9
Practical example: passing arrays to functions
void compute_mean(const double* data, int size) {
double sum = 0.0;
for (int i = 0; i < size; i++) {
sum += data[i]; // same as *(data + i)
}
return sum / size;
}
std::vector<double> measurements = {1.2, 3.4, 5.6, 7.8};
double mean = compute_mean(measurements.data(), measurements.size());
Dangling pointers
A dangling pointer does not point to a valid object. This typically occurs when the object was deleted or deallocated, but the pointer was not updated. Accessing a dangling pointer leads to undefined behavior, crashes, or data corruption.
double* ptr = new double(3.14);
delete ptr; // memory freed
// ptr still holds old address (e.g., 0x600001234567)
*ptr = 2.71; // Crash - memory no longer valid
ptr = nullptr; // Good practice: mark as invalid
References
What is a reference?
A reference in C++ is an alias for another variable, created using the ampersand (&) symbol. Once a reference is initialized, it cannot be changed to refer to another variable.
double x = 3.14;
double& ref = x; // ref is an alias for x
ref = 2.71; // modifies x directly
std::cout << x; // prints 2.71
Why use references instead of pointers?
References provide a safer alternative to pointers:
Cannot be null or uninitialized (must always refer to a valid object)
More intuitive syntax (no need for dereferencing)
Prevent common pointer-related errors
Avoid copying large objects when passing to functions
// Pass by reference: no copy, direct access
void increment(double& value) {
value += 1.0; // modifies the original
}
double x = 5.5;
increment(x); // x is now 6.5
Analogy: pointers vs references
If a pointer is like holding someone’s address written on paper, a reference is like being given a permanent portal directly into their house—you don’t need to look up the address, you just walk in.
Pointers vs References comparison
Feature |
Pointer |
Reference |
|---|---|---|
Can be null? |
Yes |
No (must refer to something valid) |
Can be reseated? |
Yes ( |
No (always refers to the same object) |
Syntax for access |
|
Just use the name |
Used for |
Dynamic memory, low-level control |
Function parameters, operator overloads, clean aliasing |
How do you use references in C++?
- icon:
info
- color:
info
int x = 10;
int& ref = x; // ref refers to x
ref = 20; // modifies x directly
std::cout << x; // prints 20
What problem does reference solve?
- icon:
info
- color:
info
In C, you must use pointers to modify variables in functions:
void increment(int* p) {
(*p)++;
}
int main() {
int x = 5;
increment(&x); // must pass address
std::cout << x; // prints 6
}
This can lead to issues like null pointers or pointer arithmetic errors.
In C++, references provide a safer and more intuitive way:
void increment(int& r) {
r++;
}
int main() {
int x = 5;
increment(x); // pass by reference
std::cout << x; // prints 6
}
References eliminate pointer-related risks while maintaining efficiency.
CUDA C++ extensions for PyTorch
When writing custom CUDA kernels for PyTorch (e.g., for ptychography reconstruction, FFTs, or custom neural network layers), you divide code into host (CPU) and device (GPU) operations.
Host code responsibilities
1. Input validation and tensor interfacing
#include <torch/extension.h>
torch::Tensor ptychography_forward(
torch::Tensor probe, // complex probe function
torch::Tensor object, // object transmission
torch::Tensor positions // scan positions
) {
// Validate inputs
TORCH_CHECK(probe.is_cuda(), "probe must be CUDA tensor");
TORCH_CHECK(probe.dtype() == torch::kComplexFloat,
"probe must be complex32");
// Get tensor dimensions
int64_t batch_size = probe.size(0);
int64_t probe_size = probe.size(1);
2. Memory allocation and data preparation
// Allocate output tensor on GPU
auto options = torch::TensorOptions()
.dtype(torch::kComplexFloat)
.device(probe.device());
auto output = torch::zeros({batch_size, probe_size, probe_size}, options);
// Get raw pointers for CUDA kernel
auto probe_ptr = probe.data_ptr<c10::complex<float>>();
auto object_ptr = object.data_ptr<c10::complex<float>>();
auto output_ptr = output.data_ptr<c10::complex<float>>();
3. CUDA kernel launch configuration
// Calculate grid and block dimensions
const int threads = 256;
const int blocks = (batch_size * probe_size * probe_size + threads - 1) / threads;
// Launch kernel
ptychography_kernel<<<blocks, threads>>>(
probe_ptr,
object_ptr,
output_ptr,
batch_size,
probe_size
);
// Check for kernel errors
cudaError_t error = cudaGetLastError();
TORCH_CHECK(error == cudaSuccess,
"CUDA kernel failed: ", cudaGetErrorString(error));
4. Synchronization and output handling
// Wait for kernel completion
cudaDeviceSynchronize();
return output; // Return PyTorch tensor
}
// Bind to Python
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
m.def("forward", &ptychography_forward, "Ptychography forward pass");
}
Device code (CUDA kernel)
__global__ void ptychography_kernel(
const c10::complex<float>* probe,
const c10::complex<float>* object,
c10::complex<float>* output,
int batch_size,
int probe_size
) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int total = batch_size * probe_size * probe_size;
if (idx < total) {
// Complex multiplication for ptychography
output[idx] = probe[idx] * object[idx];
}
}
Why this division?
Host: Handles PyTorch tensor interface, memory management, validation, kernel orchestration
Device: Pure computation (complex arithmetic, FFTs, ptychography operations)
PyTorch integration: Host code bridges Python/C++/CUDA seamlessly
Common host operations for scientific computing
Input/output tensor validation (types, shapes, device placement)
Memory allocation (output tensors, workspace buffers)
Data layout conversions (contiguous memory, proper strides)
Kernel launch parameters (grid/block sizing)
Error checking and synchronization
Gradient computation setup (for backward pass)
Multi-GPU coordination (if using distributed computing)
Typical workflow pattern
// 1. Validate inputs (host)
TORCH_CHECK(input.is_cuda(), "input must be on CUDA");
TORCH_CHECK(input.is_contiguous(), "input must be contiguous");
// 2. Allocate outputs (host)
auto output = torch::empty_like(input);
// 3. Get raw pointers (host)
auto input_ptr = input.data_ptr<float>();
auto output_ptr = output.data_ptr<float>();
// 4. Launch kernel (host calls device)
const int threads = 256;
const int blocks = (input.numel() + threads - 1) / threads;
my_kernel<<<blocks, threads>>>(input_ptr, output_ptr, input.numel());
// 5. Synchronize (host)
cudaDeviceSynchronize();
// 6. Return PyTorch tensor (host)
return output;
Building and using the extension
setup.py for compilation:
from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension
setup(
name='ptychography_cuda',
ext_modules=[
CUDAExtension(
'ptychography_cuda',
['ptychography.cpp', 'ptychography_kernel.cu'],
extra_compile_args={
'cxx': ['-O3'],
'nvcc': ['-O3', '--use_fast_math']
}
)
],
cmdclass={'build_ext': BuildExtension}
)
Using in Python:
import torch
import ptychography_cuda
# Create complex tensors on GPU
probe = torch.randn(32, 256, 256, dtype=torch.complex64, device='cuda')
object_wave = torch.randn(256, 256, dtype=torch.complex64, device='cuda')
positions = torch.randn(32, 2, device='cuda')
# Call custom CUDA kernel
result = ptychography_cuda.forward(probe, object_wave, positions)
Memory management
Memory allocation in C
How does memory allocation work in C?
Memory management in C uses <stdlib.h> (or <cstdlib> in C++):
malloc(size): Allocates heap memory and returns avoid*pointerfree(ptr): Deallocates memory allocated bymalloc
Example:
int* ptr = (int*)malloc(sizeof(int) * n);
Memory allocation in C++
What are the new and delete operators in C++?
In C++, the new operator allocates memory for an object or array on the heap, while the delete operator deallocates that memory when it is no longer needed.
To allocate memory for a single object:
int* ptr = new int; // Allocates memory for a single integer
To allocate memory for an array of objects:
int* arr = new int[10]; // Allocates memory for an array of 10 integers
To deallocate memory for a single object:
delete ptr; // Deallocates memory for the single integer
To deallocate memory for an array of objects:
delete[] arr; // Deallocates memory for the array of integers
What is the difference between malloc/free in C and new/delete in C++?
malloc/free deal with raw bytes, while new/delete handle typed objects and call constructors/destructors.
class Widget {
public:
Widget() { std::cout << "Constructing\n"; }
~Widget() { std::cout << "Destructing\n"; }
};
Widget* w = new Widget(); // allocates and constructs
delete w; // destructs and deallocates
The ~Widget() destructor is called when delete is used, ensuring proper cleanup of resources. The Widget object is constructed when allocated with new.
Memory layout and organization
Why do we need organized memory layout?
Memory organization is fundamental to how programs execute. Memory isn’t just for storing variables and arrays—it’s also for storing function call information (parameters, return addresses, local variables), program instructions (the compiled code itself), and dynamically allocated data. The operating system and CPU work together to organize memory into different regions, each optimized for specific purposes:
Code segment: Stores your compiled program instructions
Static/global data: Stores global and static variables with fixed addresses
Stack: Manages function calls and local variables with automatic cleanup
Heap: Provides flexible dynamic memory for data whose size or lifetime isn’t known at compile time
This division allows the CPU to optimize memory access patterns, prevents different types of data from interfering with each other, and enables efficient memory management. Without this organization, programs would be slower, less reliable, and more prone to crashes.
The stack
What is the stack and why is it used for function calls?
The stack is a region of memory with Last-In-First-Out (LIFO) structure, used for managing function calls and local variables.
Complete memory layout visualization:
High memory addresses
┌─────────────────────────┐
│ Stack (grows down) │ ← Stack pointer (SP)
│ - Function calls │ Fast, automatic cleanup
│ - Local variables │ Size limit (~1-8 MB)
│ - Return addresses │
│ - Function parameters │
├─────────────────────────┤
│ (unused space) │
├─────────────────────────┤
│ Heap (grows up) │ ← Heap pointer
│ - Dynamic allocation │ Flexible, manual management
│ - new/malloc │ Size: GB+ (OS dependent)
│ - std::vector data │
│ - Large objects │
├─────────────────────────┤
│ Static/Global data │
│ - Global variables │ Fixed addresses
│ - Static variables │ Lifetime: entire program
│ - Constants │
├─────────────────────────┤
│ Code/Text segment │
│ - Program instructions│ Read-only
│ - Compiled functions │ Executable code
└─────────────────────────┘
Low memory addresses
How the stack works during function calls:
void bar(double x) {
double c = x + 1.0; // local variable on stack
}
void foo(double a) {
double b = 10.0; // local variable on stack
bar(a + b); // call bar
}
int main() {
foo(5.0); // call foo
return 0;
}
Stack evolution:
1. main() called:
┌──────────────┐
│ return addr │
└──────────────┘
2. foo(5.0) called:
┌──────────────┐
│ a = 5.0 │
│ b = 10.0 │
│ return addr │
├──────────────┤
│ return addr │ ← from main
└──────────────┘
3. bar(15.0) called:
┌──────────────┐
│ x = 15.0 │
│ c = 16.0 │
│ return addr │
├──────────────┤
│ a = 5.0 │
│ b = 10.0 │
│ return addr │
├──────────────┤
│ return addr │ ← from main
└──────────────┘
4. bar() returns, its frame is removed:
┌──────────────┐
│ a = 5.0 │
│ b = 10.0 │
│ return addr │
├──────────────┤
│ return addr │ ← from main
└──────────────┘
5. foo() returns, its frame is removed:
┌──────────────┐
│ return addr │ ← from main
└──────────────┘
Why use the stack for function calls?
Speed: Stack allocation is extremely fast—just move a pointer (no searching for free memory)
Automatic cleanup: When a function returns, its stack frame is automatically freed
Predictable lifetime: Local variables have clear scope and lifetime
CPU cache-friendly: Stack memory is accessed sequentially, which is optimal for CPU caches
No fragmentation: Memory is always allocated and freed in LIFO order
Simple and efficient: The hardware supports stack operations directly (push/pop instructions)
The heap
What is heap memory and how does it work?
Heap memory is a region for dynamic memory allocation where the size and lifetime of data can be determined at runtime.
Key characteristics:
Managed by the operating system’s memory allocator
Used for runtime memory allocation (
new/deletein C++,malloc/freein C)Much larger than stack (can be GB+, limited by available RAM)
Memory persists until explicitly freed
Slower allocation than stack (requires searching for free blocks)
Risk of memory leaks if not properly managed
Heap memory visualization:
Heap grows upward (toward higher addresses)
Before allocation:
┌─────────────────────────────────┐
│ Free space (unallocated) │ Size: GB+
└─────────────────────────────────┘
After allocations:
┌─────────────────────────────────┐
│ Free space │
├─────────────────────────────────┤
│ Block 3: double[1000000] │ ← ptr3 (8 MB)
│ (vector data) │
├─────────────────────────────────┤
│ Free (after deletion) │ ← Fragmentation
├─────────────────────────────────┤
│ Block 2: std::complex<double> │ ← ptr2 (16 bytes)
├─────────────────────────────────┤
│ Block 1: double │ ← ptr1 (8 bytes)
│ Value: 3.14159 │
└─────────────────────────────────┘
Heap base
Example with actual memory operations:
// Allocate single value on heap
double* ptr1 = new double(3.14159);
// Allocates 8 bytes on heap
// ptr1 stores heap address (e.g., 0x6000012a4b00)
// Allocate complex number
std::complex<double>* ptr2 = new std::complex<double>(1.0, 2.0);
// Allocates 16 bytes on heap
// ptr2 stores heap address (e.g., 0x6000012a4b10)
// Allocate large array via vector (internal heap allocation)
std::vector<double> data(1000000);
// Allocates 8 MB on heap for array storage
// vector manages this pointer internally
delete ptr1; // Free the 8 bytes
delete ptr2; // Free the 16 bytes
// data automatically freed when vector goes out of scope
Comparison with stack allocation:
void example() {
// Stack allocation (fast, automatic)
double x = 3.14; // 8 bytes on stack
double arr[100]; // 800 bytes on stack
// Heap allocation (flexible, manual)
double* p = new double(2.71); // 8 bytes on heap
double* big = new double[1000000]; // 8 MB on heap
// Stack overflow risk
// double huge[10000000]; // Crash - too big for stack
delete p;
delete[] big;
} // Stack memory automatically freed here
When to use heap vs stack:
Use stack: Small, fixed-size local variables, function parameters
Use heap: Large data, dynamic size, data that outlives function scope, shared data
Real-world scientific computing example:
void process_diffraction_data(int num_patterns) {
// Stack: small, fixed data
int pattern_size = 512;
double threshold = 0.001;
// Heap: large, runtime-sized data
std::vector<double> patterns(num_patterns * pattern_size * pattern_size);
std::vector<std::complex<double>> fft_result(pattern_size * pattern_size);
// Process data...
// All heap memory automatically cleaned up when vectors destroyed
}
Why does the OS divide memory into stack and heap?
The OS divides memory to optimize different usage patterns:
Stack Memory - Fast, fixed size (typically 1-8 MB) - LIFO (Last-In, First-Out) - For local variables and function calls - Automatic memory management - No fragmentation - Best for: small, short-lived data
Heap Memory - Flexible, variable size (GB+) - For dynamic data structures - Manual memory management (or RAII with smart pointers) - Can fragment over time - Best for: large data, dynamic lifetime, shared ownership
Example showing the difference:
void bad_function() {
double data[10000000]; // Stack overflow - 80 MB on limited stack
}
void good_function() {
std::vector<double> data(10000000); // Heap allocation - no problem
}
Memory fragmentation
What is memory fragmentation?
Memory fragmentation is when free memory becomes divided into small, scattered blocks, making it difficult to allocate large contiguous chunks of memory.
Why does memory fragmentation occur?
Fragmentation happens when: - Memory blocks are allocated in different sizes - Blocks are freed in random order - Gaps form between allocated blocks - These gaps are too small for new allocations
How can you avoid memory fragmentation?
Key strategies: - Use memory pools - Allocate larger blocks upfront - Minimize frequent small allocations - Use fixed-size allocations when possible - Monitor memory usage patterns
Memory pools
What are memory pools?
Memory pools are pre-allocated blocks of memory that manage smaller allocations internally:
Allocate one large block upfront
Manage smaller allocations within that block
Reduce fragmentation
Improve allocation speed
Simplify memory management
Can you show an example of a memory pool in C++?
Here is a simple example of a memory pool in C++:
#include <iostream>
#include <vector>
class MemoryPool {
public:
MemoryPool(size_t size) : poolSize(size), pool(new char[size]), offset(0) {}
~MemoryPool() {
delete[] pool;
}
void* allocate(size_t size) {
if (offset + size > poolSize) {
throw std::bad_alloc();
}
void* ptr = pool + offset;
offset += size;
return ptr;
}
void deallocate(void* ptr, size_t size) {
// In a simple memory pool, deallocation is often a no-op
}
private:
size_t poolSize;
char* pool;
size_t offset;
};
int main() {
MemoryPool memPool(1024); // Create a memory pool of 1024 bytes
int* intPtr = static_cast<int*>(memPool.allocate(sizeof(int)));
*intPtr = 42;
std::cout << "Allocated integer: " << *intPtr << std::endl;
memPool.deallocate(intPtr, sizeof(int)); // Deallocation is a no-op in this simple example
return 0;
}
Smart pointers
Memory management is error prone. C++ provides automatic memory management through smart pointers:
std::unique_ptr<T>: Exclusive ownership, auto-deletion when out of scope
std::shared_ptr<T>: Shared ownership with reference counting
std::weak_ptr<T>: Non-owning reference to prevent circular references
Practice examples
Pointer declarations
Basic pointer with actual memory addresses:
double* p; // pointer to double (uninitialized, contains garbage)
double x = 3.14159;
p = &x; // p now holds address of x (e.g., 0x7ffeeb5c4a18)
std::cout << "Value: " << x << std::endl; // 3.14159
std::cout << "Address: " << &x << std::endl; // 0x7ffeeb5c4a18
std::cout << "Pointer: " << p << std::endl; // 0x7ffeeb5c4a18
std::cout << "Dereferenced: " << *p << std::endl; // 3.14159
Multiple pointers on one line:
double *p1, *p2; // both are pointers to double
double* p1, p2; // p1 is a pointer to double, p2 is a double (confusing!)
Constant pointers:
double x = 3.14, y = 2.71;
const double* p1 = &x; // pointer to const double (can't modify *p1)
double* const p2 = &x; // const pointer (can't change address)
const double* const p3 = &x; // both const
Generic pointer:
void* vp = &x; // can point to any type, but you must cast to use it
Pointer to pointer:
double** pp = &p; // points to a pointer (e.g., stores 0x7ffeeb5c4a20)
Using references
How do you use references in C++?
- icon:
info
- color:
info
double x = 3.14;
double& ref = x; // ref refers to x
ref = 2.71; // modifies x directly
std::cout << x; // prints 2.71
What problem does reference solve?
- icon:
info
- color:
info
In C, you must use pointers to modify variables in functions:
void increment(double* p) {
(*p) += 1.0;
}
int main() {
double x = 5.5;
increment(&x); // must pass address
std::cout << x; // prints 6.5
}
This can lead to issues like null pointers or pointer arithmetic errors.
In C++, references provide a safer and more intuitive way:
void increment(double& r) {
r += 1.0;
}
int main() {
double x = 5.5;
increment(x); // pass by reference
std::cout << x; // prints 6.5
}
References eliminate pointer-related risks while maintaining efficiency.
- What is the concept of “null” in C++?
In C++, “null” refers to a pointer that does not point to any valid object or memory location. It is represented by the constant
nullptr(introduced in C++11) or the macroNULL(from C). A null pointer is used to indicate that the pointer is not currently assigned to any object.- Why do you want to have a pointer that is “null”?
Having a null pointer is useful for several reasons:
Indicating Absence: A null pointer can signify that a pointer is not currently pointing to any valid object, which is useful for optional data or uninitialized pointers.
Error Handling: Functions can return a null pointer to indicate failure or that no valid object could be returned.
Memory Management: It helps in managing dynamic memory by allowing pointers to be set to null after deallocation, preventing dangling pointers.
- What’s a good analogy to understand pointers vs references?
If a pointer is like holding someone’s address written on paper,
A reference is like being given a permanent portal directly into their house. You don’t need to look up the address — you just walk in.