C++ ==== Nov 7, 2025 - This is a learning journal for C++ questions focused on scientific computing. I used C++ and C as an undergrad but never built scientific applications. Now, with GPU programming and high performance computing, I want to revisit C++ for scientific computing. **Purpose of this documentation**: This page is designed as a practical reference to help you read and write C++ code faster for scientific computing. It focuses on common patterns, syntax, and concepts you'll encounter in real codebases, with emphasis on floating-point arithmetic and numerical computing. The following is sourced from various online references, Copilot, and my own experiments. I do not claim originality for any of the content here. Quick reference for ``&`` and ``*`` symbols ------------------------------------------- .. list-table:: :header-rows: 1 :widths: 20 20 30 15 15 * - Context - Code Example - Meaning of ``&`` - Type of Thing - Behavior / Effect * - In a type declaration - ``int& ref = x;`` - Declares a reference - Reference - ``ref`` becomes another name for ``x`` (no copy). Changing ``ref`` changes ``x``. * - In an expression - ``int* ptr = &x;`` - Address-of operator - Pointer - ``&x`` gets the memory address of ``x``, stored in ``ptr``. * - Using pointer dereference - ``*ptr = 10;`` - ``*`` means "value at address" - Pointer - * - Combining both - ``int& alias = *ptr;`` - ``*ptr`` gives the value of ``x``, ``alias`` refers to it - Reference + Pointer - ``alias`` becomes another name for whatever ``ptr`` points to. * - In a function parameter - ``void foo(int& n)`` - Passed by reference - Reference - Function uses the original variable, not a copy. * - In a copy constructor - ``Tracker(const Tracker& other)`` - Copy parameter safely and efficiently - Reference - Avoids recursion (no extra copies). Pointer in practice ^^^^^^^^^^^^^^^^^^ To get the device number in CUDA, you run this code: .. code-block:: cpp int device; cudaGetDevice(&device); The ``&`` means "address of". We want to pass the address of ``device`` so that ``cudaGetDevice`` can write the result into that memory location. Now, if you look at the function signature for ``cudaGetDevice`` (simplified): .. code-block:: cpp cudaError_t cudaGetDevice(int* device); // Ask the driver (lower-level API) which GPU is bound to this thread int currentDevice = driver_get_current_device(); // e.g. returns 1 // Copy that value into the caller’s memory *device = currentDevice; return cudaSuccess; Now, the parameter requires ``int* device``, it is the pointer type that holds the address of an integer that the user has entered. Inside the function, we use ``*device`` to dereference the pointer and write the value into that memory location. Why ``const`` is needed for member functions -------------------------------------------- The ``const`` keyword after a member function declaration indicates that the function will not modify the object's state. This is important for const-correctness and enables the function to be called on const objects. .. code-block:: cpp class Rectangle { int width, height; public: void setWidth(int w) { width = w; } // modifies state (non-const) int area() const { return width * height; } // doesn't modify state (const) }; Key points ^^^^^^^^^^ When you write ``int area() const``, the ``const`` applies to the hidden ``this`` pointer inside the method. C++ secretly treats member functions like this: .. code-block:: cpp int area(Rectangle* const this) { return this->width * this->height; } But if you mark the function as ``const``, it becomes: .. code-block:: cpp int area(const Rectangle* const this) { return this->width * this->height; } Now ``this`` points to a ``const Rectangle``, meaning the compiler won't let you modify any member variables inside that function. This ensures: - The function can be called on const objects - The function is guaranteed not to modify the object's state - Better code safety and clearer intent In contrast with Python ^^^^^^^^^^^^^^^^^^^^^^^ Python makes ``self`` explicit: .. code-block:: python class Rectangle: def area(self): return self.width * self.height C++ makes ``this`` implicit - it's there, you just don't have to say it. When you write ``width`` in a C++ member function, the compiler automatically treats it as ``this->width``. Numerical data types for scientific computing --------------------------------------------- .. list-table:: :header-rows: 1 :widths: 20 15 25 40 * - Type - Size - Precision - Scientific use * - ``float`` - 4 bytes (32-bit) - ~7 decimal digits - GPU computing (faster), machine learning, graphics * - ``double`` - 8 bytes (64-bit) - ~15-16 decimal digits - Default for scientific computing, general numerical work * - ``long double`` - 16 bytes (128-bit) - ~18-19 decimal digits - High-precision calculations, rarely needed * - ``std::complex`` - 8 bytes - Same as ``float`` - Quantum mechanics, signal processing (GPU) * - ``std::complex`` - 16 bytes - Same as ``double`` - Quantum mechanics, signal processing (default) * - ``int`` - 4 bytes - Exact integers - Loop counters, array indices only **Key points for scientific computing**: - **Always use ``double`` by default** unless you have a specific reason (GPU memory, performance) - **Use ``float`` for GPU computing** where memory bandwidth matters - **Use ``std::complex`` for complex numbers** in quantum mechanics, FFTs, signal processing - **Precision matters**: ``float`` accumulates error faster in iterative algorithms .. dropdown:: Understanding floating-point precision :icon: info :color: info Floating-point numbers have limited precision: .. code-block:: cpp float x = 1.0f / 3.0f; // 0.333333... (7 digits) double y = 1.0 / 3.0; // 0.333333333333333... (15 digits) // Accumulated error example float sum_f = 0.0f; double sum_d = 0.0; for (int i = 0; i < 1000000; i++) { sum_f += 0.1f; // Accumulates rounding error sum_d += 0.1; // Better, but still not perfect } std::cout << sum_f << std::endl; // ~99994.5 (wrong!) std::cout << sum_d << std::endl; // ~100000.0 (closer) **Rule of thumb**: For scientific computing with many iterations or accumulations, prefer ``double``. Use ``float`` only when profiling shows performance benefits. Complex numbers in C++ ---------------------- For quantum mechanics, signal processing, and Fourier transforms: .. code-block:: cpp #include #include // Declaration and initialization std::complex z1(3.0, 4.0); // 3 + 4i std::complex z2 = {1.0, -2.0}; // 1 - 2i std::complex z3(2.0f, 1.0f); // Use float for GPU // Basic operations auto z_sum = z1 + z2; // (4, 2) auto z_prod = z1 * z2; // (11, -2) auto z_conj = std::conj(z1); // 3 - 4i (complex conjugate) // Access real and imaginary parts double real_part = z1.real(); // 3.0 double imag_part = z1.imag(); // 4.0 // Magnitude and phase double magnitude = std::abs(z1); // 5.0 = sqrt(3²+4²) double phase = std::arg(z1); // ~0.927 radians // Exponential form (Euler's formula: e^(iθ) = cos(θ) + i*sin(θ)) std::complex z_polar = std::polar(5.0, 0.927); // 5*e^(i*0.927) **Common in scientific computing**: .. code-block:: cpp // Wave functions in quantum mechanics std::complex psi(0.707, 0.707); // normalized state // Fourier transform (simplified) std::vector> fft_result(1024); // Complex exponentials for signal processing double omega = 2.0 * M_PI * 440.0; // 440 Hz std::complex signal = std::exp(std::complex(0.0, omega * t)); Array initialization ------------------- Understanding various initialization methods is essential because different codebases prefer different styles. Some projects use classic C-style initialization, while others adopt modern C++11 brace initialization. Being familiar with all approaches helps you read, understand, and contribute to diverse codebases more effectively. 1. Classic C-style ^^^^^^^^^^^^^^^^^^ .. code-block:: cpp double arr[3] = {1.0, 2.5, 3.14}; This is the most common way. Specify the size and the list. If you make the list shorter than the size, the remaining elements are initialized to zero: .. code-block:: cpp double arr[5] = {1.0, 2.5}; // becomes {1.0, 2.5, 0.0, 0.0, 0.0} If you omit the size, it is deduced from the initializer list: .. code-block:: cpp double arr[] = {1.0, 2.5, 3.14}; // size is 3 2. Default (zero) initialization ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: cpp double arr[3] = {}; // all elements are 0.0 double arr2[3] = {0}; // also all elements are 0.0 3. C++11 uniform (brace) initialization ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: cpp double arr[3]{1.0, 2.5, 3.14}; // same as = {1.0, 2.5, 3.14}, but modern style double arr2[]{1.0, 2.5, 3.14}; // size deduced automatically double arr3[5]{}; // all zeros 4. Using memset (low-level, C-style) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: cpp double arr[3]; std::memset(arr, 0, sizeof(arr)); // set all bytes to 0 Note: This does not support non-zero initialization directly. **Use with caution for floating-point** - it's safer to use loop initialization or brace initialization. 5. Runtime initialization (loops or algorithms) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: cpp double arr[3]; for (int i = 0; i < 3; ++i) arr[i] = static_cast(i) + 1.5; // {1.5, 2.5, 3.5} Uninitialized arrays ^^^^^^^^^^^^^^^^^^^^ .. dropdown:: What happens with ``double arr[3];`` without initialization within a function? :icon: info :color: warning .. code-block:: cpp void foo() { double arr[3]; std::cout << arr[0]; // ❌ undefined behavior (garbage) } ``arr`` is a local automatic variable. It lives on **stack memory**, and the contents are uninitialized. **What is stack memory?** Stack memory is a region of memory used for local variables and function calls. It operates in a Last-In-First-Out (LIFO) manner. When a function is called, space is allocated on the stack for its local variables. When the function returns, that space is freed. **Why is it undefined behavior?** When you declare ``double arr[3];`` without initialization, the compiler allocates space on the stack but does not clear or set those memory locations. Whatever data was previously stored in those memory locations remains there. This is called "garbage" data—it could be zeros, could be leftover values from previous function calls, or any random bit pattern. Reading this uninitialized memory leads to unpredictable program behavior, which is why it's called undefined behavior. .. dropdown:: What about ``double arr[3];`` for global or static arrays? :icon: info :color: info .. code-block:: cpp double arr[3]; // global → all elements = 0.0 void foo() { static double arr2[3]; // static local → also all 0.0 } Global and static arrays are automatically zero-initialized. **What does "static" mean here?** The ``static`` keyword in this context means the variable has **static storage duration**. Unlike normal local variables that are created on the stack each time a function is called and destroyed when the function returns, a static local variable: - Is created only once, the first time the function is called - Persists for the entire lifetime of the program - Retains its value between function calls - Is stored in a separate memory region (not on the stack), similar to global variables - Is automatically zero-initialized by the compiler (just like global variables) Example: .. code-block:: cpp void counter() { static int count = 0; // initialized only once count++; std::cout << count << std::endl; } int main() { counter(); // prints 1 counter(); // prints 2 counter(); // prints 3 } The ``count`` variable persists between function calls, unlike a normal local variable which would reset to 0 each time. Array initialization summary ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: :header-rows: 1 :widths: 25 35 40 * - Form - Example - Notes * - Explicit C-style - ``double arr[3] = {1.0, 2.5, 3.14};`` - Classic * - Implicit size - ``double arr[] = {1.0, 2.5, 3.14};`` - Compiler deduces size * - Partial init - ``double arr[5] = {1.0, 2.5};`` - Remaining elements = 0.0 * - Zero init - ``double arr[3] = {};`` - All zeros * - Brace init - ``double arr[3]{1.0, 2.5, 3.14};`` - C++11 style * - Heap array - ``double* arr = new double[3]{1.0, 2.5, 3.14};`` - Manual memory * - ``std::array`` - ``std::array a{1.0,2.5,3.14};`` - Safer fixed-size * - ``std::vector`` - ``std::vector v{1.0,2.5,3.14};`` - Resizable container .. dropdown:: Why std::array and std::vector exist :icon: info :color: info C++ inherited raw C arrays (``double arr[5]``), but they have fundamental safety issues. The Standard Library introduced ``std::array`` and ``std::vector`` to solve these problems: **The problem with C arrays** .. code-block:: cpp double arr[5] = {1.0, 2.5, 3.14, 4.2, 5.7}; // No size information - decays to pointer void process(double arr[]) { // really just double* int size = sizeof(arr); // Wrong! Gives pointer size, not array size } // No bounds checking arr[10] = 42.0; // Compiles but writes to random memory (undefined behavior) // Can't be copied or returned easily double arr2[5] = arr; // Compilation error // Can't be resized // No way to dynamically change size **std::array: Safe fixed-size arrays** Created to replace C arrays with zero runtime overhead while adding safety: .. code-block:: cpp std::array arr = {1.0, 2.5, 3.14, 4.2, 5.7}; // Knows its size void process(std::array& arr) { int size = arr.size(); // Returns 5 } // Optional bounds checking arr.at(10) = 42.0; // Throws exception (debug builds) arr[10] = 42.0; // Undefined behavior (same as C array, but you have .at()) // Can be copied and returned std::array arr2 = arr; // Works // Works with STL algorithms std::sort(arr.begin(), arr.end()); **Key limitation**: Size must be known at compile time. Same memory layout as C arrays (lives on stack). **std::vector: Dynamic arrays with memory management** Created to provide dynamic sizing with automatic memory management: .. code-block:: cpp std::vector vec = {1.0, 2.5, 3.14, 4.2, 5.7}; // Size determined at runtime int n; std::cin >> n; std::vector data(n); // Size from user input // Can grow and shrink vec.push_back(6.8); // Add element vec.pop_back(); // Remove element vec.resize(100); // Change size // Automatic memory management (RAII) { std::vector temp(1000000); // Allocates memory from heap // use temp... } // Destructor automatically frees memory // Prevents stack overflow for large data std::vector big_data(10000000); // Works (heap allocation) **Key tradeoff**: Small overhead for dynamic capability (stores size, capacity, and pointer to heap memory). **Summary** - **C arrays**: Fast but unsafe, no size info, can't be copied, fixed size at compile time - **std::array**: Safe wrapper for C arrays, same performance, knows its size, but still fixed at compile time - **std::vector**: Dynamic sizing, automatic memory management, slight overhead, lives on heap Modern C++ code should prefer ``std::array`` for fixed sizes and ``std::vector`` for dynamic data. Use raw C arrays only when interfacing with C APIs or for specific performance reasons (measure first!). Practical tips for working with large datasets ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ When working with large arrays or datasets in scientific computing: **Use ``std::vector`` for dynamic data** .. code-block:: cpp std::vector data(1000000); // 1 million doubles, all zeros std::vector data2(1000000, 3.14); // all initialized to π std::vector> wavefunction(1024); // complex array .. dropdown:: What does "automatically manage memory" mean? :icon: info :color: info Vectors handle allocation and deallocation behind the scenes. When you create a vector, it allocates memory from the heap. When the vector goes out of scope, it automatically frees that memory. You don't need to call ``new``/``delete`` or ``malloc``/``free`` manually. .. code-block:: cpp { std::vector data(1000); // memory allocated // use data... } // vector destroyed, memory automatically freed This prevents memory leaks compared to manual management: .. code-block:: cpp double* data = new double[1000]; // manual allocation // use data... delete[] data; // must remember to free! (easy to forget) .. dropdown:: What about GPU computing and fine control? :icon: info :color: info For GPU computing (CUDA, OpenCL, etc.), you're right that you need fine control over memory transfers between CPU and GPU. In those cases: - **CPU-side preparation**: ``std::vector`` is commonly used to prepare data on the CPU side before transferring to GPU - **GPU memory**: You still use explicit GPU memory APIs (``cudaMalloc``, ``cudaMemcpy``, etc.) for GPU-side buffers - **Why still use vectors?**: Even in GPU programming, vectors are preferred on the CPU side for their safety and convenience. You get the data pointer with ``data.data()`` when you need to copy to GPU .. code-block:: cpp // Common pattern in CUDA programming std::vector host_data(1000000, 1.0f); // CPU side, safe and easy float* device_data; cudaMalloc(&device_data, host_data.size() * sizeof(float)); // GPU side cudaMemcpy(device_data, host_data.data(), /* ... */); // transfer // ... GPU computation ... cudaFree(device_data); // manual GPU cleanup // host_data automatically cleaned up when out of scope For high-performance computing, the pattern is: use modern C++ containers (vectors) for CPU data management, and explicit APIs for GPU/specialized memory. **Use ``std::array`` for fixed-size data** .. code-block:: cpp std::array buffer{}; // all zeros, safer than C arrays std::array position{0.0f, 0.0f, 0.0f}; // 3D coordinates Provides bounds checking in debug mode and works with STL algorithms. .. dropdown:: What are STL algorithms? :icon: info :color: info STL (Standard Template Library) algorithms are pre-built functions in ```` for common operations on containers. Examples: .. code-block:: cpp #include #include std::array arr = {3.1, 1.4, 4.2, 1.0, 5.9}; // Sort the array std::sort(arr.begin(), arr.end()); // Find an element auto it = std::find(arr.begin(), arr.end(), 4); // Transform elements (square each value) std::transform(arr.begin(), arr.end(), arr.begin(), [](int x) { return x * x; }); // Count occurrences int count = std::count(arr.begin(), arr.end(), 1); These work seamlessly with ``std::array`` and ``std::vector``, but not with raw C arrays. **Analogy to NumPy** ``std::array`` and ``std::vector`` are analogous to NumPy's ``ndarray`` in Python: .. code-block:: python # Python with NumPy import numpy as np arr = np.array([1.0, 2.5, 3.14, 4.2, 5.9]) arr.sort() # method on array arr.mean() # built-in statistical function arr * 2 # vectorized operations .. code-block:: cpp // C++ with std::array std::array arr = {1.0, 2.5, 3.14, 4.2, 5.9}; std::sort(arr.begin(), arr.end()); // algorithm // No built-in mean, but can use STL accumulate double mean = std::accumulate(arr.begin(), arr.end(), 0.0) / arr.size(); std::transform(arr.begin(), arr.end(), arr.begin(), [](double x) { return x * 2.0; }); // element-wise operation Key difference: NumPy has more built-in numerical methods (mean, std, etc.), while C++ requires STL algorithms or external libraries (like Eigen or Armadillo) for numerical computing. However, both provide contiguous memory layout for performance. **Reserve capacity for vectors when size is known** .. code-block:: cpp std::vector data; data.reserve(1000000); // pre-allocate memory to avoid reallocations for (int i = 0; i < 1000000; ++i) { data.push_back(static_cast(i) * 0.001); // convert to double } This avoids multiple memory reallocations during growth. **Avoid uninitialized arrays for large data** .. code-block:: cpp // Bad: uninitialized, potentially dangerous double data[1000000]; // Good: explicitly zero-initialized double data[1000000] = {}; // Better: use vector std::vector data(1000000); For large datasets, uninitialized data can lead to hard-to-debug issues. .. dropdown:: Why is vector better than array here? :icon: info :color: info **Automatic memory management** - Arrays like ``double data[1000000]`` allocate on the stack (limited size, typically a few MB) - Large arrays can cause stack overflow - Vectors allocate on the heap automatically, no size limit concerns **Safety and flexibility** - If you need to change the size later, vectors can resize; arrays cannot - Vectors know their own size (``data.size()``); arrays do not - Vectors automatically clean up when going out of scope Example of the problem with large stack arrays: .. code-block:: cpp void foo() { double data[10000000]; // Stack overflow (80 MB on stack) } void bar() { std::vector data(10000000); // Works fine (heap allocation) } .. dropdown:: When should I use ``std::array`` vs ``std::vector``? :icon: info :color: info **Use ``std::array`` when:** - Size is fixed and known at compile time - Small, performance-critical code (stack allocation is faster) - You want array bounds checking with ``.at()`` .. code-block:: cpp std::array buffer; // fixed size, stack allocated **Use ``std::vector`` when:** - Size is determined at runtime or might change - Size is large (> few KB) - You need to resize, add, or remove elements .. code-block:: cpp std::vector data(n); // n determined at runtime data.push_back(42.5); // can grow dynamically **Use raw C arrays when:** - Working with legacy code or C APIs - Interfacing with GPU memory APIs - Extreme performance optimization (but measure first!) .. code-block:: cpp double buffer[256]; // C-style, minimal overhead **Use memory-efficient initialization for specific patterns** .. code-block:: cpp // Fill with specific value std::vector data(1000000, 3.14); // Or use std::fill std::vector data(1000000); std::fill(data.begin(), data.end(), 3.14); // Generate values with a pattern std::vector data(1000); std::iota(data.begin(), data.end(), 0.0); // 0.0, 1.0, 2.0, ..., 999.0 **Only zero-fill when you need zero** Zeroing a large buffer is O(n). If you will overwrite everything anyway, skip the zeroing and write the data once: .. code-block:: cpp // Wasteful: zeros then overwrites std::vector data(1000000); // zeros all elements for (int i = 0; i < 1000000; ++i) { data[i] = compute_value(i); // overwrites the zeros } // Better: allocate but don't initialize, then fill once std::vector data; data.reserve(1000000); // allocate space without initializing for (int i = 0; i < 1000000; ++i) { data.push_back(compute_value(i)); // write data once } **Prefer standard containers over raw arrays** Standard containers like ``std::vector`` and ``std::array`` offer significant advantages over raw C arrays: Fixed size known at compile time: .. code-block:: cpp std::array data{}; // safer than double data[1000] Large or variable size: .. code-block:: cpp std::vector data(1000000); // contiguous, cache-friendly **Why prefer standard containers?** - **Automatic memory management**: ``std::vector`` handles allocation and deallocation, preventing memory leaks - **Bounds checking**: ``std::array::at()`` and ``std::vector::at()`` check bounds in debug mode - **Size tracking**: containers know their own size (``data.size()``), unlike raw arrays - **STL algorithm compatibility**: works seamlessly with ``std::sort``, ``std::find``, ``std::transform``, etc. - **Exception safety**: proper cleanup on exceptions, unlike raw arrays with manual memory management - **Resizable**: ``std::vector`` can grow dynamically without manual reallocation - **No pointer arithmetic errors**: iterators are safer than pointer manipulation - **Similar to NumPy arrays**: just as NumPy arrays are preferred over raw Python lists for numerical work, ``std::vector`` is preferred for similar reasons—contiguous memory layout enables cache efficiency and SIMD optimizations Pointers -------- **What is a pointer?** A pointer in C++ is a variable that stores the memory address of another object or function. It provides direct memory access and flexibility, but requires careful management. While powerful, pointers are more primitive and potentially dangerous compared to references. They can be null, uninitialized, or point to arbitrary memory locations, which makes them both powerful and risky to use. **Pointer basics** - Pointers are stored in memory managed by the operating system - Size: 4 bytes on 32-bit systems, 8 bytes on 64-bit systems (need to address 2^32 or 2^64 memory locations) - Example memory address: `0x7ffee3bff6c8` (hexadecimal format) **Null pointers** In C++, null refers to a pointer that does not point to any valid object. It is represented by `nullptr` (C++11) or `NULL` (from C). Why use null pointers? - **Indicating absence**: Signify that a pointer is not currently pointing to any valid object Pointers -------- **What is a pointer?** A pointer is a variable that stores the memory address of another variable. Pointers are essential for dynamic memory allocation, data structures, and interfacing with C libraries. **Why use pointers?** - **Dynamic memory**: Allocate memory at runtime (essential for large datasets) - **Efficiency**: Pass large objects without copying - **Data structures**: Build linked lists, trees, graphs - **Hardware access**: Interface with GPU memory, C APIs, hardware registers - **Polymorphism**: Access derived classes through base class pointers - **Error handling**: Functions can return null to indicate failure - **Memory management**: Set pointers to null after deallocation to prevent dangling pointers **Declaring pointers with concrete examples** Pointers have types and can be null. Each pointer declaration specifies what type of data it points to. Here's what actually happens in memory: .. code-block:: cpp #include double x = 3.14159; double* p1 = &x; // p1 stores address of x std::cout << "Value of x: " << x << std::endl; // 3.14159 std::cout << "Address of x: " << &x << std::endl; // 0x7ffeeb5c4a18 (example) std::cout << "Value of p1: " << p1 << std::endl; // 0x7ffeeb5c4a18 (same address) std::cout << "Dereferenced p1: " << *p1 << std::endl; // 3.14159 (accesses value at address) **Memory visualization**: .. code-block:: text Memory Address Variable Value ────────────── ──────── ───────────────── 0x7ffeeb5c4a18 x 3.14159 (double, 8 bytes) 0x7ffeeb5c4a20 p1 0x7ffeeb5c4a18 (pointer, 8 bytes) └─ points to x **Common pointer types for scientific computing**: .. code-block:: cpp double* ptr1; // pointer to double (most common in scientific code) float* ptr2; // pointer to float (GPU computing) std::complex* ptr3; // pointer to complex number const double* ptr4; // pointer to constant double (can't modify *ptr4) double* const ptr5 = &x; // constant pointer (can't change ptr5 itself) **Important**: Uninitialized pointers contain garbage values (random addresses). Always initialize: .. code-block:: cpp double x = 2.718; double* ptr = nullptr; // Good: explicitly null double* ptr2 = &x; // Good: points to valid object (address: e.g., 0x7ffeeb5c4b00) double* ptr3; // Bad: contains garbage address (e.g., 0x000deadbeef) *ptr3 = 5.0; // Crash - writing to random memory The type of a pointer determines: - What type of data it can point to - How pointer arithmetic works (e.g., ``ptr + 1`` moves by ``sizeof(type)`` bytes) - Whether it can be null (all pointer types can be ``nullptr``) **Dereferencing pointers** Use the asterisk (*) operator to access the value a pointer points to: .. code-block:: cpp double x = 42.5; double* ptr = &x; // ptr stores address of x (e.g., 0x7ffeeb5c4c10) double value = *ptr; // dereference: value = 42.5 *ptr = 99.9; // modify through pointer std::cout << x; // x is now 99.9 **Practical example: passing arrays to functions** .. code-block:: cpp void compute_mean(const double* data, int size) { double sum = 0.0; for (int i = 0; i < size; i++) { sum += data[i]; // same as *(data + i) } return sum / size; } std::vector measurements = {1.2, 3.4, 5.6, 7.8}; double mean = compute_mean(measurements.data(), measurements.size()); **Dangling pointers** A dangling pointer does not point to a valid object. This typically occurs when the object was deleted or deallocated, but the pointer was not updated. Accessing a dangling pointer leads to undefined behavior, crashes, or data corruption. .. code-block:: cpp double* ptr = new double(3.14); delete ptr; // memory freed // ptr still holds old address (e.g., 0x600001234567) *ptr = 2.71; // Crash - memory no longer valid ptr = nullptr; // Good practice: mark as invalid References ---------- **What is a reference?** A reference in C++ is an alias for another variable, created using the ampersand (&) symbol. Once a reference is initialized, it cannot be changed to refer to another variable. .. code-block:: cpp double x = 3.14; double& ref = x; // ref is an alias for x ref = 2.71; // modifies x directly std::cout << x; // prints 2.71 **Why use references instead of pointers?** References provide a safer alternative to pointers: - Cannot be null or uninitialized (must always refer to a valid object) - More intuitive syntax (no need for dereferencing) - Prevent common pointer-related errors - Avoid copying large objects when passing to functions .. code-block:: cpp // Pass by reference: no copy, direct access void increment(double& value) { value += 1.0; // modifies the original } double x = 5.5; increment(x); // x is now 6.5 **Analogy: pointers vs references** If a pointer is like holding someone's address written on paper, a reference is like being given a permanent portal directly into their house—you don't need to look up the address, you just walk in. **Pointers vs References comparison** .. list-table:: :header-rows: 1 :widths: 20 40 40 * - Feature - Pointer - Reference * - Can be null? - Yes - No (must refer to something valid) * - Can be reseated? - Yes (``p = &y;``) - No (always refers to the same object) * - Syntax for access - ``*p`` (dereference) - Just use the name * - Used for - Dynamic memory, low-level control - Function parameters, operator overloads, clean aliasing .. dropdown:: How do you use references in C++? :icon: info :color: info .. code-block:: cpp int x = 10; int& ref = x; // ref refers to x ref = 20; // modifies x directly std::cout << x; // prints 20 .. dropdown:: What problem does reference solve? :icon: info :color: info In C, you must use pointers to modify variables in functions: .. code-block:: cpp void increment(int* p) { (*p)++; } int main() { int x = 5; increment(&x); // must pass address std::cout << x; // prints 6 } This can lead to issues like null pointers or pointer arithmetic errors. In C++, references provide a safer and more intuitive way: .. code-block:: cpp void increment(int& r) { r++; } int main() { int x = 5; increment(x); // pass by reference std::cout << x; // prints 6 } References eliminate pointer-related risks while maintaining efficiency. CUDA C++ extensions for PyTorch -------------------------------- When writing custom CUDA kernels for PyTorch (e.g., for ptychography reconstruction, FFTs, or custom neural network layers), you divide code into **host** (CPU) and **device** (GPU) operations. Host code responsibilities ^^^^^^^^^^^^^^^^^^^^^^^^^^ 1. Input validation and tensor interfacing """"""""""""""""""""""""""""""""""""""""""" .. code-block:: cpp #include torch::Tensor ptychography_forward( torch::Tensor probe, // complex probe function torch::Tensor object, // object transmission torch::Tensor positions // scan positions ) { // Validate inputs TORCH_CHECK(probe.is_cuda(), "probe must be CUDA tensor"); TORCH_CHECK(probe.dtype() == torch::kComplexFloat, "probe must be complex32"); // Get tensor dimensions int64_t batch_size = probe.size(0); int64_t probe_size = probe.size(1); 2. Memory allocation and data preparation """"""""""""""""""""""""""""""""""""""""" .. code-block:: cpp // Allocate output tensor on GPU auto options = torch::TensorOptions() .dtype(torch::kComplexFloat) .device(probe.device()); auto output = torch::zeros({batch_size, probe_size, probe_size}, options); // Get raw pointers for CUDA kernel auto probe_ptr = probe.data_ptr>(); auto object_ptr = object.data_ptr>(); auto output_ptr = output.data_ptr>(); 3. CUDA kernel launch configuration """"""""""""""""""""""""""""""""""" .. code-block:: cpp // Calculate grid and block dimensions const int threads = 256; const int blocks = (batch_size * probe_size * probe_size + threads - 1) / threads; // Launch kernel ptychography_kernel<<>>( probe_ptr, object_ptr, output_ptr, batch_size, probe_size ); // Check for kernel errors cudaError_t error = cudaGetLastError(); TORCH_CHECK(error == cudaSuccess, "CUDA kernel failed: ", cudaGetErrorString(error)); 4. Synchronization and output handling """""""""""""""""""""""""""""""""""""" .. code-block:: cpp // Wait for kernel completion cudaDeviceSynchronize(); return output; // Return PyTorch tensor } // Bind to Python PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { m.def("forward", &ptychography_forward, "Ptychography forward pass"); } Device code (CUDA kernel) ^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: cuda __global__ void ptychography_kernel( const c10::complex* probe, const c10::complex* object, c10::complex* output, int batch_size, int probe_size ) { int idx = blockIdx.x * blockDim.x + threadIdx.x; int total = batch_size * probe_size * probe_size; if (idx < total) { // Complex multiplication for ptychography output[idx] = probe[idx] * object[idx]; } } Why this division? ^^^^^^^^^^^^^^^^^^ - **Host**: Handles PyTorch tensor interface, memory management, validation, kernel orchestration - **Device**: Pure computation (complex arithmetic, FFTs, ptychography operations) - **PyTorch integration**: Host code bridges Python/C++/CUDA seamlessly Common host operations for scientific computing ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - Input/output tensor validation (types, shapes, device placement) - Memory allocation (output tensors, workspace buffers) - Data layout conversions (contiguous memory, proper strides) - Kernel launch parameters (grid/block sizing) - Error checking and synchronization - Gradient computation setup (for backward pass) - Multi-GPU coordination (if using distributed computing) Typical workflow pattern ^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: cpp // 1. Validate inputs (host) TORCH_CHECK(input.is_cuda(), "input must be on CUDA"); TORCH_CHECK(input.is_contiguous(), "input must be contiguous"); // 2. Allocate outputs (host) auto output = torch::empty_like(input); // 3. Get raw pointers (host) auto input_ptr = input.data_ptr(); auto output_ptr = output.data_ptr(); // 4. Launch kernel (host calls device) const int threads = 256; const int blocks = (input.numel() + threads - 1) / threads; my_kernel<<>>(input_ptr, output_ptr, input.numel()); // 5. Synchronize (host) cudaDeviceSynchronize(); // 6. Return PyTorch tensor (host) return output; Building and using the extension ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **setup.py for compilation**: .. code-block:: python from setuptools import setup from torch.utils.cpp_extension import BuildExtension, CUDAExtension setup( name='ptychography_cuda', ext_modules=[ CUDAExtension( 'ptychography_cuda', ['ptychography.cpp', 'ptychography_kernel.cu'], extra_compile_args={ 'cxx': ['-O3'], 'nvcc': ['-O3', '--use_fast_math'] } ) ], cmdclass={'build_ext': BuildExtension} ) **Using in Python**: .. code-block:: python import torch import ptychography_cuda # Create complex tensors on GPU probe = torch.randn(32, 256, 256, dtype=torch.complex64, device='cuda') object_wave = torch.randn(256, 256, dtype=torch.complex64, device='cuda') positions = torch.randn(32, 2, device='cuda') # Call custom CUDA kernel result = ptychography_cuda.forward(probe, object_wave, positions) Memory management ----------------- Memory allocation in C ^^^^^^^^^^^^^^^^^^^^^ .. dropdown:: How does memory allocation work in C? :icon: info :color: info Memory management in C uses ```` (or ```` in C++): - ``malloc(size)``: Allocates heap memory and returns a ``void*`` pointer - ``free(ptr)``: Deallocates memory allocated by ``malloc`` Example: ``int* ptr = (int*)malloc(sizeof(int) * n);`` Memory allocation in C++ ^^^^^^^^^^^^^^^^^^^^^^^^ .. dropdown:: What are the new and delete operators in C++? :icon: info :color: info In C++, the `new` operator allocates memory for an object or array on the heap, while the `delete` operator deallocates that memory when it is no longer needed. - To allocate memory for a single object: .. code-block:: cpp int* ptr = new int; // Allocates memory for a single integer - To allocate memory for an array of objects: .. code-block:: cpp int* arr = new int[10]; // Allocates memory for an array of 10 integers - To deallocate memory for a single object: .. code-block:: cpp delete ptr; // Deallocates memory for the single integer - To deallocate memory for an array of objects: .. code-block:: cpp delete[] arr; // Deallocates memory for the array of integers .. dropdown:: What is the difference between ``malloc/free`` in C and ``new/delete`` in C++? :icon: info :color: info ``malloc/free`` deal with raw bytes, while ``new/delete`` handle typed objects and call constructors/destructors. .. code-block:: cpp class Widget { public: Widget() { std::cout << "Constructing\n"; } ~Widget() { std::cout << "Destructing\n"; } }; Widget* w = new Widget(); // allocates and constructs delete w; // destructs and deallocates The ``~Widget()`` destructor is called when ``delete`` is used, ensuring proper cleanup of resources. The ``Widget`` object is constructed when allocated with ``new``. Memory layout and organization ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ **Why do we need organized memory layout?** Memory organization is fundamental to how programs execute. Memory isn't just for storing variables and arrays—it's also for storing function call information (parameters, return addresses, local variables), program instructions (the compiled code itself), and dynamically allocated data. The operating system and CPU work together to organize memory into different regions, each optimized for specific purposes: - **Code segment**: Stores your compiled program instructions - **Static/global data**: Stores global and static variables with fixed addresses - **Stack**: Manages function calls and local variables with automatic cleanup - **Heap**: Provides flexible dynamic memory for data whose size or lifetime isn't known at compile time This division allows the CPU to optimize memory access patterns, prevents different types of data from interfering with each other, and enables efficient memory management. Without this organization, programs would be slower, less reliable, and more prone to crashes. The stack """"""""" .. dropdown:: What is the stack and why is it used for function calls? :icon: info :color: info **The stack** is a region of memory with Last-In-First-Out (LIFO) structure, used for managing function calls and local variables. **Complete memory layout visualization**:: High memory addresses ┌─────────────────────────┐ │ Stack (grows down) │ ← Stack pointer (SP) │ - Function calls │ Fast, automatic cleanup │ - Local variables │ Size limit (~1-8 MB) │ - Return addresses │ │ - Function parameters │ ├─────────────────────────┤ │ (unused space) │ ├─────────────────────────┤ │ Heap (grows up) │ ← Heap pointer │ - Dynamic allocation │ Flexible, manual management │ - new/malloc │ Size: GB+ (OS dependent) │ - std::vector data │ │ - Large objects │ ├─────────────────────────┤ │ Static/Global data │ │ - Global variables │ Fixed addresses │ - Static variables │ Lifetime: entire program │ - Constants │ ├─────────────────────────┤ │ Code/Text segment │ │ - Program instructions│ Read-only │ - Compiled functions │ Executable code └─────────────────────────┘ Low memory addresses **How the stack works during function calls**: .. code-block:: cpp void bar(double x) { double c = x + 1.0; // local variable on stack } void foo(double a) { double b = 10.0; // local variable on stack bar(a + b); // call bar } int main() { foo(5.0); // call foo return 0; } **Stack evolution**:: 1. main() called: ┌──────────────┐ │ return addr │ └──────────────┘ 2. foo(5.0) called: ┌──────────────┐ │ a = 5.0 │ │ b = 10.0 │ │ return addr │ ├──────────────┤ │ return addr │ ← from main └──────────────┘ 3. bar(15.0) called: ┌──────────────┐ │ x = 15.0 │ │ c = 16.0 │ │ return addr │ ├──────────────┤ │ a = 5.0 │ │ b = 10.0 │ │ return addr │ ├──────────────┤ │ return addr │ ← from main └──────────────┘ 4. bar() returns, its frame is removed: ┌──────────────┐ │ a = 5.0 │ │ b = 10.0 │ │ return addr │ ├──────────────┤ │ return addr │ ← from main └──────────────┘ 5. foo() returns, its frame is removed: ┌──────────────┐ │ return addr │ ← from main └──────────────┘ **Why use the stack for function calls?** - **Speed**: Stack allocation is extremely fast—just move a pointer (no searching for free memory) - **Automatic cleanup**: When a function returns, its stack frame is automatically freed - **Predictable lifetime**: Local variables have clear scope and lifetime - **CPU cache-friendly**: Stack memory is accessed sequentially, which is optimal for CPU caches - **No fragmentation**: Memory is always allocated and freed in LIFO order - **Simple and efficient**: The hardware supports stack operations directly (push/pop instructions) The heap """""""" .. dropdown:: What is heap memory and how does it work? :icon: info :color: info **Heap memory** is a region for dynamic memory allocation where the size and lifetime of data can be determined at runtime. **Key characteristics**: - Managed by the operating system's memory allocator - Used for runtime memory allocation (``new``/``delete`` in C++, ``malloc``/``free`` in C) - Much larger than stack (can be GB+, limited by available RAM) - Memory persists until explicitly freed - Slower allocation than stack (requires searching for free blocks) - Risk of memory leaks if not properly managed **Heap memory visualization**:: Heap grows upward (toward higher addresses) Before allocation: ┌─────────────────────────────────┐ │ Free space (unallocated) │ Size: GB+ └─────────────────────────────────┘ After allocations: ┌─────────────────────────────────┐ │ Free space │ ├─────────────────────────────────┤ │ Block 3: double[1000000] │ ← ptr3 (8 MB) │ (vector data) │ ├─────────────────────────────────┤ │ Free (after deletion) │ ← Fragmentation ├─────────────────────────────────┤ │ Block 2: std::complex │ ← ptr2 (16 bytes) ├─────────────────────────────────┤ │ Block 1: double │ ← ptr1 (8 bytes) │ Value: 3.14159 │ └─────────────────────────────────┘ Heap base **Example with actual memory operations**: .. code-block:: cpp // Allocate single value on heap double* ptr1 = new double(3.14159); // Allocates 8 bytes on heap // ptr1 stores heap address (e.g., 0x6000012a4b00) // Allocate complex number std::complex* ptr2 = new std::complex(1.0, 2.0); // Allocates 16 bytes on heap // ptr2 stores heap address (e.g., 0x6000012a4b10) // Allocate large array via vector (internal heap allocation) std::vector data(1000000); // Allocates 8 MB on heap for array storage // vector manages this pointer internally delete ptr1; // Free the 8 bytes delete ptr2; // Free the 16 bytes // data automatically freed when vector goes out of scope **Comparison with stack allocation**: .. code-block:: cpp void example() { // Stack allocation (fast, automatic) double x = 3.14; // 8 bytes on stack double arr[100]; // 800 bytes on stack // Heap allocation (flexible, manual) double* p = new double(2.71); // 8 bytes on heap double* big = new double[1000000]; // 8 MB on heap // Stack overflow risk // double huge[10000000]; // Crash - too big for stack delete p; delete[] big; } // Stack memory automatically freed here **When to use heap vs stack**: - **Use stack**: Small, fixed-size local variables, function parameters - **Use heap**: Large data, dynamic size, data that outlives function scope, shared data **Real-world scientific computing example**: .. code-block:: cpp void process_diffraction_data(int num_patterns) { // Stack: small, fixed data int pattern_size = 512; double threshold = 0.001; // Heap: large, runtime-sized data std::vector patterns(num_patterns * pattern_size * pattern_size); std::vector> fft_result(pattern_size * pattern_size); // Process data... // All heap memory automatically cleaned up when vectors destroyed } .. dropdown:: Why does the OS divide memory into stack and heap? :icon: info :color: info The OS divides memory to optimize different usage patterns: **Stack Memory** - Fast, fixed size (typically 1-8 MB) - LIFO (Last-In, First-Out) - For local variables and function calls - Automatic memory management - No fragmentation - Best for: small, short-lived data **Heap Memory** - Flexible, variable size (GB+) - For dynamic data structures - Manual memory management (or RAII with smart pointers) - Can fragment over time - Best for: large data, dynamic lifetime, shared ownership **Example showing the difference**: .. code-block:: cpp void bad_function() { double data[10000000]; // Stack overflow - 80 MB on limited stack } void good_function() { std::vector data(10000000); // Heap allocation - no problem } Memory fragmentation ^^^^^^^^^^^^^^^^^^^ .. dropdown:: What is memory fragmentation? :icon: info :color: info Memory fragmentation is when free memory becomes divided into small, scattered blocks, making it difficult to allocate large contiguous chunks of memory. .. dropdown:: Why does memory fragmentation occur? :icon: info :color: info Fragmentation happens when: - Memory blocks are allocated in different sizes - Blocks are freed in random order - Gaps form between allocated blocks - These gaps are too small for new allocations .. dropdown:: How can you avoid memory fragmentation? :icon: info :color: info Key strategies: - Use memory pools - Allocate larger blocks upfront - Minimize frequent small allocations - Use fixed-size allocations when possible - Monitor memory usage patterns Memory pools ^^^^^^^^^^^ .. dropdown:: What are memory pools? :icon: info :color: info Memory pools are pre-allocated blocks of memory that manage smaller allocations internally: - Allocate one large block upfront - Manage smaller allocations within that block - Reduce fragmentation - Improve allocation speed - Simplify memory management .. dropdown:: Can you show an example of a memory pool in C++? :icon: info :color: info Here is a simple example of a memory pool in C++: .. code-block:: cpp #include #include class MemoryPool { public: MemoryPool(size_t size) : poolSize(size), pool(new char[size]), offset(0) {} ~MemoryPool() { delete[] pool; } void* allocate(size_t size) { if (offset + size > poolSize) { throw std::bad_alloc(); } void* ptr = pool + offset; offset += size; return ptr; } void deallocate(void* ptr, size_t size) { // In a simple memory pool, deallocation is often a no-op } private: size_t poolSize; char* pool; size_t offset; }; int main() { MemoryPool memPool(1024); // Create a memory pool of 1024 bytes int* intPtr = static_cast(memPool.allocate(sizeof(int))); *intPtr = 42; std::cout << "Allocated integer: " << *intPtr << std::endl; memPool.deallocate(intPtr, sizeof(int)); // Deallocation is a no-op in this simple example return 0; } Smart pointers ^^^^^^^^^^^^^ Memory management is error prone. C++ provides automatic memory management through smart pointers: - ``std::unique_ptr``: Exclusive ownership, auto-deletion when out of scope - ``std::shared_ptr``: Shared ownership with reference counting - ``std::weak_ptr``: Non-owning reference to prevent circular references Practice examples ----------------- Pointer declarations ^^^^^^^^^^^^^^^^^^^ Basic pointer with actual memory addresses: .. code-block:: cpp double* p; // pointer to double (uninitialized, contains garbage) double x = 3.14159; p = &x; // p now holds address of x (e.g., 0x7ffeeb5c4a18) std::cout << "Value: " << x << std::endl; // 3.14159 std::cout << "Address: " << &x << std::endl; // 0x7ffeeb5c4a18 std::cout << "Pointer: " << p << std::endl; // 0x7ffeeb5c4a18 std::cout << "Dereferenced: " << *p << std::endl; // 3.14159 Multiple pointers on one line: .. code-block:: cpp double *p1, *p2; // both are pointers to double double* p1, p2; // p1 is a pointer to double, p2 is a double (confusing!) Constant pointers: .. code-block:: cpp double x = 3.14, y = 2.71; const double* p1 = &x; // pointer to const double (can't modify *p1) double* const p2 = &x; // const pointer (can't change address) const double* const p3 = &x; // both const Generic pointer: .. code-block:: cpp void* vp = &x; // can point to any type, but you must cast to use it Pointer to pointer: .. code-block:: cpp double** pp = &p; // points to a pointer (e.g., stores 0x7ffeeb5c4a20) Using references ^^^^^^^^^^^^^^^^^^ .. dropdown:: How do you use references in C++? :icon: info :color: info .. code-block:: cpp double x = 3.14; double& ref = x; // ref refers to x ref = 2.71; // modifies x directly std::cout << x; // prints 2.71 .. dropdown:: What problem does reference solve? :icon: info :color: info In C, you must use pointers to modify variables in functions: .. code-block:: cpp void increment(double* p) { (*p) += 1.0; } int main() { double x = 5.5; increment(&x); // must pass address std::cout << x; // prints 6.5 } This can lead to issues like null pointers or pointer arithmetic errors. In C++, references provide a safer and more intuitive way: .. code-block:: cpp void increment(double& r) { r += 1.0; } int main() { double x = 5.5; increment(x); // pass by reference std::cout << x; // prints 6.5 } References eliminate pointer-related risks while maintaining efficiency. What is the concept of "null" in C++? In C++, "null" refers to a pointer that does not point to any valid object or memory location. It is represented by the constant ``nullptr`` (introduced in C++11) or the macro ``NULL`` (from C). A null pointer is used to indicate that the pointer is not currently assigned to any object. Why do you want to have a pointer that is "null"? Having a null pointer is useful for several reasons: 1. **Indicating Absence**: A null pointer can signify that a pointer is not currently pointing to any valid object, which is useful for optional data or uninitialized pointers. 2. **Error Handling**: Functions can return a null pointer to indicate failure or that no valid object could be returned. 3. **Memory Management**: It helps in managing dynamic memory by allowing pointers to be set to null after deallocation, preventing dangling pointers. What's a good analogy to understand pointers vs references? If a pointer is like holding someone's address written on paper, A reference is like being given a permanent portal directly into their house. You don't need to look up the address — you just walk in.