IO#
- class quantem.widget.io.IOResult(data: ndarray, pixel_size: float | None = None, units: str | None = None, title: str = '', labels: list[str] = <factory>, metadata: dict = <factory>, frame_metadata: list[dict] = <factory>)[source]#
Bases:
objectResult of reading a file or folder.
- data#
Float32 array, 2D (H, W) or 3D (N, H, W).
- Type:
np.ndarray
- frame_metadata#
Per-frame metadata for 5D stacks. One dict per frame, extracted from each source file (e.g. defocus, tilt angle, timestamp).
- property array#
- property name#
- describe(keys: list[str] | None = None, diff: bool = True)[source]#
Print a per-frame metadata table.
- Parameters:
keys (list of str, optional) – Metadata keys to show (short names, matched against the end of full HDF5 paths). If None, shows all keys (filtered by
diffwhen multiple frames exist).diff (bool, default True) – Only show columns where values differ across frames. Set
diff=Falseto show all columns including constants.
Examples
>>> result = IO.arina_folder("/data/session/", det_bin=8) >>> result.describe() # diff columns only >>> result.describe(diff=False) # all columns >>> result.describe(keys=["count_time", "photon_energy"])
File Loading
- static IO.file(source: str | Path | list[str | Path], *, dataset_path: str | None = None, file_type: str | None = None, recursive: bool = False) IOResult[source]#
Read one or more files.
- Parameters:
source (str, pathlib.Path, or list) – Single file path or list of file/folder paths.
dataset_path (str, optional) – Explicit HDF dataset path for
.emdfiles.file_type (str, optional) – Filter for folders in a list (passed to
IO.folder()).recursive (bool, default False) – For folders in a list (passed to
IO.folder()).
- Return type:
- static IO.folder(folder: str | Path | list, *, file_type: str | None = None, recursive: bool = False, dataset_path: str | None = None) IOResult[source]#
Read a folder of files into a stacked IOResult.
- Parameters:
folder (str, pathlib.Path, or list) – Folder path, or list of folder paths to merge into one stack.
file_type (str, optional) – File type to select (e.g.
"png","tiff","dm4"). If omitted, auto-detects from the folder contents.recursive (bool, default False) – Include files in subdirectories.
dataset_path (str, optional) – Explicit HDF dataset path for
.emdfiles.
- Return type:
- static IO.supported_formats() list[str][source]#
Return sorted list of supported file extensions (without dots).
GPU-Accelerated Loading
- static IO.arina_file(master_path: str | list[str], det_bin: int | str = 1, scan_bin: int = 1, scan_shape: tuple[int, int] | None = None, hot_pixel_filter: bool = True, backend: str = 'auto') IOResult[source]#
Load arina 4D-STEM data with GPU-accelerated decompression.
Accepts a single master file path (returns 4D) or a list of paths (returns 5D stacked along a new leading axis).
- Parameters:
master_path (str or list of str) – Path to one arina master HDF5 file, or a list of paths. A list of two or more files produces a 5D result.
det_bin (int or
"auto", optional) – Detector binning factor (applied to both axes)."auto"picks the smallest factor that fits in available RAM. Default 1.scan_bin (int, optional) – Scan (navigation) binning factor. Applied after loading as a mean over
scan_bin x scan_binneighborhoods. Requires 4D output (i.e.scan_shapemust be set or inferred). Default 1.scan_shape (tuple of (int, int), optional) – Reshape into
(scan_rows, scan_cols, det_rows, det_cols). If None anddet_bin > 1, inferred as(sqrt(n), sqrt(n)).hot_pixel_filter (bool, optional) – Zero out hot pixels on the detector (pixels > 5σ above median in the mean diffraction pattern). Default True.
backend (str, optional) – GPU backend:
"auto"(detect best available),"mps"(Apple Metal),"cuda","intel","cpu". Default"auto".
- Returns:
Single file:
.datashape(scan_r, scan_c, det_r, det_c). Multiple files:.datashape(n_files, scan_r, scan_c, det_r, det_c).- Return type:
Examples
Single file → 4D:
>>> result = IO.arina_file("SnMoS2s_001_master.h5", det_bin=2) >>> result.data.shape (512, 512, 96, 96)
Auto-detect smallest bin that fits in RAM:
>>> result = IO.arina_file("master.h5", det_bin="auto")
Cherry-pick specific files → 5D:
>>> result = IO.arina_file([ ... "scan_00_master.h5", ... "scan_03_master.h5", ... "scan_07_master.h5", ... ], det_bin=4) >>> result.data.shape (3, 256, 256, 48, 48)
Free GPU memory when done (important for large datasets):
>>> del widget # free MPS tensor held by Show4DSTEM >>> del result # free numpy array from IOResult >>> import torch, gc >>> gc.collect() >>> torch.mps.empty_cache() # release MPS allocator cache
- static IO.arina_folder(folder: str | Path | list[str | Path], det_bin: int | str = 1, scan_bin: int = 1, scan_shape: tuple[int, int] | None = None, hot_pixel_filter: bool = True, backend: str = 'auto', max_files: int = 50, recursive: bool = False, pattern: str | None = None) IOResult[source]#
Load arina master files from one or more folders into a 5D stack.
Finds every
*_master.h5, loads each withIO.arina_file(), and stacks them along a new leading axis.- Parameters:
folder (str, pathlib.Path, or list) – Directory (or list of directories) containing
*_master.h5files.det_bin – Forwarded to
IO.arina_file()for each file.scan_bin – Forwarded to
IO.arina_file()for each file.scan_shape – Forwarded to
IO.arina_file()for each file.hot_pixel_filter – Forwarded to
IO.arina_file()for each file.backend – Forwarded to
IO.arina_file()for each file.max_files (int, default 50) – Maximum number of master files to load. Prevents accidentally loading hundreds of files into RAM. Set to 0 for no limit.
recursive (bool, default False) – Search subdirectories for
*_master.h5files.pattern (str, optional) – Only load files whose stem contains this string (case-insensitive). E.g.
pattern="SnMoS2"loads only files with “SnMoS2” in the name.
- Returns:
.datahas shape(n_files, scan_rows, scan_cols, det_rows, det_cols)(5D) when each file produces 4D output..labelscontains the stem of each master file.- Return type:
Examples
Load all scans in a session folder:
>>> result = IO.arina_folder("/data/20260208/", det_bin=4) >>> result.data.shape (10, 256, 256, 48, 48)
Filter by sample name:
>>> result = IO.arina_folder("/data/", pattern="SnMoS2", det_bin=2)
Merge scans from multiple session folders:
>>> result = IO.arina_folder(["/data/day1/", "/data/day2/"], det_bin=8)
Search subdirectories:
>>> result = IO.arina_folder("/data/", recursive=True, pattern="focal", det_bin=4)
Free GPU memory when done (important for large datasets):
>>> del widget # free MPS tensor held by Show4DSTEM >>> del result # free numpy array from IOResult >>> import torch, gc >>> gc.collect() >>> torch.mps.empty_cache() # release MPS allocator cache
Examples#
Read a single file (any format):
from quantem.widget import IO, Show2D
result = IO.file("gold_nanoparticles.dm4")
print(result.pixel_size, result.units) # 1.43 Å
Show2D(result, show_fft=True, log_scale=True)
Read a folder of images as a stack:
from quantem.widget import IO, Show3D
result = IO.folder("/path/to/focal_series/", file_type="dm4")
print(result.data.shape) # (20, 4096, 4096)
print(result.labels) # ['image_001', 'image_002', ...]
Show3D(result, title="Focal Series")
Auto-detect file type (omit file_type):
# Auto-detects from folder contents (raises if mixed types)
result = IO.folder("/path/to/tiff_scans/")
Read multiple files into a stack:
result = IO.file([
"sample_region_A.dm4",
"sample_region_B.dm4",
"sample_region_C.dm4",
])
Show3D(result)
Merge multiple folders into one stack:
result = IO.folder([
"/path/to/session_1/",
"/path/to/session_2/",
], file_type="dm3")
# All images across both folders stacked into one (N, H, W) array
Read 4D-STEM data:
result = IO.file("4dstem_binned.h5")
print(result.data.shape) # (256, 256, 128, 128)
from quantem.widget import Show4DSTEM
Show4DSTEM(result, title="4D-STEM")
IOResult duck typing#
IOResult forwards NumPy array methods to the underlying .data array,
so you can use it directly in expressions:
result = IO.file("image.dm4")
result.shape # (1024, 1024)
result.dtype # float32
result.mean() # 0.42
# Reduce a 4D-STEM dataset to a virtual bright-field image
result = IO.arina_file("master.h5", det_bin=2)
vbf = result.sum(axis=(2, 3))
Show2D(vbf, title="Virtual Bright Field")
print(result) gives a human-readable summary:
IOResult
shape: 512 x 512 x 96 x 96
dtype: float32
title: SnMoS2s_001
pixel_size: 1.4298 Å
labels: ['frame_001', 'frame_002', 'frame_003', ...] (20 total)
metadata: ['General', 'Signal']
IO.arina_file — GPU-accelerated 4D-STEM loading#
IO.arina_file() decompresses bitshuffle+LZ4 data on the GPU via Apple Metal,
and optionally bins detector and/or scan axes on the fly.
(IO.arina() still works as an alias for backward compatibility.)
from quantem.widget import IO
# 2x2 detector binning (most common)
data = IO.arina_file("master.h5", det_bin=2)
# Auto-select bin factor based on available RAM
data = IO.arina_file("master.h5", det_bin="auto")
# Bin both detector and scan axes
data = IO.arina_file("master.h5", det_bin=2, scan_bin=2)
# Disable hot pixel filtering (on by default)
data = IO.arina_file("master.h5", det_bin=2, hot_pixel_filter=False)
Performance benchmarks#
Benchmarked on SnMoS2 dataset (262,144 frames, 192×192 detector, Apple M5). Steady-state times (second+ call — first call adds ~0.5s for JIT/Metal warmup):
Configuration |
Output shape |
Memory |
Time |
|---|---|---|---|
|
512 × 512 × 96 × 96 |
9.0 GB |
1.8 s |
|
512 × 512 × 48 × 48 |
2.3 GB |
1.7 s |
|
512 × 512 × 24 × 24 |
0.6 GB |
1.8 s |
|
256 × 256 × 96 × 96 |
2.3 GB |
2.0 s |
|
128 × 128 × 96 × 96 |
0.6 GB |
2.0 s |
The pipeline is double-buffered (CPU reads chunk N+1 while GPU decompresses chunk N). The bottleneck is GPU decompression: 262k frames of bitshuffle+LZ4 takes ~1.5s on M5 regardless of bin factor. The 1.7 GB disk read (8.2 GB/s SSD) is fully hidden.
Note
det_bin=1 (no binning) for this dataset requires ~18 GB of contiguous GPU
memory. Use det_bin="auto" to let IO pick the smallest bin factor that
fits in available RAM.
IO.arina_folder — batch 5D-STEM loading#
IO.arina_folder() finds all *_master.h5 files in a folder, loads each
with IO.arina_file(), and stacks them into a 5D dataset (time/tilt series).
from quantem.widget import IO
# Load all scans in a folder → 5D (n_files, scan_r, scan_c, det_r, det_c)
result = IO.arina_folder("/path/to/session/", det_bin=8)
print(result.data.shape) # (10, 256, 256, 24, 24)
# Incomplete files are auto-skipped with a warning
# "SKIPPED: [Errno 2] ... data_000003.h5 ... No such file or directory"
# View as 5D-STEM time series with frame slider
from quantem.widget import Show4DSTEM
Show4DSTEM(result, frame_dim_label="Scan")
Benchmarked on 12 Arina scans (65,536 frames each, 192×192 uint32 detector, Apple M5). 2 incomplete files auto-skipped, 10 loaded:
Configuration |
Output shape |
Memory |
Load |
|
|---|---|---|---|---|
|
10 × 256 × 256 × 24 × 24 |
1.5 GB |
9.5 s |
11.0 s |
|
10 × 256 × 256 × 48 × 48 |
6.0 GB |
10.8 s |
16.3 s |
Standard file loading performance#
Single files load in under 200 ms on any machine — no GPU required:
Format |
Size |
Time |
|---|---|---|
NPY |
1024 × 1024 |
1 ms |
DM3 |
4096 × 4096 |
14 ms |
DM4 |
4096 × 4096 |
14 ms |
TIFF |
2049 × 2040 |
41 ms |
PNG |
2048 × 2048 |
45 ms |
EMD (Velox) |
2048 × 2048 |
105 ms |
Folder loading scales linearly:
Folder |
Stack shape |
Time |
|---|---|---|
40 TIFFs (256×256) |
40 × 256 × 256 |
43 ms |
6 EMDs (2048×2048) |
6 × 2048 × 2048 |
65 ms |
3 PNGs (2048×2048) |
3 × 2048 × 2048 |
117 ms |
5 DM3s (4096×4096) |
5 × 4096 × 4096 |
150 ms |
Supported formats#
Native (no extra dependencies): PNG, JPEG, BMP, TIFF, EMD, HDF5, NPY, NPZ
Via rosettasciio (pip install rosettasciio): DM3, DM4, MRC, SER, and
60+ more formats.
GPU-accelerated: Arina 4D-STEM master files (IO.arina_file()) — requires
pyobjc-framework-Metal on macOS. CUDA and Intel GPU backends coming soon.