IO#

class quantem.widget.io.IOResult(data: ndarray, pixel_size: float | None = None, units: str | None = None, title: str = '', labels: list[str] = <factory>, metadata: dict = <factory>, frame_metadata: list[dict] = <factory>)[source]#

Bases: object

Result of reading a file or folder.

data#

Float32 array, 2D (H, W) or 3D (N, H, W).

Type:

np.ndarray

pixel_size#

Pixel size in angstroms, extracted from file metadata.

Type:

float or None

units#

Unit string (e.g. “Å”, “nm”).

Type:

str or None

title#

Title derived from filename stem.

Type:

str

labels#

One label per frame for stacks.

Type:

list of str

metadata#

Raw metadata tree from file.

Type:

dict

frame_metadata#

Per-frame metadata for 5D stacks. One dict per frame, extracted from each source file (e.g. defocus, tilt angle, timestamp).

Type:

list of dict

data: ndarray#
pixel_size: float | None = None#
units: str | None = None#
title: str = ''#
labels: list[str]#
metadata: dict#
frame_metadata: list[dict]#
property array#
property name#
describe(keys: list[str] | None = None, diff: bool = True)[source]#

Print a per-frame metadata table.

Parameters:
  • keys (list of str, optional) – Metadata keys to show (short names, matched against the end of full HDF5 paths). If None, shows all keys (filtered by diff when multiple frames exist).

  • diff (bool, default True) – Only show columns where values differ across frames. Set diff=False to show all columns including constants.

Examples

>>> result = IO.arina_folder("/data/session/", det_bin=8)
>>> result.describe()                    # diff columns only
>>> result.describe(diff=False)          # all columns
>>> result.describe(keys=["count_time", "photon_energy"])
class quantem.widget.io.IO[source]#

Bases: object

Unified file reader for electron microscopy data.

File Loading

static IO.file(source: str | Path | list[str | Path], *, dataset_path: str | None = None, file_type: str | None = None, recursive: bool = False) IOResult[source]#

Read one or more files.

Parameters:
  • source (str, pathlib.Path, or list) – Single file path or list of file/folder paths.

  • dataset_path (str, optional) – Explicit HDF dataset path for .emd files.

  • file_type (str, optional) – Filter for folders in a list (passed to IO.folder()).

  • recursive (bool, default False) – For folders in a list (passed to IO.folder()).

Return type:

IOResult

static IO.folder(folder: str | Path | list, *, file_type: str | None = None, recursive: bool = False, dataset_path: str | None = None) IOResult[source]#

Read a folder of files into a stacked IOResult.

Parameters:
  • folder (str, pathlib.Path, or list) – Folder path, or list of folder paths to merge into one stack.

  • file_type (str, optional) – File type to select (e.g. "png", "tiff", "dm4"). If omitted, auto-detects from the folder contents.

  • recursive (bool, default False) – Include files in subdirectories.

  • dataset_path (str, optional) – Explicit HDF dataset path for .emd files.

Return type:

IOResult

static IO.supported_formats() list[str][source]#

Return sorted list of supported file extensions (without dots).

GPU-Accelerated Loading

static IO.arina_file(master_path: str | list[str], det_bin: int | str = 1, scan_bin: int = 1, scan_shape: tuple[int, int] | None = None, hot_pixel_filter: bool = True, backend: str = 'auto') IOResult[source]#

Load arina 4D-STEM data with GPU-accelerated decompression.

Accepts a single master file path (returns 4D) or a list of paths (returns 5D stacked along a new leading axis).

Parameters:
  • master_path (str or list of str) – Path to one arina master HDF5 file, or a list of paths. A list of two or more files produces a 5D result.

  • det_bin (int or "auto", optional) – Detector binning factor (applied to both axes). "auto" picks the smallest factor that fits in available RAM. Default 1.

  • scan_bin (int, optional) – Scan (navigation) binning factor. Applied after loading as a mean over scan_bin x scan_bin neighborhoods. Requires 4D output (i.e. scan_shape must be set or inferred). Default 1.

  • scan_shape (tuple of (int, int), optional) – Reshape into (scan_rows, scan_cols, det_rows, det_cols). If None and det_bin > 1, inferred as (sqrt(n), sqrt(n)).

  • hot_pixel_filter (bool, optional) – Zero out hot pixels on the detector (pixels > 5σ above median in the mean diffraction pattern). Default True.

  • backend (str, optional) – GPU backend: "auto" (detect best available), "mps" (Apple Metal), "cuda", "intel", "cpu". Default "auto".

Returns:

Single file: .data shape (scan_r, scan_c, det_r, det_c). Multiple files: .data shape (n_files, scan_r, scan_c, det_r, det_c).

Return type:

IOResult

Examples

Single file → 4D:

>>> result = IO.arina_file("SnMoS2s_001_master.h5", det_bin=2)
>>> result.data.shape
(512, 512, 96, 96)

Auto-detect smallest bin that fits in RAM:

>>> result = IO.arina_file("master.h5", det_bin="auto")

Cherry-pick specific files → 5D:

>>> result = IO.arina_file([
...     "scan_00_master.h5",
...     "scan_03_master.h5",
...     "scan_07_master.h5",
... ], det_bin=4)
>>> result.data.shape
(3, 256, 256, 48, 48)

Free GPU memory when done (important for large datasets):

>>> del widget           # free MPS tensor held by Show4DSTEM
>>> del result           # free numpy array from IOResult
>>> import torch, gc
>>> gc.collect()
>>> torch.mps.empty_cache()  # release MPS allocator cache
static IO.arina_folder(folder: str | Path | list[str | Path], det_bin: int | str = 1, scan_bin: int = 1, scan_shape: tuple[int, int] | None = None, hot_pixel_filter: bool = True, backend: str = 'auto', max_files: int = 50, recursive: bool = False, pattern: str | None = None) IOResult[source]#

Load arina master files from one or more folders into a 5D stack.

Finds every *_master.h5, loads each with IO.arina_file(), and stacks them along a new leading axis.

Parameters:
  • folder (str, pathlib.Path, or list) – Directory (or list of directories) containing *_master.h5 files.

  • det_bin – Forwarded to IO.arina_file() for each file.

  • scan_bin – Forwarded to IO.arina_file() for each file.

  • scan_shape – Forwarded to IO.arina_file() for each file.

  • hot_pixel_filter – Forwarded to IO.arina_file() for each file.

  • backend – Forwarded to IO.arina_file() for each file.

  • max_files (int, default 50) – Maximum number of master files to load. Prevents accidentally loading hundreds of files into RAM. Set to 0 for no limit.

  • recursive (bool, default False) – Search subdirectories for *_master.h5 files.

  • pattern (str, optional) – Only load files whose stem contains this string (case-insensitive). E.g. pattern="SnMoS2" loads only files with “SnMoS2” in the name.

Returns:

.data has shape (n_files, scan_rows, scan_cols, det_rows, det_cols) (5D) when each file produces 4D output. .labels contains the stem of each master file.

Return type:

IOResult

Examples

Load all scans in a session folder:

>>> result = IO.arina_folder("/data/20260208/", det_bin=4)
>>> result.data.shape
(10, 256, 256, 48, 48)

Filter by sample name:

>>> result = IO.arina_folder("/data/", pattern="SnMoS2", det_bin=2)

Merge scans from multiple session folders:

>>> result = IO.arina_folder(["/data/day1/", "/data/day2/"], det_bin=8)

Search subdirectories:

>>> result = IO.arina_folder("/data/", recursive=True, pattern="focal", det_bin=4)

Free GPU memory when done (important for large datasets):

>>> del widget           # free MPS tensor held by Show4DSTEM
>>> del result           # free numpy array from IOResult
>>> import torch, gc
>>> gc.collect()
>>> torch.mps.empty_cache()  # release MPS allocator cache

Examples#

Read a single file (any format):

from quantem.widget import IO, Show2D

result = IO.file("gold_nanoparticles.dm4")
print(result.pixel_size, result.units)  # 1.43 Å
Show2D(result, show_fft=True, log_scale=True)

Read a folder of images as a stack:

from quantem.widget import IO, Show3D

result = IO.folder("/path/to/focal_series/", file_type="dm4")
print(result.data.shape)   # (20, 4096, 4096)
print(result.labels)       # ['image_001', 'image_002', ...]
Show3D(result, title="Focal Series")

Auto-detect file type (omit file_type):

# Auto-detects from folder contents (raises if mixed types)
result = IO.folder("/path/to/tiff_scans/")

Read multiple files into a stack:

result = IO.file([
    "sample_region_A.dm4",
    "sample_region_B.dm4",
    "sample_region_C.dm4",
])
Show3D(result)

Merge multiple folders into one stack:

result = IO.folder([
    "/path/to/session_1/",
    "/path/to/session_2/",
], file_type="dm3")
# All images across both folders stacked into one (N, H, W) array

Read 4D-STEM data:

result = IO.file("4dstem_binned.h5")
print(result.data.shape)  # (256, 256, 128, 128)

from quantem.widget import Show4DSTEM
Show4DSTEM(result, title="4D-STEM")

IOResult duck typing#

IOResult forwards NumPy array methods to the underlying .data array, so you can use it directly in expressions:

result = IO.file("image.dm4")
result.shape        # (1024, 1024)
result.dtype        # float32
result.mean()       # 0.42

# Reduce a 4D-STEM dataset to a virtual bright-field image
result = IO.arina_file("master.h5", det_bin=2)
vbf = result.sum(axis=(2, 3))
Show2D(vbf, title="Virtual Bright Field")

print(result) gives a human-readable summary:

IOResult
  shape:      512 x 512 x 96 x 96
  dtype:      float32
  title:      SnMoS2s_001
  pixel_size: 1.4298 Å
  labels:     ['frame_001', 'frame_002', 'frame_003', ...] (20 total)
  metadata:   ['General', 'Signal']

IO.arina_file — GPU-accelerated 4D-STEM loading#

IO.arina_file() decompresses bitshuffle+LZ4 data on the GPU via Apple Metal, and optionally bins detector and/or scan axes on the fly. (IO.arina() still works as an alias for backward compatibility.)

from quantem.widget import IO

# 2x2 detector binning (most common)
data = IO.arina_file("master.h5", det_bin=2)

# Auto-select bin factor based on available RAM
data = IO.arina_file("master.h5", det_bin="auto")

# Bin both detector and scan axes
data = IO.arina_file("master.h5", det_bin=2, scan_bin=2)

# Disable hot pixel filtering (on by default)
data = IO.arina_file("master.h5", det_bin=2, hot_pixel_filter=False)

Performance benchmarks#

Benchmarked on SnMoS2 dataset (262,144 frames, 192×192 detector, Apple M5). Steady-state times (second+ call — first call adds ~0.5s for JIT/Metal warmup):

Configuration

Output shape

Memory

Time

det_bin=2

512 × 512 × 96 × 96

9.0 GB

1.8 s

det_bin=4

512 × 512 × 48 × 48

2.3 GB

1.7 s

det_bin=8

512 × 512 × 24 × 24

0.6 GB

1.8 s

det_bin=2, scan_bin=2

256 × 256 × 96 × 96

2.3 GB

2.0 s

det_bin=2, scan_bin=4

128 × 128 × 96 × 96

0.6 GB

2.0 s

The pipeline is double-buffered (CPU reads chunk N+1 while GPU decompresses chunk N). The bottleneck is GPU decompression: 262k frames of bitshuffle+LZ4 takes ~1.5s on M5 regardless of bin factor. The 1.7 GB disk read (8.2 GB/s SSD) is fully hidden.

Note

det_bin=1 (no binning) for this dataset requires ~18 GB of contiguous GPU memory. Use det_bin="auto" to let IO pick the smallest bin factor that fits in available RAM.

IO.arina_folder — batch 5D-STEM loading#

IO.arina_folder() finds all *_master.h5 files in a folder, loads each with IO.arina_file(), and stacks them into a 5D dataset (time/tilt series).

from quantem.widget import IO

# Load all scans in a folder → 5D (n_files, scan_r, scan_c, det_r, det_c)
result = IO.arina_folder("/path/to/session/", det_bin=8)
print(result.data.shape)  # (10, 256, 256, 24, 24)

# Incomplete files are auto-skipped with a warning
# "SKIPPED: [Errno 2] ... data_000003.h5 ... No such file or directory"

# View as 5D-STEM time series with frame slider
from quantem.widget import Show4DSTEM
Show4DSTEM(result, frame_dim_label="Scan")

Benchmarked on 12 Arina scans (65,536 frames each, 192×192 uint32 detector, Apple M5). 2 incomplete files auto-skipped, 10 loaded:

Configuration

Output shape

Memory

Load

  • Show4DSTEM

det_bin=8 (10 files)

10 × 256 × 256 × 24 × 24

1.5 GB

9.5 s

11.0 s

det_bin=4 (10 files)

10 × 256 × 256 × 48 × 48

6.0 GB

10.8 s

16.3 s

Standard file loading performance#

Single files load in under 200 ms on any machine — no GPU required:

Format

Size

Time

NPY

1024 × 1024

1 ms

DM3

4096 × 4096

14 ms

DM4

4096 × 4096

14 ms

TIFF

2049 × 2040

41 ms

PNG

2048 × 2048

45 ms

EMD (Velox)

2048 × 2048

105 ms

Folder loading scales linearly:

Folder

Stack shape

Time

40 TIFFs (256×256)

40 × 256 × 256

43 ms

6 EMDs (2048×2048)

6 × 2048 × 2048

65 ms

3 PNGs (2048×2048)

3 × 2048 × 2048

117 ms

5 DM3s (4096×4096)

5 × 4096 × 4096

150 ms

Supported formats#

Native (no extra dependencies): PNG, JPEG, BMP, TIFF, EMD, HDF5, NPY, NPZ

Via rosettasciio (pip install rosettasciio): DM3, DM4, MRC, SER, and 60+ more formats.

GPU-accelerated: Arina 4D-STEM master files (IO.arina_file()) — requires pyobjc-framework-Metal on macOS. CUDA and Intel GPU backends coming soon.