Registry#

Dataset loading, discovery, and upload functions.

Notebook | Link |

|----------|——| | Demo | notebooks/demo.ipynb | | Open in Colab | |

quantem.data.available(technique: str | None = None) → list[str][source]#

List available dataset names.

Parameters:: technique (str, optional) – Filter by technique (e.g. "4dstem", "hrtem").
Returns:: Sorted dataset names (excluding placeholders).
Return type:: list of str

quantem.data.info(name: str) → dict[source]#

Return metadata for a dataset (downloads the JSON sidecar).

Parameters:: name (str) – Dataset name (see available()).

quantem.data.load(name: str, metadata: bool = False)[source]#

Download (with caching) and return a dataset as a NumPy array.

Files are cached in ~/.cache/huggingface/ and only downloaded once.

Parameters:

name (str) – Dataset name (see available()).
metadata (bool) – If True, return (array, metadata_dict) instead of just the array.

Return type:

np.ndarray or (np.ndarray, dict)

quantem.data.load_raw(name: str) → str[source]#

Download an original instrument file and return the local path.

Raw files (e.g. .h5, .dm4, .mrc) are stored in the raw/ folder. This returns the cached local path for use with h5py, hyperspy, etc.

Parameters:: name (str) – Raw file name (without extension), e.g. "arina_lamella_master".
Returns:: Local file path (cached in ~/.cache/huggingface/).
Return type:: str

quantem.data.preview_upload(data, name: str, technique: str, metadata: dict | str | None = None, description: str = '', contributor: str = '', license: str = 'CC-BY-4.0') → list[str][source]#

Validate and preview an upload without actually uploading.

Checks naming convention, metadata schema, technique, and shape. Prints a summary of what would be uploaded. Returns a list of errors (empty if everything is valid).

Parameters:

data (array_like or str) – NumPy array, or path to a .npy file.
name (str) – Dataset name.
technique (str) – Technique folder.
metadata (dict or str, optional) – Full metadata dict, or path to a JSON file.
description (str) – One-line description (used if metadata is None).
contributor (str) – Who is uploading (used if metadata is None).
license (str) – License string (default "CC-BY-4.0").

Returns:

Error messages. Empty list means the upload is valid.

Return type:

list of str

quantem.data.upload(data, name: str, technique: str, metadata: dict | str | None = None, description: str = '', contributor: str = '', license: str = 'CC-BY-4.0', token: str | None = None, create_pr: bool = True)[source]#

Upload a dataset with metadata to HF Hub.

By default creates a Pull Request for review. Set create_pr=False to commit directly (requires write access).

Parameters:

data (array_like or str) – NumPy array, or path to a .npy file.
name (str) – Dataset name (becomes the filename, e.g. "arina_lamella_32x32").
technique (str) – Category folder ("4dstem", "hrtem", "eels", etc.).
metadata (dict or str, optional) – Full metadata dict, or path to a JSON file. If None, a template is created from the other parameters.
description (str) – One-line description (used if metadata is None).
contributor (str) – Who is uploading (used if metadata is None).
license (str) – License string (default "CC-BY-4.0").
token (str, optional) – HF token. If None, uses cached login.
create_pr (bool) – If True (default), create a Pull Request instead of committing directly. The PR can be reviewed and merged on HF Hub.

quantem.data.update_metadata(name: str, updates: dict, token: str | None = None, create_pr: bool = True)[source]#

Update metadata fields for an existing dataset.

Downloads the current JSON, merges your changes, re-uploads. By default creates a Pull Request for review.

Parameters:

name (str) – Dataset name.
updates (dict) – Fields to update. Nested dicts are merged (not replaced).
token (str, optional) – HF token. If None, uses cached login.
create_pr (bool) – If True (default), create a Pull Request instead of committing directly. The PR can be reviewed and merged on HF Hub.

quantem.data.list_files(technique: str | None = None) → list[dict][source]#

List all files on HF Hub with details.

Parameters:: technique (str, optional) – Filter by technique folder (e.g. "4dstem"). If None, lists all.
Returns:: Each dict has path, size_mb, type ("data"/"metadata").
Return type:: list of dict

Registry#

This Page