Registry#

Dataset loading, discovery, and upload functions.

Notebook | Link |

|----------|——| | Demo | notebooks/demo.ipynb | | Open in Colab | colab_demo |

quantem.data.available(technique: str | None = None) list[str][source]#

List available dataset names.

Parameters:

technique (str, optional) – Filter by technique (e.g. "4dstem", "hrtem").

Returns:

Sorted dataset names (excluding placeholders).

Return type:

list of str

quantem.data.info(name: str) dict[source]#

Return metadata for a dataset (downloads the JSON sidecar).

Parameters:

name (str) – Dataset name (see available()).

quantem.data.load(name: str, metadata: bool = False)[source]#

Download (with caching) and return a dataset as a NumPy array.

Files are cached in ~/.cache/huggingface/ and only downloaded once.

Parameters:
  • name (str) – Dataset name (see available()).

  • metadata (bool) – If True, return (array, metadata_dict) instead of just the array.

Return type:

np.ndarray or (np.ndarray, dict)

quantem.data.load_raw(name: str) str[source]#

Download an original instrument file and return the local path.

Raw files (e.g. .h5, .dm4, .mrc) are stored in the raw/ folder. This returns the cached local path for use with h5py, hyperspy, etc.

Parameters:

name (str) – Raw file name (without extension), e.g. "arina_lamella_master".

Returns:

Local file path (cached in ~/.cache/huggingface/).

Return type:

str

quantem.data.preview_upload(data, name: str, technique: str, metadata: dict | str | None = None, description: str = '', contributor: str = '', license: str = 'CC-BY-4.0') list[str][source]#

Validate and preview an upload without actually uploading.

Checks naming convention, metadata schema, technique, and shape. Prints a summary of what would be uploaded. Returns a list of errors (empty if everything is valid).

Parameters:
  • data (array_like or str) – NumPy array, or path to a .npy file.

  • name (str) – Dataset name.

  • technique (str) – Technique folder.

  • metadata (dict or str, optional) – Full metadata dict, or path to a JSON file.

  • description (str) – One-line description (used if metadata is None).

  • contributor (str) – Who is uploading (used if metadata is None).

  • license (str) – License string (default "CC-BY-4.0").

Returns:

Error messages. Empty list means the upload is valid.

Return type:

list of str

quantem.data.upload(data, name: str, technique: str, metadata: dict | str | None = None, description: str = '', contributor: str = '', license: str = 'CC-BY-4.0', token: str | None = None, create_pr: bool = True)[source]#

Upload a dataset with metadata to HF Hub.

By default creates a Pull Request for review. Set create_pr=False to commit directly (requires write access).

Parameters:
  • data (array_like or str) – NumPy array, or path to a .npy file.

  • name (str) – Dataset name (becomes the filename, e.g. "arina_lamella_32x32").

  • technique (str) – Category folder ("4dstem", "hrtem", "eels", etc.).

  • metadata (dict or str, optional) – Full metadata dict, or path to a JSON file. If None, a template is created from the other parameters.

  • description (str) – One-line description (used if metadata is None).

  • contributor (str) – Who is uploading (used if metadata is None).

  • license (str) – License string (default "CC-BY-4.0").

  • token (str, optional) – HF token. If None, uses cached login.

  • create_pr (bool) – If True (default), create a Pull Request instead of committing directly. The PR can be reviewed and merged on HF Hub.

quantem.data.update_metadata(name: str, updates: dict, token: str | None = None, create_pr: bool = True)[source]#

Update metadata fields for an existing dataset.

Downloads the current JSON, merges your changes, re-uploads. By default creates a Pull Request for review.

Parameters:
  • name (str) – Dataset name.

  • updates (dict) – Fields to update. Nested dicts are merged (not replaced).

  • token (str, optional) – HF token. If None, uses cached login.

  • create_pr (bool) – If True (default), create a Pull Request instead of committing directly. The PR can be reviewed and merged on HF Hub.

quantem.data.list_files(technique: str | None = None) list[dict][source]#

List all files on HF Hub with details.

Parameters:

technique (str, optional) – Filter by technique folder (e.g. "4dstem"). If None, lists all.

Returns:

Each dict has path, size_mb, type ("data"/"metadata").

Return type:

list of dict