Registry#
Dataset loading, discovery, and upload functions.
|----------|——|
| Demo | notebooks/demo.ipynb |
| Open in Colab | |
- quantem.data.available(technique: str | None = None) list[str][source]#
List available dataset names.
- quantem.data.info(name: str) dict[source]#
Return metadata for a dataset (downloads the JSON sidecar).
- Parameters:
name (str) – Dataset name (see
available()).
- quantem.data.load(name: str, metadata: bool = False)[source]#
Download (with caching) and return a dataset as a NumPy array.
Files are cached in
~/.cache/huggingface/and only downloaded once.
- quantem.data.load_raw(name: str) str[source]#
Download an original instrument file and return the local path.
Raw files (e.g.
.h5,.dm4,.mrc) are stored in theraw/folder. This returns the cached local path for use with h5py, hyperspy, etc.
- quantem.data.preview_upload(data, name: str, technique: str, metadata: dict | str | None = None, description: str = '', contributor: str = '', license: str = 'CC-BY-4.0') list[str][source]#
Validate and preview an upload without actually uploading.
Checks naming convention, metadata schema, technique, and shape. Prints a summary of what would be uploaded. Returns a list of errors (empty if everything is valid).
- Parameters:
data (array_like or str) – NumPy array, or path to a
.npyfile.name (str) – Dataset name.
technique (str) – Technique folder.
metadata (dict or str, optional) – Full metadata dict, or path to a JSON file.
description (str) – One-line description (used if metadata is None).
contributor (str) – Who is uploading (used if metadata is None).
license (str) – License string (default
"CC-BY-4.0").
- Returns:
Error messages. Empty list means the upload is valid.
- Return type:
- quantem.data.upload(data, name: str, technique: str, metadata: dict | str | None = None, description: str = '', contributor: str = '', license: str = 'CC-BY-4.0', token: str | None = None, create_pr: bool = True)[source]#
Upload a dataset with metadata to HF Hub.
By default creates a Pull Request for review. Set
create_pr=Falseto commit directly (requires write access).- Parameters:
data (array_like or str) – NumPy array, or path to a
.npyfile.name (str) – Dataset name (becomes the filename, e.g.
"arina_lamella_32x32").technique (str) – Category folder (
"4dstem","hrtem","eels", etc.).metadata (dict or str, optional) – Full metadata dict, or path to a JSON file. If None, a template is created from the other parameters.
description (str) – One-line description (used if metadata is None).
contributor (str) – Who is uploading (used if metadata is None).
license (str) – License string (default
"CC-BY-4.0").token (str, optional) – HF token. If None, uses cached login.
create_pr (bool) – If True (default), create a Pull Request instead of committing directly. The PR can be reviewed and merged on HF Hub.
- quantem.data.update_metadata(name: str, updates: dict, token: str | None = None, create_pr: bool = True)[source]#
Update metadata fields for an existing dataset.
Downloads the current JSON, merges your changes, re-uploads. By default creates a Pull Request for review.
- Parameters:
name (str) – Dataset name.
updates (dict) – Fields to update. Nested dicts are merged (not replaced).
token (str, optional) – HF token. If None, uses cached login.
create_pr (bool) – If True (default), create a Pull Request instead of committing directly. The PR can be reviewed and merged on HF Hub.