Uploading datasets#

Open In Colab

All datasets in quantem.data are hosted on Hugging Face Hub. Uploads create a Pull Request by default — the data is reviewed before merging into the public catalog.

Prerequisites#

  1. Create a free Hugging Face account

  2. Create an access token at huggingface.co/settings/tokens (needs write access)

  3. Log in from your terminal:

huggingface-cli login

Install#

pip install --pre -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ quantem-data

Naming convention#

Dataset names follow a material-first convention: {material}_{descriptor}.

Rule

Example

Bad example

Lowercase, underscores only

srtio3_lamella

SrTiO3-Lamella

Material first

gold_nanoparticle

nanoparticle_gold

Descriptor second (morphology, orientation)

silicon_110

110_silicon

Lab suffix only to disambiguate

srtio3_lamella_ncem

ncem_srtio3

No resolution, binning, or year in name

graphene_monolayer

graphene_256x256_2024

Resolution, binning, year, and instrument details go in the JSON metadata — not the name.

Upload from Python#

import numpy as np
from quantem.data import upload

# Your data (NumPy array)
data = np.load("my_hrtem_image.npy")

# Upload — creates a PR on Hugging Face Hub
upload(
    data,
    name="silicon_110_hrtem",
    technique="hrtem",
    description="Silicon [110] zone axis, HRTEM at 200 kV",
    contributor="Jane Doe",
)

Output:

Created PR to add silicon_110_hrtem (0.2 MB)
Review: https://huggingface.co/datasets/bobleesj/quantem-data/discussions/1

The PR link takes you to the Hugging Face discussion page where the maintainer can review your data and metadata, then merge it.

Preview before uploading#

Use preview_upload() to validate naming, metadata, and check for duplicates before submitting:

from quantem.data import preview_upload

errors = preview_upload(
    data,
    name="silicon_110_hrtem",
    technique="hrtem",
    description="Silicon [110] zone axis, HRTEM at 200 kV",
    contributor="Jane Doe",
)

if errors:
    for e in errors:
        print(f"  - {e}")
else:
    print("Ready to upload!")

preview_upload() checks:

  • Naming convention (lowercase, underscores, material-first)

  • Valid technique folder

  • Metadata schema compliance

  • Array shape consistency

  • Duplicate name detection on HF Hub

Fix any errors before calling upload().

Upload from the command line#

quantem-data upload my_data.npy \
    --name silicon_110_hrtem \
    --technique hrtem \
    --description "Silicon [110] zone axis" \
    --contributor "Jane Doe"

By default this creates a PR. Add --direct to commit directly (requires write access to the repo).

Valid techniques#

Each dataset belongs to a technique folder:

technique

data type

widget

4dstem

4D-STEM diffraction

Show4DSTEM, Show4D

hrtem

high-resolution TEM

Show2D, Mark2D

eels

electron energy loss

Show1D

tomo

tomography

Show3DVolume

diffraction

diffraction patterns

Show2D

image

virtual/derived images

Show2D, Mark2D

complex

ptychography

ShowComplex2D

Metadata schema#

Every uploaded dataset gets a paired .json sidecar with metadata.

Required fields:

field

description

schema_version

current: "1.0"

name

must match the dataset name

technique

must be one of the valid techniques above

description

one-line human description

data.shape

must match the actual array shape

data.dtype

e.g. "float32"

attribution.contributor

who uploaded the data

attribution.license

must be open, e.g. "CC-BY-4.0"

Optional fields:

field

description

instrument.microscope

e.g. "JEOL JEM-2100F"

instrument.voltage_kv

accelerating voltage

instrument.detector

e.g. "Gatan OneView"

calibration.pixel_size

with pixel_size_unit

processing.source

provenance info

attribution.institution

lab or university

attribution.date

upload date

attribution.doi

publication DOI if applicable

Custom metadata#

By default, upload() generates a metadata template. For full control, pass your own metadata dict or JSON file:

from quantem.data import upload, make_template

# Generate a template and customize it
meta = make_template(
    name="silicon_110_hrtem",
    technique="hrtem",
    shape=(512, 512),
    description="Silicon [110] zone axis, HRTEM at 200 kV",
    contributor="Jane Doe",
)
meta["instrument"] = {
    "microscope": "JEOL JEM-2100F",
    "voltage_kv": 200,
    "detector": "Gatan OneView",
}
meta["calibration"] = {
    "pixel_size": 0.15,
    "pixel_size_unit": "angstrom",
}

upload(data, name="silicon_110_hrtem", technique="hrtem", metadata=meta)

Or from a JSON file:

upload(data, name="silicon_110_hrtem", technique="hrtem", metadata="metadata.json")

Validate before uploading#

from quantem.data import validate, make_template

meta = make_template(name="test", technique="hrtem", shape=(256, 256))
errors = validate(meta)
if errors:
    for e in errors:
        print(f"  - {e}")
else:
    print("Valid!")

Update existing metadata#

To update metadata for a dataset that’s already uploaded:

from quantem.data import update_metadata

update_metadata("silicon_110_hrtem", {
    "calibration": {"pixel_size": 0.148, "pixel_size_unit": "angstrom"},
})

This also creates a PR by default.

Direct commits (maintainers only)#

If you have write access to the repo, you can skip the PR:

upload(data, name="...", technique="...", create_pr=False)
update_metadata("...", {...}, create_pr=False)
quantem-data upload data.npy --name ... --technique ... --direct