Uploading datasets#

All datasets in quantem.data are hosted on Hugging Face Hub. Uploads create a Pull Request by default — the data is reviewed before merging into the public catalog.

Prerequisites#

Create a free Hugging Face account
Create an access token at huggingface.co/settings/tokens (needs write access)
Log in from your terminal:

huggingface-cli login

Install#

pip install --pre -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ quantem-data

Naming convention#

Dataset names follow a material-first convention: {material}_{descriptor}.

Rule	Example	Bad example
Lowercase, underscores only	`srtio3_lamella`	`SrTiO3-Lamella`
Material first	`gold_nanoparticle`	`nanoparticle_gold`
Descriptor second (morphology, orientation)	`silicon_110`	`110_silicon`
Lab suffix only to disambiguate	`srtio3_lamella_ncem`	`ncem_srtio3`
No resolution, binning, or year in name	`graphene_monolayer`	`graphene_256x256_2024`

Resolution, binning, year, and instrument details go in the JSON metadata — not the name.

Upload from Python#

import numpy as np
from quantem.data import upload

# Your data (NumPy array)
data = np.load("my_hrtem_image.npy")

# Upload — creates a PR on Hugging Face Hub
upload(
    data,
    name="silicon_110_hrtem",
    technique="hrtem",
    description="Silicon [110] zone axis, HRTEM at 200 kV",
    contributor="Jane Doe",
)

Output:

Created PR to add silicon_110_hrtem (0.2 MB)
Review: https://huggingface.co/datasets/bobleesj/quantem-data/discussions/1

The PR link takes you to the Hugging Face discussion page where the maintainer can review your data and metadata, then merge it.

Preview before uploading#

Use preview_upload() to validate naming, metadata, and check for duplicates before submitting:

from quantem.data import preview_upload

errors = preview_upload(
    data,
    name="silicon_110_hrtem",
    technique="hrtem",
    description="Silicon [110] zone axis, HRTEM at 200 kV",
    contributor="Jane Doe",
)

if errors:
    for e in errors:
        print(f"  - {e}")
else:
    print("Ready to upload!")

preview_upload() checks:

Naming convention (lowercase, underscores, material-first)
Valid technique folder
Metadata schema compliance
Array shape consistency
Duplicate name detection on HF Hub

Fix any errors before calling upload().

Upload from the command line#

quantem-data upload my_data.npy \
    --name silicon_110_hrtem \
    --technique hrtem \
    --description "Silicon [110] zone axis" \
    --contributor "Jane Doe"

By default this creates a PR. Add --direct to commit directly (requires write access to the repo).

Valid techniques#

Each dataset belongs to a technique folder:

technique	data type	widget
`4dstem`	4D-STEM diffraction	Show4DSTEM, Show4D
`hrtem`	high-resolution TEM	Show2D, Mark2D
`eels`	electron energy loss	Show1D
`tomo`	tomography	Show3DVolume
`diffraction`	diffraction patterns	Show2D
`image`	virtual/derived images	Show2D, Mark2D
`complex`	ptychography	ShowComplex2D

Metadata schema#

Every uploaded dataset gets a paired .json sidecar with metadata.

Required fields:

field	description
`schema_version`	current: `"1.0"`
`name`	must match the dataset name
`technique`	must be one of the valid techniques above
`description`	one-line human description
`data.shape`	must match the actual array shape
`data.dtype`	e.g. `"float32"`
`attribution.contributor`	who uploaded the data
`attribution.license`	must be open, e.g. `"CC-BY-4.0"`

Optional fields:

field	description
`instrument.microscope`	e.g. `"JEOL JEM-2100F"`
`instrument.voltage_kv`	accelerating voltage
`instrument.detector`	e.g. `"Gatan OneView"`
`calibration.pixel_size`	with `pixel_size_unit`
`processing.source`	provenance info
`attribution.institution`	lab or university
`attribution.date`	upload date
`attribution.doi`	publication DOI if applicable

Custom metadata#

By default, upload() generates a metadata template. For full control, pass your own metadata dict or JSON file:

from quantem.data import upload, make_template

# Generate a template and customize it
meta = make_template(
    name="silicon_110_hrtem",
    technique="hrtem",
    shape=(512, 512),
    description="Silicon [110] zone axis, HRTEM at 200 kV",
    contributor="Jane Doe",
)
meta["instrument"] = {
    "microscope": "JEOL JEM-2100F",
    "voltage_kv": 200,
    "detector": "Gatan OneView",
}
meta["calibration"] = {
    "pixel_size": 0.15,
    "pixel_size_unit": "angstrom",
}

upload(data, name="silicon_110_hrtem", technique="hrtem", metadata=meta)

Or from a JSON file:

upload(data, name="silicon_110_hrtem", technique="hrtem", metadata="metadata.json")

Validate before uploading#

from quantem.data import validate, make_template

meta = make_template(name="test", technique="hrtem", shape=(256, 256))
errors = validate(meta)
if errors:
    for e in errors:
        print(f"  - {e}")
else:
    print("Valid!")

Update existing metadata#

To update metadata for a dataset that’s already uploaded:

from quantem.data import update_metadata

update_metadata("silicon_110_hrtem", {
    "calibration": {"pixel_size": 0.148, "pixel_size_unit": "angstrom"},
})

This also creates a PR by default.

Direct commits (maintainers only)#

If you have write access to the repo, you can skip the PR:

upload(data, name="...", technique="...", create_pr=False)
update_metadata("...", {...}, create_pr=False)

quantem-data upload data.npy --name ... --technique ... --direct