Uploading datasets#
All datasets in quantem.data are hosted on Hugging Face Hub. Uploads create a Pull Request by default — the data is reviewed before merging into the public catalog.
Prerequisites#
Create a free Hugging Face account
Create an access token at huggingface.co/settings/tokens (needs write access)
Log in from your terminal:
huggingface-cli login
Install#
pip install --pre -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ quantem-data
Naming convention#
Dataset names follow a material-first convention: {material}_{descriptor}.
Rule |
Example |
Bad example |
|---|---|---|
Lowercase, underscores only |
|
|
Material first |
|
|
Descriptor second (morphology, orientation) |
|
|
Lab suffix only to disambiguate |
|
|
No resolution, binning, or year in name |
|
|
Resolution, binning, year, and instrument details go in the JSON metadata — not the name.
Upload from Python#
import numpy as np
from quantem.data import upload
# Your data (NumPy array)
data = np.load("my_hrtem_image.npy")
# Upload — creates a PR on Hugging Face Hub
upload(
data,
name="silicon_110_hrtem",
technique="hrtem",
description="Silicon [110] zone axis, HRTEM at 200 kV",
contributor="Jane Doe",
)
Output:
Created PR to add silicon_110_hrtem (0.2 MB)
Review: https://huggingface.co/datasets/bobleesj/quantem-data/discussions/1
The PR link takes you to the Hugging Face discussion page where the maintainer can review your data and metadata, then merge it.
Preview before uploading#
Use preview_upload() to validate naming, metadata, and check for duplicates before submitting:
from quantem.data import preview_upload
errors = preview_upload(
data,
name="silicon_110_hrtem",
technique="hrtem",
description="Silicon [110] zone axis, HRTEM at 200 kV",
contributor="Jane Doe",
)
if errors:
for e in errors:
print(f" - {e}")
else:
print("Ready to upload!")
preview_upload() checks:
Naming convention (lowercase, underscores, material-first)
Valid technique folder
Metadata schema compliance
Array shape consistency
Duplicate name detection on HF Hub
Fix any errors before calling upload().
Upload from the command line#
quantem-data upload my_data.npy \
--name silicon_110_hrtem \
--technique hrtem \
--description "Silicon [110] zone axis" \
--contributor "Jane Doe"
By default this creates a PR. Add --direct to commit directly (requires write access to the repo).
Valid techniques#
Each dataset belongs to a technique folder:
technique |
data type |
widget |
|---|---|---|
|
4D-STEM diffraction |
Show4DSTEM, Show4D |
|
high-resolution TEM |
Show2D, Mark2D |
|
electron energy loss |
Show1D |
|
tomography |
Show3DVolume |
|
diffraction patterns |
Show2D |
|
virtual/derived images |
Show2D, Mark2D |
|
ptychography |
ShowComplex2D |
Metadata schema#
Every uploaded dataset gets a paired .json sidecar with metadata.
Required fields:
field |
description |
|---|---|
|
current: |
|
must match the dataset name |
|
must be one of the valid techniques above |
|
one-line human description |
|
must match the actual array shape |
|
e.g. |
|
who uploaded the data |
|
must be open, e.g. |
Optional fields:
field |
description |
|---|---|
|
e.g. |
|
accelerating voltage |
|
e.g. |
|
with |
|
provenance info |
|
lab or university |
|
upload date |
|
publication DOI if applicable |
Custom metadata#
By default, upload() generates a metadata template. For full control, pass your own metadata dict or JSON file:
from quantem.data import upload, make_template
# Generate a template and customize it
meta = make_template(
name="silicon_110_hrtem",
technique="hrtem",
shape=(512, 512),
description="Silicon [110] zone axis, HRTEM at 200 kV",
contributor="Jane Doe",
)
meta["instrument"] = {
"microscope": "JEOL JEM-2100F",
"voltage_kv": 200,
"detector": "Gatan OneView",
}
meta["calibration"] = {
"pixel_size": 0.15,
"pixel_size_unit": "angstrom",
}
upload(data, name="silicon_110_hrtem", technique="hrtem", metadata=meta)
Or from a JSON file:
upload(data, name="silicon_110_hrtem", technique="hrtem", metadata="metadata.json")
Validate before uploading#
from quantem.data import validate, make_template
meta = make_template(name="test", technique="hrtem", shape=(256, 256))
errors = validate(meta)
if errors:
for e in errors:
print(f" - {e}")
else:
print("Valid!")
Update existing metadata#
To update metadata for a dataset that’s already uploaded:
from quantem.data import update_metadata
update_metadata("silicon_110_hrtem", {
"calibration": {"pixel_size": 0.148, "pixel_size_unit": "angstrom"},
})
This also creates a PR by default.
Direct commits (maintainers only)#
If you have write access to the repo, you can skip the PR:
upload(data, name="...", technique="...", create_pr=False)
update_metadata("...", {...}, create_pr=False)
quantem-data upload data.npy --name ... --technique ... --direct