|
| 1 | +# Create a NIfTI dataset |
| 2 | + |
| 3 | +This page shows how to create and share a dataset of medical images in NIfTI format (.nii / .nii.gz) using the `datasets` library. |
| 4 | + |
| 5 | +You can share a dataset with your team or with anyone in the community by creating a dataset repository on the Hugging Face Hub: |
| 6 | + |
| 7 | +```py |
| 8 | +from datasets import load_dataset |
| 9 | + |
| 10 | +dataset = load_dataset("<username>/my_nifti_dataset") |
| 11 | +``` |
| 12 | + |
| 13 | +There are two common ways to create a NIfTI dataset: |
| 14 | + |
| 15 | +- Create a dataset from local NIfTI files in Python and upload it with `Dataset.push_to_hub`. |
| 16 | +- Use a folder-based convention (one file per example) and a small helper to convert it into a `Dataset`. |
| 17 | + |
| 18 | +> [!TIP] |
| 19 | +> You can control access to your dataset by requiring users to share their contact information first. Check out the [Gated datasets](https://huggingface.co/docs/hub/datasets-gated) guide for more information. |
| 20 | +
|
| 21 | +## Local files |
| 22 | + |
| 23 | +If you already have a list of file paths to NIfTI files, the easiest workflow is to create a `Dataset` from that list and cast the column to the `Nifti` feature. |
| 24 | + |
| 25 | +```py |
| 26 | +from datasets import Dataset |
| 27 | +from datasets import Nifti |
| 28 | + |
| 29 | +# simple example: create a dataset from file paths |
| 30 | +files = ["/path/to/scan_001.nii.gz", "/path/to/scan_002.nii.gz"] |
| 31 | +ds = Dataset.from_dict({"nifti": files}).cast_column("nifti", Nifti()) |
| 32 | + |
| 33 | +# access a decoded nibabel image (if decode=True) |
| 34 | +# ds[0]["nifti"] will be a nibabel.Nifti1Image object when decode=True |
| 35 | +# or a dict {'bytes': None, 'path': '...'} when decode=False |
| 36 | +``` |
| 37 | + |
| 38 | +The `Nifti` feature supports a `decode` parameter. When `decode=True` (the default), it loads the NIfTI file into a `nibabel.nifti1.Nifti1Image` object. You can access the image data as a numpy array with `img.get_fdata()`. When `decode=False`, it returns a dict with the file path and bytes. |
| 39 | + |
| 40 | +```py |
| 41 | +from datasets import Dataset, Nifti |
| 42 | + |
| 43 | +ds = Dataset.from_dict({"nifti": ["/path/to/scan.nii.gz"]}).cast_column("nifti", Nifti(decode=True)) |
| 44 | +img = ds[0]["nifti"] # instance of: nibabel.nifti1.Nifti1Image |
| 45 | +arr = img.get_fdata() |
| 46 | +``` |
| 47 | + |
| 48 | +After preparing the dataset you can push it to the Hub: |
| 49 | + |
| 50 | +```py |
| 51 | +ds.push_to_hub("<username>/my_nifti_dataset") |
| 52 | +``` |
| 53 | + |
| 54 | +This will create a dataset repository containing your NIfTI dataset with a `data/` folder of parquet shards. |
| 55 | + |
| 56 | +## Folder conventions and metadata |
| 57 | + |
| 58 | +If you organize your dataset in folders you can create splits automatically (train/test/validation) by following a structure like: |
| 59 | + |
| 60 | +``` |
| 61 | +dataset/train/scan_0001.nii |
| 62 | +dataset/train/scan_0002.nii |
| 63 | +dataset/validation/scan_1001.nii |
| 64 | +dataset/test/scan_2001.nii |
| 65 | +``` |
| 66 | + |
| 67 | +If you have labels or other metadata, provide a `metadata.csv`, `metadata.jsonl`, or `metadata.parquet` in the folder so files can be linked to metadata rows. The metadata must contain a `file_name` (or `*_file_name`) field with the relative path to the NIfTI file next to the metadata file. |
| 68 | + |
| 69 | +Example `metadata.csv`: |
| 70 | + |
| 71 | +```csv |
| 72 | +file_name,patient_id,age,diagnosis |
| 73 | +scan_0001.nii.gz,P001,45,healthy |
| 74 | +scan_0002.nii.gz,P002,59,disease_x |
| 75 | +``` |
| 76 | + |
| 77 | +The `Nifti` feature works with zipped datasets too — each zip can contain NIfTI files and a metadata file. This is useful when uploading large datasets as archives. |
| 78 | +This means your dataset structure could look like this (mixed compressed and uncompressed files): |
| 79 | +``` |
| 80 | +dataset/train/scan_0001.nii.gz |
| 81 | +dataset/train/scan_0002.nii |
| 82 | +dataset/validation/scan_1001.nii.gz |
| 83 | +dataset/test/scan_2001.nii |
| 84 | +``` |
| 85 | + |
| 86 | +## Converting to PyTorch tensors |
| 87 | + |
| 88 | +Use the [`~Dataset.set_transform`] function to apply the transformation on-the-fly to batches of the dataset: |
| 89 | + |
| 90 | +```py |
| 91 | +import torch |
| 92 | +import nibabel |
| 93 | +import numpy as np |
| 94 | + |
| 95 | +def transform_to_pytorch(example): |
| 96 | + example["nifti_torch"] = [torch.tensor(ex.get_fdata()) for ex in example["nifti"]] |
| 97 | + return example |
| 98 | + |
| 99 | +ds.set_transform(transform_to_pytorch) |
| 100 | + |
| 101 | +``` |
| 102 | +Accessing elements now (e.g. `ds[0]`) will yield torch tensors in the `"nifti_torch"` key. |
| 103 | + |
| 104 | + |
| 105 | +## Usage of NifTI1Image |
| 106 | + |
| 107 | +NifTI is a format to store the result of 3 (or even 4) dimensional brain scans. This includes 3 spatial dimensions (x,y,z) |
| 108 | +and optionally a time dimension (t). Furthermore, the given positions here are only relative to the scanner, therefore |
| 109 | +the dimensions (4, 5, 6) are used to lift this to real world coordinates. |
| 110 | + |
| 111 | +You can visualize nifti files for instance leveraging `matplotlib` as follows: |
| 112 | +```python |
| 113 | +import matplotlib.pyplot as plt |
| 114 | +from datasets import load_dataset |
| 115 | + |
| 116 | +def show_slices(slices): |
| 117 | + """ Function to display row of image slices """ |
| 118 | + fig, axes = plt.subplots(1, len(slices)) |
| 119 | + for i, slice in enumerate(slices): |
| 120 | + axes[i].imshow(slice.T, cmap="gray", origin="lower") |
| 121 | + |
| 122 | +nifti_ds = load_dataset("<username>/my_nifti_dataset") |
| 123 | +for epi_img in nifti_ds: |
| 124 | + nifti_img = epi_img["nifti"].get_fdata() |
| 125 | + show_slices([nifti_img[:, :, 16], nifti_img[26, :, :], nifti_img[:, 30, :]]) |
| 126 | + plt.show() |
| 127 | +``` |
| 128 | + |
| 129 | +For further reading we refer to the [nibabel documentation](https://nipy.org/nibabel/index.html) and especially [this nibabel tutorial](https://nipy.org/nibabel/coordinate_systems.html) |
| 130 | +--- |
0 commit comments