First of all, thank you 🙏 for open-sourcing this repo. It’s been very useful for experimenting with landmark detection networks, and the codebase is clean and easy to extend. I think I found 1 small issue commonly seen in ITK to numpy conversion.
When using MaskDataset with 3D data, the mask channel/axes order is incorrect after loading. This leads to a consistent X–Y swap for landmarks extracted via mask[0].nonzero() and masks generated by _create_mask, causing visual/metric misalignment between image and landmark/heatmap.
Environment / Context:
MaskDataset from landmarker.data
- MONAI
LoadImage(image_only=True, ensure_channel_first=True)
- NIfTI/ITK data (nibabel/SimpleITK pipelines)
- 3D case (
spatial_dims=3)
What happens now
LoadImage(..., ensure_channel_first=True) returns shapes:
- 2D:
(C, H, W)
- 3D:
(C, H, W, D)
Current code then applies:
# 2D
Transpose(indices=[0, 2, 1]) # (C, W, H)
# 3D
Transpose(indices=[0, 3, 2, 1]) # (C, D, W, H)
Later code treats masks/images as (C, D, H, W), and landmarks are assumed (z, y, x)—but because the mask was transposed to (C, D, W, H), nonzero() returns (z, x, y). This misplaces landmarks and any generated heatmaps (Z slice looks right, X/Y are off).
Proposed fix
if mask_paths is not None:
if mask_paths[0].endswith(".npy"):
self.mask_loader = Compose(
[
LoadImage(image_only=True, ensure_channel_first=True),
]
)
elif self.spatial_dims == 2:
self.mask_loader = Compose(
[
LoadImage(image_only=True, ensure_channel_first=True),
- Transpose(indices=[0, 2, 1]),
+ # No transpose needed: LoadImage returns (C, H, W)
]
)
else:
self.mask_loader = Compose(
[
LoadImage(image_only=True, ensure_channel_first=True),
- Transpose(indices=[0, 3, 2, 1]),
+ Transpose(indices=[0, 3, 1, 2]), # (C, H, W, D) -> (C, D, H, W)
]
)
First of all, thank you 🙏 for open-sourcing this repo. It’s been very useful for experimenting with landmark detection networks, and the codebase is clean and easy to extend. I think I found 1 small issue commonly seen in ITK to
numpyconversion.When using
MaskDatasetwith 3D data, the mask channel/axes order is incorrect after loading. This leads to a consistent X–Y swap for landmarks extracted viamask[0].nonzero()and masks generated by_create_mask, causing visual/metric misalignment between image and landmark/heatmap.Environment / Context:
MaskDatasetfromlandmarker.dataLoadImage(image_only=True, ensure_channel_first=True)spatial_dims=3)What happens now
LoadImage(..., ensure_channel_first=True)returns shapes:(C, H, W)(C, H, W, D)Current code then applies:
Later code treats masks/images as
(C, D, H, W), and landmarks are assumed(z, y, x)—but because the mask was transposed to(C, D, W, H),nonzero()returns(z, x, y). This misplaces landmarks and any generated heatmaps (Z slice looks right, X/Y are off).Proposed fix