Breaking the Data Barrier: Robust Few-Shot 3D Vessel Segmentation using Foundation Models

Model Overview

Our proposed architecture efficiently leverages robust 2D foundation models for 3D medical image segmentation, specifically designed to overcome data barriers in tasks like few-shot vessel segmentation. The pipeline consists of four key components:

Frozen 2D Encoder (DINOv3): Extracts high-quality, slice-wise 2D patch embeddings from the input volume without requiring fine-tuning.
3D Pyramidal Adapter: A ConvNeXt-style module that smoothly bridges the gap between 2D slices and 3D spatial representations.
Parallel FFA Aggregator: Captures inter-slice dependencies and global spatial contexts using highly efficient axial and spatial self-attention mechanisms.
UNETR-Lite Decoder & Refinement Heads: Reconstructs the 3D volume, utilizing a depth-gated 3D Refine Head and a chunked 2D High-Resolution Head for precise, high-fidelity mask generation.

How to Run

1. Install Dependencies

pip install -r requirements.txt

2. Prepare Data (JSON List Format)

Training and validation data are specified via JSON files.
Each entry must contain "volume" (image) and "seg" (label) paths.

[
  {
    "volume": "/path/to/images/case_001_0000.nii.gz",
    "seg":    "/path/to/labels/case_001.nii.gz"
  },
  {
    "volume": "/path/to/images/case_002_0000.nii.gz",
    "seg":    "/path/to/labels/case_002.nii.gz"
  }
]

Note

Supported format: NIfTI (.nii, .nii.gz)

"volume" and "seg" must share the same spatial shape

3. Training

3D Model

python train.py \
  --train_list /path/to/train.json \
  --val_list /path/to/val.json \
  --log_dir /path/to/log_dir \
  --save_dir /path/to/save_dir \
  --num_classes 16 \
  --epochs 1000 \
  --batch_size 2 \
  --accumulation_steps 1 \
  --img_size 336 \
  --lr 5e-3 \
  --drop_empty \
  --min_fg_frac 0.0 \
  --bg_weight 0.05 \
  --use_mfb \
  --init_prior_bias \
  --aug \
  --use_3d \
  --use_3d_unetr \
  --depth_min_fg_frac 0.0005

4. Inference

Predictions are produced via a sliding window over the Z axis (chunk size = 64 slices).
Each chunk is processed independently, and results are stitched back to the original volume shape.
Output segmentation masks are saved as NIfTI files in --out_dir, preserving the original affine and header.

python inference.py \
  --test_list  /path/to/test.json \
  --checkpoint /path/to/checkpoint.pt \
  --out_dir    /path/to/output \
  --num_classes 16 \
  --img_size 336

Note

--num_classes, --img_size, --vit_layers, and --decoder_up_factor must match the values used during training

Output masks are saved as <case_id>.nii.gz in --out_dir, with the same affine and header as the input volume

Key Arguments

Argument	Default	Description
`--train_list`	(required)	Path to the training JSON list
`--val_list`	same as `train_list`	Path to the validation JSON list
`--num_classes`	(required)	Number of classes including background
`--epochs`	`1000`	Number of training epochs
`--batch_size`	`16`	Batch size
`--accumulation_steps`	`1`	Gradient accumulation steps (effective batch = `batch_size × accumulation_steps`)
`--img_size`	`336`	In-plane spatial resolution (must be a multiple of 16)
`--lr`	`5e-3`	Peak learning rate (Warmup + Cosine Decay)
`--crop_depth`	`64`	Number of slices to crop along the Z axis
`--depth_min_fg_frac`	`0.0005`	Per-slice foreground ratio threshold; slices below this are excluded from the loss
`--drop_empty`	`False`	Exclude volumes with no foreground voxels
`--use_mfb`	`False`	Enable Median Frequency Balancing for class weights
`--init_prior_bias`	`False`	Initialize head bias from class frequency priors
`--aug`	`False`	Enable data augmentation (rotation, scaling, noise, etc.)
`--use_3d`	`False`	Enable 3D mode (per-slice ViT + 3D aggregator + 3D decoder)
`--vit_chunk_slices`	`8`	Chunk size for ViT inference along the slice dimension (reduces VRAM usage)
`--vit_amp`	`False`	Run ViT inference in bfloat16 AMP
`--log_dir`	(/path/to/log_dir)	Directory to save training logs

Citation

If you find this repository or our paper useful in your research, please consider citing:

@misc{yoshihara2026fewshot3dsegmentation,
      title={Breaking the Data Barrier: Robust Few-Shot 3D Vessel Segmentation using Foundation Models}, 
      author={Kirato Yoshihara and Yohei Sugawara and Yuta Tokuoka and Lihang Hong},
      year={2026},
      eprint={2602.23782},
      archivePrefix={arXiv},
      primaryClass={eess.IV},
      url={https://arxiv.org/abs/2602.23782}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Breaking the Data Barrier: Robust Few-Shot 3D Vessel Segmentation using Foundation Models

Model Overview

How to Run

1. Install Dependencies

2. Prepare Data (JSON List Format)

3. Training

3D Model

4. Inference

Key Arguments

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Breaking the Data Barrier: Robust Few-Shot 3D Vessel Segmentation using Foundation Models

Model Overview

How to Run

1. Install Dependencies

2. Prepare Data (JSON List Format)

3. Training

3D Model

4. Inference

Key Arguments

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages