an attempt at a faithful implementation of dinov2-style pretraining on 3d volumes.
- the dinov2_eva is from dynamic-network-architectures , with some minimal changes
- the augmentation library is a loosely modified batchgeneratorsv2
- normalization is mostly borrowed from nnunetv2
- rope is from the dinov3 impl, extended to support 3d
this implementation is still incomplete. pretraining works but no finetuning yet written.
NOTE: a newer v2 backbone config exists and should generally be preferred for new runs, but the default remains the older config so older checkpoints continue to load without config changes
To select the newer defaults explicitly, set model.model_type to v2 in the config:
{
"model": {
"model_type": "v2",
"embedding_type": "default",
"global_crops_size": [96, 96, 96],
"local_crops_size": [48, 48, 48]
}
}pretrain.py now supports the three-stage DINOv3-style workflow:
- base pretraining with DINO + iBOT + KoLeo
- late dense-feature refinement with Gram anchoring
- short mixed-resolution HR adaptation with Gram anchoring kept on
The new config surface is:
- top-level
gramenabledloss_weightteacher_checkpointteacher_refresh_everyteacher_refresh_start_stepnormalizedimg_levelremove_negremove_only_teacher_neg
- dataset keys
gram_teacher_crop_sizegram_teacher_no_augmentationsvariants
When gram.enabled=true, the trainer builds a frozen Gram teacher backbone, loads it from gram.teacher_checkpoint when provided, refreshes it from the live EMA teacher on the configured cadence, and adds an image-level Gram loss on patch features. Gram-teacher crops are paired with each global crop from the exact same sampled 3D region, and default to normalization-only.
Mixed-resolution HR adaptation is enabled by adding dataset.variants, where each variant defines its own crop sizes and sampling ratio. The trainer builds one dataloader per variant and samples them with the configured weights. For embedding_type=deeper, the dataset automatically derives the needed overscanned *_view_size values from the patch halo.
dinovol_2/example_config.json remains runnable as a base-pretraining config and also includes a recipes object with three complete examples:
recipes.base_pretrainrecipes.gram_refinementrecipes.hr_adaptation
The important stage-specific overrides are:
{
"gram": {
"enabled": true,
"loss_weight": 2.0,
"teacher_checkpoint": "/path/to/previous/checkpoint.pt",
"teacher_refresh_every": 10000
},
"model": {
"pretrained_weights": "/path/to/previous/checkpoint.pt",
"pretrained_backbone_only": false
}
}For HR adaptation, add weighted crop variants:
{
"dataset": {
"variants": [
{
"ratio": 0.3,
"global_crop_size": [128, 128, 128],
"local_crop_size": [48, 48, 48],
"gram_teacher_crop_size": [160, 160, 160]
},
{
"ratio": 0.7,
"global_crop_size": [160, 160, 160],
"local_crop_size": [80, 80, 80],
"gram_teacher_crop_size": [192, 192, 192]
}
]
}
}To sanity-check one batch end to end:
uv run python -m dinovol_2.verify dinovol_2/example_config.json --no-ampTo run the synthetic smoke suite:
uv run python -m unittest tests.test_pretrain_smoke -vpretrain.py can optionally run small downstream segmentation trainings during pretraining.
- set
task_eval_everyto a positive step cadence to enable it - choose
eval_taskasboth,surfaces, orink - set
eval_task_train_itersto control the mini-training length, default500 - set
eval_task_decoder_typetosimpleorpatch_encode_decode
The task data is downloaded with python -m dinovol_2.eval.download_data --task both.
bothnow meanssurfacesplusinksurfacesis resized 2x before crops are drawnsurfacesandinkeach use the first 10 sorted samples as the deterministic validation setinkis not resized before crops are drawn- train and validation crops are taken from precomputed chunks that contain some foreground and at least 50% background in supervised voxels
- the saved validation image contains one row per validation sample, with image / label / prediction panels
- for
ink, voxels withsupervision_mask == 0are ignored and supervised unlabeled voxels are treated as background - for
ink, loss/metrics and saved previews use a max projection across Z to match the flat ink trainer
There is a small napari helper for checkpoint inspection at dinovol_2/eval/napari_visualizer.py.
Run it with:
python -m dinovol_2.eval.napari_visualizerWorkflow:
- open an OME-Zarr from the widget, click
Load Scales, choose the desired scale, and clickOpen Zarr - draw a rectangle in the generated
*_bboxshapes layer; this 2D YX bbox is applied across the full Z span of the selected scale - add one or more points in a
Pointslayer - choose a
pretrain.pycheckpoint, image layer, and points layer in the dock widget - click
Cache Embeddings - click
Show Feature PCAto render a 3-channel PCA view of the cached patch embeddings - optionally enable
Otsu Foreground Maskand setMask Dilationbefore creating the PCA layer - click
Similarity For Selected PointsorSimilarity For All Points
The widget rebuilds the teacher backbone from the saved checkpoint config, computes a patch embedding grid only inside the active bbox for the selected OME-Zarr scale, and limits the PCA and cosine-similarity outputs to that same crop. The dock widget opens on the bottom of the napari window.