an attempt at a faithful implementation of dinov2-style pretraining on 3d volumes.
- the dinov2_eva is from dynamic-network-architectures , with some minimal changes
- the augmentation library is a loosely modified batchgeneratorsv2
- normalization is mostly borrowed from nnunetv2
- rope is from the dinov3 impl, extended to support 3d
this implementation is still incomplete. pretraining works but no finetuning yet written.
NOTE: a newer v2 backbone config exists and should generally be preferred for new runs, but the default remains the older config so older checkpoints continue to load without config changes
To select the newer defaults explicitly, set model.model_type to v2 in the config:
{
"model": {
"model_type": "v2",
"embedding_type": "default",
"global_crops_size": [96, 96, 96],
"local_crops_size": [48, 48, 48]
}
}pretrain.py can optionally run small downstream segmentation trainings during pretraining.
- set
task_eval_everyto a positive step cadence to enable it - choose
eval_taskasboth,surfaces, orink - set
eval_task_train_itersto control the mini-training length, default500 - set
eval_task_decoder_typetosimpleorpatch_encode_decode
The task data is downloaded with python -m dinovol_2.eval.download_data --task both.
bothnow meanssurfacesplusinksurfacesis resized 2x before crops are drawnsurfacesandinkeach use the first 10 sorted samples as the deterministic validation setinkis not resized before crops are drawn- train and validation crops are taken from precomputed chunks that contain some foreground and at least 50% background in supervised voxels
- the saved validation image contains one row per validation sample, with image / label / prediction panels
- for
ink, voxels withsupervision_mask == 0are ignored and supervised unlabeled voxels are treated as background - for
ink, loss/metrics and saved previews use a max projection across Z to match the flat ink trainer
There is a small napari helper for checkpoint inspection at dinovol_2/eval/napari_visualizer.py.
Run it with:
python -m dinovol_2.eval.napari_visualizerWorkflow:
- open an OME-Zarr from the widget, click
Load Scales, choose the desired scale, and clickOpen Zarr - draw a rectangle in the generated
*_bboxshapes layer; this 2D YX bbox is applied across the full Z span of the selected scale - add one or more points in a
Pointslayer - choose a
pretrain.pycheckpoint, image layer, and points layer in the dock widget - click
Cache Embeddings - click
Show Feature PCAto render a 3-channel PCA view of the cached patch embeddings - optionally enable
Otsu Foreground Maskand setMask Dilationbefore creating the PCA layer - click
Similarity For Selected PointsorSimilarity For All Points
The widget rebuilds the teacher backbone from the saved checkpoint config, computes a patch embedding grid only inside the active bbox for the selected OME-Zarr scale, and limits the PCA and cosine-similarity outputs to that same crop. The dock widget opens on the bottom of the napari window.