LightlyTrain - SOTA Pretraining, Fine-tuning and Distillation

LightlyTrain is the leading framework for transforming your unlabeled data into powerful vision foundation models – from pretraining, over knowledge distillation to fine-tuning.

Unlock the full potential of your data with state-of-the-art (SOTA) computer vision methods like DINOv2 and DINOv3. Train any model architecture (YOLO, transformers, and beyond) and fine-tune with segmentation and object detection for your specific use case.

Benchmarks

Custom Foundation Model: Train your own DINOv2!

With LightlyTrain you can train your very own foundation model like DINOv2 on your data.

Implementation	Model	ImageNet k-NN	Docs
LightlyTrain	dinov2/vitl16	81.9%	🔗
DINOv2	dinov2/vitl16	81.6%	🔗

Object Detection: Fine-Tune DINOv2 or DINOv3 for detection!

COCO Dataset

Implementation	Backbone Model	AP_50:95	Latency (ms)	# Params (M)	Input Size	Checkpoint Name
LightlyTrain	dinov2/vits14-ltdetr	55.7	16.87	55.3	644×644	dinov2/vits14-ltdetr-coco
LightlyTrain	dinov3/convnext-tiny-ltdetr	54.4	13.29	61.1	640×640	dinov3/convnext-tiny-ltdetr-coco
LightlyTrain	dinov3/convnext-small-ltdetr	56.9	17.65	82.7	640×640	dinov3/convnext-small-ltdetr-coco
LightlyTrain	dinov3/convnext-base-ltdetr	58.6	24.68	121.0	640×640	dinov3/convnext-base-ltdetr-coco
LightlyTrain	dinov3/convnext-large-ltdetr	60.0	42.30	230.0	640×640	dinov3/convnext-large-ltdetr-coco

Latency is measured on a single NVIDIA T4 GPU with batch size 1. All models are compiled and optimized using tensorrt==10.13.3.9.

Semantic Segmentation: Use SOTA method from CVPR 2025!

COCO-Stuff Dataset

Implementation	Backbone Model	Val mIoU	Avg. FPS	# Params (M)	Input Size	Checkpoint Name
LightlyTrain	dinov3/vits16-eomt	0.465	88.7	21.6	518×518	dinov3/vits16-eomt-coco
LightlyTrain	dinov3/vitb16-eomt	0.520	43.3	85.7	518×518	dinov3/vitb16-eomt-coco
LightlyTrain	dinov3/vitl16-eomt	0.544	20.4	303.2	518×518	dinov3/vitl16-eomt-coco

Avg. FPS is measured on a single NVIDIA T4 GPU with batch size 1. All models are compiled and optimized using torch.compile.

Cityscapes Dataset

Implementation	Backbone Model	Val mIoU	Avg. FPS	# Params (M)	Input Size	Checkpoint Name
LightlyTrain	dinov3/vits16-eomt	0.786	18.6	21.6	1024×1024	dinov3/vits16-eomt-cityscapes
LightlyTrain	dinov3/vitb16-eomt	0.810	8.7	85.7	1024×1024	dinov3/vitb16-eomt-cityscapes
LightlyTrain	dinov3/vitl16-eomt	0.844	3.9	303.2	1024×1024	dinov3/vitl16-eomt-cityscapes
EoMT (CVPR 2025 paper, current SOTA)	dinov2/vitl16-eomt	0.842	-	319	1024×1024	-

Avg. FPS is measured on a single NVIDIA T4 GPU with batch size 1. All models are compiled and optimized using torch.compile.

ADE20k Dataset

Implementation	Model Name	Autolabel	Val mIoU	# Params (M)	Input Size	Checkpoint Name
LightlyTrain	dinov3/vits16-eomt	❌	0.466	21.6	518×518
LightlyTrain	dinov3/vits16-eomt	✅	0.533	21.6	518×518	dinov3/vits16-eomt-ade20k
LightlyTrain	dinov3/vitb16-eomt	❌	0.544	85.7	518×518
LightlyTrain	dinov3/vitb16-eomt-ade20k	✅	0.573	85.7	518×518	dinov3/vitb16-eomt-ade20k

The better results with auto-labeling were achieved by fine-tuning a ViT-H+ on the ADE20k dataset, which reaches 0.595 validation mIoU. We then used the checkpoint to create pseudo masks for the SUN397 dataset (~100k images). Using these masks, we subsequently fine-tuned the smaller models, and then used the ADE20k dataset for validation.

News

[0.12.0] - 2025-11-06: 💡 New DINOv3 Object Detection: Run inference or fine-tune DINOv3 models for object detection! 💡
[0.11.0] - 2025-08-15: 🚀 New DINOv3 Support: Pretrain your own model with distillation from DINOv3 weights. Or fine-tune our SOTA EoMT semantic segmentation model with a DINOv3 backbone! 🚀
[0.10.0] - 2025-08-04: 🔥 Train state-of-the-art semantic segmentation models with our new DINOv2 semantic segmentation fine-tuning method! 🔥
[0.9.0] - 2025-07-21: DINOv2 pretraining is now out of beta and officially available!
[0.8.0] - 2025-06-10: DINOv2 pretraining is now available (beta 🔬)!
[0.7.0] - 2025-05-26: Up to 3x faster distillation and higher accuracy with Distillation v2 (new default method)!

Installation

LightlyTrain requires Python 3.8+ and runs on Windows, Linux and MacOS.

pip install lightly-train

🔥 Pretrain Your Own DINOv2 Foundation Model 🔥

Pretrain a DINOv2 model on your own unlabeled images. LightlyTrain's DINOv2 implementation matches or outperforms the official implementation on ImageNet-1K. See our documentation on how to get started!

import lightly_train

if __name__ == "__main__":
    lightly_train.train(
        out="out/my_experiment", 
        data="my_data_dir",
        model="dinov2/vitb14",
        method="dinov2",
    )

See our documentation for more details.

🔥 Distill DINOv2/v3 Into Any Model Architecture 🔥

Pretrain any model architecture with unlabeled data by distilling the knowledge from DINOv2 or DINOv3 foundation models into your model. On the COCO dataset, YOLOv8-s models pretrained with LightlyTrain achieve high performance across all tested label fractions. These improvements hold for other architectures like YOLOv11, RT-DETR, and Faster R-CNN. See our announcement post for more benchmarks and details. See our documentation on how to get started!

import lightly_train

if __name__ == "__main__":
    lightly_train.train(
        out="out/my_experiment", 
        data="my_data_dir",
        model="ultralytics/yolov8s.pt",
        method="distillation",
    )

See our documentation for more details.

🔥 High-Performance Object Detection Models 🔥

LightlyTrain’s LT-DETR models, powered by DINOv2 and DINOv3 backbones, demonstrate strong performance across different scales.

🚀 We are actively working on new models with improved speed and accuracy. Updates coming soon, so stay tuned!

import lightly_train
from torchvision import utils, io
import matplotlib.pyplot as plt

model = lightly_train.load_model("dinov3/convnext-tiny-ltdetr-coco")

labels, boxes, scores = model.predict("<image>.jpg").values()

# Visualize predictions.
image_with_boxes = utils.draw_bounding_boxes(
    image=io.read_image("<image>.jpg"),
    boxes=boxes,
    labels=[model.classes[i.item()] for i in labels],
)

fig, ax = plt.subplots(figsize=(30, 30))
ax.imshow(image_with_boxes.permute(1, 2, 0))
fig.savefig(f"predictions.png")

Or fine-tune on your own dataset:

import lightly_train

if __name__ == "__main__":
    lightly_train.train_object_detection(
        out="out/my_experiment",
        model="dinov3/convnext-tiny-ltdetr-coco",
        data={
          # data config ...
        }
    )

See our documentation for more details.

🔥 Fine-tune SOTA Instance Segmentation Models 🔥

LightlyTrain's EoMT instance segmentation model based on DINOv3 achieves a new state-of-the-art on the COCO benchmark! See our documentation for more details.

import lightly_train

if __name__ == "__main__":
    lightly_train.train_instance_segmentation(
        out="out/my_experiment",
        model="dinov3/vits16-eomt-inst-coco",
        data={
          # data config ...
        }
    )

🔥 Fine-tune SOTA Semantic Segmentation Models 🔥

LightlyTrain's EoMT semantic segmentation model based on DINOv3 achieves a new state-of-the-art on the ADE20K benchmark! See our documentation for more details.

You can explore training semantic segmentation models with the example code below:

import lightly_train

if __name__ == "__main__":
    lightly_train.train_semantic_segmentation(
        out="out/my_experiment",
        model="dinov3/vits16-eomt",
        data={
          # data config ...
        }
    )

Tutorials

Fine-tuning Your Pretrained Models: Looking for code example for fine-tuning after pretraining your model? Head over to the Quick Start!
Embedding Example: Want to use your pretrained model to generate image embeddings instead? Check out the embed guide!
Instance Segmentation Fine-tuning: Want to train a state-of-the-art instance segmentation model? Head over to the instance segmentation guide!
Semantic Segmentation Fine-tuning: Want to train a state-of-the-art semantic segmentation model? Head over to the semantic segmentation guide!
More Tutorials: Want to get more hands-on with LightlyTrain? Check out our Tutorials for more examples!

Features

Model Pretraining (no self-supervised learning expertise required!)

Pretrain DINOv2 foundation models on your own data
Distill knowledge from DINOv2 or DINOv3 into any model architecture
Pretrain models from popular libraries such as Torchvision, TIMM, Ultralytics, SuperGradients, RT-DETR, RF-DETR, and YOLOv12
Pretrain custom models with ease
Export models in their native format for fine-tuning or inference
Generate and export image embeddings

Model Fine-tuning

Fine-tune DINOv2 and DINOv3 for object detection
Fine-tune DINOv3 for instance segmentation
Fine-tune DINOv2 and DINOv3 for semantic segmentation

MLOps

Python, Command Line, and Docker support
Built for high performance including multi-GPU and multi-node support
Monitor training progress with MLflow, TensorBoard, Weights & Biases, and more
Runs fully on-premises with no API authentication and no telemetry

Supported Models

LightlyTrain supports a wide range of frameworks and models out of the box.

Framework	Model	Pretrain _{(Unlabeled Images)}	Distill From DINOv2/v3 _{(Unlabeled Images)}	Fine-tune _{(Labeled Images)}
				Object Detection	Instance Segmentation	Semantic Segmentation
LightlyTrain	DINOv3		✅ 🔗	✅ 🔗	✅ 🔗	✅ 🔗
	DINOv2	✅ 🔗	✅ 🔗	✅ 🔗		✅ 🔗
Torchvision	ResNet	✅ 🔗	✅ 🔗
	ConvNext	✅ 🔗	✅ 🔗
	ShuffleNetV2	✅ 🔗	✅ 🔗
TIMM	All models	✅ 🔗	✅ 🔗
Ultralytics	YOLOv5	✅ 🔗	✅ 🔗
	YOLOv6	✅ 🔗	✅ 🔗
	YOLOv8	✅ 🔗	✅ 🔗
	YOLO11	✅ 🔗	✅ 🔗
	YOLO12	✅ 🔗	✅ 🔗
RT-DETR	RT-DETR	✅ 🔗	✅ 🔗
	RT-DETRv2	✅ 🔗	✅ 🔗
RF-DETR	RF-DETR	✅ 🔗	✅ 🔗
YOLOv12	YOLOv12	✅ 🔗	✅ 🔗
SuperGradients	PP-LiteSeg	✅ 🔗	✅ 🔗
	SSD	✅ 🔗	✅ 🔗
	YOLO-NAS	✅ 🔗	✅ 🔗
Custom Models	Any PyTorch model	✅ 🔗	✅ 🔗

For an overview of all supported models and usage instructions, see the full model docs.

Contact us if you need support for additional models or libraries.

Supported Pretraining & Distillation Methods

See the full methods docs for details.

FAQ

Who is LightlyTrain for?

LightlyTrain is designed for engineers and teams who want to use their unlabeled data to its full potential. It is ideal if any of the following applies to you:

You want to speedup model development cycles
You have limited labeled data but abundant unlabeled data
You have slow and expensive labeling processes
You want to build your own foundation model
You work with domain-specific datasets (video analytics, robotics, medical, agriculture, etc.)
You cannot use public pretrained models
No pretrained models are available for your specific architecture
You want to leverage the latest research in self-supervised learning and distillation

How much data do I need?

We recommend a minimum of several thousand unlabeled images for training with LightlyTrain and 100+ labeled images for fine-tuning afterwards.

For best results:

Use at least 5x more unlabeled than labeled data
Even a 2x ratio of unlabeled to labeled data yields strong improvements
Larger datasets (>100,000 images) benefit from pretraining up to 3,000 epochs
Smaller datasets (<100,000 images) benefit from longer pretraining of up to 10,000 epochs

The unlabeled dataset must always be treated like a training split—never include validation images in pretraining to avoid data leakage.

What's the difference between LightlyTrain and other self-supervised learning implementations?

LightlyTrain offers several advantages:

User-friendly: You don't need to be an SSL expert - focus on training your model instead of implementation details.
Works with various model architectures: Integrates directly with different libraries such as Torchvision, Ultralytics, etc.
Handles complexity: Manages scaling from single GPU to multi-GPU training and optimizes hyperparameters.
Seamless workflow: Automatically pretrains the correct layers and exports models in the right format for fine-tuning.

Why should I use LightlyTrain instead of other already pretrained models?

LightlyTrain is most beneficial when:

Working with domain-specific data: When your data has a very different distribution from standard datasets (medical images, industrial data, etc.)
Facing policy or license restrictions: When you can't use models pretrained on datasets with unclear licensing
Having limited labeled data: When you have access to a lot of unlabeled data but few labeled examples
Using custom architectures: When no pretrained checkpoints are available for your model

LightlyTrain is complementary to existing pretrained models and can start from either random weights or existing pretrained weights.

Check our complete FAQ for more information.

Usage Events

LightlyTrain collects anonymous usage events to help us improve the product. We only track training method, model architecture, and system information (OS, GPU). To opt-out, set the environment variable: export LIGHTLY_TRAIN_EVENTS_DISABLED=1

License

LightlyTrain offers flexible licensing options to suit your specific needs:

AGPL-3.0 License: Perfect for open-source projects, academic research, and community contributions. Share your innovations with the world while benefiting from community improvements.
Commercial License: Ideal for businesses and organizations that need proprietary development freedom. Enjoy all the benefits of LightlyTrain while keeping your code and models private.
Free Community License: Available for students, researchers, startups in early stages, or anyone exploring or experimenting with LightlyTrain. Empower the next generation of innovators with full access to the world of pretraining.

We're committed to supporting both open-source and commercial users. Contact us to discuss the best licensing option for your project!

Name		Name	Last commit message	Last commit date
Latest commit History 338 Commits
.github		.github
dev_tools		dev_tools
docker		docker
docs		docs
examples/notebooks		examples/notebooks
licences		licences
src/lightly_train		src/lightly_train
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LightlyTrain - SOTA Pretraining, Fine-tuning and Distillation

Benchmarks

Custom Foundation Model: Train your own DINOv2!

Object Detection: Fine-Tune DINOv2 or DINOv3 for detection!

COCO Dataset

Semantic Segmentation: Use SOTA method from CVPR 2025!

COCO-Stuff Dataset

Cityscapes Dataset

ADE20k Dataset

News

Installation

🔥 Pretrain Your Own DINOv2 Foundation Model 🔥

🔥 Distill DINOv2/v3 Into Any Model Architecture 🔥

🔥 High-Performance Object Detection Models 🔥

🔥 Fine-tune SOTA Instance Segmentation Models 🔥

🔥 Fine-tune SOTA Semantic Segmentation Models 🔥

Tutorials

Features

Supported Models

Supported Pretraining & Distillation Methods

FAQ

Usage Events

License

Contact

About

Uh oh!

Releases 14

Uh oh!

Contributors 12

Uh oh!

Languages

License

lightly-ai/lightly-train

Folders and files

Latest commit

History

Repository files navigation

LightlyTrain - SOTA Pretraining, Fine-tuning and Distillation

Benchmarks

Custom Foundation Model: Train your own DINOv2!

Object Detection: Fine-Tune DINOv2 or DINOv3 for detection!

COCO Dataset

Semantic Segmentation: Use SOTA method from CVPR 2025!

COCO-Stuff Dataset

Cityscapes Dataset

ADE20k Dataset

News

Installation

🔥 Pretrain Your Own DINOv2 Foundation Model 🔥

🔥 Distill DINOv2/v3 Into Any Model Architecture 🔥

🔥 High-Performance Object Detection Models 🔥

🔥 Fine-tune SOTA Instance Segmentation Models 🔥

🔥 Fine-tune SOTA Semantic Segmentation Models 🔥

Tutorials

Features

Supported Models

Supported Pretraining & Distillation Methods

FAQ

Usage Events

License

Contact

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 14

Uh oh!

Contributors 12

Uh oh!

Languages