Skip to content

All-in-one training for vision models (YOLO, ViTs, RT-DETR, DINOv3): pretraining, fine-tuning, distillation.

License

Notifications You must be signed in to change notification settings

lightly-ai/lightly-train

Repository files navigation

LightlyTrain - SOTA Pretraining, Fine-tuning and Distillation

Google Colab Python Docker Documentation Discord

LightlyTrain is the leading framework for transforming your unlabeled data into powerful vision foundation models – from pretraining, over knowledge distillation to fine-tuning.

Unlock the full potential of your data with state-of-the-art (SOTA) computer vision methods like DINOv2 and DINOv3. Train any model architecture (YOLO, transformers, and beyond) and fine-tune with segmentation and object detection for your specific use case.

Benchmarks

Custom Foundation Model: Train your own DINOv2!

With LightlyTrain you can train your very own foundation model like DINOv2 on your data.

Implementation Model ImageNet k-NN Docs
LightlyTrain dinov2/vitl16 81.9% πŸ”—
DINOv2 dinov2/vitl16 81.6% πŸ”—

Object Detection: Fine-Tune DINOv2 or DINOv3 for detection!

COCO Dataset

Implementation Backbone Model AP50:95 Latency (ms) # Params (M) Input Size Checkpoint
LightlyTrain dinov2/vits14-ltdetr 55.7 16.87 55.3 644Γ—644 πŸ”—
LightlyTrain dinov3/convnext-tiny-ltdetr 54.4 13.29 61.1 640Γ—640 πŸ”—
LightlyTrain dinov3/convnext-small-ltdetr 56.9 17.65 82.7 640Γ—640 πŸ”—
LightlyTrain dinov3/convnext-base-ltdetr 58.6 24.68 121.0 640Γ—640 πŸ”—
LightlyTrain dinov3/convnext-large-ltdetr 60.0 42.30 230.0 640Γ—640 πŸ”—

Latency is measured on a single NVIDIA T4 GPU with batch size 1. All models are compiled and optimized using tensorrt==10.13.3.9.

Semantic Segmentation: Use SOTA method from CVPR 2025!

COCO-Stuff Dataset

Implementation Backbone Model Val mIoU Avg. FPS # Params (M) Input Size Checkpoint
LightlyTrain dinov3/vits16-eomt 0.465 88.7 21.6 512Γ—512 πŸ”—
LightlyTrain dinov3/vitb16-eomt 0.520 43.3 85.7 512Γ—512 πŸ”—
LightlyTrain dinov3/vitl16-eomt 0.544 20.4 303.2 512Γ—512 πŸ”—

Avg. FPS is measured on a single NVIDIA T4 GPU with batch size 1. All models are compiled and optimized using torch.compile.

Cityscapes Dataset

Implementation Backbone Model Val mIoU Avg. FPS # Params (M) Input Size Checkpoint
LightlyTrain dinov3/vits16-eomt 0.786 18.6 21.6 1024Γ—1024 πŸ”—
LightlyTrain dinov3/vitb16-eomt 0.810 8.7 85.7 1024Γ—1024 πŸ”—
LightlyTrain dinov3/vitl16-eomt 0.844 3.9 303.2 1024Γ—1024 πŸ”—
EoMT (CVPR 2025 paper, current SOTA) dinov2/vitl16-eomt 0.842 - 319 1024Γ—1024 -

Avg. FPS is measured on a single NVIDIA T4 GPU with batch size 1. All models are compiled and optimized using torch.compile.

News

Installation

LightlyTrain requires Python 3.8+ and runs on Windows, Linux and MacOS.

pip install lightly-train

πŸ”₯ Pretrain Your Own DINOv2 Foundation Model πŸ”₯

Pretrain a DINOv2 model on your own unlabeled images. LightlyTrain's DINOv2 implementation matches or outperforms the official implementation on ImageNet-1K. See our documentation on how to get started!

import lightly_train

if __name__ == "__main__":
    lightly_train.train(
        out="out/my_experiment", 
        data="my_data_dir",
        model="dinov2/vitb14",
        method="dinov2",
    )

See our documentation for more details.

πŸ”₯ Distill DINOv2/v3 Into Any Model Architecture πŸ”₯

Pretrain any model architecture with unlabeled data by distilling the knowledge from DINOv2 or DINOv3 foundation models into your model. On the COCO dataset, YOLOv8-s models pretrained with LightlyTrain achieve high performance across all tested label fractions. These improvements hold for other architectures like YOLOv11, RT-DETR, and Faster R-CNN. See our announcement post for more benchmarks and details. See our documentation on how to get started!

Benchmark Results

import lightly_train

if __name__ == "__main__":
    lightly_train.train(
        out="out/my_experiment", 
        data="my_data_dir",
        model="ultralytics/yolov8s.pt",
        method="distillation",
    )

See our documentation for more details.

πŸ”₯ Predict with High-Performance Object Detection Models πŸ”₯

LightlyTrain’s LT-DETR models, powered by DINOv2 and DINOv3 backbones, demonstrate strong performance across different scales.

πŸš€ We are actively working on new models with improved speed and accuracy. Updates coming soon, so stay tuned!

wget <MODEL-WEIGHTS-URL> -O model.ckpt
import lightly_train
from torchvision import utils, io
import matplotlib.pyplot as plt

model = lightly_train.load_model_from_checkpoint(
    checkpoint="model.ckpt",
)

labels, boxes, scores = model.predict("<image>.jpg").values()

# Visualize predictions.
image_with_boxes = utils.draw_bounding_boxes(
    image=io.read_image("<image>.jpg"),
    boxes=boxes,
    labels=[model.classes[i.item()] for i in labels],
)

fig, ax = plt.subplots(figsize=(30, 30))
ax.imshow(image_with_boxes.permute(1, 2, 0))
fig.savefig(f"predictions.png")

πŸ”₯ Fine-tune SOTA Semantic Segmentation Models πŸ”₯

LightlyTrain's EoMT semantic segmentation model based on DINOv3 achieves a new state-of-the-art on the ADE20K benchmark! See our documentation for more details.

You can explore training semantic segmentation models with the example code below:

import lightly_train

if __name__ == "__main__":
    lightly_train.train_semantic_segmentation(
        out="out/my_experiment",
        model="dinov3/vits16-eomt",
        # model and dataset config
        # ...
    )

Tutorials

  • Fine-tuning Your Pretrained Models: Looking for code example for fine-tuning after pretraining your model? Head over to the Quick Start!

  • Embedding Example: Want to use your pretrained model to generate image embeddings instead? Check out the embed guide!

  • Semantic Segmentation Fine-tuning: Want to train a state-of-the-art semantic segmentation model? Head over to the semantic segmentation guide!

  • More Tutorials: Want to get more hands-on with LightlyTrain? Check out our Tutorials for more examples!

Features

Model Pretraining (no self-supervised learning expertise required!)

Model Fine-tuning

MLOps

Supported Models

LightlyTrain supports a wide range of frameworks and models out of the box.

Framework Model Pretrain
(Unlabeled Images)
Distill From
DINOv2/v3
(Unlabeled Images)
Fine-tune
(Labeled Images)
Semantic Segmentation
LightlyTrain DINOv3 βœ… πŸ”— βœ… πŸ”—
DINOv2 βœ… πŸ”— βœ… πŸ”— βœ… πŸ”—
Torchvision ResNet βœ… πŸ”— βœ… πŸ”—
ConvNext βœ… πŸ”— βœ… πŸ”—
ShuffleNetV2 βœ… πŸ”— βœ… πŸ”—
TIMM All models βœ… πŸ”— βœ… πŸ”—
Ultralytics YOLOv5 βœ… πŸ”— βœ… πŸ”—
YOLOv6 βœ… πŸ”— βœ… πŸ”—
YOLOv8 βœ… πŸ”— βœ… πŸ”—
YOLO11 βœ… πŸ”— βœ… πŸ”—
YOLO12 βœ… πŸ”— βœ… πŸ”—
RT-DETR RT-DETR βœ… πŸ”— βœ… πŸ”—
RT-DETRv2 βœ… πŸ”— βœ… πŸ”—
RF-DETR RF-DETR βœ… πŸ”— βœ… πŸ”—
YOLOv12 YOLOv12 βœ… πŸ”— βœ… πŸ”—
SuperGradients PP-LiteSeg βœ… πŸ”— βœ… πŸ”—
SSD βœ… πŸ”— βœ… πŸ”—
YOLO-NAS βœ… πŸ”— βœ… πŸ”—
Custom Models Any PyTorch model βœ… πŸ”— βœ… πŸ”—

For an overview of all supported models and usage instructions, see the full model docs.

Contact us if you need support for additional models or libraries.

Supported Pretraining & Distillation Methods

See the full methods docs for details.

FAQ

Who is LightlyTrain for?

LightlyTrain is designed for engineers and teams who want to use their unlabeled data to its full potential. It is ideal if any of the following applies to you:

  • You want to speedup model development cycles
  • You have limited labeled data but abundant unlabeled data
  • You have slow and expensive labeling processes
  • You want to build your own foundation model
  • You work with domain-specific datasets (video analytics, robotics, medical, agriculture, etc.)
  • You cannot use public pretrained models
  • No pretrained models are available for your specific architecture
  • You want to leverage the latest research in self-supervised learning and distillation
How much data do I need?

We recommend a minimum of several thousand unlabeled images for training with LightlyTrain and 100+ labeled images for fine-tuning afterwards.

For best results:

  • Use at least 5x more unlabeled than labeled data
  • Even a 2x ratio of unlabeled to labeled data yields strong improvements
  • Larger datasets (>100,000 images) benefit from pretraining up to 3,000 epochs
  • Smaller datasets (<100,000 images) benefit from longer pretraining of up to 10,000 epochs

The unlabeled dataset must always be treated like a training splitβ€”never include validation images in pretraining to avoid data leakage.

What's the difference between LightlyTrain and other self-supervised learning implementations?

LightlyTrain offers several advantages:

  • User-friendly: You don't need to be an SSL expert - focus on training your model instead of implementation details.
  • Works with various model architectures: Integrates directly with different libraries such as Torchvision, Ultralytics, etc.
  • Handles complexity: Manages scaling from single GPU to multi-GPU training and optimizes hyperparameters.
  • Seamless workflow: Automatically pretrains the correct layers and exports models in the right format for fine-tuning.
Why should I use LightlyTrain instead of other already pretrained models?

LightlyTrain is most beneficial when:

  • Working with domain-specific data: When your data has a very different distribution from standard datasets (medical images, industrial data, etc.)
  • Facing policy or license restrictions: When you can't use models pretrained on datasets with unclear licensing
  • Having limited labeled data: When you have access to a lot of unlabeled data but few labeled examples
  • Using custom architectures: When no pretrained checkpoints are available for your model

LightlyTrain is complementary to existing pretrained models and can start from either random weights or existing pretrained weights.

Check our complete FAQ for more information.

License

LightlyTrain offers flexible licensing options to suit your specific needs:

  • AGPL-3.0 License: Perfect for open-source projects, academic research, and community contributions. Share your innovations with the world while benefiting from community improvements.

  • Commercial License: Ideal for businesses and organizations that need proprietary development freedom. Enjoy all the benefits of LightlyTrain while keeping your code and models private.

  • Free Community License: Available for students, researchers, startups in early stages, or anyone exploring or experimenting with LightlyTrain. Empower the next generation of innovators with full access to the world of pretraining.

We're committed to supporting both open-source and commercial users. Contact us to discuss the best licensing option for your project!

Contact

Website
Discord
GitHub
X
YouTube
LinkedIn