-
Notifications
You must be signed in to change notification settings - Fork 414
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Search before asking
- I have searched the RF-DETR issues and found no similar bug report.
Bug
β
Dataset structure validated
π Training RF-DETR SMALL for text line segmentation
============================================================
2025-10-12 06:29:19.108570: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1760250559.143950 2815 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1760250559.155516 2815 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1760250559.182967 2815 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1760250559.183006 2815 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1760250559.183021 2815 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1760250559.183030 2815 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-10-12 06:29:19.190769: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
β
Loaded RF-DETR SMALL model class
π Model description: Balanced performance and speed
βοΈ Training parameters:
- Batch size: 6
- Gradient accumulation steps: 3
- Learning rate: 0.00015
- Epochs: 100
- Total effective batch size: 18
- Early stopping: Enabled
* Patience: 10 epochs
* Min delta: 0.002
Using a different number of positional encodings than DINOv2, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model.
Using patch size 16 instead of 14, which means we're not loading DINOv2 backbone weights. This is not a problem if finetuning a pretrained RF-DETR model.
Loading pretrain weights
β
Initialized RF-DETR SMALL model
βΉοΈ Early stopping enabled (patience: 10, min_delta: 0.002)
π― Starting training...
π Dataset: ./datasets/sam_44_mss_coco/
π Output: ./runs/sam_44_mss_coco/
------------------------------------------------------------
TensorBoard logging initialized. To monitor logs, use 'tensorboard --logdir ./runs/sam_44_mss_coco/' and open http://localhost:6006/ in browser.
Not using distributed mode
fatal: not a git repository (or any of the parent directories): .git
git:
sha: N/A, status: clean, branch: N/A
Namespace(num_classes=1, grad_accum_steps=3, amp=True, lr=0.00015, lr_encoder=0.00015, batch_size=6, weight_decay=0.0001, epochs=100, lr_drop=100, clip_max_norm=0.1, lr_vit_layer_decay=0.8, lr_component_decay=0.7, do_benchmark=False, dropout=0, drop_path=0.0, drop_mode='standard', drop_schedule='constant', cutoff_epoch=0, pretrained_encoder=None, pretrain_weights='rf-detr-small.pth', pretrain_exclude_keys=None, pretrain_keys_modify_to_load=None, pretrained_distiller=None, encoder='dinov2_windowed_small', vit_encoder_num_layers=12, window_block_indexes=None, position_embedding='sine', out_feature_indexes=[3, 6, 9, 12], freeze_encoder=False, layer_norm=True, rms_norm=False, backbone_lora=False, force_no_pretrain=False, dec_layers=3, dim_feedforward=2048, hidden_dim=256, sa_nheads=8, ca_nheads=16, num_queries=300, group_detr=13, two_stage=True, projector_scale=['P4'], lite_refpoint_refine=True, num_select=300, dec_n_points=2, decoder_norm='LN', bbox_reparam=True, freeze_batch_norm=False, set_cost_class=2, set_cost_bbox=5, set_cost_giou=2, cls_loss_coef=1.0, bbox_loss_coef=5, giou_loss_coef=2, focal_alpha=0.25, aux_loss=True, sum_group_losses=False, use_varifocal_loss=False, use_position_supervised_loss=False, ia_bce_loss=True, dataset_file='roboflow', coco_path=None, dataset_dir='./datasets/sam_44_mss_coco/', square_resize_div_64=True, output_dir='./runs/sam_44_mss_coco/', dont_save_weights=False, checkpoint_interval=10, seed=42, resume='', start_epoch=0, eval=False, use_ema=True, ema_decay=0.993, ema_tau=100, num_workers=2, device='cuda', world_size=1, dist_url='env://', sync_bn=True, fp16_eval=False, encoder_only=False, backbone_only=False, resolution=512, use_cls_token=False, multi_scale=True, expanded_scales=True, do_random_resize_via_padding=False, warmup_epochs=0.0, lr_scheduler='step', lr_min_factor=0.0, early_stopping=True, early_stopping_patience=10, early_stopping_min_delta=0.002, early_stopping_use_ema=False, gradient_checkpointing=False, patch_size=16, num_windows=2, positional_encoding_size=32, mask_downsample_ratio=4, tensorboard=True, wandb=False, project=None, run=None, class_names=['text_line'], run_test=True, segmentation_head=False, distributed=False)
number of params: 31787350
[672]
loading annotations into memory...
Done (t=2.31s)
creating index...
index created!
[672]
loading annotations into memory...
Done (t=0.32s)
creating index...
index created!
[672]
loading annotations into memory...
Done (t=0.65s)
creating index...
index created!
Get benchmark
Start training
Grad accum steps: 3
Total batch size: 18
LENGTH OF DATA LOADER: 53
UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /pytorch/aten/src/ATen/native/TensorShape.cpp:4322.)
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [65,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [66,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [67,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [68,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [70,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [71,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [72,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [73,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [74,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [75,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [76,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [77,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [78,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [79,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [80,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [81,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [82,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [83,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [84,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [85,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [86,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [87,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [88,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [89,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [90,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [91,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [92,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [93,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [94,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [74,0,0], thread: [95,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [0,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [1,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [2,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [3,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [4,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [5,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [6,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [7,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [8,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [9,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [10,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [11,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [12,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [13,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [14,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [15,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [16,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [17,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [18,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [19,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [20,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [21,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [22,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [23,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [24,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [25,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [26,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [27,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [28,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [29,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [30,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [58,0,0], thread: [31,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [65,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [66,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [67,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [68,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [70,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [71,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [72,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [73,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [74,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [75,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [76,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [77,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [78,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [79,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [80,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [81,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [82,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [83,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [84,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [85,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [86,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [87,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [88,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [89,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [90,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [91,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [92,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [93,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [94,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [47,0,0], thread: [95,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [64,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [65,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [66,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [67,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [68,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [69,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [70,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [71,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [72,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [73,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [74,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [75,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [76,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [77,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [78,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [79,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [80,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [81,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [82,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [83,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [84,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [85,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [86,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [87,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [88,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [89,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [90,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [91,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [92,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [93,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [94,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:113: operator(): block: [7,0,0], thread: [95,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
β Training failed: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
π₯ Training failed. Check the error messages above.
Tried on three systems (one being Google colab). Same result.
Environment
- RF-DETR: 1.3.0
- OS: Debian/sid
- Python: 3.11.13
- PyTorch: 2.8.0
- GPU: Nvidia RTX 3090 FE
Minimal Reproducible Example
#!/usr/bin/env python3
"""
RF-DETR Training Script with Model Size Selection
This script demonstrates how to fine-tune different RF-DETR model sizes
for text line segmentation on book pages.
Usage:
python train_rf_detr.py --model_size nano --dataset_dir ./coco_dataset --output_dir ./output
python train_rf_detr.py --model_size large --dataset_dir ./coco_dataset --output_dir ./output --epochs 50
"""
import argparse
import sys
from pathlib import Path
def get_model_class(model_size: str):
"""Get the appropriate RF-DETR model class based on size."""
model_size = model_size.lower()
if model_size == "nano":
from rfdetr import RFDETRNano
return RFDETRNano
elif model_size == "small":
from rfdetr import RFDETRSmall
return RFDETRSmall
elif model_size == "base":
from rfdetr import RFDETRBase
return RFDETRBase
elif model_size == "medium":
from rfdetr import RFDETRMedium
return RFDETRMedium
elif model_size == "large":
from rfdetr import RFDETRLarge
return RFDETRLarge
else:
raise ValueError(f"Unknown model size: {model_size}. Choose from: nano, small, base, medium, large")
def get_recommended_params(model_size: str):
"""Get recommended training parameters based on model size."""
params = {
"nano": {
"batch_size": 8,
"grad_accum_steps": 2,
"lr": 2e-4,
"description": "Fastest, smallest, good for edge devices"
},
"small": {
"batch_size": 6,
"grad_accum_steps": 3,
"lr": 1.5e-4,
"description": "Balanced performance and speed"
},
"base": {
"batch_size": 4,
"grad_accum_steps": 4,
"lr": 1e-4,
"description": "Default, good general performance"
},
"medium": {
"batch_size": 3,
"grad_accum_steps": 5,
"lr": 8e-5,
"description": "Higher accuracy"
},
"large": {
"batch_size": 2,
"grad_accum_steps": 8,
"lr": 5e-5,
"description": "Highest accuracy, requires more resources"
}
}
return params[model_size.lower()]
def train_model(model_size: str, dataset_dir: str, output_dir: str, epochs: int = 20,
batch_size: int = None, grad_accum_steps: int = None, lr: float = None,
use_tensorboard: bool = False, use_wandb: bool = False,
project_name: str = "rf-detr-text-lines", run_name: str = None,
early_stopping: bool = True, early_stopping_patience: int = 5,
early_stopping_min_delta: float = 0.001):
"""Train RF-DETR model with specified parameters."""
print(f"π Training RF-DETR {model_size.upper()} for text line segmentation")
print("=" * 60)
# Get model class
try:
ModelClass = get_model_class(model_size)
print(f"β
Loaded RF-DETR {model_size.upper()} model class")
except ValueError as e:
print(f"β Error: {e}")
return False
# Get recommended parameters
recommended = get_recommended_params(model_size)
print(f"π Model description: {recommended['description']}")
# Use provided parameters or defaults
final_batch_size = batch_size if batch_size is not None else recommended["batch_size"]
final_grad_accum = grad_accum_steps if grad_accum_steps is not None else recommended["grad_accum_steps"]
final_lr = lr if lr is not None else recommended["lr"]
print(f"βοΈ Training parameters:")
print(f" - Batch size: {final_batch_size}")
print(f" - Gradient accumulation steps: {final_grad_accum}")
print(f" - Learning rate: {final_lr}")
print(f" - Epochs: {epochs}")
print(f" - Total effective batch size: {final_batch_size * final_grad_accum}")
if early_stopping:
print(f" - Early stopping: Enabled")
print(f" * Patience: {early_stopping_patience} epochs")
print(f" * Min delta: {early_stopping_min_delta}")
else:
print(f" - Early stopping: Disabled")
# Initialize model
try:
model = ModelClass()
print(f"β
Initialized RF-DETR {model_size.upper()} model")
except Exception as e:
print(f"β Error initializing model: {e}")
return False
# Prepare training arguments
train_args = {
"dataset_dir": dataset_dir,
"epochs": epochs,
"batch_size": final_batch_size,
"grad_accum_steps": final_grad_accum,
"lr": final_lr,
"output_dir": output_dir
}
# Add optional logging
if use_tensorboard:
train_args["tensorboard"] = True
print("π TensorBoard logging enabled")
if use_wandb:
train_args["wandb"] = True
train_args["project"] = project_name
if run_name:
train_args["run"] = run_name
print(f"π Weights & Biases logging enabled (project: {project_name})")
# Add early stopping parameters
if early_stopping:
train_args["early_stopping"] = True
train_args["early_stopping_patience"] = early_stopping_patience
train_args["early_stopping_min_delta"] = early_stopping_min_delta
print(f"βΉοΈ Early stopping enabled (patience: {early_stopping_patience}, min_delta: {early_stopping_min_delta})")
# Start training
print(f"\nπ― Starting training...")
print(f"π Dataset: {dataset_dir}")
print(f"π Output: {output_dir}")
print("-" * 60)
try:
model.train(**train_args)
print("\nβ
Training completed successfully!")
print(f"π Checkpoints saved to: {output_dir}")
print(f"π Best model: {output_dir}/checkpoint_best_total.pth")
return True
except Exception as e:
print(f"\nβ Training failed: {e}")
return False
def main():
parser = argparse.ArgumentParser(description="Train RF-DETR model for text line segmentation")
parser.add_argument("--model_size", required=True,
choices=["nano", "small", "base", "medium", "large"],
help="RF-DETR model size to use")
parser.add_argument("--dataset_dir", required=True,
help="Path to COCO dataset directory")
parser.add_argument("--output_dir", required=True,
help="Output directory for checkpoints and logs")
parser.add_argument("--epochs", type=int, default=20,
help="Number of training epochs (default: 20)")
parser.add_argument("--batch_size", type=int,
help="Batch size (if not provided, uses recommended value)")
parser.add_argument("--grad_accum_steps", type=int,
help="Gradient accumulation steps (if not provided, uses recommended value)")
parser.add_argument("--lr", type=float,
help="Learning rate (if not provided, uses recommended value)")
parser.add_argument("--tensorboard", action="store_true",
help="Enable TensorBoard logging")
parser.add_argument("--wandb", action="store_true",
help="Enable Weights & Biases logging")
parser.add_argument("--project_name", default="rf-detr-text-lines",
help="W&B project name (default: rf-detr-text-lines)")
parser.add_argument("--run_name",
help="W&B run name (if not provided, auto-generated)")
parser.add_argument("--no_early_stopping", action="store_true",
help="Disable early stopping (default: enabled)")
parser.add_argument("--early_stopping_patience", type=int, default=5,
help="Number of epochs to wait before stopping (default: 5)")
parser.add_argument("--early_stopping_min_delta", type=float, default=0.001,
help="Minimum change in mAP to qualify as improvement (default: 0.001)")
args = parser.parse_args()
# Validate dataset directory
dataset_path = Path(args.dataset_dir)
if not dataset_path.exists():
print(f"β Error: Dataset directory {args.dataset_dir} does not exist")
sys.exit(1)
# Check for required COCO structure
required_dirs = ["train", "valid", "test"]
for split in required_dirs:
split_dir = dataset_path / split
if not split_dir.exists():
print(f"β Error: Missing {split} directory in dataset")
sys.exit(1)
coco_file = split_dir / "_annotations.coco.json"
if not coco_file.exists():
print(f"β Error: Missing _annotations.coco.json in {split} directory")
sys.exit(1)
print("β
Dataset structure validated")
# Create output directory
output_path = Path(args.output_dir)
output_path.mkdir(parents=True, exist_ok=True)
# Start training
success = train_model(
model_size=args.model_size,
dataset_dir=args.dataset_dir,
output_dir=args.output_dir,
epochs=args.epochs,
batch_size=args.batch_size,
grad_accum_steps=args.grad_accum_steps,
lr=args.lr,
use_tensorboard=args.tensorboard,
use_wandb=args.wandb,
project_name=args.project_name,
run_name=args.run_name,
early_stopping=not args.no_early_stopping,
early_stopping_patience=args.early_stopping_patience,
early_stopping_min_delta=args.early_stopping_min_delta
)
if success:
print("\nπ Training completed successfully!")
print("\nNext steps:")
print("1. Evaluate your model on the test set")
print("2. Use the best checkpoint for inference")
print("3. Consider exporting to ONNX for deployment")
else:
print("\nπ₯ Training failed. Check the error messages above.")
sys.exit(1)
if __name__ == "__main__":
main()
Additional
No response
Are you willing to submit a PR?
- Yes, I'd like to help by submitting a PR!
UPDATE: Sorry! Might be duplicate of #349
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working