During RFDETRSegPreview ONNX export call, why does the desired shape has to be divisible by both 24 and 14?

I was trying to export RFDETRSegPreview like the following:   

```
model = RFDETRSegPreview(pretrain_weights=output_dir + "/checkpoint_best_ema.pth", device="cpu")
export_image_shape = (560, 560)
model.export(
    output_dir=output_dir,
    verbose=True,
    shape=export_image_shape,
)
```

I get the complaint:

```
File ~/projects/rf-detr/rfdetr/main.py:551, in Model.export(self, output_dir, infer_dir, simplify, backbone_only, opset_version, verbose, force, shape, batch_size, **kwargs)
    549     print(f"PyTorch inference output shape: {features.shape}")
    550 elif self.args.segmentation_head:
--> [551](https://file+.vscode-resource.vscode-cdn.net/home/mujin/projects/rf-detr/~/projects/rf-detr/rfdetr/main.py:551)     outputs = model(input_tensors)
    552     dets = outputs['pred_boxes']
...
--> [187](https://file+.vscode-resource.vscode-cdn.net/home/mujin/projects/rf-detr/~/projects/rf-detr/rfdetr/models/backbone/dinov2.py:187)     assert x.shape[2] % block_size == 0 and x.shape[3] % block_size == 0, f"Backbone requires input shape to be divisible by {block_size}, but got {x.shape}"
    188     x = self.encoder(x)
    189     return list(x[0])

AssertionError: Backbone requires input shape to be divisible by 24, but got torch.Size([1, 3, 560, 560])

``` 

If I set export_image_shape=(432, 432) - which is what is used if `shape=None` passed to export anyway - I get the following complaint:  

```
File ~/projects/rf-detr/rfdetr/main.py:539, in Model.export(self, output_dir, infer_dir, simplify, backbone_only, opset_version, verbose, force, shape, batch_size, **kwargs)
    537 else:
    538     if shape[0] % 14 != 0 or shape[1] % 14 != 0:
--> [539](https://file+.vscode-resource.vscode-cdn.net/home/mujin/projects/rf-detr/~/projects/rf-detr/rfdetr/main.py:539)         raise ValueError("Shape must be divisible by 14")
    541 input_tensors = make_infer_image(infer_dir, shape, batch_size, device).to(device)
    542 input_names = ['input']
```

I was trying to export the model with a higher resolution. I have trained and predicted using the segmentation model. With an image resolution of 432, the output masks are too grainy for some of our robotics applications.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

During RFDETRSegPreview ONNX export call, why does the desired shape has to be divisible by both 24 and 14? #396

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

During RFDETRSegPreview ONNX export call, why does the desired shape has to be divisible by both 24 and 14? #396

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions