Skip to content

During RFDETRSegPreview ONNX export call, why does the desired shape has to be divisible by both 24 and 14? #396

@Abdul-Mukit

Description

@Abdul-Mukit

I was trying to export RFDETRSegPreview like the following:

model = RFDETRSegPreview(pretrain_weights=output_dir + "/checkpoint_best_ema.pth", device="cpu")
export_image_shape = (560, 560)
model.export(
    output_dir=output_dir,
    verbose=True,
    shape=export_image_shape,
)

I get the complaint:

File ~/projects/rf-detr/rfdetr/main.py:551, in Model.export(self, output_dir, infer_dir, simplify, backbone_only, opset_version, verbose, force, shape, batch_size, **kwargs)
    549     print(f"PyTorch inference output shape: {features.shape}")
    550 elif self.args.segmentation_head:
--> [551](https://file+.vscode-resource.vscode-cdn.net/home/mujin/projects/rf-detr/~/projects/rf-detr/rfdetr/main.py:551)     outputs = model(input_tensors)
    552     dets = outputs['pred_boxes']
...
--> [187](https://file+.vscode-resource.vscode-cdn.net/home/mujin/projects/rf-detr/~/projects/rf-detr/rfdetr/models/backbone/dinov2.py:187)     assert x.shape[2] % block_size == 0 and x.shape[3] % block_size == 0, f"Backbone requires input shape to be divisible by {block_size}, but got {x.shape}"
    188     x = self.encoder(x)
    189     return list(x[0])

AssertionError: Backbone requires input shape to be divisible by 24, but got torch.Size([1, 3, 560, 560])

If I set export_image_shape=(432, 432) - which is what is used if shape=None passed to export anyway - I get the following complaint:

File ~/projects/rf-detr/rfdetr/main.py:539, in Model.export(self, output_dir, infer_dir, simplify, backbone_only, opset_version, verbose, force, shape, batch_size, **kwargs)
    537 else:
    538     if shape[0] % 14 != 0 or shape[1] % 14 != 0:
--> [539](https://file+.vscode-resource.vscode-cdn.net/home/mujin/projects/rf-detr/~/projects/rf-detr/rfdetr/main.py:539)         raise ValueError("Shape must be divisible by 14")
    541 input_tensors = make_infer_image(infer_dir, shape, batch_size, device).to(device)
    542 input_names = ['input']

I was trying to export the model with a higher resolution. I have trained and predicted using the segmentation model. With an image resolution of 432, the output masks are too grainy for some of our robotics applications.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions