using TextSpotInferencer to infer,the InstanceData from pred_instances is not compatible with textspotting_visualizer

### Prerequisite

- [X] I have searched [Issues](https://github.com/open-mmlab/mmocr/issues) and [Discussions](https://github.com/open-mmlab/mmocr/discussions) but cannot get the expected help.
- [X] The bug has not been fixed in the [latest version (0.x)](https://github.com/open-mmlab/mmocr) or [latest version (1.x)](https://github.com/open-mmlab/mmocr/tree/dev-1.x).

### Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

### Branch

main branch https://github.com/open-mmlab/mmocr

### Environment

sys.platform: linux
Python: 3.10.7 (main, Nov 24 2022, 19:45:47) [GCC 12.2.0]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0,1: GeForce RTX 3090
GPU 2,3: GeForce RTX 3080 Ti
CUDA_HOME: /usr/local/cuda-11.8
NVCC: Cuda compilation tools, release 11.8, V11.8.89
GCC: x86_64-linux-gnu-gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0
PyTorch: 2.0.0+cu117
PyTorch compiling details: PyTorch built with:

GCC 9.3
C++ Version: 201703
Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.7
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
CuDNN 8.5
Magma 2.6.1
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.15.1+cu117
OpenCV: 4.7.0
MMEngine: 0.7.3
MMOCR: 1.0.0+964172a

### Reproduces the problem - code sample

    def add_datasample(self,
                       name: str,
                       image: np.ndarray,
                       data_sample: Optional['TextDetDataSample'] = None,
                       draw_gt: bool = True,
                       draw_pred: bool = True,
                       show: bool = False,
                       wait_time: int = 0,
                       pred_score_thr: float = 0.5,
                       out_file: Optional[str] = None,
                       step: int = 0) -> None:
        """Draw datasample and save to all backends.

        - If GT and prediction are plotted at the same time, they are
        displayed in a stitched image where the left image is the
        ground truth and the right image is the prediction.
        - If ``show`` is True, all storage backends are ignored, and
        the images will be displayed in a local window.
        - If ``out_file`` is specified, the drawn image will be
        saved to ``out_file``. This is usually used when the display
        is not available.

        Args:
            name (str): The image identifier.
            image (np.ndarray): The image to draw.
            data_sample (:obj:`TextSpottingDataSample`, optional):
                TextDetDataSample which contains gt and prediction. Defaults
                    to None.
            draw_gt (bool): Whether to draw GT TextDetDataSample.
                Defaults to True.
            draw_pred (bool): Whether to draw Predicted TextDetDataSample.
                Defaults to True.
            show (bool): Whether to display the drawn image. Default to False.
            wait_time (float): The interval of show (s). Defaults to 0.
            out_file (str): Path to output file. Defaults to None.
            pred_score_thr (float): The threshold to visualize the bboxes
                and masks. Defaults to 0.3.
            step (int): Global step value to record. Defaults to 0.
        """
        cat_images = []

        if data_sample is not None:
            if draw_gt and 'gt_instances' in data_sample:
                gt_bboxes = data_sample.gt_instances.get('bboxes', None)
                gt_texts = data_sample.gt_instances.texts
                gt_polygons = data_sample.gt_instances.get('polygons', None)
                gt_img_data = self._draw_instances(image, gt_bboxes,
                                                   gt_polygons, gt_texts)
                cat_images.append(gt_img_data)

            if draw_pred and 'pred_instances' in data_sample:
                pred_instances = data_sample.pred_instances
                # pred_instances = pred_instances[
                #     pred_instances.scores > pred_score_thr].cpu().numpy()
                pred_instances = pred_instances[
                    [i for i, x in enumerate(pred_instances) if any(score > 0.3 for score in x.text_scores)]].cpu().numpy()
                pred_bboxes = pred_instances.get('bboxes', None)
                pred_texts = pred_instances.texts
                pred_polygons = pred_instances.get('polygons', None)
                if pred_bboxes is None and pred_polygons is not None:
                    pred_bboxes = [poly2bbox(poly) for poly in pred_polygons]
                    pred_bboxes = np.array(pred_bboxes)
                if pred_bboxes is not None:
                    pred_img_data = self._draw_instances(image, pred_bboxes,
                                                     pred_polygons, pred_texts)
                    cat_images.append(pred_img_data)

        cat_images = self._cat_image(cat_images, axis=0)
        if cat_images is None:
            cat_images = image

        if show:
            self.show(cat_images, win_name=name, wait_time=wait_time)
        else:
            self.add_image(name, cat_images, step)

        if out_file is not None:
            mmcv.imwrite(cat_images[..., ::-1], out_file)

        self.set_image(cat_images)
        return self.get_image()

### Reproduces the problem - command or script

from mmocr.apis import TextSpotInferencer
# Load models into memory
inferencer = TextSpotInferencer(model='projects/SPTS/config/spts/spts_resnet50_8xb8-200e_icdar2015.py',weights='model/best_generic_hmean.pth')
inferencer('/root/icdar2015/textdet_imgs/test/image_7000.jpg', save_vis=True,return_vis=True)

### Reproduces the problem - error message

![image](https://github.com/open-mmlab/mmocr/assets/42183110/9246d95d-4330-4bd0-bccf-8bdadef88b99)
please see the datastruct in the pred_instances, here is not have the bbox and polygons data, and the type of scores is 'list' not 'tensor'，so i had to change the code to adapter the datastruct.

### Additional information

1.the datastruct should be consist with inferring from (det and recog)
2.get the infer result image to visualizer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

using TextSpotInferencer to infer,the InstanceData from pred_instances is not compatible with textspotting_visualizer #1943

Prerequisite

Task

Branch

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Load models into memory

Reproduces the problem - error message

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

using TextSpotInferencer to infer,the InstanceData from pred_instances is not compatible with textspotting_visualizer #1943

Description

Prerequisite

Task

Branch

Environment

Reproduces the problem - code sample

Reproduces the problem - command or script

Load models into memory

Reproduces the problem - error message

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions