Skip to content

using TextSpotInferencer to infer,the InstanceData from pred_instances is not compatible with textspotting_visualizer #1943

Open
@xiaomaofeng

Description

@xiaomaofeng

Prerequisite

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

main branch https://github.com/open-mmlab/mmocr

Environment

sys.platform: linux
Python: 3.10.7 (main, Nov 24 2022, 19:45:47) [GCC 12.2.0]
CUDA available: True
numpy_random_seed: 2147483648
GPU 0,1: GeForce RTX 3090
GPU 2,3: GeForce RTX 3080 Ti
CUDA_HOME: /usr/local/cuda-11.8
NVCC: Cuda compilation tools, release 11.8, V11.8.89
GCC: x86_64-linux-gnu-gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0
PyTorch: 2.0.0+cu117
PyTorch compiling details: PyTorch built with:

GCC 9.3
C++ Version: 201703
Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.7
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
CuDNN 8.5
Magma 2.6.1
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.15.1+cu117
OpenCV: 4.7.0
MMEngine: 0.7.3
MMOCR: 1.0.0+964172a

Reproduces the problem - code sample

def add_datasample(self,
                   name: str,
                   image: np.ndarray,
                   data_sample: Optional['TextDetDataSample'] = None,
                   draw_gt: bool = True,
                   draw_pred: bool = True,
                   show: bool = False,
                   wait_time: int = 0,
                   pred_score_thr: float = 0.5,
                   out_file: Optional[str] = None,
                   step: int = 0) -> None:
    """Draw datasample and save to all backends.

    - If GT and prediction are plotted at the same time, they are
    displayed in a stitched image where the left image is the
    ground truth and the right image is the prediction.
    - If ``show`` is True, all storage backends are ignored, and
    the images will be displayed in a local window.
    - If ``out_file`` is specified, the drawn image will be
    saved to ``out_file``. This is usually used when the display
    is not available.

    Args:
        name (str): The image identifier.
        image (np.ndarray): The image to draw.
        data_sample (:obj:`TextSpottingDataSample`, optional):
            TextDetDataSample which contains gt and prediction. Defaults
                to None.
        draw_gt (bool): Whether to draw GT TextDetDataSample.
            Defaults to True.
        draw_pred (bool): Whether to draw Predicted TextDetDataSample.
            Defaults to True.
        show (bool): Whether to display the drawn image. Default to False.
        wait_time (float): The interval of show (s). Defaults to 0.
        out_file (str): Path to output file. Defaults to None.
        pred_score_thr (float): The threshold to visualize the bboxes
            and masks. Defaults to 0.3.
        step (int): Global step value to record. Defaults to 0.
    """
    cat_images = []

    if data_sample is not None:
        if draw_gt and 'gt_instances' in data_sample:
            gt_bboxes = data_sample.gt_instances.get('bboxes', None)
            gt_texts = data_sample.gt_instances.texts
            gt_polygons = data_sample.gt_instances.get('polygons', None)
            gt_img_data = self._draw_instances(image, gt_bboxes,
                                               gt_polygons, gt_texts)
            cat_images.append(gt_img_data)

        if draw_pred and 'pred_instances' in data_sample:
            pred_instances = data_sample.pred_instances
            # pred_instances = pred_instances[
            #     pred_instances.scores > pred_score_thr].cpu().numpy()
            pred_instances = pred_instances[
                [i for i, x in enumerate(pred_instances) if any(score > 0.3 for score in x.text_scores)]].cpu().numpy()
            pred_bboxes = pred_instances.get('bboxes', None)
            pred_texts = pred_instances.texts
            pred_polygons = pred_instances.get('polygons', None)
            if pred_bboxes is None and pred_polygons is not None:
                pred_bboxes = [poly2bbox(poly) for poly in pred_polygons]
                pred_bboxes = np.array(pred_bboxes)
            if pred_bboxes is not None:
                pred_img_data = self._draw_instances(image, pred_bboxes,
                                                 pred_polygons, pred_texts)
                cat_images.append(pred_img_data)

    cat_images = self._cat_image(cat_images, axis=0)
    if cat_images is None:
        cat_images = image

    if show:
        self.show(cat_images, win_name=name, wait_time=wait_time)
    else:
        self.add_image(name, cat_images, step)

    if out_file is not None:
        mmcv.imwrite(cat_images[..., ::-1], out_file)

    self.set_image(cat_images)
    return self.get_image()

Reproduces the problem - command or script

from mmocr.apis import TextSpotInferencer

Load models into memory

inferencer = TextSpotInferencer(model='projects/SPTS/config/spts/spts_resnet50_8xb8-200e_icdar2015.py',weights='model/best_generic_hmean.pth')
inferencer('/root/icdar2015/textdet_imgs/test/image_7000.jpg', save_vis=True,return_vis=True)

Reproduces the problem - error message

image
please see the datastruct in the pred_instances, here is not have the bbox and polygons data, and the type of scores is 'list' not 'tensor',so i had to change the code to adapter the datastruct.

Additional information

1.the datastruct should be consist with inferring from (det and recog)
2.get the infer result image to visualizer.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions