在C#推理中实现PP-OCR返回单字符坐标

### 🔎 Search before asking

- [x] I have searched the PaddleOCR [Docs](https://paddlepaddle.github.io/PaddleOCR/) and found no similar bug report.
- [x] I have searched the PaddleOCR [Issues](https://github.com/PaddlePaddle/PaddleOCR/issues) and found no similar bug report.
- [x] I have searched the PaddleOCR [Discussions](https://github.com/PaddlePaddle/PaddleOCR/discussions) and found no similar bug report.

### 🐛 Bug (问题描述)

        很高兴在Paddle 3.2.0更新中看到可以在推理中返回单字符坐标，目前在官方的python源码中已经成功实现该功能。目前使用场景是用C# OpenVINO推理PaddleOCRv5模型，其CTC解码代码不支持返回单字符坐标。不知道如何实现在C#中返回单字符坐标，请大佬们指点一下，谢谢！

### 🏃‍♂️ Environment (运行环境)

C# OpenVINO推理PaddleOCRv5模型

### 🌰 Minimal Reproducible Example (最小可复现问题的Demo)

        // 定义一个公共方法 predict，用于执行文本识别预测
        // 参数：
        // - img_list: 输入图像列表，每个图像是一个 Mat 对象（通常来自 OpenCV）
        // - rec_texts: 输出参数，用于存储识别出的文本字符串列表
        // - rec_text_scores: 输出参数，用于存储每个识别文本的置信度分数列表
        public void predict(List<Mat> img_list, List<string> rec_texts, List<float> rec_text_scores)
        {
            // 获取输入图像的数量
            int img_num = img_list.Count;

            // 创建一个列表，存储每张图像的宽高比（宽度/高度）
            List<float> width_list = new List<float>();
            for (int i = 0; i < img_num; i++)
            {
                // 计算每张图像的宽高比并添加到 width_list
                // img_list[i].Cols 是图像宽度，img_list[i].Rows 是图像高度
                width_list.Add((float)(img_list[i].Cols) / img_list[i].Rows);
            }
            // 根据宽高比对图像索引进行排序（从小到大）
            // PaddleOcrUtility.argsort 返回排序后的索引列表
            List<int> indices = PaddleOcrUtility.argsort(width_list);

            // 按批次处理图像，m_rec_batch_num 是每批的最大图像数
            // beg_img_no 是当前批次的起始图像索引
            for (int beg_img_no = 0; beg_img_no < img_num; beg_img_no += m_rec_batch_num)
            {
                // 计算当前批次的结束图像索引，确保不超过总图像数
                int end_img_no = Math.Min(img_num, beg_img_no + m_rec_batch_num);
                // 计算当前批次的图像数量
                int batch_num = end_img_no - beg_img_no;
                // 获取模型期望的图像尺寸（高度和宽度）
                // m_rec_image_shape 是一个数组，假设格式为 [通道数, 高度, 宽度]
                int imgH = m_rec_image_shape[1]; // 目标图像高度
                int imgW = m_rec_image_shape[2]; // 目标图像宽度
                // 初始化最大宽高比，初始值为模型期望的宽高比 (imgW / imgH)
                float max_wh_ratio = (imgW * 1.0f) / imgH;

                // 遍历当前批次中的图像，计算批次中最小的宽高比
                for (int ino = beg_img_no; ino < end_img_no; ino++)
                {
                    // 获取当前图像的高度和宽度
                    int h = img_list[indices[ino]].Rows;
                    int w = img_list[indices[ino]].Cols;
                    // 计算当前图像的宽高比
                    float wh_ratio = (w * 1.0f) / h;
                    // 更新 max_wh_ratio 为批次中所有图像宽高比的最小值
                    max_wh_ratio = Math.Min(max_wh_ratio, wh_ratio);
                }

                // 初始化批次最大宽度（用于后续张量填充）
                int batch_width = 0;
                // 创建一个列表，存储归一化后的图像批次
                List<Mat> norm_img_batch = new List<Mat>();

                // 预处理当前批次的图像
                for (int ino = beg_img_no; ino < end_img_no; ino++)
                {
                    // 创建一个 Mat 对象，复制当前图像
                    Mat srcimg = new Mat();
                    img_list[indices[ino]].CopyTo(srcimg);

                    //Cv2.ImShow("srcimg", srcimg);
                    //Cv2.WaitKey(0);

                    // 调用 PreProcess.crnn_resize_img 调整图像大小
                    // - max_wh_ratio 确保批次中所有图像按统一宽高比调整
                    // - m_rec_image_shape 提供目标尺寸
                    //Mat resize_img = PreProcess.crnn_resize_img(srcimg, max_wh_ratio, m_rec_image_shape);
                    Mat resize_img = PreProcess.crnn_resize_img(srcimg, max_wh_ratio, m_rec_image_shape);

                    //Cv2.ImShow("resize_img", resize_img);
                    //Cv2.WaitKey(0);
                  

                    // 对调整后的图像进行归一化
                    // - m_mean: 均值（用于归一化）
                    // - m_scale: 缩放因子
                    // - m_is_scale: 是否启用缩放
                    //PreProcess.normalize(resize_img, m_mean, m_scale, m_is_scale);
                    PreProcess.normalize(resize_img, m_mean, m_scale, m_is_scale);

                    // 将归一化后的图像添加到批次列表
                    norm_img_batch.Add(resize_img);
                    // 更新批次最大宽度（取所有图像宽度的最大值）
                    batch_width = Math.Max(resize_img.Cols, batch_width);
                }

                // 将批次图像转换为模型输入格式的浮点数数组
                // PreProcess.permute_batch 通常将图像数据重排为 [batch, channels, height, width] 格式
                float[] input_data = PreProcess.permute_batch(norm_img_batch);

                //DateTime start = DateTime.Now;
                float[] predict_batch = infer(input_data, new long[] { batch_num, 3, m_input_size[2], batch_width });
                //DateTime end = DateTime.Now;
                //Console.WriteLine("time: " + (end - start).TotalMilliseconds);

                // 计算每个图像的预测序列长度
                // predict_batch.Length 是模型输出的总元素数
                // m_output_shape.Last<long>() 是输出张量的最后一个维度（通常是类别数）
                // batch_len 是每个图像的预测序列长度（时间步数）
                int batch_len = (int)Math.Round((double)(predict_batch.Length / batch_num / m_output_shape.Last<long>()));

                // 处理每个批次中的图像预测结果
                for (int m = 0; m < m_rec_batch_num; m++)
                {
                    // 如果当前图像索引超出总图像数，退出循环
                    if (beg_img_no + m >= img_num)
                        return;

                    // 初始化识别结果字符串
                    string str_res = "";
                    // 存储当前预测的类别索引
                    int argmax_idx;
                    // 记录上一个预测的类别索引，用于去重
                    int last_index = 0;
                    // 累积置信度分数
                    float score = 0.0f;
                    // 有效预测的计数（用于计算平均置信度）
                    int count = 0;
                    // 存储当前时间步的最大概率值
                    float max_value = 0.0f;

                    // 计算当前图像预测数据的起始偏移
                    // m * batch_len * m_output_shape.Last<long>() 是当前图像在 predict_batch 中的起始位置
                    int pre_data = (int)(m * batch_len * m_output_shape.Last<long>());
                    // 遍历当前图像的预测序列（时间步）
                    for (int n = 0; n < batch_len; n++)
                    {
                        // 提取当前时间步的预测结果（类别概率分布）
                        float[] res = new float[m_output_shape.Last<long>()];
                        Array.Copy(predict_batch, m_output_shape.Last<long>() * n + pre_data, res, 0, m_output_shape.Last<long>());

                        // get idx and score
                        // 获取最大概率的类别索引和对应的概率值
                        // PaddleOcrUtility.argmax 返回最大值的索引
                        argmax_idx = (int)(PaddleOcrUtility.argmax(res, out max_value));

                        // 检查是否满足添加条件：
                        // - argmax_idx > 0: 排除空白类别（通常 0 表示空白）
                        // - !(n > 0 && argmax_idx == last_index): 避免连续重复的类别（CTC 去重）
                        // - argmax_idx < m_label_list.Count: 确保索引在标签列表范围内
                        if (argmax_idx > 0 && (!(n > 0 && argmax_idx == last_index)) && argmax_idx < m_label_list.Count)
                        {
                            // 累加置信度
                            score += max_value;
                            // 计数加一
                            count += 1;
                            // 将对应的标签（字符）添加到结果字符串
                            str_res += m_label_list[argmax_idx];
                        }
                        last_index = argmax_idx;
                    }

                    // 计算平均置信度（如果 count > 0）
                    score /= count;
                    // 如果 score 为 0（即没有有效预测），跳过当前图像
                    if (score == 0.0f)
                    {
                        continue;
                    }
                    rec_texts[indices[beg_img_no + m]] = str_res;
                    rec_text_scores[indices[beg_img_no + m]] = score;
                }

            }
        }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

在C#推理中实现PP-OCR返回单字符坐标 #16655

🔎 Search before asking

🐛 Bug (问题描述)

🏃‍♂️ Environment (运行环境)

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

在C#推理中实现PP-OCR返回单字符坐标 #16655

Description

🔎 Search before asking

🐛 Bug (问题描述)

🏃‍♂️ Environment (运行环境)

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions