-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pdf公式识别问题 #1877
Comments
我在cut_image.py里面也增加了def ocr_cut_image_and_table(spans, page, page_id, pdf_bytes_md5, imageWriter): for span in spans: |
我现在看到在images里面存在了行间图片的文件了,但是我运行完发现行间公式还是被识别了出来如果我想让它像普通图片一样是一个链接的形式的话我应该修改哪里? |
ocr_mkcontent.py的ocr_mk_markdown_with_para_core_v2的 elif para_type == BlockType.InterlineEquation:要怎么改才能把公式图片的位置放在那里, elif para_type == BlockType.InterlineEquation: |
另外公式中带文字解析会乱码有解决办法吗各位大佬 |
由于我们需要用来处理大量的学术文件,里面有很多数学公式,当前工具已经能识别很多公式了但是还是存在不少公式识别错误。我想了解一是:1.有没有可能通过修改公式识别的模型来提升公式识别准确率?2.如果不行的话能不能提取公式为图片?我确实不太了解整个项目和里面用到的技术,如果可以的话请给我一些指导
The text was updated successfully, but these errors were encountered: