We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
跨页表格的表头在上一页(只有表头),表格内容全部在下一页,content_list里找不到表头部分。见下图
设置为 "device-mode": "cuda", "layout-config": { "model": "doclayout_yolo" }, "formula-config": { "mfd_model": "yolo_v8_mfd", "mfr_model": "unimernet_small", "enable": false }, "table-config": { "model": "rapid_table", "sub_model": "slanet_plus", "enable": true, "max_time": 400 },
解析以下文件,表格位于第17页最底部
2024年度河南航空港投资集团有限公司信用评级报告.pdf
获得的输出
content_list.json
layout.json
model.json
Linux
3.10
1.0.x
cuda
The text was updated successfully, but these errors were encountered:
17页表12的表头太靠近页面底部被识别成页脚了,这个不太好解决,只能自己手动处理下
Sorry, something went wrong.
该怎样手动解决呢?表12的第一行,也就是“产业园名称”,“项目状态”这一行在content_list、middle和md文件里都没有出现,也就是说解析所有输出文件这一行都是不存在的,在middle文件里只输出了table_caption,没有table_body。所以也不知道该怎样处理,有没有其他办法能读到这一行的信息呢?谢谢
手动处理当然是手动把表头和表的caption信息补全到后面那页的表信息里,输出不直接可用的情况下手动微调是必须的
No branches or pull requests
Description of the bug | 错误描述
跨页表格的表头在上一页(只有表头),表格内容全部在下一页,content_list里找不到表头部分。见下图
How to reproduce the bug | 如何复现
设置为
"device-mode": "cuda",
"layout-config": {
"model": "doclayout_yolo"
},
"formula-config": {
"mfd_model": "yolo_v8_mfd",
"mfr_model": "unimernet_small",
"enable": false
},
"table-config": {
"model": "rapid_table",
"sub_model": "slanet_plus",
"enable": true,
"max_time": 400
},
解析以下文件,表格位于第17页最底部
2024年度河南航空港投资集团有限公司信用评级报告.pdf
获得的输出
content_list.json
layout.json
model.json
Operating system | 操作系统
Linux
Python version | Python 版本
3.10
Software version | 软件版本 (magic-pdf --version)
1.0.x
Device mode | 设备模式
cuda
The text was updated successfully, but these errors were encountered: