Releases: DS4SD/docling
Releases · DS4SD/docling
v2.25.0
Feature
- [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054) (
3c9fe76
) - cli: Add option for downloading all models, refine help messages (#1061) (
ab683e4
)
Fix
- Vlm using artifacts path (#1057) (
e197225
) - html: Parse text in div elements as TextItem (#1041) (
1b0ead6
)
Documentation
v2.24.0
v2.23.1
v2.23.0
v2.22.0
Feature
- Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) (
00d9405
) - Introduce the enable_remote_services option to allow remote connections while processing (#941) (
2716c7d
) - Allow artifacts_path to be defined as ENV (#940) (
5101e25
)
Fix
- Update Pillow constraints (#958) (
af19c03
) - Fix the initialization of the TesseractOcrModel (#935) (
c47ae70
)
Documentation
v2.21.0
v2.20.0
v2.19.0
Feature
Fix
- markdown: Handle nested lists (#910) (
90b766e
) - Test cases for RTL programmatic PDFs and fixes for the formula model (#903) (
9114ada
) - msword_backend: Handle conversion error in label parsing (#896) (
722a6eb
) - Enrichment models batch size and expose picture classifier (#878) (
5ad6de0
)
Documentation
v2.18.0
Feature
- Expose equation exports (#869) (
6a76b49
) - Add option to define page range (#852) (
70d68b6
) - docx: Support of SDTs in docx backend (#853) (
d727b04
) - Python 3.13 support (#841) (
4df085a
)
Fix
- markdown: Fix parsing if doc ending with table (#873) (
5ac2887
) - markdown: Add support for HTML content (#855) (
94751a7
) - docx: Merged table cells not properly converted (#857) (
0cd81a8
) - Processing of placeholder shapes in pptx that have text but no bbox (#868) (
eff16b6
) - KeyError in tableformer prediction (#854) (
b1cf796
) - Fixed docx import with headers that are also lists (#842) (
2c037ae
) - Use new add_code in html backend and add more typing hints (#850) (
2a1f8af
) - markdown: Fix empty block handling (#843) (
bccb022
) - Fix for the crash when encountering WMF images in pptx and docx (#837) (
fea0a99
)
Documentation
v2.17.0
Feature
- CLI: Expose code and formula models in the CLI (#820) (
6882e6c
) - Add platform info to CLI version printout (#816) (
95b293a
) - ocr: Expose
rec_keys_path
in RapidOcrOptions to support custom dictionaries (#786) (5332755
) - Introduce automatic language detection in TesseractOcrCliModel (#800) (
3be2fb5
)
Fix
- Fix single newline handling in MD backend (#824) (
5aed9f8
) - Use file extension if filetype fails with PDF (#827) (
adf6353
) - Parse html with omitted body tag (#818) (
a112d7a
)
Documentation
- Document Docling JSON parsing (#819) (
6875913
) - Add SSL verification error mitigation (#821) (
5139b48
) - backend XML: Do not delete temp file in notebook (#817) (
4d41db3
) - Typo (#814) (
8a4ec77
) - Added markdown headings to enable TOC in github pages (#808) (
b885b2f
) - Description of supported formats and backends (#788) (
c2ae1cc
)