Releases · DS4SD/docling

26 Feb 14:16

deep-search-ops

v2.25.0

37dd8c1

v2.25.0 Latest

Latest

Feature

[Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054) (3c9fe76)
cli: Add option for downloading all models, refine help messages (#1061) (ab683e4)

Fix

Vlm using artifacts path (#1057) (e197225)
html: Parse text in div elements as TextItem (#1041) (1b0ead6)

Documentation

Extend chunking docs, add FAQ on token limit (#1053) (c84b973)

Assets 2

20 Feb 18:31

deep-search-ops

v2.24.0

d8a81c3

v2.24.0

Feature

Implement new reading-order model (#916) (c93e369)

Assets 2

20 Feb 16:26

deep-search-ops

v2.23.1

c031a7a

v2.23.1

Fix

Runtime error when Pandas Series is not always of string type (#1024) (6796f0a)

Documentation

Revamp picture description example (#1015) (27c0400)

Assets 2

17 Feb 14:22

deep-search-ops

v2.23.0

75db611

v2.23.0

Feature

Support cuda:n GPU device allocation (#694) (77eb77b)
xml-jats: Parse XML JATS documents (#967) (428b656)

Fix

Revise DocTags, fix iterate_items to output content_layer in items (#965) (6e75f0b)

Assets 2

14 Feb 08:53

deep-search-ops

v2.22.0

ffbde1d

v2.22.0

Feature

Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) (00d9405)
Introduce the enable_remote_services option to allow remote connections while processing (#941) (2716c7d)
Allow artifacts_path to be defined as ENV (#940) (5101e25)

Fix

Update Pillow constraints (#958) (af19c03)
Fix the initialization of the TesseractOcrModel (#935) (c47ae70)

Documentation

Update example Dockerfile with download CLI (#929) (7493d5b)
Examples for picture descriptions (#951) (2d66e99)

Assets 2

10 Feb 11:43

deep-search-ops

v2.21.0

de46209

v2.21.0

Feature

Add content_layer property to items to address body, furniture and other roles (#735) (cf78d5b)

Assets 2

07 Feb 17:46

deep-search-ops

v2.20.0

3e26597

v2.20.0

Feature

Describe pictures using vision models (#259) (4cc6e3e)

Fix

Remove unused httpx (#919) (c18f47c)

Assets 2

07 Feb 13:36

deep-search-ops

v2.19.0

fba3cf9

v2.19.0

Feature

New artifacts path and CLI utility (#876) (ed74fe2)

Fix

markdown: Handle nested lists (#910) (90b766e)
Test cases for RTL programmatic PDFs and fixes for the formula model (#903) (9114ada)
msword_backend: Handle conversion error in label parsing (#896) (722a6eb)
Enrichment models batch size and expose picture classifier (#878) (5ad6de0)

Documentation

Introduce example with custom models for RapidOCR (#874) (6d3fea0)

Assets 2

03 Feb 14:58

deep-search-ops

v2.18.0

b5da408

v2.18.0

Feature

Expose equation exports (#869) (6a76b49)
Add option to define page range (#852) (70d68b6)
docx: Support of SDTs in docx backend (#853) (d727b04)
Python 3.13 support (#841) (4df085a)

Fix

markdown: Fix parsing if doc ending with table (#873) (5ac2887)
markdown: Add support for HTML content (#855) (94751a7)
docx: Merged table cells not properly converted (#857) (0cd81a8)
Processing of placeholder shapes in pptx that have text but no bbox (#868) (eff16b6)
KeyError in tableformer prediction (#854) (b1cf796)
Fixed docx import with headers that are also lists (#842) (2c037ae)
Use new add_code in html backend and add more typing hints (#850) (2a1f8af)
markdown: Fix empty block handling (#843) (bccb022)
Fix for the crash when encountering WMF images in pptx and docx (#837) (fea0a99)

Documentation

Updated the readme with upcoming features (#831) (d7c0828)
Add example for inspection of picture content (#624) (f9144f2)

Assets 2

28 Jan 18:37

deep-search-ops

v2.17.0

4d11d87

v2.17.0

Feature

CLI: Expose code and formula models in the CLI (#820) (6882e6c)
Add platform info to CLI version printout (#816) (95b293a)
ocr: Expose rec_keys_path in RapidOcrOptions to support custom dictionaries (#786) (5332755)
Introduce automatic language detection in TesseractOcrCliModel (#800) (3be2fb5)

Fix

Fix single newline handling in MD backend (#824) (5aed9f8)
Use file extension if filetype fails with PDF (#827) (adf6353)
Parse html with omitted body tag (#818) (a112d7a)

Documentation

Document Docling JSON parsing (#819) (6875913)
Add SSL verification error mitigation (#821) (5139b48)
backend XML: Do not delete temp file in notebook (#817) (4d41db3)
Typo (#814) (8a4ec77)
Added markdown headings to enable TOC in github pages (#808) (b885b2f)
Description of supported formats and backends (#788) (c2ae1cc)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature

Fix

Documentation

Feature

Fix

Documentation

Feature

Fix

Feature

Fix

Documentation

Feature

Feature

Fix

Feature

Fix

Documentation

Feature

Fix

Documentation

Feature

Fix

Documentation

Releases: DS4SD/docling

v2.25.0

Feature

Fix

Documentation

v2.24.0

Feature

v2.23.1

Fix

Documentation

v2.23.0

Feature

Fix

v2.22.0

Feature

Fix

Documentation

v2.21.0

Feature

v2.20.0

Feature

Fix

v2.19.0

Feature

Fix

Documentation

v2.18.0

Feature

Fix

Documentation

v2.17.0

Feature

Fix

Documentation