Skip to content

Releases: DS4SD/docling

v2.25.0

26 Feb 14:16
Compare
Choose a tag to compare

Feature

  • [Experimental] Introduce VLM pipeline using HF AutoModelForVision2Seq, featuring SmolDocling model (#1054) (3c9fe76)
  • cli: Add option for downloading all models, refine help messages (#1061) (ab683e4)

Fix

Documentation

  • Extend chunking docs, add FAQ on token limit (#1053) (c84b973)

v2.24.0

20 Feb 18:31
Compare
Choose a tag to compare

Feature

v2.23.1

20 Feb 16:26
Compare
Choose a tag to compare

Fix

  • Runtime error when Pandas Series is not always of string type (#1024) (6796f0a)

Documentation

v2.23.0

17 Feb 14:22
Compare
Choose a tag to compare

Feature

Fix

  • Revise DocTags, fix iterate_items to output content_layer in items (#965) (6e75f0b)

v2.22.0

14 Feb 08:53
Compare
Choose a tag to compare

Feature

  • Add support for CSV input with new backend to transform CSV files to DoclingDocument (#945) (00d9405)
  • Introduce the enable_remote_services option to allow remote connections while processing (#941) (2716c7d)
  • Allow artifacts_path to be defined as ENV (#940) (5101e25)

Fix

Documentation

  • Update example Dockerfile with download CLI (#929) (7493d5b)
  • Examples for picture descriptions (#951) (2d66e99)

v2.21.0

10 Feb 11:43
Compare
Choose a tag to compare

Feature

  • Add content_layer property to items to address body, furniture and other roles (#735) (cf78d5b)

v2.20.0

07 Feb 17:46
Compare
Choose a tag to compare

Feature

  • Describe pictures using vision models (#259) (4cc6e3e)

Fix

v2.19.0

07 Feb 13:36
Compare
Choose a tag to compare

Feature

Fix

  • markdown: Handle nested lists (#910) (90b766e)
  • Test cases for RTL programmatic PDFs and fixes for the formula model (#903) (9114ada)
  • msword_backend: Handle conversion error in label parsing (#896) (722a6eb)
  • Enrichment models batch size and expose picture classifier (#878) (5ad6de0)

Documentation

  • Introduce example with custom models for RapidOCR (#874) (6d3fea0)

v2.18.0

03 Feb 14:58
Compare
Choose a tag to compare

Feature

Fix

  • markdown: Fix parsing if doc ending with table (#873) (5ac2887)
  • markdown: Add support for HTML content (#855) (94751a7)
  • docx: Merged table cells not properly converted (#857) (0cd81a8)
  • Processing of placeholder shapes in pptx that have text but no bbox (#868) (eff16b6)
  • KeyError in tableformer prediction (#854) (b1cf796)
  • Fixed docx import with headers that are also lists (#842) (2c037ae)
  • Use new add_code in html backend and add more typing hints (#850) (2a1f8af)
  • markdown: Fix empty block handling (#843) (bccb022)
  • Fix for the crash when encountering WMF images in pptx and docx (#837) (fea0a99)

Documentation

  • Updated the readme with upcoming features (#831) (d7c0828)
  • Add example for inspection of picture content (#624) (f9144f2)

v2.17.0

28 Jan 18:37
Compare
Choose a tag to compare

Feature

  • CLI: Expose code and formula models in the CLI (#820) (6882e6c)
  • Add platform info to CLI version printout (#816) (95b293a)
  • ocr: Expose rec_keys_path in RapidOcrOptions to support custom dictionaries (#786) (5332755)
  • Introduce automatic language detection in TesseractOcrCliModel (#800) (3be2fb5)

Fix

  • Fix single newline handling in MD backend (#824) (5aed9f8)
  • Use file extension if filetype fails with PDF (#827) (adf6353)
  • Parse html with omitted body tag (#818) (a112d7a)

Documentation