Skip to content

Add DICOM (.dcm) Support via Plugin #2072

@timburman

Description

@timburman

Add DICOM (.dcm) Support via Plugin

Summary

I would like to propose adding support for DICOM (.dcm) files through the plugin system.

DICOM is a widely used format not only in medical imaging but also in industrial radiography, non-destructive testing (NDT), CT inspection, aerospace inspection, and manufacturing quality assurance workflows.

MarkItDown's goal is to convert diverse file formats into LLM-friendly Markdown. DICOM files contain a large amount of structured metadata that maps naturally to Markdown and could be valuable for indexing, search, retrieval, and RAG workflows.

Why This Fits MarkItDown

MarkItDown already handles formats that contain both metadata and binary content (e.g. images, audio, and documents) by prioritizing textual representations that are useful for LLMs.

DICOM follows the same pattern:

  • Structured metadata (study information, acquisition parameters, equipment information, descriptions, timestamps, etc.)
  • Binary pixel data

A DICOM converter could focus on extracting and organizing metadata into Markdown, similar to how other converters transform format-specific information into text.

Potential Output

Example output could include:

  • Study information
  • Series information
  • Acquisition parameters
  • Equipment metadata
  • Image properties (dimensions, bit depth, modality, photometric interpretation, etc.)
  • Relevant textual tags such as descriptions and comments

This would make DICOM datasets searchable and usable in RAG pipelines without requiring users to manually parse DICOM headers.

Scope

For an initial implementation, I suggest limiting support to:

  • Metadata extraction
  • Image/property metadata
  • Structured Markdown generation

Out of scope:

  • Pixel-level image analysis
  • Medical interpretation
  • Defect detection
  • Vision-based captioning of scans

The pixel array itself would not be converted into text or embedded in Markdown. Instead, the converter would extract basic image characteristics such as dimensions, modality, photometric interpretation, bit depth, and related metadata.

Implementation Direction

I propose contributing this as a new plugin package (e.g. markitdown-dicom) within the packages/ directory of the monorepo, similar to existing plugin packages. This would keep the pydicom dependency isolated from the core library while allowing users who work with DICOM files to install the functionality when needed.

The implementation would likely be built on top of pydicom, which is the standard Python library for DICOM handling.

Testing

To ensure compatibility across different DICOM variants, I would plan to test against publicly available DICOM datasets and sample files covering different modalities and metadata structures.

Contribution

I would be interested in contributing this if the maintainers feel it aligns with the project's goals.

I'd also be happy to implement it either as an in-tree plugin package within the repository or as a community-maintained plugin, depending on the maintainers' preference.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions