Skip to content

convert_document_into_docling_document hits MCP client timeout on cold start (no preload path) #98

@nicholasjayantylearns

Description

@nicholasjayantylearns

Symptom

A user installs docling-mcp via uvx (e.g. in Claude Desktop's
mcpServers config), authorizes a folder containing a PDF, and asks the
client to convert it. The server starts, the handshake completes, all 19
tools register, and the convert call times out:

McpError: MCP error -32001: Request timed out

The server-side Processing document from source: ... log line is reached
(the call passed all server-side validation), but DocumentConverter.convert()
doesn't return inside the client's 60-second request timeout because
Docling's first call has to import PyTorch, load OCR models, initialize the
layout detector, and prime the table-structure pipeline. On macOS this
takes anywhere from 30 to >120 seconds depending on hardware and whether
models are already cached locally.

Reproducible against main. Not specific to any PDF — the very first
conversion in a fresh server process always pays the cold-start cost.

Why progress notifications don't fix it

The obvious workaround is to use notifications/progress as a keepalive,
the way convert_directory_files_into_docling_document already does for
multi-file work. But: the MCP Python SDK's BaseSession enforces request
timeouts via anyio.fail_after(timeout) — a hard wall-clock — and
dispatches progress notifications through _progress_callbacks without
resetting that timeout. The TypeScript client used by Claude Desktop appears
to behave the same way (untested, but the reference behavior is the
contract clients implement against).

So even if we added ctx.report_progress() calls inside the single-document
tool, the underlying convert() call still blocks one thread for >60s and
the client still cancels.

Proposed fix: eager model preload at server startup

Move the cold-start cost from "first user-facing call" to "server boot,"
behind a CLI flag so users who don't want a slow startup can opt out.

Sketch:

# docling_mcp/servers/mcp_server.py — new option, defaults False
preload_models: Annotated[
    bool,
    typer.Option(
        "--preload-models",
        help="Eagerly initialize the Docling converter at server startup. "
             "Adds 30-120s to server boot but makes the first conversion "
             "call return inside the client's request timeout.",
    ),
] = False,
# Before mcp.run():
if preload_models:
    logger.info("--preload-models: warming up DocumentConverter...")
    from docling_mcp.tools.conversion import _get_converter
    _get_converter()  # triggers @lru_cache, loads all models
    logger.info("DocumentConverter ready")

This pushes the cost into the server-process startup window. Claude
Desktop's spawn-and-handshake timeout for MCP servers is more generous
than its per-request timeout (in practice, 2-3 minutes), so eager load
fits within it for typical model sizes.

For container/Llama Stack deployments, the same flag means the container is
"warm" by the time its health check passes — net benefit, not regression.

What I'd want from maintainers before opening a PR

  1. Is the proposed CLI flag the right shape? Or would you prefer an env
    var (DOCLING_MCP_PRELOAD=1), a config-file setting, or default-on with
    --lazy-models as the opt-out?
  2. Is the underlying assumption (cold-start fits in spawn timeout)
    verified for the actual model set?
    I haven't measured it across model
    variants. If a particular config (large OCR model, multiple language
    packs) blows past the spawn timeout too, the preload approach needs
    smarter chunking or a separate warmup tool the client can call
    on-demand.
  3. Relation to Not able to get docling mcp working on Claude Desktop #86 / feat: implement MCP Roots protocol for dynamic allowed-directories #95. feat: implement MCP Roots protocol for dynamic allowed-directories #95 implements MCP Roots support and is
    currently marked draft because it can't deliver a complete end-to-end
    experience without this cold-start fix. Would you prefer:
    • Roots PR merges independently, this cold-start fix lands as a separate
      PR shortly after; or
    • Cold-start fix lands first, Roots PR updates to depend on it and ships
      a verified end-to-end demo in its receipts.

Happy to write whichever PR shape the maintainers prefer.

Repro environment

  • macOS 14 (Apple Silicon)
  • Claude Desktop, latest
  • uvx --from git+https://github.com/<fork>/docling-mcp.git@main docling-mcp-server --transport stdio
  • Test PDF: any single-page text PDF; reproduces regardless of content

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions