convert_document_into_docling_document hits MCP client timeout on cold start (no preload path)

## Symptom

A user installs `docling-mcp` via `uvx` (e.g. in Claude Desktop's
`mcpServers` config), authorizes a folder containing a PDF, and asks the
client to convert it. The server starts, the handshake completes, all 19
tools register, and the convert call times out:

```
McpError: MCP error -32001: Request timed out
```

The server-side `Processing document from source: ...` log line is reached
(the call passed all server-side validation), but `DocumentConverter.convert()`
doesn't return inside the client's 60-second request timeout because
Docling's first call has to import PyTorch, load OCR models, initialize the
layout detector, and prime the table-structure pipeline. On macOS this
takes anywhere from 30 to >120 seconds depending on hardware and whether
models are already cached locally.

Reproducible against `main`. Not specific to any PDF — the very first
conversion in a fresh server process always pays the cold-start cost.

## Why progress notifications don't fix it

The obvious workaround is to use `notifications/progress` as a keepalive,
the way `convert_directory_files_into_docling_document` already does for
multi-file work. But: the MCP Python SDK's `BaseSession` enforces request
timeouts via `anyio.fail_after(timeout)` — a hard wall-clock — and
dispatches progress notifications through `_progress_callbacks` without
resetting that timeout. The TypeScript client used by Claude Desktop appears
to behave the same way (untested, but the reference behavior is the
contract clients implement against).

So even if we added `ctx.report_progress()` calls inside the single-document
tool, the underlying `convert()` call still blocks one thread for >60s and
the client still cancels.

## Proposed fix: eager model preload at server startup

Move the cold-start cost from "first user-facing call" to "server boot,"
behind a CLI flag so users who don't want a slow startup can opt out.

Sketch:

```python
# docling_mcp/servers/mcp_server.py — new option, defaults False
preload_models: Annotated[
    bool,
    typer.Option(
        "--preload-models",
        help="Eagerly initialize the Docling converter at server startup. "
             "Adds 30-120s to server boot but makes the first conversion "
             "call return inside the client's request timeout.",
    ),
] = False,
```

```python
# Before mcp.run():
if preload_models:
    logger.info("--preload-models: warming up DocumentConverter...")
    from docling_mcp.tools.conversion import _get_converter
    _get_converter()  # triggers @lru_cache, loads all models
    logger.info("DocumentConverter ready")
```

This pushes the cost into the server-process startup window. Claude
Desktop's spawn-and-handshake timeout for MCP servers is more generous
than its per-request timeout (in practice, 2-3 minutes), so eager load
fits within it for typical model sizes.

For container/Llama Stack deployments, the same flag means the container is
"warm" by the time its health check passes — net benefit, not regression.

## What I'd want from maintainers before opening a PR

1. **Is the proposed CLI flag the right shape?** Or would you prefer an env
   var (`DOCLING_MCP_PRELOAD=1`), a config-file setting, or default-on with
   `--lazy-models` as the opt-out?
2. **Is the underlying assumption (cold-start fits in spawn timeout)
   verified for the actual model set?** I haven't measured it across model
   variants. If a particular config (large OCR model, multiple language
   packs) blows past the spawn timeout too, the preload approach needs
   smarter chunking or a separate `warmup` tool the client can call
   on-demand.
3. **Relation to #86 / #95.** #95 implements MCP Roots support and is
   currently marked draft because it can't deliver a complete end-to-end
   experience without this cold-start fix. Would you prefer:
   * Roots PR merges independently, this cold-start fix lands as a separate
     PR shortly after; or
   * Cold-start fix lands first, Roots PR updates to depend on it and ships
     a verified end-to-end demo in its receipts.

Happy to write whichever PR shape the maintainers prefer.

## Repro environment

* macOS 14 (Apple Silicon)
* Claude Desktop, latest
* `uvx --from git+https://github.com/<fork>/docling-mcp.git@main docling-mcp-server --transport stdio`
* Test PDF: any single-page text PDF; reproduces regardless of content


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert_document_into_docling_document hits MCP client timeout on cold start (no preload path) #98

Symptom

Why progress notifications don't fix it

Proposed fix: eager model preload at server startup

What I'd want from maintainers before opening a PR

Repro environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

convert_document_into_docling_document hits MCP client timeout on cold start (no preload path) #98

Description

Symptom

Why progress notifications don't fix it

Proposed fix: eager model preload at server startup

What I'd want from maintainers before opening a PR

Repro environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions