You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A user installs docling-mcp via uvx (e.g. in Claude Desktop's mcpServers config), authorizes a folder containing a PDF, and asks the
client to convert it. The server starts, the handshake completes, all 19
tools register, and the convert call times out:
McpError: MCP error -32001: Request timed out
The server-side Processing document from source: ... log line is reached
(the call passed all server-side validation), but DocumentConverter.convert()
doesn't return inside the client's 60-second request timeout because
Docling's first call has to import PyTorch, load OCR models, initialize the
layout detector, and prime the table-structure pipeline. On macOS this
takes anywhere from 30 to >120 seconds depending on hardware and whether
models are already cached locally.
Reproducible against main. Not specific to any PDF — the very first
conversion in a fresh server process always pays the cold-start cost.
Why progress notifications don't fix it
The obvious workaround is to use notifications/progress as a keepalive,
the way convert_directory_files_into_docling_document already does for
multi-file work. But: the MCP Python SDK's BaseSession enforces request
timeouts via anyio.fail_after(timeout) — a hard wall-clock — and
dispatches progress notifications through _progress_callbacks without
resetting that timeout. The TypeScript client used by Claude Desktop appears
to behave the same way (untested, but the reference behavior is the
contract clients implement against).
So even if we added ctx.report_progress() calls inside the single-document
tool, the underlying convert() call still blocks one thread for >60s and
the client still cancels.
Proposed fix: eager model preload at server startup
Move the cold-start cost from "first user-facing call" to "server boot,"
behind a CLI flag so users who don't want a slow startup can opt out.
Sketch:
# docling_mcp/servers/mcp_server.py — new option, defaults Falsepreload_models: Annotated[
bool,
typer.Option(
"--preload-models",
help="Eagerly initialize the Docling converter at server startup. ""Adds 30-120s to server boot but makes the first conversion ""call return inside the client's request timeout.",
),
] =False,
# Before mcp.run():ifpreload_models:
logger.info("--preload-models: warming up DocumentConverter...")
fromdocling_mcp.tools.conversionimport_get_converter_get_converter() # triggers @lru_cache, loads all modelslogger.info("DocumentConverter ready")
This pushes the cost into the server-process startup window. Claude
Desktop's spawn-and-handshake timeout for MCP servers is more generous
than its per-request timeout (in practice, 2-3 minutes), so eager load
fits within it for typical model sizes.
For container/Llama Stack deployments, the same flag means the container is
"warm" by the time its health check passes — net benefit, not regression.
What I'd want from maintainers before opening a PR
Is the proposed CLI flag the right shape? Or would you prefer an env
var (DOCLING_MCP_PRELOAD=1), a config-file setting, or default-on with --lazy-models as the opt-out?
Is the underlying assumption (cold-start fits in spawn timeout)
verified for the actual model set? I haven't measured it across model
variants. If a particular config (large OCR model, multiple language
packs) blows past the spawn timeout too, the preload approach needs
smarter chunking or a separate warmup tool the client can call
on-demand.
Symptom
A user installs
docling-mcpviauvx(e.g. in Claude Desktop'smcpServersconfig), authorizes a folder containing a PDF, and asks theclient to convert it. The server starts, the handshake completes, all 19
tools register, and the convert call times out:
The server-side
Processing document from source: ...log line is reached(the call passed all server-side validation), but
DocumentConverter.convert()doesn't return inside the client's 60-second request timeout because
Docling's first call has to import PyTorch, load OCR models, initialize the
layout detector, and prime the table-structure pipeline. On macOS this
takes anywhere from 30 to >120 seconds depending on hardware and whether
models are already cached locally.
Reproducible against
main. Not specific to any PDF — the very firstconversion in a fresh server process always pays the cold-start cost.
Why progress notifications don't fix it
The obvious workaround is to use
notifications/progressas a keepalive,the way
convert_directory_files_into_docling_documentalready does formulti-file work. But: the MCP Python SDK's
BaseSessionenforces requesttimeouts via
anyio.fail_after(timeout)— a hard wall-clock — anddispatches progress notifications through
_progress_callbackswithoutresetting that timeout. The TypeScript client used by Claude Desktop appears
to behave the same way (untested, but the reference behavior is the
contract clients implement against).
So even if we added
ctx.report_progress()calls inside the single-documenttool, the underlying
convert()call still blocks one thread for >60s andthe client still cancels.
Proposed fix: eager model preload at server startup
Move the cold-start cost from "first user-facing call" to "server boot,"
behind a CLI flag so users who don't want a slow startup can opt out.
Sketch:
This pushes the cost into the server-process startup window. Claude
Desktop's spawn-and-handshake timeout for MCP servers is more generous
than its per-request timeout (in practice, 2-3 minutes), so eager load
fits within it for typical model sizes.
For container/Llama Stack deployments, the same flag means the container is
"warm" by the time its health check passes — net benefit, not regression.
What I'd want from maintainers before opening a PR
var (
DOCLING_MCP_PRELOAD=1), a config-file setting, or default-on with--lazy-modelsas the opt-out?verified for the actual model set? I haven't measured it across model
variants. If a particular config (large OCR model, multiple language
packs) blows past the spawn timeout too, the preload approach needs
smarter chunking or a separate
warmuptool the client can callon-demand.
currently marked draft because it can't deliver a complete end-to-end
experience without this cold-start fix. Would you prefer:
PR shortly after; or
a verified end-to-end demo in its receipts.
Happy to write whichever PR shape the maintainers prefer.
Repro environment
uvx --from git+https://github.com/<fork>/docling-mcp.git@main docling-mcp-server --transport stdio