Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Replace HTTP+SSE with new "Streamable HTTP" transport #206
[RFC] Replace HTTP+SSE with new "Streamable HTTP" transport #206
Changes from 19 commits
ec96e58
fb8e9ce
66a1cdd
39af45c
5474985
bd504c8
048f812
1917ba4
e0b3493
b3364c9
732fdd2
4001c48
d5e0a0e
ea699ca
9cffe4f
103bcc0
20399a4
06abe2c
fe418dd
67fe21f
0f4924b
29fcf7e
60dbfcd
3a57a03
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we should prescribe it being an "independent process" because of two things (1) I am not fully sure what "independent process" refers to (2) if we were to deprecate stdio in favor of HTTP everywhere, it might still be beneficial to launch the server as a parent process in order to control lifecycle easily, in which case the server process is not independent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This language was here before, but happy to revise it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since some servers will now return "plain" JSON, two questions occur to me:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a larger TODO here, to put some recommendations about timeouts into the spec, and maybe codify conventions like "issuing a progress notification should reset default timeouts." This is definitely important but probably should happen separate to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a bigger discussion here about how best to support streaming chunks—on the input side too. I'm inclined to leave mentions out of the current spec but tackle it in follow-ups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My take here is that (a) timeouts depend on the SDKs and can be freely chosen, but i am okay with giving guidance (b) Clients should only be concerned with starting to parse responses once a newline is detected, so I think clients who CAN do streamed parsing can do so, but clients who cannot sould be fine too. I certainly don't want to expect a streaming json parser as it would raise the bar significantly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally timeouts should be exposed as configuration on the client, so that in extenuating circumstances they can be tuned eg. users in APAC calling a server deployed in us-east regions may need to tune higher read timeouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that
EventSource
already does not support POST requests meaning thatfetch
will have to be used by browser-based clients, why not go all the way and allow more than one JSON-RPC message in POST request's streaming request body? That's certainly where my mind goes to when renaming the transport from "HTTP with SSE" to "Streamable HTTP".While this wouldn't solve the resumability issue by itself, it would vastly simplify the transport. It would be much closer to the stdio transport, and it could potentially support binary data.
And I think it would help with the resumability issue. It greatly simplifies resumability to only have one logical stream per connection like the stdio transport does. That way, you're not stuck managing multiple last-message-ids on the server which seems like a pain.
If a core design principal is that clients should handle complexity where it exists, I'd suggest forcing the client to only have one resumable server-to-client message stream at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON-RPC supports batching. You can pass an array of requests as a JSON-RPC body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW - in OpenID Provider Commands we are proposing a POST that returns either application/json or text/event-stream https://openid.github.io/openid-provider-commands/main.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supporting batched client -> server messages makes sense! I honestly had forgotten that aspect of JSON-RPC, since we ignored it for so long. 😬
I would like to extend support to streaming request bodies, but I think we should kick the can on this a bit, as it will probably involve significant discussion of its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll punt on batching in this PR as well, as it also affects the stdio transport and has some wider-ranging implications, but basically I agree we should support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this relationship needs to be clearly explained, for example, what kind of notifications are related to the request.
Here is my understanding:
Is this understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is purposely left a bit unspecified at the moment. Your interpretation seems reasonable. Another example would be progress or logging notifications related to a tool call request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly dumb question, but: what should the client do if it wants to cancel a request after a disconnection, in the non-resumable case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can POST the
CancelledNotification
to the MCP endpoint. The notification contains a request ID as a parameter to refer to the thing that it's cancelling, and this ID would have been previously assigned when submitting the long-running request.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any alternatives here like having support for clients to cancel or discover ongoing requests ? I know that’s a lot of extra complexity but uncanceled requests can cause a lot of challenges
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Servers and clients should be responsible for automatically timing out requests from each other. We may add more language around this into the spec in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should the server know which session this event id corresponds to? After reading the github discussion, I was expecting the client to remember the
sessionId
from the firstInitializeResult
and pass it to this request, but I don't see that mentioned here.As a server author, I could choose to embed the session id in every event id, but that seems wasteful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
further complicated by the IDs needing to uniquely identify the stream as well.
i mentioned this in another thread but i think it got buried:
single stream
@jspahrsummers is there a reason we dont use a single stream (
/mcp
) as the server channel?clients send via POST
servers send in the stream
i may be missing something but it seems this would simplify things greatly. it would eliminate the complexity and overhead of multiple streams and ensure that replays are consistent up to a point of failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multi-stream adds non-trivial complications to the backend instead of a single stream for stream resumption/re-delivery.
In the current spec's single SSE/stream world, backend resumption/redelivery would be done for a single stream. The existing backend complexity of routing messages to the correct stream on the correct machine remains, with the added complexity of buffering messages for that single stream.
However, in a multi-stream world, these backend complexities are multiplied by the number of individual streams within a session. The backend would now have to keep track of what streams are active on possibly different machines to determine session liveliness, and routing messages to the correct streams for a given session is a much more complicated mapping/routing problem. Then, in addition to this more complex routing, mapping, and liveliness, you still have to figure out how to implement per-stream buffering for resumption/re-delivery.
A single stream is 100% the way to go if you plan on keeping the server's complexity minimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing POST requests to turn into a stream permits a gradual increase in complexity in server design. For example, a server design might evolve like this (starting from the simplest version to the most complex):
It's pretty important that 2 can be done without any additional server infrastructure. Resumability is a nice feature but far from required, and this permits servers that don't require it to remain quite simple.
If we had only a single stream (like the existing SSE transport), then all streaming would require a message bus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nagl-temporal As described in the "session management" section, the session ID header should be attached to all subsequent HTTP requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that a gradual increase in complexity for server design is a goal the protocol should optimize for. However, per-request streams are orthogonal from letting a JSON-RPC POST request return the corresponding JSON-RPC response. You can still support that without adopting multi-stream.
This isn't true in any multi-server production system. The instant any other request/event can impact the results streamed from another production machine, a message bus will be required.
Alternatives Approaches
Single stream, multi-subscription. Keep a streaming endpoint, make it available only if the client/server agree on it as a supported capability and expose cancellable subscriptions through that single streaming endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure that's needed. Is there a world where we can just always let the server start a session at any point by sending a session id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want clients to have to figure out what it means for servers to return a session ID at any time. It introduces synchronization challenges for both sides, since (e.g.) it could happen on 1 out of N concurrent streams, or even multiple times on different streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we only limit the valid format of session id, can we also limit its maximum length?
Because for most backend services or API gateways( Kong, ingress-nginx), in order to protect the backend application, the maximum data that can be received is limited to avoid attacks or abuse.
If we can clarify this limitation at the specification level, we will have a reference for subsequent deployment in production environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this, but I found it hard to pick a good value—something arbitrary like 100 or 255 could easily be too short, while specifying a very large number feels equally arbitrary and constraining. HTTP also doesn't define a maximum length for header values AFAIK, and implementations treat it differently.
I think it might be best to allow implementations to define this themselves, and if we see incompatibilities emerge, we can revisit it in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As another point of reference, it seems OAuth also does not specify a maximum length for tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OAuth does not. While that might have been an oversight (I'm the editor) I do not know of any incidents where it has been an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There have been some security issues in the Kubernetes community caused by sending large payloads.
Or we can add a suggestion? For example, it is usually recommended that it does not exceed 1M.
I care about this mainly because I believe that many managed MCP servers will emerge in the future. This point is very important for deployment in a production environment.