Allow the user, when serving agents, specify how the thinking blocks are going to be handled (disabled, truncated to a maximum length, or full, only function calls - no outputs, etc.) since some tools can produce very long outputs (big tables) and this table should not be streamed back to the client (as part of thinking block), only the output come of the LLM reasoning based on it.
Allow the user, when serving agents, specify how the thinking blocks are going to be handled (disabled, truncated to a maximum length, or full, only function calls - no outputs, etc.) since some tools can produce very long outputs (big tables) and this table should not be streamed back to the client (as part of thinking block), only the output come of the LLM reasoning based on it.