Fix Structured Output for GPT-OSS Models#4386
Fix Structured Output for GPT-OSS Models#4386windreamer wants to merge 1 commit intoInternLM:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes structured output for GPT-OSS models by avoiding Guided Decoding (which conflicts with Harmony response parsing) and instead injecting the requested response schema into the prompt using Harmony’s native # Response Formats section.
Changes:
- Detect GPT-OSS (
arch == 'GptOssForCausalLM') requests with non-textresponse_format. - Inject the serialized
response_formatschema into the system message under# Response Formats(creating a system message if missing). - Disable guided decoding for GPT-OSS by clearing the local
response_formatpassed intoGenerationConfig.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… Harmony/JSON mode conflict for GPT-OSS Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
f51f924 to
d3f847a
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
It's been a while since I compiled from source ,does |
No need to recompile it, you can just patch the python part. |
|
After patching this |
Motivation
GPT-OSS models use Harmony Response format, which conflicts with Guided Decoding (token-level JSON constraint) when
response_formatis specified. This causes:message.parsedresultsBreaking existing OpenAI SDK clients using
client.beta.chat.completions.parse().Modification
Approach: Replace Guided Decoding with Harmony-native structured output.
response_format# Response Formatssectionresponse_formatcloses: #4347