Skip to content

Fix Structured Output for GPT-OSS Models#4386

Open
windreamer wants to merge 1 commit intoInternLM:mainfrom
windreamer:fix_gpt_oss_guided_decoding
Open

Fix Structured Output for GPT-OSS Models#4386
windreamer wants to merge 1 commit intoInternLM:mainfrom
windreamer:fix_gpt_oss_guided_decoding

Conversation

@windreamer
Copy link
Collaborator

Motivation

GPT-OSS models use Harmony Response format, which conflicts with Guided Decoding (token-level JSON constraint) when response_format is specified. This causes:

  • Harmony parse errors
  • Request hangs
  • Empty message.parsed results

Breaking existing OpenAI SDK clients using client.beta.chat.completions.parse().

Modification

Approach: Replace Guided Decoding with Harmony-native structured output.

  1. Detect GPT-OSS architecture with active response_format
  2. Inject JSON schema into system message under # Response Formats section
  3. Disable Guided Decoding by clearing response_format
  4. Create system message automatically if none exists

closes: #4347

Copilot AI review requested due to automatic review settings March 2, 2026 06:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes structured output for GPT-OSS models by avoiding Guided Decoding (which conflicts with Harmony response parsing) and instead injecting the requested response schema into the prompt using Harmony’s native # Response Formats section.

Changes:

  • Detect GPT-OSS (arch == 'GptOssForCausalLM') requests with non-text response_format.
  • Inject the serialized response_format schema into the system message under # Response Formats (creating a system message if missing).
  • Disable guided decoding for GPT-OSS by clearing the local response_format passed into GenerationConfig.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… Harmony/JSON mode conflict for GPT-OSS

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jingyibo123
Copy link
Contributor

It's been a while since I compiled from source ,does CUDA 12.1 + GCC 9.4 work?

@windreamer
Copy link
Collaborator Author

It's been a while since I compiled from source ,does CUDA 12.1 + GCC 9.4 work?

No need to recompile it, you can just patch the python part.

@jingyibo123
Copy link
Contributor

jingyibo123 commented Mar 3, 2026

After patching this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] GPT-OSS-120B + openai-python empty result from client.beta.chat.completions.parse with response_format

3 participants