Fix Structured Output for GPT-OSS Models by windreamer · Pull Request #4386 · InternLM/lmdeploy

windreamer · 2026-03-02T06:23:17Z

Motivation

GPT-OSS models use Harmony Response format, which conflicts with Guided Decoding (token-level JSON constraint) when response_format is specified. This causes:

Harmony parse errors
Request hangs
Empty message.parsed results

Breaking existing OpenAI SDK clients using client.beta.chat.completions.parse().

Modification

Approach: Replace Guided Decoding with Harmony-native structured output.

Detect GPT-OSS architecture with active response_format
Inject JSON schema into system message under # Response Formats section
Disable Guided Decoding by clearing response_format
Create system message automatically if none exists

closes: #4347

Copilot

Pull request overview

This PR fixes structured output for GPT-OSS models by avoiding Guided Decoding (which conflicts with Harmony response parsing) and instead injecting the requested response schema into the prompt using Harmony’s native # Response Formats section.

Changes:

Detect GPT-OSS (arch == 'GptOssForCausalLM') requests with non-text response_format.
Inject the serialized response_format schema into the system message under # Response Formats (creating a system message if missing).
Disable guided decoding for GPT-OSS by clearing the local response_format passed into GenerationConfig.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lmdeploy/serve/openai/api_server.py

… Harmony/JSON mode conflict for GPT-OSS Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lmdeploy/serve/openai/api_server.py

jingyibo123 · 2026-03-02T11:48:07Z

It's been a while since I compiled from source ,does CUDA 12.1 + GCC 9.4 work?

windreamer · 2026-03-02T23:20:01Z

It's been a while since I compiled from source ,does CUDA 12.1 + GCC 9.4 work?

No need to recompile it, you can just patch the python part.

jingyibo123 · 2026-03-03T02:32:10Z

After patching this

Copilot AI review requested due to automatic review settings March 2, 2026 06:23

Copilot started reviewing on behalf of windreamer March 2, 2026 06:23 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

lmdeploy/serve/openai/api_server.py Outdated Show resolved Hide resolved

lmdeploy/serve/openai/api_server.py Outdated Show resolved Hide resolved

lmdeploy/serve/openai/api_server.py Show resolved Hide resolved

lmdeploy/serve/openai/api_server.py Show resolved Hide resolved

fix: convert guided decoding schema into Harmony-native mode to avoid…

d3f847a

… Harmony/JSON mode conflict for GPT-OSS Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

windreamer force-pushed the fix_gpt_oss_guided_decoding branch from f51f924 to d3f847a Compare March 2, 2026 06:46

windreamer requested a review from Copilot March 2, 2026 06:47

Copilot started reviewing on behalf of windreamer March 2, 2026 06:47 View session

Copilot AI reviewed Mar 2, 2026

View reviewed changes

lmdeploy/serve/openai/api_server.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Structured Output for GPT-OSS Models#4386

Fix Structured Output for GPT-OSS Models#4386
windreamer wants to merge 1 commit intoInternLM:mainfrom
windreamer:fix_gpt_oss_guided_decoding

windreamer commented Mar 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

jingyibo123 commented Mar 2, 2026

Uh oh!

windreamer commented Mar 2, 2026

Uh oh!

jingyibo123 commented Mar 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

windreamer commented Mar 2, 2026

Motivation

Modification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

jingyibo123 commented Mar 2, 2026

Uh oh!

windreamer commented Mar 2, 2026

Uh oh!

jingyibo123 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jingyibo123 commented Mar 3, 2026 •

edited

Loading