sampling : different parameters during thinking/reasoning #13225

ggerganov · 2025-05-01T06:29:41Z

ggerganov
May 1, 2025
Maintainer

In the context of the new Qwen 3 models, I'm wondering if there could be an elegant way to support different sampling parameters while the model is thinking. The intent is to have non-greedy sampling (for example, the recommended high-temperature, top-k, etc. parameters) while thinking and when the thinking is over to switch to greedy sampling (or more generally, to some other set of sampling parameters). I think this should produce the optimal quality since we want to allow the model to explore more random ideas during the thinking, but when it starts generating the final answer, we want it to be precise.

Seems like supporting this would be too much extra logic, both in the UI and the server implementation, to be worth it. But in case you have some thoughts, please share.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sampling : different parameters during thinking/reasoning #13225

{{title}}

Replies: 0 comments

Select a reply

sampling : different parameters during thinking/reasoning #13225

ggerganov May 1, 2025 Maintainer

Replies: 0 comments

ggerganov
May 1, 2025
Maintainer