Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reasoning in the UI #4559

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

FallDownTheSystem
Copy link
Contributor

@FallDownTheSystem FallDownTheSystem commented Mar 9, 2025

Description

This PR adds support for both showing thinking tokens in the chat as well as controlling the reasoning effort for supported models. I'm opening this PR to get feedback from the Continue team, I'm sure there are some design decisions here that you may disagree with and want to change. I'm also okay with the Continue team taking over this branch and developing on top of it.

I'm adding comments in the PR to explain some of the changes.

  • Added UI settings to set reasoning effort / token budget
  • Added UI to show reasoning tokens from Anthropic Claude 3.7 Sonnet and DeepSeek R1
    • Supports tool use with thinking and redacted_thinking message types
  • Improved UI scaling on smaller sizes to better fit the new Thinking button
  • Added support for requestOptions headers to be passed to the Anthropic provider, so that "anthropic-beta": "output-128k-2025-02-19" can 128k maxOutput can be enabled.

Checklist

  • The relevant docs, if any, have been updated or created
    • Updated yaml/json references for new config options.
  • The relevant tests, if any, have been updated or created
    • Couldn't get tests to run on Windows 11 machine at all, regardless if it was this or the main branch. But the tests on the PR passed.
    • If there are new tests that should be created, let me know.

Screenshots

This shows most of the changes.

Recording.2025-03-09.135230.mp4

Testing instructions

Test the following models:

  • "provider": "deepseek", "model": "deepseek-reasoner"
  • "provider": "openai", "model": "o3-mini"
  • "provider": "openai", "model": "o1"
  • "provider": "anthropic", "model": "claude-3-7-sonnet-20250219"
  • A non-thinking model like "provider": "openai", "model": "gpt-4o"

completionOptions should have schema'd definitions for reasoning_effort for the OpenAI models and thinking for Anthropic

"thinking": {
    "type": "enabled",
    "budget_tokens": 4096
}

Tests:

  • The Thinking button is not disabled for these models.
  • The thinking output is shown for Anthropic Claude 3.7 Sonnet and DeepSeek Reasoner
  • Thinking can only be toggled off for Sonnet 3.7
  • Sonnet 3.7 works along with tool use for both thinking and redacted thinking, in multi-turn conversations.
    • For redacted thinking, prompt the API with this magic string ANTHROPIC_MAGIC_STRING_TRIGGER_REDACTED_THINKING_46C9A13E193C177646C7398A98432ECCCE4C1253D5E2D82641AC0E52CC2876CB
  • Test that the thinking and redacted_thinking works even if stream is set to false for Sonnet 3.7. (Tool use is not supported in Continue when not streaming afaik)
  • Thinking options popover shows up and shows the correct settings for Anthropic and OpenAI
  • The UI scales nicely even at the smallest view sizes
    • This incles Tool use popover, which now stacks the options when below xs breakpoint
  • Test that non-thinking models still work as expected, specifically in the UI considering the numerous changes made to sessionSlice
    • Because of the new Message API types, there were some type annotations added to FreeTrial, Gemini, and WatsonX core/llm's, so those should be tested
  • DeepSeek reasoner no longer uses promptTemplates because that limits the completions to string only, meaning both content and reasoning_content couldn't be passed down to the UI.
  • Test that the Reasoning tokens are still shown even for non-thinking models if you ask the AI to put some text inside think tags like:
<think>Put something here</think>
Rest of the message here
  • Test that json.config autocompletes the completion options correctly based on the schema definitions.

Copy link

netlify bot commented Mar 9, 2025

Deploy Preview for continuedev ready!

Name Link
🔨 Latest commit 09c32b7
🔍 Latest deploy log https://app.netlify.com/sites/continuedev/deploys/67ce97aef1cbd600086c1660
😎 Deploy Preview https://deploy-preview-4559--continuedev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@@ -174,7 +174,8 @@ function autodetectTemplateType(model: string): TemplateType | undefined {
lower.includes("pplx") ||
lower.includes("gemini") ||
lower.includes("grok") ||
lower.includes("moonshot")
lower.includes("moonshot") ||
lower.includes("deepseek-reasoner")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to avoid deepseek-reasoner using _streamComplete in core/llm/index.ts so that the ChatMessage with content and reasoning_content are both preserved.

@@ -373,11 +374,45 @@ function autodetectPromptTemplates(
return templates;
}

const PROVIDER_SUPPORTS_THINKING: string[] = ["anthropic", "openai", "deepseek"];

const MODEL_SUPPORTS_THINKING: string[] = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think support for other proxy providers like OpenRouter could be added as well. I haven't looked into it.

title: string | undefined,
capabilities: ModelCapability | undefined,
): boolean {
if (capabilities?.thinking !== undefined) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the capabilities: thinking is necessary. Thinking support needs to be hardcoded in some places anyway, so I don't know if there's a reasonable way to try to force it to be enabled.

return (await encoding.encode(part.thinking ?? "")).length;
} else if (part.type === "redacted_thinking") {
// For redacted thinking, don't count any tokens
return 0;
Copy link
Contributor Author

@FallDownTheSystem FallDownTheSystem Mar 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"All extended thinking tokens (including redacted thinking tokens) are billed as output tokens and count toward your rate limits."

But they would have to be counted from the API's response:

"usage": {
    "input_tokens": 2095,
    "output_tokens": 503
}

@@ -124,7 +125,7 @@ class FreeTrial extends BaseLLM {
}
return {
type: "text",
text: part.text,
text: (part as TextMessagePart).text,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new Message API has thinking and redacted_thinking types now, so wherever the types were causing errors, I just assumed they'd be TextMessageParts as they've previously been.

import { ChatMessage, CompletionOptions, TextMessagePart } from "..";

// Extend OpenAI API types to support DeepSeek reasoning_content field
interface DeepSeekDelta {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The types for OpenAI's messages are imported from an external library, so to support DeepSeek's reasoning_content I needed to create those interfaces elsewhere. I'm not sure if this is the best place for them, but it works.

@@ -17,12 +17,19 @@ export function stripImages(messageContent: MessageContent): string {
.join("\n");
}

export function stripThinking(content: string): string {
Copy link
Contributor Author

@FallDownTheSystem FallDownTheSystem Mar 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think tags are handled differently now. They're always included in the message, but stripped from the UI

@@ -36,6 +36,8 @@ Each model has specific configuration options tailored to its provider and funct
- `engine`: Engine for Azure OpenAI requests.
- `capabilities`: Override auto-detected capabilities:
- `uploadImage`: Boolean indicating if the model supports image uploads.
- `tools`: Boolean indicating if the model supports tool use.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tools was missing so I added that along with the new thinking capability

@@ -59,6 +61,19 @@ Example:
"title": "GPT-4o",
"provider": "openai",
"apiKey": "<YOUR_OPENAI_API_KEY>"
},
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new example showcasing a model with thinking capabilities

@@ -284,6 +285,8 @@ const Layout = () => {
/>

<GridDiv className="">
{/* Initialize model-specific settings when model changes */}
<ModelSettingsInitializer />
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of a hack. I needed the UI to fetch the reasoning_effort and thinking options on the initial load, so that they get set in the UI based on the user's config, but so that the user could still change them without changing the config. AI generated this and put them here. It worked but it might be a silly place to do something like this.


<StyledMarkdownPreview
isRenderingInStepContainer
source={stripImages(props.item.message.content)}
source={renderChatMessage(props.item.message)}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renderChatMessage calls stripImages but now also strips think tags.

@@ -106,123 +119,123 @@ function InputToolbar(props: InputToolbarProps) {
<StyledDiv
isHidden={props.hidden}
onClick={props.onClick}
className="find-widget-skip flex"
className="find-widget-skip flex flex-col"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Model selection / enter button are now on their own row, to make room for the rest of the buttons.

<ToggleThinkingButton disabled={!thinkingSupported} />
</div>
<div className="-mb-1 flex w-full items-center gap-2 whitespace-nowrap">
<ModelSelect />
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model select now takes the full width of the remaining space in the container, meaning that there's no need to try to set a reasonable max width.

return (
<Transition
show={show}
enter="transition duration-100 ease-out"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a copy of the PopoverTransition component. I made the popovers on thinking and tool use toggle buttons be relative to the Chat box instead of the buttons, so that they can be easily positioned and can fit better on small view sizes, but the scale transform can't calculate the position while scaling, causing them to jump, so I removed the scaling animation from those buttons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically a copy of ToggleToolsButton


// Get provider from default model
const provider = defaultModel?.provider || "";
const hasThinkingOptions = provider !== "deepseek";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some provider/model specific logic in here that could probably be placed somewhere else. Basically some models can't be toggled off, whilst some models don't have configuration options, so certain elements/interactions are disabled based on the provider/model.

@@ -83,93 +83,93 @@ export default function ToolDropdown(props: ToolDropdownProps) {
{useTools && !isDisabled && (
<>
<span className="hidden align-top sm:flex">Tools</span>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly formatting changes seemingly from Prettier. There were a few changes like the hover events are now bound on the parent container and not the icon/content, before you could hover the edges and not see the background change.

<StyledListboxButton
data-testid="model-select-button"
ref={buttonRef}
className="h-[18px] overflow-hidden"
style={{ padding: 0 }}
onClick={calculatePosition}
>
<div className="flex max-w-[33vw] items-center gap-0.5 text-gray-400 transition-colors duration-200">
<span className="truncate">
<div className="flex w-fit min-w-0 items-center gap-0.5 text-gray-400 transition-colors duration-200">
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes allow the model select to scale to the parent container

@@ -433,54 +433,60 @@ export function Chat() {
contextItems={item.contextItems}
toolCallId={item.message.toolCallId}
/>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes sure that if the API returns a message along with tool use, that both are shown.

for (const message of action.payload) {
const lastItem = state.history[state.history.length - 1];
const lastMessage = lastItem.message;
// Simplified condition to keep thinking blocks and tool calls together in the same message
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the major change. Session slice now should handle all Message API types, and keep collecting different parts to the same Assistant message, so that thinking and tool use work together properly, since Anthropic requires that you send back the thinking message along with tool use.

if (messageContent.includes("<think>")) {
// Check if the message content is an array with parts
if (
Array.isArray(message.content) &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is basically handling the content as parts, aka the Messages API

}

// For other content types, use renderChatMessage
const messageContent = renderChatMessage(message);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part handles what's more typical of OpenAI / DeepSeeks APIs

const fullContent = lastMessage.content as string;

// If we find <think> tags, extract the content for the reasoning field
if (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lastly we handle think tags

};

// Handle the special case for anthropic-beta
this.setBetaHeaders(headers, shouldCacheSystemMessage);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change allows the config to add headers to the request and intelligently merge them with beta headers that continue adds for caching.

@FallDownTheSystem
Copy link
Contributor Author

Resolves #4339

Add UI settings to set reasoning effort / token budget
Add UI to show reasoning tokens from Anthropic Claude 3.7 Sonnet and DeepSeek R1
Fix thinking icon color not switching back to gray
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant