Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# ellmer (development version)

* `params()` gains new `reasoning_effort` and `reasoning_tokens` so you can control the amount of effort a model spends on thinking. Initial support is provided for `chat_claude()`, `chat_google_gemini()`, and `chat_openai()` (#720).
* `chat_anthropic()` gains new `cache` parameter to control caching. By default it is set to "5m". This should (on average) reduce the cost of your chats.(#584)
* `chat_openai_responses()` gains a `service_tier` argument (#712).
* `Chat$get_tokens()` now also returns the cost, and returns one row for each assistant turn, better representing the underlying data received from LLM APIs. Similarly, the `print()` method now reports costs on each assistant turn, rather than trying to parse out individual costs.
Expand Down
10 changes: 10 additions & 0 deletions R/params.R
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@
#' @param max_tokens Maximum number of tokens to generate.
#' @param log_probs Include the log probabilities in the output?
#' @param stop_sequences A character vector of tokens to stop generation on.
#' @param reasoning_effort,reasoning_tokens How much effort to spend thinking?
#' `ressoning_effort` is a string, like "low", "medium", "high".
#' `reasoning_tokens` is an integer, giving a maximum token budget.
#' Each provider only takes one of these two parameters.
#' @param ... Additional named parameters to send to the provider.
#' @export
params <- function(
Expand All @@ -30,6 +34,8 @@ params <- function(
max_tokens = NULL,
log_probs = NULL,
stop_sequences = NULL,
reasoning_effort = NULL,
reasoning_tokens = NULL,
...
) {
check_number_decimal(temperature, allow_null = TRUE, min = 0)
Expand All @@ -41,6 +47,8 @@ params <- function(
check_number_whole(max_tokens, allow_null = TRUE, min = 1)
check_bool(log_probs, allow_null = TRUE)
check_character(stop_sequences, allow_null = TRUE)
check_string(reasoning_effort, allow_null = TRUE)
check_number_whole(reasoning_tokens, min = 0, allow_null = TRUE)

compact(list2(
temperature = temperature,
Expand All @@ -52,6 +60,8 @@ params <- function(
max_tokens = max_tokens,
log_probs = log_probs,
stop_sequences = stop_sequences,
reasoning_effort = reasoning_effort,
reasoning_tokens = reasoning_tokens,
extra_args = list2(...)
))
}
Expand Down
15 changes: 13 additions & 2 deletions R/provider-anthropic.R
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,15 @@ method(chat_body, ProviderAnthropic) <- function(
tools <- as_json(provider, unname(tools))

params <- chat_params(provider, provider@params)
if (has_name(params, "budget_tokens")) {
thinking <- list(
type = "enabled",
budget_tokens = params$budget_tokens
)
params$budget_tokens <- NULL
} else {
thinking <- NULL
}

compact(list2(
model = provider@model,
Expand All @@ -222,6 +231,7 @@ method(chat_body, ProviderAnthropic) <- function(
stream = stream,
tools = tools,
tool_choice = tool_choice,
thinking = thinking,
!!!params
))
}
Expand All @@ -234,7 +244,8 @@ method(chat_params, ProviderAnthropic) <- function(provider, params) {
top_p = "top_p",
top_k = "top_k",
max_tokens = "max_tokens",
stop_sequences = "stop_sequences"
stop_sequences = "stop_sequences",
budget_tokens = "reasoning_tokens"
)
)

Expand Down Expand Up @@ -262,7 +273,7 @@ method(stream_parse, ProviderAnthropic) <- function(provider, event) {
}
method(stream_text, ProviderAnthropic) <- function(provider, event) {
if (event$type == "content_block_delta") {
event$delta$text
event$delta$text %||% event$delta$thinking
}
}
method(stream_merge_chunks, ProviderAnthropic) <- function(
Expand Down
15 changes: 12 additions & 3 deletions R/provider-google.R
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,14 @@ method(chat_body, ProviderGoogleGemini) <- function(
generation_config$response_schema <- as_json(provider, type)
}

if (has_name(generation_config, "thinkingBudget")) {
generation_config$thinkingConfig <- list(
thinkingBudget = generation_config$thinkingBudget,
includeThoughts = TRUE
)
generation_config$thinkingBudget <- NULL
}

contents <- as_json(provider, turns)

# https://ai.google.dev/api/caching#Tool
Expand Down Expand Up @@ -240,7 +248,8 @@ method(chat_params, ProviderGoogleGemini) <- function(provider, params) {
seed = "seed",
maxOutputTokens = "max_tokens",
responseLogprobs = "log_probs",
stopSequences = "stop_sequences"
stopSequences = "stop_sequences",
thinkingBudget = "reasoning_tokens"
)
)
}
Expand Down Expand Up @@ -274,8 +283,8 @@ method(value_tokens, ProviderGoogleGemini) <- function(provider, json) {
cached <- usage$cachedContentTokenCount %||% 0

tokens(
input = (usage$promptTokenCount %||% 0) - cached,
output = usage$candidatesTokenCount,
input = (usage$promptTokenCount %||% 0) + -cached,
output = usage$candidatesTokenCount + (usage$thoughtsTokenCount %||% 0),
cached_input = cached
)
}
Expand Down
34 changes: 21 additions & 13 deletions R/provider-openai-responses.R
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,16 @@ method(chat_body, ProviderOpenAIResponses) <- function(
# https://platform.openai.com/docs/api-reference/responses/create#responses-create-include
params <- chat_params(provider, provider@params)

if (has_name(params, "reasoning_effort")) {
reasoning <- list(
effort = params$reasoning_effort,
summary = "auto"
)
params$reasoning_effort <- NULL
} else {
reasoning <- NULL
}

include <- c(
if (isTRUE(params$log_probs)) "message.output_text.logprobs",
if (is_openai_reasoning(provider@model)) "reasoning.encrypted_content"
Expand All @@ -140,6 +150,7 @@ method(chat_body, ProviderOpenAIResponses) <- function(
stream = stream,
tools = tools,
text = text,
reasoning = reasoning,
store = FALSE,
service_tier = provider@service_tier
))
Expand All @@ -155,17 +166,24 @@ method(chat_params, ProviderOpenAIResponses) <- function(provider, params) {
frequency_penalty = "frequency_penalty",
max_tokens = "max_output_tokens",
log_probs = "log_probs",
top_logprobs = "top_k"
top_logprobs = "top_k",
reasoning_effort = "reasoning_effort"
)
)
}

# OpenAI -> ellmer --------------------------------------------------------------

method(stream_text, ProviderOpenAIResponses) <- function(provider, event) {
# https://platform.openai.com/docs/api-reference/responses-streaming/response/output_text/delta
if (event$type == "response.output_text.delta") {
# https://platform.openai.com/docs/api-reference/responses-streaming/response/output_text/delta
event$delta
} else if (event$type == "response.reasoning_summary_text.delta") {
# https://platform.openai.com/docs/api-reference/responses-streaming/response/reasoning_summary_text/delta
event$delta
} else if (event$type == "response.reasoning_summary_text.done") {
# https://platform.openai.com/docs/api-reference/responses-streaming/response/reasoning_summary_text/done
"\n\n"
Comment on lines +179 to +186
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this section reminded me that it would be nice if this generic returned Content objects, too. In this case it could return either ContentText() or ContentThinking() objects. This would let us both display streamed thinking tokens differently in the console and handle streaming thinking tokens in shinychat with an entirely different treatment than regular text.

I also recognize that this would probably be best done as a follow up PR rather than here.

(At one point I had thought we'd return ContentDelta* objects here, e.g. ContentDeltaText() or ContentDeltaThinking(), but now that we're here I think it's fine to use the existing types.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. Issue at #828

}
}
method(stream_merge_chunks, ProviderOpenAIResponses) <- function(
Expand Down Expand Up @@ -214,17 +232,7 @@ method(value_turn, ProviderOpenAIResponses) <- function(
arguments <- jsonlite::parse_json(output$arguments)
ContentToolRequest(output$id, output$name, arguments)
} else if (output$type == "reasoning") {
# {
# id: str,
# summary: str,
# type: "reasoning",
# content: [
# { text: str, type: "reasoning_text" }
# ],
# encrypted_content: str,
# status: "in_progress" | "completed" | "incomplete"
# }
thinking <- paste0(map_chr(output$content, "[[", "text"), collapse = "")
thinking <- paste0(map_chr(output$summary, "[[", "text"), collapse = "")
ContentThinking(thinking = thinking, extra = output)
} else if (output$type == "image_generation_call") {
mime_type <- switch(
Expand Down
7 changes: 7 additions & 0 deletions man/params.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.