Skip to content

feat: record model thoughts#203

Merged
dannykopping merged 14 commits intomainfrom
dk/model-thoughts
Mar 16, 2026
Merged

feat: record model thoughts#203
dannykopping merged 14 commits intomainfrom
dk/model-thoughts

Conversation

@dannykopping
Copy link
Copy Markdown
Collaborator

@dannykopping dannykopping commented Mar 5, 2026

Required for coder/coder#22676
Closes #168

This PR adds recording of model "thoughts" (sometimes call "reasoning"). This is only available for /v1/messages and /v1/responses.

/v1/messages

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"The"}              }

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" user is asking me to get"}       }

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" stock prices for Apple and Google and compare"}           }

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" their"}               }

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" performance over the last week."}        }

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"\n\nLet me search"}    }

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" for both"}      }

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" stock prices in"}    }

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":" parallel"} }

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"thinking_delta","thinking":"."}       }

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"signature_delta","signature":"..."}      }

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: content_block_start
data: {"type":"content_block_start","index":1,"content_block":{"type":"tool_use","id":"toolu_01KLHTi1EnM4RxxG9VfFBwDt","name":"WebSearch","input":{},"caller":{"type":"direct"}}     }

...

event: content_block_start
data: {"type":"content_block_start","index":2,"content_block":{"type":"tool_use","id":"toolu_01T9we4nEzyv7WVoiazyK8VE","name":"WebSearch","input":{},"caller":{"type":"direct"}}          }

/v1/responses

{
    "type": "response.completed",
    "response": {
        ...
        "output": [
            {
                "id": "rs_0a139fe840c9c4640169b296e3eb088195886c12786026df1a",
                "type": "reasoning",
                "encrypted_content": "...",
                "summary": []
            },
            {
                "id": "msg_0a139fe840c9c4640169b296ebad1081958b3077bf7ae243ee",
                "type": "message",
                "status": "completed",
                "content": [
                    {
                        "type": "output_text",
                        "annotations": [],
                        "logprobs": [],
                        "text": "I’m reading both paths directly so I can return their contents or tell you if either one is missing or not a regular file."
                    }
                ],
                "phase": "commentary",
                "role": "assistant"
            },
            {
                "id": "fc_0a139fe840c9c4640169b296ed6ebc8195b5cc1641dac78918",
                "type": "function_call",
                "status": "completed",
                "arguments": "{\"cmd\":\"sed -n '1,250p' /tmp/foo\",\"yield_time_ms\":1000,\"max_output_tokens\":6000,\"workdir\":\"/home/coder/coder\"}",
                "call_id": "call_JmiKwclDrwqU1w2Ojk0E9wQB",
                "name": "exec_command"
            },
            {
                "id": "fc_0a139fe840c9c4640169b296ed6ecc819586447dbe18e23fff",
                "type": "function_call",
                "status": "completed",
                "arguments": "{\"cmd\":\"sed -n '1,250p' /home/bar\",\"yield_time_ms\":1000,\"max_output_tokens\":6000,\"workdir\":\"/home/coder/coder\"}",
                "call_id": "call_juUonJYUHxtCJb6ckr9tNTv5",
                "name": "exec_command"
            }
        ],
        "parallel_tool_calls": true,
        ...
    }
}

Copy link
Copy Markdown
Collaborator Author

dannykopping commented Mar 5, 2026

@dannykopping dannykopping changed the base branch from dk/session-id-tracking to graphite-base/203 March 6, 2026 11:52
@graphite-app graphite-app bot changed the base branch from graphite-base/203 to main March 6, 2026 11:53
Comment thread intercept/responses/base.go Outdated
@dannykopping dannykopping marked this pull request as ready for review March 6, 2026 14:30
Comment thread recorder/types.go Outdated
InvocationError error
Metadata Metadata
CreatedAt time.Time
ModelThoughts []*ModelThoughtRecord
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much space could ModelThoughts take?
Maybe we should consider configuring recording things not only due to compliance reasons but also to limit space usage.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It all depends on how many thinking tokens are given to the model, and even then we're only capturing a summary, so I suspect it won't be much. In any case, having this information will help provide insight into why agents take the course of action they do, which may end up being valuable for an audit scenario.

In any case, disk is cheap.

We can later add a flag to not store these conditionally.

Comment thread intercept/messages/blocking.go Outdated
Comment thread intercept/messages/blocking.go Outdated
Comment thread intercept/responses/base.go
Comment thread intercept/messages/blocking.go Outdated
@pawbana
Copy link
Copy Markdown
Contributor

pawbana commented Mar 9, 2026

Maybe I'm missing something but is there a reason thinking blocks are merged into RecordToolUsage and recorded only on tool call? (inner loop abstraction?)
I understand they are usually connected but I'd think there could be some reasoning without any tool call or provider could call tool directly.
Maybe thinking blocks should be part of RecordInterceptionEnded?

Comment thread fixtures/anthropic/simple.txtar
Comment thread fixtures/fixtures.go
Comment thread intercept/messages/blocking.go Outdated
Comment thread intercept/messages/streaming.go Outdated
case anthropic.ThinkingBlock:
thoughtRecords = append(thoughtRecords, &recorder.ModelThoughtRecord{
Content: variant.Thinking,
CreatedAt: time.Now(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK, with streaming, the code waits until we get a stop block and processes all thinking blocks at that point, meaning they'll all have the same CreatedAt. I assume the ordering is still preserved by their position in the slice, so this probably doesn't matter, but worth noting that CreatedAt won't reflect when each block actually arrived. Could this be an issue?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually true of tool usages as well which are recorded at this stage, as well.

Postgres is microsecond-precise, so some records may be persisted with the same timestamp.

The timestamp for thoughts doesn't really matter, though; as long as it's associated to a tool call it'll be displayed correctly (i.e. before the tool call).

@dannykopping dannykopping changed the base branch from main to graphite-base/203 March 11, 2026 08:27
@dannykopping dannykopping changed the base branch from graphite-base/203 to dk/parallel-tool-calls March 11, 2026 08:27
@dannykopping dannykopping marked this pull request as draft March 11, 2026 08:28
@dannykopping dannykopping force-pushed the dk/parallel-tool-calls branch from a0e150e to b2e4f03 Compare March 11, 2026 12:15
@dannykopping dannykopping force-pushed the dk/model-thoughts branch 2 times, most recently from 8a13f42 to a492e77 Compare March 11, 2026 12:27
@dannykopping dannykopping force-pushed the dk/parallel-tool-calls branch from b2e4f03 to 732f3d2 Compare March 11, 2026 12:27
@dannykopping dannykopping force-pushed the dk/model-thoughts branch 3 times, most recently from 158d8b0 to 25b5478 Compare March 12, 2026 12:18
@dannykopping dannykopping marked this pull request as ready for review March 12, 2026 13:44
@dannykopping dannykopping changed the base branch from dk/parallel-tool-calls to graphite-base/203 March 12, 2026 13:48
@graphite-app graphite-app bot changed the base branch from graphite-base/203 to main March 12, 2026 13:49
Comment thread fixtures/anthropic/simple.txtar
Comment thread intercept/messages/blocking.go Outdated
Comment thread intercept/messages/base.go

// Handle tool calls for non-streaming.
// Capture any thinking blocks that were returned.
for _, t := range i.extractModelThoughts(resp) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: in responses there is separate base.recordModelThoughts method. I like this approach as it helps to keep ProcessRequest function in blocking.go and streaming.go shorter / "higher level" making it easier to understand and reason about. Would be nice to be consistent between interceptors (either both have this base method or not).

Comment thread internal/integrationtest/bridge_test.go Outdated
// We can't guarantee the order of model thoughts since they're recorded separately, so
// we have to scan all thoughts for a match.

for _, expected := range tc.expectedThoughts {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be part of mockRecorder, eg. something like verifyThoughts method. Would remove copy paste from TestResponsesModelThoughts. It could ignore InterceptionID and CreatedAt fields then tests could use []recorder.ModelThoughtRecord as expectedThoughts field.

I believe mockRecorder stores thoughts in slice so they should be stored in order they where added which should be deterministic? I'm ok with not checking exact order but I think comment is wrong?

nit: when comparing slices without caring about order I find sorting slices then comparing equality helps to keep code more concise.

Comment thread recorder/recorder.go
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
@dannykopping dannykopping merged commit 5c071a7 into main Mar 16, 2026
5 checks passed
dannykopping added a commit to coder/coder that referenced this pull request Mar 17, 2026
Depends on coder/aibridge#203
Closes coder/internal#1337

---------

Signed-off-by: Danny Kopping <danny@coder.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Record model thinking/reasoning

3 participants