Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
8682f8d to
56ff5d0
Compare
1d5e6ac to
e559e5e
Compare
56ff5d0 to
11d859c
Compare
11d859c to
855e3a7
Compare
d09d5b0 to
a0ad392
Compare
| InvocationError error | ||
| Metadata Metadata | ||
| CreatedAt time.Time | ||
| ModelThoughts []*ModelThoughtRecord |
There was a problem hiding this comment.
How much space could ModelThoughts take?
Maybe we should consider configuring recording things not only due to compliance reasons but also to limit space usage.
There was a problem hiding this comment.
It all depends on how many thinking tokens are given to the model, and even then we're only capturing a summary, so I suspect it won't be much. In any case, having this information will help provide insight into why agents take the course of action they do, which may end up being valuable for an audit scenario.
In any case, disk is cheap.
We can later add a flag to not store these conditionally.
|
Maybe I'm missing something but is there a reason thinking blocks are merged into |
| case anthropic.ThinkingBlock: | ||
| thoughtRecords = append(thoughtRecords, &recorder.ModelThoughtRecord{ | ||
| Content: variant.Thinking, | ||
| CreatedAt: time.Now(), |
There was a problem hiding this comment.
AFAIK, with streaming, the code waits until we get a stop block and processes all thinking blocks at that point, meaning they'll all have the same CreatedAt. I assume the ordering is still preserved by their position in the slice, so this probably doesn't matter, but worth noting that CreatedAt won't reflect when each block actually arrived. Could this be an issue?
There was a problem hiding this comment.
This is actually true of tool usages as well which are recorded at this stage, as well.
Postgres is microsecond-precise, so some records may be persisted with the same timestamp.
The timestamp for thoughts doesn't really matter, though; as long as it's associated to a tool call it'll be displayed correctly (i.e. before the tool call).
a0ad392 to
c2f9c79
Compare
a0e150e to
b2e4f03
Compare
8a13f42 to
a492e77
Compare
b2e4f03 to
732f3d2
Compare
158d8b0 to
25b5478
Compare
0773ea6 to
8db8a1c
Compare
8db8a1c to
7a0d206
Compare
3466ab6 to
4fd7ded
Compare
7a0d206 to
87044c1
Compare
87044c1 to
0c56c33
Compare
|
|
||
| // Handle tool calls for non-streaming. | ||
| // Capture any thinking blocks that were returned. | ||
| for _, t := range i.extractModelThoughts(resp) { |
There was a problem hiding this comment.
nit: in responses there is separate base.recordModelThoughts method. I like this approach as it helps to keep ProcessRequest function in blocking.go and streaming.go shorter / "higher level" making it easier to understand and reason about. Would be nice to be consistent between interceptors (either both have this base method or not).
| // We can't guarantee the order of model thoughts since they're recorded separately, so | ||
| // we have to scan all thoughts for a match. | ||
|
|
||
| for _, expected := range tc.expectedThoughts { |
There was a problem hiding this comment.
This could be part of mockRecorder, eg. something like verifyThoughts method. Would remove copy paste from TestResponsesModelThoughts. It could ignore InterceptionID and CreatedAt fields then tests could use []recorder.ModelThoughtRecord as expectedThoughts field.
I believe mockRecorder stores thoughts in slice so they should be stored in order they where added which should be deterministic? I'm ok with not checking exact order but I think comment is wrong?
nit: when comparing slices without caring about order I find sorting slices then comparing equality helps to keep code more concise.
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
0c56c33 to
33cec2a
Compare
Signed-off-by: Danny Kopping <danny@coder.com>
Depends on coder/aibridge#203 Closes coder/internal#1337 --------- Signed-off-by: Danny Kopping <danny@coder.com>

Required for coder/coder#22676
Closes #168
This PR adds recording of model "thoughts" (sometimes call "reasoning"). This is only available for
/v1/messagesand/v1/responses./v1/messages/v1/responses