-
Notifications
You must be signed in to change notification settings - Fork 572
feat(instrumentation-aws-sdk): add gen ai conventions for converse stream span #2769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,6 +29,10 @@ import { | |
export interface RequestMetadata { | ||
// isIncoming - if true, then the operation callback / promise should be bind with the operation's span | ||
isIncoming: boolean; | ||
// isStream - if true, then the response is a stream so the span should not be ended by the middleware. | ||
// the ServiceExtension must end the span itself, generally by wrapping the stream and ending after it is | ||
// consumed. | ||
isStream?: boolean; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you know if the GenAI SIG discussed/documented wanting this behaviour of ending the span after the full stream is consumed? I've seen opinions vary when discussing HTTP streaming. See the guidance for HTTP client span duration here: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/http/http-spans.md#http-client-span-duration Is there any equivalent in the Python GenAI-related instrumentations? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the link, that's interesting indeed in terms of response streaming. There isn't any guideline on gen ai spans, however I feel it's sort of implied by the conventions for token usage - there really isn't a way to populate them without keeping the span until the end of the stream. While the duration could keep streaming out of it, that would then need to override the end time of that span with the earlier timestamp, which I don't think the JS SDK supports. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Correct, it does not support that. The (documented) behaviour then would be that the Span duration would be just up until the first response from the server, effectively TTFB. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah do you mean that we should use TTFB here? That would mean we couldn't record usage information though. BTW, I realized that this might be closer to RPC than HTTP which has some specification for streaming https://opentelemetry.io/docs/specs/semconv/rpc/rpc-spans/#message-event
I think this also implies the overall span is for the whole stream. FWIW python keeps the span for the entire stream too. |
||
spanAttributes?: SpanAttributes; | ||
spanKind?: SpanKind; | ||
spanName?: string; | ||
|
@@ -45,10 +49,11 @@ export interface ServiceExtension { | |
// called before request is sent, and after span is started | ||
requestPostSpanHook?: (request: NormalizedRequest) => void; | ||
|
||
// called after response is received. If value is returned, it replaces the response output. | ||
responseHook?: ( | ||
response: NormalizedResponse, | ||
span: Span, | ||
tracer: Tracer, | ||
config: AwsSdkInstrumentationConfig | ||
) => void; | ||
) => any | undefined; | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,6 +34,10 @@ import { | |
NormalizedRequest, | ||
NormalizedResponse, | ||
} from '../types'; | ||
import type { | ||
ConverseStreamOutput, | ||
TokenUsage, | ||
} from '@aws-sdk/client-bedrock-runtime'; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This adds a runtime dependency on the library being instrumented. In general, I don't think we can have that, because it adds coupling between the instrumentation (which is meant to work for a range of aws-sdk versions) and a particular version of the aws-sdk dependency. These are just types. My understanding is the typical handling is to add a (possibly simplified) copy of the types from the module being instrumented, and add those to a local types ts file in the instrumentation. See, for example, the existing "src/aws-sdk.types.ts" in this instrumentation package, which says: /*
These are slightly modified and simplified versions of the actual SQS types included
in the official distribution:
https://github.com/aws/aws-sdk-js/blob/master/clients/sqs.d.ts
These are brought here to avoid having users install the `aws-sdk` whenever they
require this instrumentation.
*/ There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oops, nice catch. I just changed to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like instrumentation-graphql and some others are using |
||
|
||
export class BedrockRuntimeServiceExtension implements ServiceExtension { | ||
requestPreSpanHook( | ||
|
@@ -43,7 +47,9 @@ export class BedrockRuntimeServiceExtension implements ServiceExtension { | |
): RequestMetadata { | ||
switch (request.commandName) { | ||
case 'Converse': | ||
return this.requestPreSpanHookConverse(request, config, diag); | ||
return this.requestPreSpanHookConverse(request, config, diag, false); | ||
case 'ConverseStream': | ||
return this.requestPreSpanHookConverse(request, config, diag, true); | ||
} | ||
|
||
return { | ||
|
@@ -54,7 +60,8 @@ export class BedrockRuntimeServiceExtension implements ServiceExtension { | |
private requestPreSpanHookConverse( | ||
request: NormalizedRequest, | ||
config: AwsSdkInstrumentationConfig, | ||
diag: DiagLogger | ||
diag: DiagLogger, | ||
isStream: boolean | ||
): RequestMetadata { | ||
let spanName = GEN_AI_OPERATION_NAME_VALUE_CHAT; | ||
const spanAttributes: Attributes = { | ||
|
@@ -90,6 +97,7 @@ export class BedrockRuntimeServiceExtension implements ServiceExtension { | |
return { | ||
spanName, | ||
isIncoming: false, | ||
isStream, | ||
spanAttributes, | ||
}; | ||
} | ||
|
@@ -107,6 +115,8 @@ export class BedrockRuntimeServiceExtension implements ServiceExtension { | |
switch (response.request.commandName) { | ||
case 'Converse': | ||
return this.responseHookConverse(response, span, tracer, config); | ||
case 'ConverseStream': | ||
return this.responseHookConverseStream(response, span, tracer, config); | ||
} | ||
} | ||
|
||
|
@@ -117,6 +127,49 @@ export class BedrockRuntimeServiceExtension implements ServiceExtension { | |
config: AwsSdkInstrumentationConfig | ||
) { | ||
const { stopReason, usage } = response.data; | ||
|
||
BedrockRuntimeServiceExtension.setStopReason(span, stopReason); | ||
BedrockRuntimeServiceExtension.setUsage(span, usage); | ||
} | ||
|
||
private responseHookConverseStream( | ||
response: NormalizedResponse, | ||
span: Span, | ||
tracer: Tracer, | ||
config: AwsSdkInstrumentationConfig | ||
) { | ||
return { | ||
...response.data, | ||
stream: this.wrapConverseStreamResponse(response.data.stream, span), | ||
}; | ||
} | ||
|
||
private async *wrapConverseStreamResponse( | ||
response: AsyncIterable<ConverseStreamOutput>, | ||
span: Span | ||
) { | ||
try { | ||
for await (const item of response) { | ||
BedrockRuntimeServiceExtension.setStopReason( | ||
span, | ||
item.messageStop?.stopReason | ||
); | ||
BedrockRuntimeServiceExtension.setUsage(span, item.metadata?.usage); | ||
yield item; | ||
} | ||
} finally { | ||
span.end(); | ||
} | ||
} | ||
|
||
private static setStopReason(span: Span, stopReason: string | undefined) { | ||
if (stopReason !== undefined) { | ||
console.log(stopReason); | ||
span.setAttribute(ATTR_GEN_AI_RESPONSE_FINISH_REASONS, [stopReason]); | ||
} | ||
} | ||
|
||
private static setUsage(span: Span, usage: TokenUsage | undefined) { | ||
if (usage) { | ||
const { inputTokens, outputTokens } = usage; | ||
if (inputTokens !== undefined) { | ||
|
@@ -126,9 +179,5 @@ export class BedrockRuntimeServiceExtension implements ServiceExtension { | |
span.setAttribute(ATTR_GEN_AI_USAGE_OUTPUT_TOKENS, outputTokens); | ||
} | ||
} | ||
|
||
if (stopReason !== undefined) { | ||
span.setAttribute(ATTR_GEN_AI_RESPONSE_FINISH_REASONS, [stopReason]); | ||
} | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
[ | ||
{ | ||
"scope": "https://bedrock-runtime.us-east-1.amazonaws.com:443", | ||
"method": "POST", | ||
"path": "/model/amazon.titan-text-lite-v1/converse-stream", | ||
"body": { | ||
"inferenceConfig": { | ||
"maxTokens": 10, | ||
"stopSequences": [ | ||
"|" | ||
], | ||
"temperature": 0.8, | ||
"topP": 1 | ||
}, | ||
"messages": [ | ||
{ | ||
"content": [ | ||
{ | ||
"text": "Say this is a test" | ||
} | ||
], | ||
"role": "user" | ||
} | ||
] | ||
}, | ||
"status": 200, | ||
"response": "00000081000000526cc176930b3a6576656e742d7479706507000c6d65737361676553746172740d3a636f6e74656e742d747970650700106170706c69636174696f6e2f6a736f6e0d3a6d6573736167652d747970650700056576656e747b2270223a2261626364222c22726f6c65223a22617373697374616e74227df512a005000000c600000057f67806450b3a6576656e742d74797065070011636f6e74656e74426c6f636b44656c74610d3a636f6e74656e742d747970650700106170706c69636174696f6e2f6a736f6e0d3a6d6573736167652d747970650700056576656e747b22636f6e74656e74426c6f636b496e646578223a302c2264656c7461223a7b2274657874223a2248692120486f772061726520796f753f20486f77227d2c2270223a226162636465666768696a6b6c6d6e6f70717273747576777879227da88f22a40000009500000056fecc83c80b3a6576656e742d74797065070010636f6e74656e74426c6f636b53746f700d3a636f6e74656e742d747970650700106170706c69636174696f6e2f6a736f6e0d3a6d6573736167652d747970650700056576656e747b22636f6e74656e74426c6f636b496e646578223a302c2270223a226162636465666768696a6b6c6d6e6f7071227d8dc2956d00000097000000511a68450b0b3a6576656e742d7479706507000b6d65737361676553746f700d3a636f6e74656e742d747970650700106170706c69636174696f6e2f6a736f6e0d3a6d6573736167652d747970650700056576656e747b2270223a226162636465666768696a6b6c6d6e6f7071727374222c2273746f70526561736f6e223a226d61785f746f6b656e73227ddb5bf387000000ce0000004ea263e5440b3a6576656e742d747970650700086d657461646174610d3a636f6e74656e742d747970650700106170706c69636174696f6e2f6a736f6e0d3a6d6573736167652d747970650700056576656e747b226d657472696373223a7b226c6174656e63794d73223a3736357d2c2270223a226162636465666768696a6b6c6d6e6f222c227573616765223a7b22696e707574546f6b656e73223a382c226f7574707574546f6b656e73223a31302c22746f74616c546f6b656e73223a31387d7d98eada7f", | ||
"rawHeaders": [ | ||
"Date", | ||
"Fri, 21 Mar 2025 02:04:20 GMT", | ||
"Content-Type", | ||
"application/vnd.amazon.eventstream", | ||
"Transfer-Encoding", | ||
"chunked", | ||
"Connection", | ||
"keep-alive", | ||
"x-amzn-RequestId", | ||
"c01898c3-00ef-43e3-b015-e7458e9afc84" | ||
], | ||
"responseIsBinary": true | ||
} | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered whether this should be done for the user hook too but didn't think there's enough use case for it. Currently the change is only internal since AFAIK, users can't define service extensions