Description
The GenAI SIG has been discussing how to capture prompts and completion for a while and there are several issues that are blocked on this discussion (#1913, #1883, #1556)
What we have today on OTel semconv is a set of structured events exported separately from spans. This was implemented in #980 and discussed in #834 and #829. The motivation to use events was to
- overcome size limits on attribute values by using event body
- use a signal that supports structured body and attributes
- have a clear 1:1 relationship between event name and structure (as opposed to polymorphic types or arrays of heterogeneous objects)
- make it possible and easy to consume individual events and prompts/completions without spans
- have verbosity controls
Turns out that:
- after ~9 months events are still not adopted by GenAI-focused tracing tools and their external instrumentation libs including Arize, Traceloop, Langtrace - all these providers use span attributes to capture prompts and completions.
- These backends consume prompts and completions along with spans and don't envision separating them - they store and visualize this data altogether
So, the GenAI SIG is re-litigating this decision taking into account backends' feedback and other relevant issues: #1621, #1912, open-telemetry/opentelemetry-specification#4414
The fundamental question is whether this data belongs on the span or is a separate thing useful without a span.
How it can be useful without a span:
- audit logs - https://cloud-native.slack.com/archives/C06KR7ARS3X/p1742322601090389?thread_ts=1741895340.932419&cid=C06KR7ARS3X - we could capture them on the request-response payloads where they are not unified/filtered/altered in other ways. But also audit logs have different delivery guarantees/storage/retention needs than telemetry
- some applications don't use tracing and rely on logs
To be useful without a span, events should probably duplicate some of the span attributes - endpoint, model used, input parameters, etc - it's not the case today
Are prompts/completions point-in-time telemetry?
- they don't really have a timestamp - prompts are input parameters, completion comes at the end of non-streaming call, buffered completion comes at the end of the streaming call. (Timestamp granularity and log ordering #1701, Please revisit decision to emit an event for every message #1621 (comment))
- Streaming chunks, if captured at all, would have timestamps (Add the option of streaming gen_ai.choice events. #1964)
Arguably, from what we've seen so far, GenAI prompts and completion are used along with the spans and there is no great use-case for standalone events
Another fundamental question is how and if to capture unbounded (text, video, audio, etc) data on telemetry
It's problematic because of:
- privacy - prompts can contain health concerns, ssns, addresses, names, etc. Apps that remain compliant with different regulators would have a problem of sharing this data with a broad audience of DevOps humans. The data should be accessible for evaluations, audit, but access should be restricted
- size - non-GenAI specific backends are not optimized for this and it's expensive to store such data in hot storage.
Imagine, we had a solution that allowed us to store chat history somewhere and added a deep-link to that specific conversation to the telemetry - would we consider reporting this link as an event? We might, but we'd most likely have added this link as attribute on the span.
Arguably, the long term solution to this problem is having this data stored separately from telemetry, but recorded by reference (e.g. URL on span that points to the chat history)
TL;DR:
- current approach doesn't work, we're blocked and need to find path forward.
- GenAI-focused backends, innerloop scenarios, non-production apps would benefit from having prompts/completions stamped on the spans directly
- General-purpose observability backends and high-scale applications would have a problem with sensitive/large/binary data coming from end-users on telemetry anyway
Metadata
Metadata
Assignees
Type
Projects
Status