Skip to content

Capture GenAI prompts and completions as events or attributes #2010

Open
@lmolkova

Description

@lmolkova

The GenAI SIG has been discussing how to capture prompts and completion for a while and there are several issues that are blocked on this discussion (#1913, #1883, #1556)

What we have today on OTel semconv is a set of structured events exported separately from spans. This was implemented in #980 and discussed in #834 and #829. The motivation to use events was to

  • overcome size limits on attribute values by using event body
  • use a signal that supports structured body and attributes
  • have a clear 1:1 relationship between event name and structure (as opposed to polymorphic types or arrays of heterogeneous objects)
  • make it possible and easy to consume individual events and prompts/completions without spans
  • have verbosity controls

Turns out that:

  • after ~9 months events are still not adopted by GenAI-focused tracing tools and their external instrumentation libs including Arize, Traceloop, Langtrace - all these providers use span attributes to capture prompts and completions.
  • These backends consume prompts and completions along with spans and don't envision separating them - they store and visualize this data altogether

So, the GenAI SIG is re-litigating this decision taking into account backends' feedback and other relevant issues: #1621, #1912, open-telemetry/opentelemetry-specification#4414


The fundamental question is whether this data belongs on the span or is a separate thing useful without a span.

How it can be useful without a span:

To be useful without a span, events should probably duplicate some of the span attributes - endpoint, model used, input parameters, etc - it's not the case today

Are prompts/completions point-in-time telemetry?

Arguably, from what we've seen so far, GenAI prompts and completion are used along with the spans and there is no great use-case for standalone events


Another fundamental question is how and if to capture unbounded (text, video, audio, etc) data on telemetry

It's problematic because of:

  • privacy - prompts can contain health concerns, ssns, addresses, names, etc. Apps that remain compliant with different regulators would have a problem of sharing this data with a broad audience of DevOps humans. The data should be accessible for evaluations, audit, but access should be restricted
  • size - non-GenAI specific backends are not optimized for this and it's expensive to store such data in hot storage.

Imagine, we had a solution that allowed us to store chat history somewhere and added a deep-link to that specific conversation to the telemetry - would we consider reporting this link as an event? We might, but we'd most likely have added this link as attribute on the span.

Arguably, the long term solution to this problem is having this data stored separately from telemetry, but recorded by reference (e.g. URL on span that points to the chat history)


TL;DR:

  • current approach doesn't work, we're blocked and need to find path forward.
  • GenAI-focused backends, innerloop scenarios, non-production apps would benefit from having prompts/completions stamped on the spans directly
  • General-purpose observability backends and high-scale applications would have a problem with sensitive/large/binary data coming from end-users on telemetry anyway

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    How to model prompts and completions

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions