Skip to content

feat(datadog-aws-lambda): add root invocation span with OTel tracing#213

Merged
ygree merged 24 commits into
mainfrom
david.ogbureke/lambda-root-invocation
May 20, 2026
Merged

feat(datadog-aws-lambda): add root invocation span with OTel tracing#213
ygree merged 24 commits into
mainfrom
david.ogbureke/lambda-root-invocation

Conversation

@Dogbu-cyber
Copy link
Copy Markdown
Contributor

@Dogbu-cyber Dogbu-cyber commented Apr 21, 2026

PR Stack: #194 (workspace setup) -> #213 (lambda root span) -> #190#219 (lambda inferred spans)

What does this PR do?

Implements the core Lambda handler wrapper for datadog-aws-lambda. Each invocation is automatically instrumented with an aws.lambda root span carrying cold_start, request_id, function metadata, and error recording.

This PR intentionally does not include trigger extraction or inferred spans. Those are layered on in #190#219 with minimal API surface changes.

Usage

lambda_runtime::run(TracedService::new(my_handler, Config::default())).await

What's included

  • TracedService - tower::Service that wraps lambda handler with OTel tracing
  • LambdaSpan - aws.lambda root span with cold_start, request_id, function_arn, function_version, functionname, _dd.origin=lambda
  • Invocation - start/handler_context/finish lifecycle with error recording
  • Config - service/env/version or full DatadogTracingBuilder control
  • Lambda-appropriate OTel defaults: synchronous writes, no client-side stats

Motivation

This PR establishes the root invocation tracing that #190#219 builds inferred spans on top of.

Ref: #221

@Dogbu-cyber Dogbu-cyber changed the title David.ogbureke/lambda root invocation feat(datadog-aws-lambda): add root invocation span with OTel tracing Apr 21, 2026
@Dogbu-cyber Dogbu-cyber force-pushed the david.ogbureke/lambda-root-invocation branch from 4520f81 to 3347bdc Compare April 21, 2026 19:35
@Dogbu-cyber Dogbu-cyber marked this pull request as ready for review April 21, 2026 19:47
@Dogbu-cyber Dogbu-cyber requested a review from a team as a code owner April 21, 2026 19:47
Comment thread instrumentation/datadog-aws-lambda/src/invocation.rs
Comment thread instrumentation/datadog-aws-lambda/src/invocation.rs

let parent_cx = Context::current();

let mut builder = tracer.span_builder(TRACER_NAME);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the operation name I think

Copy link
Copy Markdown
Contributor

@ygree ygree May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. BTW, why do we need both the span name and the operation attribute?

Comment thread instrumentation/datadog-aws-lambda/src/invocation.rs
Comment on lines +140 to +142
impl<F, Fut, E, R> Service<LambdaEvent<Box<RawValue>>> for WrappedHandler<F, E, R>
where
F: Fn(LambdaEvent<E>) -> Fut + Send + Sync + 'static,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the wrapped handler to be a Service<E> and not a simple function, if users want to use middlewares after tracing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'm still working on how to test this. Soon I will push that commit.

Copy link
Copy Markdown
Contributor

@ygree ygree May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address this in 70f982c

Comment on lines +26 to +27
#[derive(Default)]
pub struct Config {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need specific config for this integration or can we just pass a datadog_tracing::Configuration object?

All of the keys here are in the main config.

Copy link
Copy Markdown
Contributor Author

@Dogbu-cyber Dogbu-cyber Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought process was that I wanted to provide a way to set service, env, and version without having to also remember to set client-side stats to false. But I think it's just better to accept a datadog_opentelemetry::Config and emphasize that users should set Lambda-specific defaults themselves if they pass in their own.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just ask them to pass a datadog_opentelemetry::ConfigBuilder and then override the specific config fields

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Contributor

@ygree ygree May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it so that it uses datadog_opentelemetry::ConfigBuilder directly. Therefore, there is no need for a separate Config structure for now. See 2dbf552

Comment on lines +28 to +33
/// Service name. Overrides `DD_SERVICE`. Ignored when [`tracing`](Self::tracing) is `Some`.
pub service: Option<String>,
/// Deployment environment. Overrides `DD_ENV`. Ignored when [`tracing`](Self::tracing) is
/// `Some`.
pub env: Option<String>,
/// Service version. Overrides `DD_VERSION`. Ignored when [`tracing`](Self::tracing) is `Some`.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't expose the public struct like this. Adding config field here would be a breaking change, so I think we should provide a builder style API

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it so that it uses datadog_opentelemetry::ConfigBuilder directly. Therefore, there is no need for a separate Config structure for now. See 19feab9

}
};
let typed_event = LambdaEvent::new(typed_payload, event.context);
let fut = inner.call(typed_event);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're json deserializing the event payload from a string. Is that correct? Why do we need to do this? Shouldn't the event come to us as an object in the first place?

Copy link
Copy Markdown
Contributor Author

@Dogbu-cyber Dogbu-cyber Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could take LambdaEvent<E> directly, but that would mean deserialization errors would be handled by the Lambda runtime before our code runs, meaning an invocation with a payload that couldn't be deserialized into an E wouldn't be traced at all.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that makes sense. What I wonder though is, how often is there going to be an error deserializing event payloads? Probably rarely if never. Were you ever able to produce an error during deserialization? If so, how were you able to? I suspect it's not possible.

Copy link
Copy Markdown
Contributor Author

@Dogbu-cyber Dogbu-cyber Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could happen if someone invoked a Lambda directly with a custom payload that doesn't match the handler's expected type.

A much bigger reason I forgot to mention though is looking ahead at #190 .

Let's say we accept a LambdaEvent<E>.

The lambda runtime will deserialize raw bytes into that E, but now we need to perform trace extraction.

The problem is the inferrer has no clue what E is (since it's designed to in the future to be generic across runtimes), it only knows how to interpret a json payload and match a trigger from that point, which is why the options are to either accept a Value, or a RawValue.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that's helpful. And the crate that you and Jordan were working on for trace extraction. It accepts an event. What is the type of that event? Does it take a string?

Copy link
Copy Markdown
Contributor Author

@Dogbu-cyber Dogbu-cyber Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the 3 methods exposed by the trace-inferrer crate for trace extraction:

pub fn infer_span_from_bytes(&self, payload: &[u8])
pub fn infer_span_from_value(&self, payload: &Value)
pub fn infer_span(&self, payload: &str)

Also to clarify, I didn't work on this crate.

/// use lambda_runtime::service_fn;
///
/// // Zero-config (Lambda defaults applied automatically)
/// lambda_runtime::run(WrappedHandler::new(service_fn(my_handler), None)).await
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about this and would like to learn more before I form an opinion.

It is my understanding that for a customer handler function my_handler, you must wrap it with service_fn(my_handler), then start it with lambda_runtime::run.

So, the question is, which wrapper wraps who? Do we wrap the service_fn or does the service_fn wrap us? I see you've chosen the former. I'm curious to learn more about the benefits/drawbacks of each method and why you think this is the preferred option.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WrappedHandler is a Service that wraps an inner Service. which means we can place tower middleware between tracing and the handler:

let inner = ServiceBuilder::new()
    .layer(auth_middleware)
    .service(service_fn(my_handler));

lambda_runtime::run(WrappedHandler::new(inner, None)).await

If we were to instead let service_fn wrap us, my understanding is the api would look something like this

let traced = datadog_lambda::wrap(my_handler);
lambda_runtime::run(service_fn(traced)).await

wrap returns a new async function that starts the span, calls the original handler, ends
the span, and flushes. The API is simpler, but because the result is a function, not a Service, you can't insert tower middleware between tracing and the handler:

// wrap(my_handler) is a function, not a Service. So this is not possible
let traced = datadog_lambda::wrap(my_handler);
let inner = ServiceBuilder::new()
    .layer(auth_middleware)
    .service(???); // traced is a fn, not a Service

This is the sole reason I went with the code as it's currently written.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's tower middleware and why is it important that we support it?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooooh, I see.

Comment thread instrumentation/datadog-aws-lambda/src/invocation.rs
Dogbu-cyber added a commit that referenced this pull request Apr 23, 2026
> **PR Stack:** **#194 (workspace setup)** -> #213 (lambda root span) ->
#189 (aws-sdk injection) -> #190 (lambda inferred spans)

# What does this PR do?

Establishes the `instrumentation/` Cargo workspace that houses Datadog
AWS instrumentation crates. This is the foundation PR -- #213, #189, and
#190 add the implementations on top.

## Workspace structure

```
instrumentation/
├── Cargo.toml                  # shared workspace (resolver = "2", MSRV 1.91.1)
├── datadog-aws-core/           # stub -- internal interceptor machinery (added in #189)
├── datadog-aws-sqs/            # stub -- SqsInterceptor (added in #189)
├── datadog-aws-sns/            # stub -- SnsInterceptor (added in #189)
├── datadog-aws-eventbridge/    # stub -- EventBridgeInterceptor (added in #189)
└── datadog-aws-lambda/         # distributed tracing for Rust Lambda functions (added in #213, #190)
```
Base automatically changed from david.ogbureke/ci-integration-msrv to main April 23, 2026 13:38
Implements the core Lambda handler wrapper with Datadog tracing:
- WrappedHandler: tower::Service that wraps user handlers with OTel spans
- LambdaSpan: aws.lambda root span with cold_start, request_id, function metadata
- Invocation lifecycle: start/handler_context/finish with error recording
- Config: service/env/version or full DatadogTracingBuilder control
- Lambda-appropriate OTel defaults (sync writes, no client-side stats)

Trigger extraction and inferred spans will follow in a subsequent PR.
…ervice

- Change OTel span name from tracer scope name to "aws.lambda"
- Remove redundant "language" tag
- Remove logging from LambdaSpan (error info captured in span attributes)
- Accept tower::Service instead of Fn for inner handler, enabling
  middleware composition inside the traced span
- Replace custom Config struct with Option<datadog_opentelemetry::Config>,
  applying Lambda defaults (stats disabled, sync writes) when None
- Verify payload flows through WrappedHandler to inner service and back
- Verify tower middleware composed between tracing and handler executes
@Dogbu-cyber Dogbu-cyber force-pushed the david.ogbureke/lambda-root-invocation branch from c2f0ee8 to 2cd84c3 Compare April 23, 2026 15:27
@Dogbu-cyber
Copy link
Copy Markdown
Contributor Author

datadog-opentelemetry is pulled in with features = ["test-utils"] as a production dependency because set_trace_writer_synchronous_write is gated behind #[cfg(feature = "test-utils")] upstream. Once that setter is ungated, the test-utils feature from [dependencies] should be dropped to avoid inflating the binary.

Comment on lines +99 to +102
S: Service<LambdaEvent<E>, Response = R, Error = lambda_runtime::Error>
+ Clone
+ Send
+ 'static,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the bound on the error should be something like
`S::Error: Into + Debug, instead of lambda_runtime::Error, to match the signature of lambda_runtime::run

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b9e8297

Comment on lines +142 to +144
if let Err(err) = provider.force_flush() {
tracing::error!("flush failed: {err}");
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be necessary if the tracer is in synchronous mode

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in aa5fc55

}
};
let typed_event = LambdaEvent::new(typed_payload, event.context);
let fut = inner.call(typed_event);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inner.call should be wrapper in the invocation.handler_context()

Suggested change
let fut = inner.call(typed_event);
let fut = {
let _ctx = invocation.handler_context().attach();
inner.call(typed_event)
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Fixed in 817e9ea. to ensure that the sync part is within the same invocation span and capture the same active context for the future execution.

// Copyright 2025-Present Datadog, Inc. https://www.datadoghq.com/
// SPDX-License-Identifier: Apache-2.0

pub(crate) const OPERATION_NAME: &str = "operation.name";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: We need to double check this. When this is operation_name then the invocation doesn't show up on the serverless page.

@ygree ygree force-pushed the david.ogbureke/lambda-root-invocation branch 3 times, most recently from a19ea9f to 1154f95 Compare May 6, 2026 01:38
ygree added 8 commits May 6, 2026 11:26
Remove the local config wrapper and accept Datadog's ConfigBuilder
directly for customized tracing setup.

Add a zero-config WrappedHandler::new constructor and move the explicit
builder path to WrappedHandler::with_config.

Force the Lambda-required tracing defaults internally, clean up
Datadog/OpenTelemetry imports, and fix the WrappedHandler rustdoc
examples to be rendered as ignored examples instead of failing doctests.
WrappedHandler was too generic, and the type's actual contract is a
tower::Service over LambdaEvent rather than a handler function.

Rename it to TracedService to better reflect both its tracing behavior
and its service-based API, and update the docs/examples accordingly,
including setting version in the config example.
…ambda_runtime

TracedService previously required inner service errors to convert into
lambda_runtime::Error, which was narrower than lambda_runtime::run.

Relax the bound to Into<lambda_runtime::Diagnostic> + Debug and
introduce TracedServiceError to normalize wrapped service errors and
local deserialization failures into a single outer error type that is
compatible with both Lambda diagnostics and invocation span reporting.
With synchronous trace writes enabled, the Datadog exporter already
waits for the completed trace chunk to flush when the root span ends.

Remove the extra provider.force_flush() calls, drop the now-redundant
stored SdkTracerProvider from TracedService, and update the comment to
describe the actual flush behavior.

Also add a TODO in Cargo.toml to drop the test-utils feature from the
production datadog-opentelemetry dependency once
ConfigBuilder::set_trace_writer_synchronous_write is ungated upstream.
…rvice call

Attach the invocation context around inner.call(...) and use
with_current_context() so both the synchronous call phase and the
returned future run under the same active Lambda invocation context.

Add a regression test covering services that inspect the active span in
call().
@ygree
Copy link
Copy Markdown
Contributor

ygree commented May 6, 2026

datadog-opentelemetry is pulled in with features = ["test-utils"] as a production dependency because set_trace_writer_synchronous_write is gated behind #[cfg(feature = "test-utils")] upstream. Once that setter is ungated, the test-utils feature from [dependencies] should be dropped to avoid inflating the binary.

Added a TODO into Cargo.toml and cleaned up flush calls in aa5fc55
Ref #221

datadog-opentelemetry = { version = "0.3", path = "../../datadog-opentelemetry" }
# TODO: Drop `test-utils` from this production dependency once
# `ConfigBuilder::set_trace_writer_synchronous_write` is ungated upstream.
datadog-opentelemetry = { version = "0.3", path = "../../datadog-opentelemetry", features = ["test-utils"] }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we need to remove the path before releasing this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to remove the path. Cargo uses the local path for development and the version when this crate is published. The version will also remain at 0.3, meaning it's the latest published version in the [0.3.0, 0.4.0) interval.

ygree added a commit that referenced this pull request May 12, 2026
# What does this PR do?

Make `ConfigBuilder::set_trace_writer_synchronous_write` and related
`set_trace_writer_synchronous_timeout` available outside test-utils so
short-lived AWS Lambda invocations can force trace flushes before
freeze.

# Motivation

This needs to be a public setting for [the datadog-aws-lambda
instrumentation](#213).

# Additional Notes

See #213 (comment)
ygree added 2 commits May 12, 2026 11:20
dependency

This is possible since #221 has been merged to the main branch.

Keep test-only functionality out of the runtime dependency graph.
Refresh Cargo.lock for the updated dependency resolution.
@ygree
Copy link
Copy Markdown
Contributor

ygree commented May 12, 2026

datadog-opentelemetry is pulled in with features = ["test-utils"] as a production dependency because set_trace_writer_synchronous_write is gated behind #[cfg(feature = "test-utils")] upstream. Once that setter is ungated, the test-utils feature from [dependencies] should be dropped to avoid inflating the binary.

This was addressed once #221 was merged into the main branch.

@ygree ygree merged commit 4e77aa6 into main May 20, 2026
31 checks passed
@ygree ygree deleted the david.ogbureke/lambda-root-invocation branch May 20, 2026 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants