Skip to content

feat(datadog-aws-lambda): add trigger extraction and inferred spans#219

Open
ygree wants to merge 24 commits into
mainfrom
ygree/lambda-integration
Open

feat(datadog-aws-lambda): add trigger extraction and inferred spans#219
ygree wants to merge 24 commits into
mainfrom
ygree/lambda-integration

Conversation

@ygree
Copy link
Copy Markdown
Contributor

@ygree ygree commented May 6, 2026

NOTE: This PR mirrors the original PR #189, which was closed by GH because it was accidentally pushed into base PR #213. There is no way to restore it other than to reopen it as a new PR.

PR Stack: #194 (workspace setup) -> #213 (lambda root span) -> #190#219 (lambda inferred spans)

What does this PR do

Adds trigger extraction and inferred spans to datadog-aws-lambda. Building on the root invocation tracing from #213, this PR integrates libdd-trace-inferrer to parse Lambda event payloads, extract upstream trace context, and create inferred trigger spans that parent the aws.lambda root span.

Trigger detection and carrier extraction are delegated to libdd-trace-inferrer, an experimental shared crate in development in libdatadog. This crate is a PoC implementation based on the work outlined in the Serverless Rust tracing design doc, originally started by @duncanista on the jordan.gonzalez/libdd-trace-inferrer branch. This PR depends on a fork of that work at david.ogbureke/libdd-trace-inferrer to unblock the consumer side while the upstream crate matures.

Dependency note: libdd-trace-inferrer is pulled via a git dependency on david.ogbureke/libdd-trace-inferrer in libdatadog. This will be updated to a stable release once the crate lands in libdatadog main.

Supported triggers (as implemented by libdd-trace-inferrer):

Trigger Inferred Span(s)
SQS aws.sqs
SNS aws.sns
EventBridge aws.eventbridge
SNS -> SQS aws.sns -> aws.sqs
EventBridge -> SQS aws.eventbridge -> aws.sqs
EventBridge -> SNS aws.eventbridge -> aws.sns
API Gateway REST (v1) aws.apigateway
API Gateway HTTP (v2) aws.httpapi
API Gateway WebSocket aws.apigateway.websocket
Lambda Function URL aws.lambda.url
Kinesis aws.kinesis
DynamoDB aws.dynamodb
S3 aws.s3
MSK (Kafka) aws.msk

For all trigger types, trace context carrier extraction is also handled by libdd-trace-inferrer. A header-based fallback covers payloads not matched by any known trigger shape.

Motivation

Completes the consumer side of distributed tracing through AWS managed services for Rust Lambdas. The producer side is handled by datadog-aws (#189).

Notes

  • MSRV 1.85.0 (not repo-wide 1.84.1), required by lambda_runtime crate.
  • datadog-opentelemetry is pulled in with features = ["test-utils"] because set_trace_writer_synchronous_write is currently gated behind that feature. Synchronous flush ensures spans are flushed from the handler's in-process buffer to the local Datadog extension before the handler returns, reducing span loss when the process freezes. This causes test-only deps (criterion, gRPC and HTTP exporters) to be compiled into the production binary, which has a binary size impact on cold starts.

Dogbu-cyber and others added 13 commits April 23, 2026 11:27
Implements the core Lambda handler wrapper with Datadog tracing:
- WrappedHandler: tower::Service that wraps user handlers with OTel spans
- LambdaSpan: aws.lambda root span with cold_start, request_id, function metadata
- Invocation lifecycle: start/handler_context/finish with error recording
- Config: service/env/version or full DatadogTracingBuilder control
- Lambda-appropriate OTel defaults (sync writes, no client-side stats)

Trigger extraction and inferred spans will follow in a subsequent PR.
…ervice

- Change OTel span name from tracer scope name to "aws.lambda"
- Remove redundant "language" tag
- Remove logging from LambdaSpan (error info captured in span attributes)
- Accept tower::Service instead of Fn for inner handler, enabling
  middleware composition inside the traced span
- Replace custom Config struct with Option<datadog_opentelemetry::Config>,
  applying Lambda defaults (stats disabled, sync writes) when None
- Verify payload flows through WrappedHandler to inner service and back
- Verify tower middleware composed between tracing and handler executes
Implements the core Lambda handler wrapper with Datadog tracing:
- WrappedHandler: tower::Service that wraps user handlers with OTel spans
- LambdaSpan: aws.lambda root span with cold_start, request_id, function metadata
- Invocation lifecycle: start/handler_context/finish with error recording
- Config: service/env/version or full DatadogTracingBuilder control
- Lambda-appropriate OTel defaults (sync writes, no client-side stats)

Trigger extraction and inferred spans will follow in a subsequent PR.
@ygree ygree requested a review from a team as a code owner May 6, 2026 02:04
@ygree ygree changed the title Ygree/lambda integration feat(datadog-aws-lambda): add trigger extraction and inferred spans May 6, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 50b6d0f601

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +199 to +202
let trace_id = result.carrier.get("x-datadog-trace-id").map(String::as_str);
let has_upstream_trace = trace_id
.and_then(|id| id.parse::<u64>().ok())
.is_some_and(|id| id != 0);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve trace_context when carrier headers are absent

For triggers where libdd_trace_inferrer returns a pre-extracted trace_context instead of Datadog carrier headers (for example AWSTraceHeader/Step Functions contexts), this sentinel only looks at result.carrier, so has_upstream_trace is false and the invocation falls back to Context::current(). Those invocations then start a new trace instead of parenting the inferred and lambda spans to the upstream context; please convert/use result.trace_context when the carrier is empty.

Useful? React with 👍 / 👎.

Comment on lines +127 to +130
let end_time = if span.is_async {
invocation_start
} else {
invocation_end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge End wrapped inferred spans at the inner event time

When an event is wrapped (for example SNS/EventBridge delivered through SQS), the outer wrapped span represents the time until the inner event, not the whole Lambda invocation. Because wrapped results are constructed by the inferrer with default is_async == false, this branch ends the outer span at invocation_end, so a long Lambda handler incorrectly inflates the SNS/EventBridge span duration and makes it cover Lambda execution. The outer wrapped span should end at the inner span start time (or equivalent), while the inner span keeps the async/sync invocation timing.

Useful? React with 👍 / 👎.

ygree and others added 11 commits May 6, 2026 11:26
Remove the local config wrapper and accept Datadog's ConfigBuilder
directly for customized tracing setup.

Add a zero-config WrappedHandler::new constructor and move the explicit
builder path to WrappedHandler::with_config.

Force the Lambda-required tracing defaults internally, clean up
Datadog/OpenTelemetry imports, and fix the WrappedHandler rustdoc
examples to be rendered as ignored examples instead of failing doctests.
WrappedHandler was too generic, and the type's actual contract is a
tower::Service over LambdaEvent rather than a handler function.

Rename it to TracedService to better reflect both its tracing behavior
and its service-based API, and update the docs/examples accordingly,
including setting version in the config example.
…ambda_runtime

TracedService previously required inner service errors to convert into
lambda_runtime::Error, which was narrower than lambda_runtime::run.

Relax the bound to Into<lambda_runtime::Diagnostic> + Debug and
introduce TracedServiceError to normalize wrapped service errors and
local deserialization failures into a single outer error type that is
compatible with both Lambda diagnostics and invocation span reporting.
With synchronous trace writes enabled, the Datadog exporter already
waits for the completed trace chunk to flush when the root span ends.

Remove the extra provider.force_flush() calls, drop the now-redundant
stored SdkTracerProvider from TracedService, and update the comment to
describe the actual flush behavior.

Also add a TODO in Cargo.toml to drop the test-utils feature from the
production datadog-opentelemetry dependency once
ConfigBuilder::set_trace_writer_synchronous_write is ungated upstream.
…rvice call

Attach the invocation context around inner.call(...) and use
with_current_context() so both the synchronous call phase and the
returned future run under the same active Lambda invocation context.

Add a regression test covering services that inspect the active span in
call().
Adds libdd-trace-inferrer integration to parse Lambda event payloads and
create inferred spans for upstream triggers (SQS, SNS, EventBridge, API
Gateway, Lambda Function URLs).

- span_inferrer module: bridges libdd-trace-inferrer with OTel SDK
- TriggerExtraction: parses event payload, extracts carrier headers
- InferredSpanScope: manages 0-2 inferred spans per invocation
- Root span gains trigger metadata (event_source, async_invocation)
- Correct span timing: async spans end at start, sync at end
@ygree ygree force-pushed the ygree/lambda-integration branch from 50b6d0f to c5fd42f Compare May 7, 2026 01:48
ygree added a commit that referenced this pull request May 20, 2026
…213)

> **PR Stack:** #194 (workspace setup) -> **#213 (lambda root span)** ->
~~#190~~#219 (lambda inferred spans)

# What does this PR do?

Implements the core Lambda handler wrapper for `datadog-aws-lambda`.
Each invocation is automatically instrumented with an `aws.lambda` root
span carrying cold_start, request_id, function metadata, and error
recording.

This PR intentionally does **not** include trigger extraction or
inferred spans. Those are layered on in ~~#190~~#219 with minimal API
surface changes.

## Usage

```rust
lambda_runtime::run(TracedService::new(my_handler, Config::default())).await
```

## What's included

- **`TracedService`** - `tower::Service` that wraps lambda handler with
OTel tracing
- **`LambdaSpan`** - `aws.lambda` root span with `cold_start`,
`request_id`, `function_arn`, `function_version`, `functionname`,
`_dd.origin=lambda`
- **`Invocation`** - start/handler_context/finish lifecycle with error
recording
- **`Config`** - `service`/`env`/`version` or full
`DatadogTracingBuilder` control
- Lambda-appropriate OTel defaults: synchronous writes, no client-side
stats

# Motivation

This PR establishes the root invocation tracing that ~~#190~~#219 builds
inferred spans on top of.


Ref: #221

---------

Co-authored-by: Yury Gribkov <yury.gribkov@gmail.com>
Base automatically changed from david.ogbureke/lambda-root-invocation to main May 20, 2026 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants