feat(datadog-aws-lambda): add trigger extraction and inferred spans#219
feat(datadog-aws-lambda): add trigger extraction and inferred spans#219ygree wants to merge 24 commits into
Conversation
Implements the core Lambda handler wrapper with Datadog tracing: - WrappedHandler: tower::Service that wraps user handlers with OTel spans - LambdaSpan: aws.lambda root span with cold_start, request_id, function metadata - Invocation lifecycle: start/handler_context/finish with error recording - Config: service/env/version or full DatadogTracingBuilder control - Lambda-appropriate OTel defaults (sync writes, no client-side stats) Trigger extraction and inferred spans will follow in a subsequent PR.
…ervice - Change OTel span name from tracer scope name to "aws.lambda" - Remove redundant "language" tag - Remove logging from LambdaSpan (error info captured in span attributes) - Accept tower::Service instead of Fn for inner handler, enabling middleware composition inside the traced span - Replace custom Config struct with Option<datadog_opentelemetry::Config>, applying Lambda defaults (stats disabled, sync writes) when None
- Verify payload flows through WrappedHandler to inner service and back - Verify tower middleware composed between tracing and handler executes
…as and apply nightly fmt
…ction dependency" This reverts commit ff2cb0f.
Implements the core Lambda handler wrapper with Datadog tracing: - WrappedHandler: tower::Service that wraps user handlers with OTel spans - LambdaSpan: aws.lambda root span with cold_start, request_id, function metadata - Invocation lifecycle: start/handler_context/finish with error recording - Config: service/env/version or full DatadogTracingBuilder control - Lambda-appropriate OTel defaults (sync writes, no client-side stats) Trigger extraction and inferred spans will follow in a subsequent PR.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 50b6d0f601
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let trace_id = result.carrier.get("x-datadog-trace-id").map(String::as_str); | ||
| let has_upstream_trace = trace_id | ||
| .and_then(|id| id.parse::<u64>().ok()) | ||
| .is_some_and(|id| id != 0); |
There was a problem hiding this comment.
Preserve trace_context when carrier headers are absent
For triggers where libdd_trace_inferrer returns a pre-extracted trace_context instead of Datadog carrier headers (for example AWSTraceHeader/Step Functions contexts), this sentinel only looks at result.carrier, so has_upstream_trace is false and the invocation falls back to Context::current(). Those invocations then start a new trace instead of parenting the inferred and lambda spans to the upstream context; please convert/use result.trace_context when the carrier is empty.
Useful? React with 👍 / 👎.
| let end_time = if span.is_async { | ||
| invocation_start | ||
| } else { | ||
| invocation_end |
There was a problem hiding this comment.
End wrapped inferred spans at the inner event time
When an event is wrapped (for example SNS/EventBridge delivered through SQS), the outer wrapped span represents the time until the inner event, not the whole Lambda invocation. Because wrapped results are constructed by the inferrer with default is_async == false, this branch ends the outer span at invocation_end, so a long Lambda handler incorrectly inflates the SNS/EventBridge span duration and makes it cover Lambda execution. The outer wrapped span should end at the inner span start time (or equivalent), while the inner span keeps the async/sync invocation timing.
Useful? React with 👍 / 👎.
Remove the local config wrapper and accept Datadog's ConfigBuilder directly for customized tracing setup. Add a zero-config WrappedHandler::new constructor and move the explicit builder path to WrappedHandler::with_config. Force the Lambda-required tracing defaults internally, clean up Datadog/OpenTelemetry imports, and fix the WrappedHandler rustdoc examples to be rendered as ignored examples instead of failing doctests.
WrappedHandler was too generic, and the type's actual contract is a tower::Service over LambdaEvent rather than a handler function. Rename it to TracedService to better reflect both its tracing behavior and its service-based API, and update the docs/examples accordingly, including setting version in the config example.
…ambda_runtime TracedService previously required inner service errors to convert into lambda_runtime::Error, which was narrower than lambda_runtime::run. Relax the bound to Into<lambda_runtime::Diagnostic> + Debug and introduce TracedServiceError to normalize wrapped service errors and local deserialization failures into a single outer error type that is compatible with both Lambda diagnostics and invocation span reporting.
With synchronous trace writes enabled, the Datadog exporter already waits for the completed trace chunk to flush when the root span ends. Remove the extra provider.force_flush() calls, drop the now-redundant stored SdkTracerProvider from TracedService, and update the comment to describe the actual flush behavior. Also add a TODO in Cargo.toml to drop the test-utils feature from the production datadog-opentelemetry dependency once ConfigBuilder::set_trace_writer_synchronous_write is ungated upstream.
…rvice call Attach the invocation context around inner.call(...) and use with_current_context() so both the synchronous call phase and the returned future run under the same active Lambda invocation context. Add a regression test covering services that inspect the active span in call().
Adds libdd-trace-inferrer integration to parse Lambda event payloads and create inferred spans for upstream triggers (SQS, SNS, EventBridge, API Gateway, Lambda Function URLs). - span_inferrer module: bridges libdd-trace-inferrer with OTel SDK - TriggerExtraction: parses event payload, extracts carrier headers - InferredSpanScope: manages 0-2 inferred spans per invocation - Root span gains trigger metadata (event_source, async_invocation) - Correct span timing: async spans end at start, sync at end
…trigger extraction
50b6d0f to
c5fd42f
Compare
…213) > **PR Stack:** #194 (workspace setup) -> **#213 (lambda root span)** -> ~~#190~~#219 (lambda inferred spans) # What does this PR do? Implements the core Lambda handler wrapper for `datadog-aws-lambda`. Each invocation is automatically instrumented with an `aws.lambda` root span carrying cold_start, request_id, function metadata, and error recording. This PR intentionally does **not** include trigger extraction or inferred spans. Those are layered on in ~~#190~~#219 with minimal API surface changes. ## Usage ```rust lambda_runtime::run(TracedService::new(my_handler, Config::default())).await ``` ## What's included - **`TracedService`** - `tower::Service` that wraps lambda handler with OTel tracing - **`LambdaSpan`** - `aws.lambda` root span with `cold_start`, `request_id`, `function_arn`, `function_version`, `functionname`, `_dd.origin=lambda` - **`Invocation`** - start/handler_context/finish lifecycle with error recording - **`Config`** - `service`/`env`/`version` or full `DatadogTracingBuilder` control - Lambda-appropriate OTel defaults: synchronous writes, no client-side stats # Motivation This PR establishes the root invocation tracing that ~~#190~~#219 builds inferred spans on top of. Ref: #221 --------- Co-authored-by: Yury Gribkov <yury.gribkov@gmail.com>
What does this PR do
Adds trigger extraction and inferred spans to
datadog-aws-lambda. Building on the root invocation tracing from #213, this PR integrateslibdd-trace-inferrerto parse Lambda event payloads, extract upstream trace context, and create inferred trigger spans that parent theaws.lambdaroot span.Trigger detection and carrier extraction are delegated to
libdd-trace-inferrer, an experimental shared crate in development inlibdatadog. This crate is a PoC implementation based on the work outlined in the Serverless Rust tracing design doc, originally started by @duncanista on thejordan.gonzalez/libdd-trace-inferrerbranch. This PR depends on a fork of that work atdavid.ogbureke/libdd-trace-inferrerto unblock the consumer side while the upstream crate matures.Supported triggers (as implemented by
libdd-trace-inferrer):aws.sqsaws.snsaws.eventbridgeaws.sns->aws.sqsaws.eventbridge->aws.sqsaws.eventbridge->aws.snsaws.apigatewayaws.httpapiaws.apigateway.websocketaws.lambda.urlaws.kinesisaws.dynamodbaws.s3aws.mskFor all trigger types, trace context carrier extraction is also handled by
libdd-trace-inferrer. A header-based fallback covers payloads not matched by any known trigger shape.Motivation
Completes the consumer side of distributed tracing through AWS managed services for Rust Lambdas. The producer side is handled by
datadog-aws(#189).Notes
lambda_runtimecrate.datadog-opentelemetryis pulled in withfeatures = ["test-utils"]becauseset_trace_writer_synchronous_writeis currently gated behind that feature. Synchronous flush ensures spans are flushed from the handler's in-process buffer to the local Datadog extension before the handler returns, reducing span loss when the process freezes. This causes test-only deps (criterion, gRPC and HTTP exporters) to be compiled into the production binary, which has a binary size impact on cold starts.