feat(datadog-aws-lambda): add root invocation span with OTel tracing#213
Conversation
4520f81 to
3347bdc
Compare
|
|
||
| let parent_cx = Context::current(); | ||
|
|
||
| let mut builder = tracer.span_builder(TRACER_NAME); |
There was a problem hiding this comment.
This should be the operation name I think
There was a problem hiding this comment.
Fixed. BTW, why do we need both the span name and the operation attribute?
| impl<F, Fut, E, R> Service<LambdaEvent<Box<RawValue>>> for WrappedHandler<F, E, R> | ||
| where | ||
| F: Fn(LambdaEvent<E>) -> Fut + Send + Sync + 'static, |
There was a problem hiding this comment.
I would expect the wrapped handler to be a Service<E> and not a simple function, if users want to use middlewares after tracing
There was a problem hiding this comment.
Agreed. I'm still working on how to test this. Soon I will push that commit.
| #[derive(Default)] | ||
| pub struct Config { |
There was a problem hiding this comment.
Do we need specific config for this integration or can we just pass a datadog_tracing::Configuration object?
All of the keys here are in the main config.
There was a problem hiding this comment.
My thought process was that I wanted to provide a way to set service, env, and version without having to also remember to set client-side stats to false. But I think it's just better to accept a datadog_opentelemetry::Config and emphasize that users should set Lambda-specific defaults themselves if they pass in their own.
There was a problem hiding this comment.
You can just ask them to pass a datadog_opentelemetry::ConfigBuilder and then override the specific config fields
There was a problem hiding this comment.
I have changed it so that it uses datadog_opentelemetry::ConfigBuilder directly. Therefore, there is no need for a separate Config structure for now. See 2dbf552
| /// Service name. Overrides `DD_SERVICE`. Ignored when [`tracing`](Self::tracing) is `Some`. | ||
| pub service: Option<String>, | ||
| /// Deployment environment. Overrides `DD_ENV`. Ignored when [`tracing`](Self::tracing) is | ||
| /// `Some`. | ||
| pub env: Option<String>, | ||
| /// Service version. Overrides `DD_VERSION`. Ignored when [`tracing`](Self::tracing) is `Some`. |
There was a problem hiding this comment.
I think we shouldn't expose the public struct like this. Adding config field here would be a breaking change, so I think we should provide a builder style API
There was a problem hiding this comment.
I have changed it so that it uses datadog_opentelemetry::ConfigBuilder directly. Therefore, there is no need for a separate Config structure for now. See 19feab9
| } | ||
| }; | ||
| let typed_event = LambdaEvent::new(typed_payload, event.context); | ||
| let fut = inner.call(typed_event); |
There was a problem hiding this comment.
It looks like you're json deserializing the event payload from a string. Is that correct? Why do we need to do this? Shouldn't the event come to us as an object in the first place?
There was a problem hiding this comment.
We could take LambdaEvent<E> directly, but that would mean deserialization errors would be handled by the Lambda runtime before our code runs, meaning an invocation with a payload that couldn't be deserialized into an E wouldn't be traced at all.
There was a problem hiding this comment.
Okay, that makes sense. What I wonder though is, how often is there going to be an error deserializing event payloads? Probably rarely if never. Were you ever able to produce an error during deserialization? If so, how were you able to? I suspect it's not possible.
There was a problem hiding this comment.
It could happen if someone invoked a Lambda directly with a custom payload that doesn't match the handler's expected type.
A much bigger reason I forgot to mention though is looking ahead at #190 .
Let's say we accept a LambdaEvent<E>.
The lambda runtime will deserialize raw bytes into that E, but now we need to perform trace extraction.
The problem is the inferrer has no clue what E is (since it's designed to in the future to be generic across runtimes), it only knows how to interpret a json payload and match a trigger from that point, which is why the options are to either accept a Value, or a RawValue.
There was a problem hiding this comment.
Okay, that's helpful. And the crate that you and Jordan were working on for trace extraction. It accepts an event. What is the type of that event? Does it take a string?
There was a problem hiding this comment.
These are the 3 methods exposed by the trace-inferrer crate for trace extraction:
pub fn infer_span_from_bytes(&self, payload: &[u8])
pub fn infer_span_from_value(&self, payload: &Value)
pub fn infer_span(&self, payload: &str)
Also to clarify, I didn't work on this crate.
| /// use lambda_runtime::service_fn; | ||
| /// | ||
| /// // Zero-config (Lambda defaults applied automatically) | ||
| /// lambda_runtime::run(WrappedHandler::new(service_fn(my_handler), None)).await |
There was a problem hiding this comment.
I'm curious about this and would like to learn more before I form an opinion.
It is my understanding that for a customer handler function my_handler, you must wrap it with service_fn(my_handler), then start it with lambda_runtime::run.
So, the question is, which wrapper wraps who? Do we wrap the service_fn or does the service_fn wrap us? I see you've chosen the former. I'm curious to learn more about the benefits/drawbacks of each method and why you think this is the preferred option.
There was a problem hiding this comment.
WrappedHandler is a Service that wraps an inner Service. which means we can place tower middleware between tracing and the handler:
let inner = ServiceBuilder::new()
.layer(auth_middleware)
.service(service_fn(my_handler));
lambda_runtime::run(WrappedHandler::new(inner, None)).awaitIf we were to instead let service_fn wrap us, my understanding is the api would look something like this
let traced = datadog_lambda::wrap(my_handler);
lambda_runtime::run(service_fn(traced)).awaitwrap returns a new async function that starts the span, calls the original handler, ends
the span, and flushes. The API is simpler, but because the result is a function, not a Service, you can't insert tower middleware between tracing and the handler:
// wrap(my_handler) is a function, not a Service. So this is not possible
let traced = datadog_lambda::wrap(my_handler);
let inner = ServiceBuilder::new()
.layer(auth_middleware)
.service(???); // traced is a fn, not a ServiceThis is the sole reason I went with the code as it's currently written.
There was a problem hiding this comment.
what's tower middleware and why is it important that we support it?
> **PR Stack:** **#194 (workspace setup)** -> #213 (lambda root span) -> #189 (aws-sdk injection) -> #190 (lambda inferred spans) # What does this PR do? Establishes the `instrumentation/` Cargo workspace that houses Datadog AWS instrumentation crates. This is the foundation PR -- #213, #189, and #190 add the implementations on top. ## Workspace structure ``` instrumentation/ ├── Cargo.toml # shared workspace (resolver = "2", MSRV 1.91.1) ├── datadog-aws-core/ # stub -- internal interceptor machinery (added in #189) ├── datadog-aws-sqs/ # stub -- SqsInterceptor (added in #189) ├── datadog-aws-sns/ # stub -- SnsInterceptor (added in #189) ├── datadog-aws-eventbridge/ # stub -- EventBridgeInterceptor (added in #189) └── datadog-aws-lambda/ # distributed tracing for Rust Lambda functions (added in #213, #190) ```
Implements the core Lambda handler wrapper with Datadog tracing: - WrappedHandler: tower::Service that wraps user handlers with OTel spans - LambdaSpan: aws.lambda root span with cold_start, request_id, function metadata - Invocation lifecycle: start/handler_context/finish with error recording - Config: service/env/version or full DatadogTracingBuilder control - Lambda-appropriate OTel defaults (sync writes, no client-side stats) Trigger extraction and inferred spans will follow in a subsequent PR.
…ervice - Change OTel span name from tracer scope name to "aws.lambda" - Remove redundant "language" tag - Remove logging from LambdaSpan (error info captured in span attributes) - Accept tower::Service instead of Fn for inner handler, enabling middleware composition inside the traced span - Replace custom Config struct with Option<datadog_opentelemetry::Config>, applying Lambda defaults (stats disabled, sync writes) when None
- Verify payload flows through WrappedHandler to inner service and back - Verify tower middleware composed between tracing and handler executes
…as and apply nightly fmt
c2f0ee8 to
2cd84c3
Compare
…ction dependency" This reverts commit ff2cb0f.
|
|
| S: Service<LambdaEvent<E>, Response = R, Error = lambda_runtime::Error> | ||
| + Clone | ||
| + Send | ||
| + 'static, |
There was a problem hiding this comment.
I think the bound on the error should be something like
`S::Error: Into + Debug, instead of lambda_runtime::Error, to match the signature of lambda_runtime::run
| if let Err(err) = provider.force_flush() { | ||
| tracing::error!("flush failed: {err}"); | ||
| } |
There was a problem hiding this comment.
this should not be necessary if the tracer is in synchronous mode
| } | ||
| }; | ||
| let typed_event = LambdaEvent::new(typed_payload, event.context); | ||
| let fut = inner.call(typed_event); |
There was a problem hiding this comment.
inner.call should be wrapper in the invocation.handler_context()
| let fut = inner.call(typed_event); | |
| let fut = { | |
| let _ctx = invocation.handler_context().attach(); | |
| inner.call(typed_event) | |
| }; |
There was a problem hiding this comment.
Thanks! Fixed in 817e9ea. to ensure that the sync part is within the same invocation span and capture the same active context for the future execution.
| // Copyright 2025-Present Datadog, Inc. https://www.datadoghq.com/ | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| pub(crate) const OPERATION_NAME: &str = "operation.name"; |
There was a problem hiding this comment.
TODO: We need to double check this. When this is operation_name then the invocation doesn't show up on the serverless page.
a19ea9f to
1154f95
Compare
1154f95 to
70f982c
Compare
Remove the local config wrapper and accept Datadog's ConfigBuilder directly for customized tracing setup. Add a zero-config WrappedHandler::new constructor and move the explicit builder path to WrappedHandler::with_config. Force the Lambda-required tracing defaults internally, clean up Datadog/OpenTelemetry imports, and fix the WrappedHandler rustdoc examples to be rendered as ignored examples instead of failing doctests.
WrappedHandler was too generic, and the type's actual contract is a tower::Service over LambdaEvent rather than a handler function. Rename it to TracedService to better reflect both its tracing behavior and its service-based API, and update the docs/examples accordingly, including setting version in the config example.
…ambda_runtime TracedService previously required inner service errors to convert into lambda_runtime::Error, which was narrower than lambda_runtime::run. Relax the bound to Into<lambda_runtime::Diagnostic> + Debug and introduce TracedServiceError to normalize wrapped service errors and local deserialization failures into a single outer error type that is compatible with both Lambda diagnostics and invocation span reporting.
With synchronous trace writes enabled, the Datadog exporter already waits for the completed trace chunk to flush when the root span ends. Remove the extra provider.force_flush() calls, drop the now-redundant stored SdkTracerProvider from TracedService, and update the comment to describe the actual flush behavior. Also add a TODO in Cargo.toml to drop the test-utils feature from the production datadog-opentelemetry dependency once ConfigBuilder::set_trace_writer_synchronous_write is ungated upstream.
…rvice call Attach the invocation context around inner.call(...) and use with_current_context() so both the synchronous call phase and the returned future run under the same active Lambda invocation context. Add a regression test covering services that inspect the active span in call().
Added a TODO into |
| datadog-opentelemetry = { version = "0.3", path = "../../datadog-opentelemetry" } | ||
| # TODO: Drop `test-utils` from this production dependency once | ||
| # `ConfigBuilder::set_trace_writer_synchronous_write` is ungated upstream. | ||
| datadog-opentelemetry = { version = "0.3", path = "../../datadog-opentelemetry", features = ["test-utils"] } |
There was a problem hiding this comment.
Will we need to remove the path before releasing this?
There was a problem hiding this comment.
There is no need to remove the path. Cargo uses the local path for development and the version when this crate is published. The version will also remain at 0.3, meaning it's the latest published version in the [0.3.0, 0.4.0) interval.
# What does this PR do? Make `ConfigBuilder::set_trace_writer_synchronous_write` and related `set_trace_writer_synchronous_timeout` available outside test-utils so short-lived AWS Lambda invocations can force trace flushes before freeze. # Motivation This needs to be a public setting for [the datadog-aws-lambda instrumentation](#213). # Additional Notes See #213 (comment)
dependency This is possible since #221 has been merged to the main branch. Keep test-only functionality out of the runtime dependency graph. Refresh Cargo.lock for the updated dependency resolution.
This was addressed once #221 was merged into the main branch. |
What does this PR do?
Implements the core Lambda handler wrapper for
datadog-aws-lambda. Each invocation is automatically instrumented with anaws.lambdaroot span carrying cold_start, request_id, function metadata, and error recording.This PR intentionally does not include trigger extraction or inferred spans. Those are layered on in
#190#219 with minimal API surface changes.Usage
What's included
TracedService-tower::Servicethat wraps lambda handler with OTel tracingLambdaSpan-aws.lambdaroot span withcold_start,request_id,function_arn,function_version,functionname,_dd.origin=lambdaInvocation- start/handler_context/finish lifecycle with error recordingConfig-service/env/versionor fullDatadogTracingBuildercontrolMotivation
This PR establishes the root invocation tracing that
#190#219 builds inferred spans on top of.Ref: #221