Skip to content

feat: API Gateway Tracing #3095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 47 commits into from
Closed

Conversation

PROFeNoM
Copy link
Contributor

@PROFeNoM PROFeNoM commented Feb 19, 2025

Description

This PR adds the possibility to create a new span on top of the existing request span based on received headers. This feature is opt-in through DD_TRACE_INFERRED_PROXY_SERVICES_ENABLED.

The newly created span (the "inferred span") will be a root span. The original request span will also be a root span. The latter will be child of the former.

Runtime information (e.g., memory peak) will be set on the inferred span. It won't be set on both spans.

While the inferred span being the root span of the stack, it won't be returned by using DDTrace\root_span. Instead, the child will be returned to preserve the current behavior. The inferred span can still be accessed by using the parent property.

Errors set on the original request span will be set on the inferred span as well.

Implementation Doc: link

Telemetry PR: https://github.com/DataDog/dd-go/pull/171137

image

Reviewer checklist

  • Test coverage seems ok.
  • Appropriate labels assigned.

'A simple GET request returning a string',
'/simple?key=value&pwd=should_redact',
[
'x-dd-proxy: aws.apigateway',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way we implemented it in other languages was to have a map w/the header value of aws-apigateway to map to a span name value of aws.apigateway. We want the header value to remain unchanged and we want the span name to be slightly different which can be represented by a key value map.

That way we can add more values in the future if we have more proxy headers that correspond to different operation names

Copy link

@zarirhamza zarirhamza Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to this as an example

Note, they also include the component value in the map which would change for future proxies as well that may not necessarily be equal to the proxy header

ext/ddtrace.c Outdated
UNUSED(old_value, new_str);

ddtrace_span_properties *pspan = ddtrace_active_span_props();
while (pspan) {
if (skip_inferred) {
ddtrace_span_data *span = SPANDATA(pspan);
if (span && span->type == DDTRACE_INFERRED_SPAN) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (span && span->type == DDTRACE_INFERRED_SPAN) {
if (span->type == DDTRACE_INFERRED_SPAN) {

Nit, but SPANDATA() is just an offset calculation from pspan and will never be NULL.

Comment on lines 23 to 26
ddtrace_proxy_info *info;
ZEND_HASH_FOREACH_PTR(&proxy_info_map, info) {
efree(info);
} ZEND_HASH_FOREACH_END();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the idiomatic way would be using:

static dd_proxy_info_dtor(zval *zv) {
    efree(Z_PTR_P(zv));
}

and passing it to zend_hash_init as destructor.

}

void ddtrace_init_proxy_info_map(void) {
zend_hash_init(&proxy_info_map, 8, NULL, NULL, 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be 0 as last parameter right? persistent=1 is for memory supposed to survive the request cycle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, indeed. I initially wanted to do the initialization/free in MINIT/MSHUTDOWN, but faced issues. Nice catch, ty.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be in minit/mshutdown though. No point in reinitializing that every request.

ext/serializer.c Outdated
Comment on lines 1191 to 1195
} else if (span->std.ce == ddtrace_ce_root_span_data) {
ddtrace_root_span_data *root = ROOTSPANDATA(&span->std);
if (root->child_root) { // Can't check for DDTRACE_INFERRED_SPAN because the span is now closed
zval *child_exception_zv = &root->child_root->span.property_exception;
has_exception = Z_TYPE_P(child_exception_zv) == IS_OBJECT && instanceof_function(Z_OBJCE_P(child_exception_zv), zend_ce_throwable);
Copy link
Collaborator

@bwoebi bwoebi Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this before the if (has_exception) above?

// inherit exception on inferred span from child root
if (!has_exception && span->std.ce == ddtrace_ce_root_span_data) {
    ddtrace_root_span_data *root = ROOTSPANDATA(&span->std);
    if (root->child_root) { // Can't check for DDTRACE_INFERRED_SPAN because the span is now closed
        exception_zv = &root->child_root->span.property_exception;
        has_exception = Z_TYPE_P(child_exception_zv) == IS_OBJECT && instanceof_function(Z_OBJCE_P(child_exception_zv), zend_ce_throwable);
    }
}

The duplication was confusing me

ext/span.h Outdated
@@ -86,6 +87,7 @@ struct ddtrace_span_data {
bool notify_user_req_end;
struct ddtrace_span_data *next;
struct ddtrace_root_span_data *root;
bool is_child_of_inferred_span;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this property at all? Isn't it redundant with child_root != NULL on the root_span_data?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see only one usage of it, and it's trivially replaced.

ext/span.h Outdated
@@ -107,6 +109,7 @@ struct ddtrace_root_span_data {
ddtrace_rule_result sampling_rule;
bool explicit_sampling_priority;
enum ddtrace_trace_limited trace_is_limited;
struct ddtrace_root_span_data *child_root; // Only used when inferring proxy services (type: DDTRACE_INFERRED_SPAN)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's rename it to inferred_root

@PROFeNoM PROFeNoM marked this pull request as ready for review February 27, 2025 09:25
@PROFeNoM PROFeNoM requested review from a team as code owners February 27, 2025 09:25
if (!proxy_info) {
zend_string_release(result.system);
zend_string_release(result.start_time_ms);
return NULL;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're leaking the other strings in result here, there are more than two.

Comment on lines +31 to +34
read_header((zai_str)ZAI_STRL("X_DD_PROXY_PATH"), "x-dd-proxy-path", &result.path, data);
read_header((zai_str)ZAI_STRL("X_DD_PROXY_HTTPMETHOD"), "x-dd-proxy-httpmethod", &result.http_method, data);
read_header((zai_str)ZAI_STRL("X_DD_PROXY_DOMAIN_NAME"), "x-dd-proxy-domain-name", &result.domain, data);
read_header((zai_str)ZAI_STRL("X_DD_PROXY_STAGE"), "x-dd-proxy-stage", &result.stage, data);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably put this in an if (result.system || result.path) to short-circuit this away.

ext/span.c Outdated
ZVAL_STR(&span->property_resource, strpprintf(0, "%s %s", ZSTR_VAL(result.http_method), ZSTR_VAL(result.path)));
}

span->start = (zend_long)zend_strtod(ZSTR_VAL(result.start_time_ms), NULL) * 1000000;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fishy. Either you should directly parse it as a long, or you need parenthesis (if the number is actually a decimal number). The (zend_long) cast is evaluated before the multiplication.
If the goal is string to zend_long, use ZEND_ATOL().

@bwoebi
Copy link
Collaborator

bwoebi commented Feb 27, 2025

How is this supposed to interact with sampling decisions and distributed tracing?
Currently it's going to use the api gateway root span for sampling decisions (e.g. DD_TRACE_SAMPLING_RULES is going to match against the inferred root rather the "normal" root). Is that intentional? Also, it's two root spans, whose sampling wins? Especially when manually setting sampling decisions...
Same for distributed tracing, where are the propagated root tags to be taken from?

Also: ddtrace_get_root_span() is the function appsec uses to attach sampling info and root tags to. Is that supposed to be the inferred span or the proper root?


if (result.domain) {
ZVAL_STR_COPY(&zv, result.domain);
ddtrace_assign_variable(&span->property_service, &zv);
Copy link
Collaborator

@bwoebi bwoebi Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this if (result.domain) rather a safeguard or something which will regularly happen? Because if it can happen that the domain is not set, the inferred span will have the generic service name (http.request), while frameworks will have the service name assigned by the integrations on the root span. Which will certainly be unexpected.

@PROFeNoM
Copy link
Contributor Author

PROFeNoM commented Feb 27, 2025

If I refer to the implementation doc, it would appear that it is expected for the sampling rules to be matched against the new, inferred root span. I assume this is intentional.


About the other questions, I don't know. At a glance, I am more concerned about the appsec question, as this is an internal behavior that the customer won't have power upon at all. I'd have to ask to know whether a decision was already made.

@bwoebi
Copy link
Collaborator

bwoebi commented Feb 27, 2025

@PROFeNoM Could you please ask whether that's supposed to affect all metadata (like, "tags" or "resource"), or only the service name? Because if it's the former it will be impossible to define sampling rules according to runtime behaviour (i.e. "if this tag is set, then apply only this reduced sampling rate), essentially.

@PROFeNoM PROFeNoM marked this pull request as draft February 28, 2025 08:50
@PROFeNoM
Copy link
Contributor Author

PROFeNoM commented Mar 4, 2025

Superseded by #3116

@PROFeNoM PROFeNoM closed this Mar 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants