feat: [SVLS-9168] add aws.durable.operation_attempt tag to durable operations span#18191
feat: [SVLS-9168] add aws.durable.operation_attempt tag to durable operations span#18191lym953 wants to merge 5 commits into
Conversation
Add aws.durable.operation_retry_attempt to aws.durable.step and aws.durable.wait_for_condition spans, sourced from the SDK's StepDetails.attempt checkpoint field. 0 = original attempt, N = Nth retry. Set as a numeric metric so it supports range queries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codeowners resolved as |
|
BenchmarksBenchmark execution time: 2026-05-20 21:44:25 Comparing candidate commit c363d36 in PR branch Found 0 performance improvements and 4 performance regressions! Performance is the same for 372 metrics, 9 unstable metrics. scenario:httppropagationinject-ids_only
scenario:span-start
scenario:telemetryaddmetric-1-count-metric-1-times
scenario:tracer-small
|
…try_attempt Match the style of the other span-attribute setters in the same subscriber. The wire result is identical — _set_attribute dispatches by value type, so an int still lands in the metrics dict as a float. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-indexed in prod The AWS Lambda Durable service reports `step_details.attempt` 1-indexed (1 for the first attempt, 2 after the first retry), not 0-indexed like the SDK's own documented semantic. Subtract 1 (clamped to 0) so the tag emits the user-facing retry count: 0 for the original attempt, 1 for the first retry, etc. Updates the test_step_with_retry snapshot: the retry-success span's value drops from 1 to 0 because the SDK testing framework already returns step_details.attempt as the retry count directly. The test framework and prod disagree on the semantics; the AIDEV-NOTE in patch.py captures the discrepancy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ttempt Rename `aws.durable.operation_retry_attempt` to `aws.durable.operation_attempt`. Semantics remain 0-indexed: 0 = original attempt, 1 = first retry, etc. - TAG_OPERATION_RETRY_ATTEMPT renamed to TAG_OPERATION_ATTEMPT - AwsDurableOperationEvent field renamed accordingly - Snapshot keys updated; values unchanged Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…indexed Match the AWS UI's attempt-count convention: 1 = original attempt, 2 = first retry, etc. Pass step_details.attempt through directly (it's already 1-indexed in production); default to 1 when no checkpoint exists yet, and clamp with max(1, …) to handle the SDK testing framework's 0-indexed values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c363d36cfd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if isinstance(event, AwsDurableOperationEvent) and event.operation in _RETRYABLE_OPERATIONS: | ||
| operation = checkpoint.operation | ||
| if operation is not None and operation.step_details is not None: | ||
| event.operation_attempt = max(1, operation.step_details.attempt) |
There was a problem hiding this comment.
Convert retry attempt to 1-based before tagging
For retrying step/wait_for_condition operations, using max(1, operation.step_details.attempt) collapses attempts 0 and 1 to the same value (1) when the SDK reports attempts as 0-based, so the first retry is indistinguishable from the initial attempt. This directly undermines the new tag’s stated purpose of distinguishing retries (1, 2, 3, ...); in practice you can already see both attempts tagged 1 in the updated test_step_with_retry snapshot. Consider converting with attempt + 1 (or otherwise normalizing by source) instead of clamping with max.
Useful? React with 👍 / 👎.
Description
Add
aws.durable.operation_attempttoaws.durable.stepandaws.durable.wait_for_conditionspans. The value is the 1-indexed attempt number, matching the AWS UI's own attempt-count convention:1— original attempt2— first retry (second attempt)3— second retry (third attempt)Sourced from Durable Execution SDK's
StepDetails.attemptfield in the operation checkpoint (which the AWS Lambda Durable service reports 1-indexed). When no checkpoint exists yet (the very first execution before the START checkpoint), the tag defaults to1.This tag will be used by UI to display attempt count and group attempts for the same operation.
Testing
Installed the tracer on a durable function and invoked it. The tag shows up for

aws.durable.stepspan. (link)Why only
stepandwait_for_conditionspans?The SDK has six
OperationExecutorsubclasses; only two use theStepDetails.attemptretry mechanism:StepOperationExecutorstep_detailsWaitForConditionOperationExecutorstep_details(polling iterations)CallbackOperationExecutorcallback_detailsInvokeOperationExecutorchained_invoke_detailsChildOperationExecutorWaitOperationExecutorwait_detailswait_for_callbackisn't its own executor — it's a helper that internally callscreate_callback+step, so any retries appear on the inneraws.durable.stepchild span (already covered).mapandparallelalso have no executor; their work is decomposed into MAP_ITERATION / PARALLEL_BRANCH child operations, which_is_top_level_for_spanfilters out anyway.Note on test snapshots vs. production
The SDK's
aws_durable_execution_sdk_python_testingframework reportsstep_details.attempt0-indexed (matching the SDK's documented semantic for "completed prior attempts"), while the production AWS Lambda Durable service reports it 1-indexed (matching the AWS UI). The code passes the value through directly, withmax(1, …)guarding the case where the test framework yields 0. As a result, test snapshots show1for both the original attempt and the first retry — they verify tag presence, not the distinction between attempt numbers. See theAIDEV-NOTEinpatch.py. Production traces correctly distinguish (verified end-to-end on a deployed Lambda).