Skip to content

Conversation

@newlinedeveloper
Copy link

Description

Fixes an issue where retrying a CloudFormation deployment that uses a custom resource with an async waiter fails with ExecutionAlreadyExists error.

Root Cause

The custom resource provider framework uses CloudFormation's RequestId as the Step Functions execution name when starting the waiter state machine. When CloudFormation retries a failed deployment, it reuses the same RequestId. Since Step Functions execution names must be unique for 90 days, subsequent retry attempts fail with ExecutionAlreadyExists.

Solution

Removed the name parameter from the startExecution call, allowing Step Functions to auto-generate unique execution names. This is the recommended approach per the AWS Step Functions StartExecution API Reference, where the name parameter is optional and Step Functions will automatically generate a universally unique identifier (UUID) as the execution name if not provided.

Changes

  • Removed name: resourceEvent.RequestId from the waiter state machine execution call in framework.ts
  • Updated log statement to remove the name field
  • Added unit test to verify that name is not included in the startExecution call

Testing

  • Added unit test waiter state machine execution does not include name field (allows retries) to verify the fix
  • All existing unit tests pass
  • Verified that the mock assertion checks for name being undefined

Related Issue

Fixes #35957

Verification

The fix was verified by:

  1. Running unit tests to ensure the name field is not included
  2. Confirming that existing tests continue to pass
  3. The change aligns with AWS Step Functions best practices for execution naming

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK bug This issue is a bug. effort/medium Medium work item – several days of effort p1 labels Nov 8, 2025
@aws-cdk-automation aws-cdk-automation requested a review from a team November 8, 2025 11:14
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This review is outdated)

@newlinedeveloper
Copy link
Author

Exemption Request

This fix is in runtime code (Lambda function execution) and does not change CloudFormation templates or infrastructure. The existing integration tests verify infrastructure creation, which is unaffected by this change. Unit tests provide comprehensive coverage of the runtime behavior change.

@aws-cdk-automation aws-cdk-automation added the pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback. label Nov 8, 2025
@newlinedeveloper newlinedeveloper force-pushed the fix/custom-resources-waiter-retry-execution-name branch from 2a0d935 to 6d329d8 Compare November 8, 2025 13:29
@vvigilante
Copy link

alternatively we could forward the request id from the lambda. That should never repeat.

@Abogical Abogical self-assigned this Nov 12, 2025
@Abogical
Copy link
Member

Abogical commented Nov 13, 2025

I have confirmed that this PR fixes the issue.

@Abogical Abogical added pr-linter/exempt-integ-test The PR linter will not require integ test changes pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback. and removed pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback. labels Nov 13, 2025
@aws-cdk-automation aws-cdk-automation dismissed their stale review November 13, 2025 14:57

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

@mrgrain
Copy link
Contributor

mrgrain commented Nov 17, 2025

Integration test failure are expected due to the changed asset. They are not caused by the new integ-runner engine. You'll need to work with your PR reviewer to update all snapshots. For framework changes like this, I'd typically recommend that a CDK team member is doing this for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK bug This issue is a bug. effort/medium Medium work item – several days of effort p1 pr-linter/exempt-integ-test The PR linter will not require integ test changes pr-linter/exemption-requested The contributor has requested an exemption to the PR Linter feedback.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CustomResource Provider: WaiterStateMachine can't start when stack deployment is retried

5 participants