Skip to content

Add Azure Managed on-demand sandbox preview APIs#151

Merged
berndverst merged 60 commits into
mainfrom
wangbill/serverless-op1
Jun 18, 2026
Merged

Add Azure Managed on-demand sandbox preview APIs#151
berndverst merged 60 commits into
mainfrom
wangbill/serverless-op1

Conversation

@YunchuWang

@YunchuWang YunchuWang commented Jun 9, 2026

Copy link
Copy Markdown
Member

The scenario

A Python app using Durable Task Scheduler should be able to keep its normal orchestration worker local, but run selected activities in a DTS-started sandbox worker container.

The intended flow is the same shape as the .NET preview:

  1. The local app registers orchestrators and local activities with the existing Azure Managed worker/client APIs.
  2. The local app declares a sandbox worker profile: activity names, container image, provider kind, CPU/memory, managed identities, environment variables, and concurrency.
  3. DTS stores that declaration and later starts a sandbox worker container from the declared image.
  4. The sandbox worker process imports the preview sandbox package, connects back to DTS, registers the activities it can run, and reports activity capacity through the sandbox service.
  5. DTS routes only the declared sandbox activity work to that live sandbox worker.

In the sample, the main Python app owns the orchestration and local work; the separate sandbox worker image owns the remote activity implementation.

What's missing

The Python preview surface needed to line up with the renamed protobuf and .NET SDK surface.

The earlier naming still exposed on_demand_sandbox concepts and substrate terminology. That made the Python API read differently from the shared contract and from the .NET packages, even though all three are describing the same DTS sandbox worker model.

Reviewers also needed one clear place to understand the Python shape: which package an app imports, which API declares profiles, which worker type runs in the sandbox image, and how the generated protobuf names map to the public API.

The change

The preview package is now durabletask.azuremanaged.preview.sandboxes.

A good review path is:

  1. Start with durabletask-azuremanaged/durabletask/azuremanaged/internal/sandbox_service_pb2*. These files are regenerated from sandbox_service.proto and expose SandboxActivities, SandboxProviderKind, and sandbox_provider.
  2. Read durabletask-azuremanaged/durabletask/azuremanaged/preview/sandboxes/. This is the public preview API: SandboxActivitiesClient, SandboxWorker, SandboxWorkerProfileOptions, and sandbox_worker_profile.
  3. Read examples/sandboxes/ last. It shows the intended app split between the main orchestration process and the sandbox worker process, using the renamed DTS_SANDBOX_* settings.

The old preview.on_demand_sandbox package and old example path are replaced by the sandboxes naming so the Python PR matches the protobuf and .NET PRs.

YunchuWang and others added 15 commits May 22, 2026 10:37
- Introduced new protobuf definitions for serverless activities in `serverless_activities_service_pb2.py`.
- Added type hints and interface definitions in `serverless_activities_service_pb2.pyi`.
- Implemented gRPC service methods in `serverless_activities_service_pb2_grpc.py`.
- Created a Dockerfile for building a remote worker image for serverless activities.
- Developed a declarer application (`main_app.py`) to register serverless activities and start orchestrations.
- Implemented a remote worker (`remote_worker.py`) to execute activities in a serverless environment.
- Added a README for the serverless example, detailing setup and usage instructions.
- Created unit tests for serverless extension functionalities in `test_serverless_extension.py`.
Rename the Serverless activity RPC surface and messages to OnDemandSandbox equivalents across the proto, generated pb2/pb2_grpc stubs, and client usage. Add CPU and memory normalization/validation helpers to client (kubernetes-style quantities parsed using Decimal) and wire those into activity declaration construction. Remove DefaultAzureCredential usage in the worker (token credential set to None). Update examples, tests, and markdown tooling (enable front-matter in .pymarkdown.json), and add a new .github agent doc. Generated protobuf/grpc files and version/warning text were updated to reflect the renamed package and expected grpc tool/runtime versions.
Stop relying on an "accepted" response flag for serverless worker registration. The proto removed the accepted field from OnDemandSandboxActivityWorkerSessionResult and the generated _pb2.py/_pb2.pyi were updated accordingly. The worker now simply invokes connect_serverless_activity_worker and relies on gRPC status/transport behavior instead of checking a boolean, and the changelog documents this change. (Regenerated protobuf offsets in pb2 file as part of the proto change.)
Move the Azure Managed on-demand sandbox SDK implementation under the preview ondemand_sandbox package while keeping the legacy serverless import path as a compatibility shim. Update the sample and tests to use the new canonical API names.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the legacy serverless extension package and place the Azure Managed on-demand sandbox APIs under durabletask.azuremanaged.preview.on_demand_sandbox. Also remove the unrelated elementary PR teacher agent file from the branch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Restore unrelated repository configuration and example index changes, and remove the unused azuremanaged extensions package stub so the PR only carries on-demand sandbox preview changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove unrelated pymarkdown, Makefile, and example index changes from the on-demand sandbox PR.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add lifecycle hooks around activity execution and adapt the on-demand sandbox worker to use them. TaskHubGrpcWorker now defines _durabletask_on_activity_execution_started/completed and invokes them around activity execution; _execute_activity was refactored to adjust payload (de/externalization) and error handling. OnDemandSandboxWorker was updated to use new hook semantics, track active activity counts, expose add_activity wrapper for name resolution, use the internal shared logger, and consolidate on-demand-specific host/secure-channel attributes. Tests updated accordingly and a new test file verifies the activity hook behavior and active-activity counting.
Remove the checked-in on-demand sandbox proto source and extend proto generation to fetch it from durabletask-protobuf using the recorded source hash. Simplify the preview changelog entry and use Durable Task Scheduler in new public prose instead of DTS shorthand.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use a single full container image reference for on-demand sandbox declarations and remove split registry/repository/tag/digest configuration.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 9, 2026 21:21
@YunchuWang YunchuWang changed the title Wangbill/serverless op1 Add Azure Managed on-demand sandbox preview APIs Jun 9, 2026
@YunchuWang YunchuWang marked this pull request as draft June 9, 2026 21:23

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new durabletask.azuremanaged.preview.on_demand_sandbox surface for declaring and running Durable Task Scheduler on-demand sandbox activities, including generated gRPC stubs and an end-to-end sample. The PR also introduces internal worker activity execution hooks in the core SDK to support sandbox worker heartbeating.

Changes:

  • Added preview on-demand sandbox client/worker APIs (worker profiles, activity declarations, worker registration/heartbeats).
  • Added core TaskHubGrpcWorker activity execution hook points and tests validating hook ordering.
  • Added on-demand sandbox sample (declarer app + remote worker container) and updated azuremanaged changelog/version.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
durabletask/worker.py Adds activity execution hook points invoked around _execute_activity.
tests/durabletask/test_worker_activity_hooks.py New tests asserting hooks run on success and failure.
durabletask-azuremanaged/durabletask/azuremanaged/preview/on_demand_sandbox/client.py Implements activity declaration builders, worker profile decorator, and management client.
durabletask-azuremanaged/durabletask/azuremanaged/preview/on_demand_sandbox/worker.py Implements sandbox worker that registers/heartbeats and restricts activity dispatch via filters.
durabletask-azuremanaged/durabletask/azuremanaged/preview/on_demand_sandbox/__init__.py Defines public preview exports (__all__).
durabletask-azuremanaged/durabletask/azuremanaged/preview/__init__.py Adds preview package marker.
durabletask-azuremanaged/durabletask/azuremanaged/internal/ON_DEMAND_SANDBOX_PROTO_SOURCE_COMMIT_HASH Pins protobuf source commit for sandbox service proto generation.
durabletask-azuremanaged/durabletask/azuremanaged/internal/on_demand_sandbox_activities_service_pb2.py Generated protobuf message definitions for sandbox activities service.
durabletask-azuremanaged/durabletask/azuremanaged/internal/on_demand_sandbox_activities_service_pb2.pyi Generated typing stubs for protobuf messages.
durabletask-azuremanaged/durabletask/azuremanaged/internal/on_demand_sandbox_activities_service_pb2_grpc.py Generated gRPC stub/servicer definitions and RPC paths.
tests/durabletask-azuremanaged/test_on_demand_sandbox_extension.py New tests for declaration building, environment parsing, and worker behavior.
durabletask-azuremanaged/pyproject.toml Bumps package version to 1.6.0.
durabletask-azuremanaged/CHANGELOG.md Documents new preview on-demand sandbox APIs under Unreleased.
Makefile Extends proto generation to fetch/generate sandbox service stubs.
examples/on_demand_sandbox/README.md Documents how to build/run the on-demand sandbox sample.
examples/on_demand_sandbox/main_app.py Declarer sample that registers declarations and runs an orchestration calling the remote activity.
examples/on_demand_sandbox/remote_worker.py Remote worker entrypoint that runs inside the sandbox container.
examples/on_demand_sandbox/Containerfile Builds a container image for the remote sandbox worker sample.
examples/on_demand_sandbox/activity_names.py Shared activity name constant between declarer and remote worker.
Files not reviewed (1)
  • durabletask-azuremanaged/durabletask/azuremanaged/internal/on_demand_sandbox_activities_service_pb2.py: Language not supported
Comments suppressed due to low confidence (2)

durabletask/worker.py:990

  • _durabletask_on_activity_execution_started() is called outside of any exception handling. If an override raises, _execute_activity will exit before sending an ActivityResponse back to the sidecar, potentially leaving the work item uncompleted. Consider treating hook failures as non-fatal and logging them instead.
                if stream_outcome is _WorkItemStreamOutcome.GRACEFUL_CLOSE_AFTER_MESSAGE:
                    self._logger.info("Work item stream closed after receiving messages")
                    invalidate_connection(close_channel=True)

durabletask/worker.py:1040

  • Exceptions from _durabletask_on_activity_execution_completed() can currently propagate out of the finally block and potentially crash the worker thread after the ActivityResponse is already sent. Hook errors should be logged and swallowed to avoid destabilizing the worker.
                    )
                conn_retry_count += 1

Comment thread tests/durabletask-azuremanaged/test_on_demand_sandbox_extension.py Outdated
Comment thread examples/on_demand_sandbox/README.md Outdated
Comment thread examples/on_demand_sandbox/README.md Outdated
Comment thread examples/on_demand_sandbox/README.md Outdated
YunchuWang and others added 2 commits June 9, 2026 15:32
Record activity names as registered and leave normalization/deduplication to the existing registration boundary.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep worker registration on an internal gRPC client while exposing only declaration management APIs through the public on-demand sandbox client.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
YunchuWang and others added 5 commits June 9, 2026 15:46
Move declaration helpers and gRPC transport out of the public management client module, and rename the worker transport to OnDemandSandboxActivitiesGrpcTransport.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make declaration helpers private, fail empty worker profiles, keep still-running registration thread handles, add worker typing, and add Bash examples to the sample README.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reuse the internal declaration normalization helper from the on-demand sandbox management client instead of keeping a duplicate copy in client.py.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep shared on-demand sandbox normalization logic out of the public client and declaration modules by centralizing it in a private normalization module.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use the requested private helper module name for shared on-demand sandbox helper functions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread durabletask/worker.py Outdated
@YunchuWang

Copy link
Copy Markdown
Member Author

The current sample uses examples/sandboxes and the API surface is named sandboxes to match the merged Durable Task Scheduler sandbox proto. I kept this as a single sandbox activity sample rather than adding/combining another serverless sample.

Comment thread durabletask-azuremanaged/durabletask/azuremanaged/internal/sandbox_service_pb2.py Dismissed

@berndverst berndverst left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Azure Managed sandbox preview APIs

Overall this is a solid, well-tested preview that faithfully mirrors the .NET design (env-var contract, defaults, profile-builder validation, activity-only worker model). I cross-checked the wire contract and resource validation against both the canonical sandbox proto and the actual server/Backend behavior the worker registers with.

Verified parity (looks good):

  • Sandbox proto is pinned to the same commit the server vendors, so the wire contract is aligned.
  • Client-side CPU/memory validation matches the server constant-for-constant (250m–16000m in 250m steps; 2048 MiB/core; same accepted formats). Nice that this is validated at declare time so users get the same error locally that the server would return.
  • Env var names/defaults match .NET (heartbeat 2s, retry 1s→30s full-jitter, max concurrent activities 100, CPU tiers, memory-per-core).

Priority items (inline comments below):

  1. DTS_SANDBOX_ID is not validated although the server requires a non-empty value — see worker.py L69 / worker_messages.py L38.
  2. The registration loop retries permanent misconfigurations forever and silently — see worker.py L143.
  3. The DTS_SANDBOX_PROVIDER strictness is inverted relative to what the server actually enforces — see worker.py L219.

Together, (1)+(2) mean a worker started without DTS_SANDBOX_ID (or with a profile/activity/concurrency mismatch) enters an invisible infinite reconnect loop instead of failing fast.

Minor / Pythonic notes:

  • Two activity record types in the public surface (nested Activity vs the exported SandboxActivity) — inline on worker_profiles.py L38.
  • SandboxActivitiesClient has close() but no context manager, and the sample never closes it — inline on client.py L41.
  • enable_sandbox_activities() raises when no profiles are registered; .NET silently no-ops. Worth deciding if the raise is intended for a preview.
  • resolve_activities casefolds the activity name for its dedup key but not the version; note the server matches the declared vs. registered activity sets case-sensitively, so client-side case-folding of names could (in rare mixed-case cases) collapse entries the server treats as distinct.
  • Tests reset the global _worker_profiles registry via per-test try/finally pop(...). An autouse fixture that snapshots/restores the dict would be more robust against a test forgetting to clean up.
  • The root CHANGELOG entry for the activity-execution hooks describes an internal underscore-prefixed mechanism; per the repo's changelog guidance, consider reframing by user impact or dropping it as internal-only.

Two server couplings worth documenting (not bugs, but they only surface today as server rejections retried forever):

  • The activity names the worker registers (Python function names) must exactly, case-sensitively match the names declared in the profile (incl. version).
  • The worker's resolved max activity count must equal the declared profile's max concurrency. This holds at runtime because the env is server-injected, but there's no client-side guard.

Comment thread durabletask-azuremanaged/durabletask/azuremanaged/preview/sandboxes/worker.py Outdated
Comment thread durabletask-azuremanaged/durabletask/azuremanaged/preview/sandboxes/worker.py Outdated
@YunchuWang YunchuWang requested a review from berndverst June 18, 2026 01:02
Comment thread CHANGELOG.md Outdated
Comment thread durabletask-azuremanaged/CHANGELOG.md Outdated

@berndverst berndverst left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the CHANGELOG :)

@YunchuWang YunchuWang requested a review from berndverst June 18, 2026 03:02
@berndverst berndverst merged commit dce23a4 into main Jun 18, 2026
18 checks passed
@berndverst berndverst deleted the wangbill/serverless-op1 branch June 18, 2026 03:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants