Add Azure Managed on-demand sandbox preview APIs#151
Conversation
- Introduced new protobuf definitions for serverless activities in `serverless_activities_service_pb2.py`. - Added type hints and interface definitions in `serverless_activities_service_pb2.pyi`. - Implemented gRPC service methods in `serverless_activities_service_pb2_grpc.py`. - Created a Dockerfile for building a remote worker image for serverless activities. - Developed a declarer application (`main_app.py`) to register serverless activities and start orchestrations. - Implemented a remote worker (`remote_worker.py`) to execute activities in a serverless environment. - Added a README for the serverless example, detailing setup and usage instructions. - Created unit tests for serverless extension functionalities in `test_serverless_extension.py`.
…rker and update related imports
Rename the Serverless activity RPC surface and messages to OnDemandSandbox equivalents across the proto, generated pb2/pb2_grpc stubs, and client usage. Add CPU and memory normalization/validation helpers to client (kubernetes-style quantities parsed using Decimal) and wire those into activity declaration construction. Remove DefaultAzureCredential usage in the worker (token credential set to None). Update examples, tests, and markdown tooling (enable front-matter in .pymarkdown.json), and add a new .github agent doc. Generated protobuf/grpc files and version/warning text were updated to reflect the renamed package and expected grpc tool/runtime versions.
Stop relying on an "accepted" response flag for serverless worker registration. The proto removed the accepted field from OnDemandSandboxActivityWorkerSessionResult and the generated _pb2.py/_pb2.pyi were updated accordingly. The worker now simply invokes connect_serverless_activity_worker and relies on gRPC status/transport behavior instead of checking a boolean, and the changelog documents this change. (Regenerated protobuf offsets in pb2 file as part of the proto change.)
Move the Azure Managed on-demand sandbox SDK implementation under the preview ondemand_sandbox package while keeping the legacy serverless import path as a compatibility shim. Update the sample and tests to use the new canonical API names. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the legacy serverless extension package and place the Azure Managed on-demand sandbox APIs under durabletask.azuremanaged.preview.on_demand_sandbox. Also remove the unrelated elementary PR teacher agent file from the branch. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Restore unrelated repository configuration and example index changes, and remove the unused azuremanaged extensions package stub so the PR only carries on-demand sandbox preview changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove unrelated pymarkdown, Makefile, and example index changes from the on-demand sandbox PR. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add lifecycle hooks around activity execution and adapt the on-demand sandbox worker to use them. TaskHubGrpcWorker now defines _durabletask_on_activity_execution_started/completed and invokes them around activity execution; _execute_activity was refactored to adjust payload (de/externalization) and error handling. OnDemandSandboxWorker was updated to use new hook semantics, track active activity counts, expose add_activity wrapper for name resolution, use the internal shared logger, and consolidate on-demand-specific host/secure-channel attributes. Tests updated accordingly and a new test file verifies the activity hook behavior and active-activity counting.
Remove the checked-in on-demand sandbox proto source and extend proto generation to fetch it from durabletask-protobuf using the recorded source hash. Simplify the preview changelog entry and use Durable Task Scheduler in new public prose instead of DTS shorthand. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use a single full container image reference for on-demand sandbox declarations and remove split registry/repository/tag/digest configuration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new durabletask.azuremanaged.preview.on_demand_sandbox surface for declaring and running Durable Task Scheduler on-demand sandbox activities, including generated gRPC stubs and an end-to-end sample. The PR also introduces internal worker activity execution hooks in the core SDK to support sandbox worker heartbeating.
Changes:
- Added preview on-demand sandbox client/worker APIs (worker profiles, activity declarations, worker registration/heartbeats).
- Added core
TaskHubGrpcWorkeractivity execution hook points and tests validating hook ordering. - Added on-demand sandbox sample (declarer app + remote worker container) and updated azuremanaged changelog/version.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
durabletask/worker.py |
Adds activity execution hook points invoked around _execute_activity. |
tests/durabletask/test_worker_activity_hooks.py |
New tests asserting hooks run on success and failure. |
durabletask-azuremanaged/durabletask/azuremanaged/preview/on_demand_sandbox/client.py |
Implements activity declaration builders, worker profile decorator, and management client. |
durabletask-azuremanaged/durabletask/azuremanaged/preview/on_demand_sandbox/worker.py |
Implements sandbox worker that registers/heartbeats and restricts activity dispatch via filters. |
durabletask-azuremanaged/durabletask/azuremanaged/preview/on_demand_sandbox/__init__.py |
Defines public preview exports (__all__). |
durabletask-azuremanaged/durabletask/azuremanaged/preview/__init__.py |
Adds preview package marker. |
durabletask-azuremanaged/durabletask/azuremanaged/internal/ON_DEMAND_SANDBOX_PROTO_SOURCE_COMMIT_HASH |
Pins protobuf source commit for sandbox service proto generation. |
durabletask-azuremanaged/durabletask/azuremanaged/internal/on_demand_sandbox_activities_service_pb2.py |
Generated protobuf message definitions for sandbox activities service. |
durabletask-azuremanaged/durabletask/azuremanaged/internal/on_demand_sandbox_activities_service_pb2.pyi |
Generated typing stubs for protobuf messages. |
durabletask-azuremanaged/durabletask/azuremanaged/internal/on_demand_sandbox_activities_service_pb2_grpc.py |
Generated gRPC stub/servicer definitions and RPC paths. |
tests/durabletask-azuremanaged/test_on_demand_sandbox_extension.py |
New tests for declaration building, environment parsing, and worker behavior. |
durabletask-azuremanaged/pyproject.toml |
Bumps package version to 1.6.0. |
durabletask-azuremanaged/CHANGELOG.md |
Documents new preview on-demand sandbox APIs under Unreleased. |
Makefile |
Extends proto generation to fetch/generate sandbox service stubs. |
examples/on_demand_sandbox/README.md |
Documents how to build/run the on-demand sandbox sample. |
examples/on_demand_sandbox/main_app.py |
Declarer sample that registers declarations and runs an orchestration calling the remote activity. |
examples/on_demand_sandbox/remote_worker.py |
Remote worker entrypoint that runs inside the sandbox container. |
examples/on_demand_sandbox/Containerfile |
Builds a container image for the remote sandbox worker sample. |
examples/on_demand_sandbox/activity_names.py |
Shared activity name constant between declarer and remote worker. |
Files not reviewed (1)
- durabletask-azuremanaged/durabletask/azuremanaged/internal/on_demand_sandbox_activities_service_pb2.py: Language not supported
Comments suppressed due to low confidence (2)
durabletask/worker.py:990
- _durabletask_on_activity_execution_started() is called outside of any exception handling. If an override raises, _execute_activity will exit before sending an ActivityResponse back to the sidecar, potentially leaving the work item uncompleted. Consider treating hook failures as non-fatal and logging them instead.
if stream_outcome is _WorkItemStreamOutcome.GRACEFUL_CLOSE_AFTER_MESSAGE:
self._logger.info("Work item stream closed after receiving messages")
invalidate_connection(close_channel=True)
durabletask/worker.py:1040
- Exceptions from _durabletask_on_activity_execution_completed() can currently propagate out of the finally block and potentially crash the worker thread after the ActivityResponse is already sent. Hook errors should be logged and swallowed to avoid destabilizing the worker.
)
conn_retry_count += 1
Record activity names as registered and leave normalization/deduplication to the existing registration boundary. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep worker registration on an internal gRPC client while exposing only declaration management APIs through the public on-demand sandbox client. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move declaration helpers and gRPC transport out of the public management client module, and rename the worker transport to OnDemandSandboxActivitiesGrpcTransport. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make declaration helpers private, fail empty worker profiles, keep still-running registration thread handles, add worker typing, and add Bash examples to the sample README. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reuse the internal declaration normalization helper from the on-demand sandbox management client instead of keeping a duplicate copy in client.py. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Keep shared on-demand sandbox normalization logic out of the public client and declaration modules by centralizing it in a private normalization module. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use the requested private helper module name for shared on-demand sandbox helper functions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
The current sample uses |
berndverst
left a comment
There was a problem hiding this comment.
Review: Azure Managed sandbox preview APIs
Overall this is a solid, well-tested preview that faithfully mirrors the .NET design (env-var contract, defaults, profile-builder validation, activity-only worker model). I cross-checked the wire contract and resource validation against both the canonical sandbox proto and the actual server/Backend behavior the worker registers with.
Verified parity (looks good):
- Sandbox proto is pinned to the same commit the server vendors, so the wire contract is aligned.
- Client-side CPU/memory validation matches the server constant-for-constant (250m–16000m in 250m steps; 2048 MiB/core; same accepted formats). Nice that this is validated at declare time so users get the same error locally that the server would return.
- Env var names/defaults match .NET (heartbeat 2s, retry 1s→30s full-jitter, max concurrent activities 100, CPU tiers, memory-per-core).
Priority items (inline comments below):
DTS_SANDBOX_IDis not validated although the server requires a non-empty value — seeworker.pyL69 /worker_messages.pyL38.- The registration loop retries permanent misconfigurations forever and silently — see
worker.pyL143. - The
DTS_SANDBOX_PROVIDERstrictness is inverted relative to what the server actually enforces — seeworker.pyL219.
Together, (1)+(2) mean a worker started without DTS_SANDBOX_ID (or with a profile/activity/concurrency mismatch) enters an invisible infinite reconnect loop instead of failing fast.
Minor / Pythonic notes:
- Two activity record types in the public surface (nested
Activityvs the exportedSandboxActivity) — inline onworker_profiles.pyL38. SandboxActivitiesClienthasclose()but no context manager, and the sample never closes it — inline onclient.pyL41.enable_sandbox_activities()raises when no profiles are registered; .NET silently no-ops. Worth deciding if the raise is intended for a preview.resolve_activitiescasefolds the activity name for its dedup key but not the version; note the server matches the declared vs. registered activity sets case-sensitively, so client-side case-folding of names could (in rare mixed-case cases) collapse entries the server treats as distinct.- Tests reset the global
_worker_profilesregistry via per-testtry/finallypop(...). An autouse fixture that snapshots/restores the dict would be more robust against a test forgetting to clean up. - The root CHANGELOG entry for the activity-execution hooks describes an internal underscore-prefixed mechanism; per the repo's changelog guidance, consider reframing by user impact or dropping it as internal-only.
Two server couplings worth documenting (not bugs, but they only surface today as server rejections retried forever):
- The activity names the worker registers (Python function names) must exactly, case-sensitively match the names declared in the profile (incl. version).
- The worker's resolved max activity count must equal the declared profile's max concurrency. This holds at runtime because the env is server-injected, but there's no client-side guard.
berndverst
left a comment
There was a problem hiding this comment.
Please update the CHANGELOG :)
The scenario
A Python app using Durable Task Scheduler should be able to keep its normal orchestration worker local, but run selected activities in a DTS-started sandbox worker container.
The intended flow is the same shape as the .NET preview:
In the sample, the main Python app owns the orchestration and local work; the separate sandbox worker image owns the remote activity implementation.
What's missing
The Python preview surface needed to line up with the renamed protobuf and .NET SDK surface.
The earlier naming still exposed
on_demand_sandboxconcepts andsubstrateterminology. That made the Python API read differently from the shared contract and from the .NET packages, even though all three are describing the same DTS sandbox worker model.Reviewers also needed one clear place to understand the Python shape: which package an app imports, which API declares profiles, which worker type runs in the sandbox image, and how the generated protobuf names map to the public API.
The change
The preview package is now
durabletask.azuremanaged.preview.sandboxes.A good review path is:
durabletask-azuremanaged/durabletask/azuremanaged/internal/sandbox_service_pb2*. These files are regenerated fromsandbox_service.protoand exposeSandboxActivities,SandboxProviderKind, andsandbox_provider.durabletask-azuremanaged/durabletask/azuremanaged/preview/sandboxes/. This is the public preview API:SandboxActivitiesClient,SandboxWorker,SandboxWorkerProfileOptions, andsandbox_worker_profile.examples/sandboxes/last. It shows the intended app split between the main orchestration process and the sandbox worker process, using the renamedDTS_SANDBOX_*settings.The old
preview.on_demand_sandboxpackage and old example path are replaced by thesandboxesnaming so the Python PR matches the protobuf and .NET PRs.