Skip to content

DurableAgent known limitations: sandbox CPU billing, abort signal, double billing, verify loop context #1762

Description

@shtefcs

Context

We are building a production multi-agent AI orchestration platform using DurableAgent from @workflow/ai. The migration from a monolithic "use step" approach to DurableAgent (per-LLM-call + per-tool step isolation) is working but has several architectural limitations that affect correctness and billing accuracy.

Related: #1737 (development mode performance), #1315 (linear step overhead growth), #1160 (queue delay)

Environment

  • workflow: 4.2.1
  • @workflow/ai: 4.1.1
  • @workflow/core: 4.2.1
  • Next.js: 16.1.6
  • AI SDK: 6.x

Limitation 1: No AbortSignal in V8 Sandbox (Credit Circuit Breaker)

The "use workflow" function runs in a V8 sandbox where AbortController does not exist. When a user's credit balance crosses the overdraft limit during onStepFinish, we cannot abort the DurableAgent mid-execution. We use a flag checked in prepareStep instead, which means the agent completes its current LLM call before stopping, potentially overspending by one response.

The original non-durable orchestrator uses AbortController.abort() to kill streamText immediately. DurableAgent accepts abortSignal in its stream() options, but creating an AbortController at workflow level throws ReferenceError.

Question: Is there a way to trigger DurableAgent's abort from within onStepFinish or prepareStep? Or could the V8 sandbox expose AbortController?

Limitation 2: Double Billing Risk on Lambda Crash + Retry

onStepFinish is a side-effect callback, not a workflow step. If a Lambda crashes after onStepFinish fires (credits deducted) but before the step result is persisted, the SDK retries the step. The retried step's onStepFinish fires again, deducting credits a second time.

The creditsAlreadyDeducted counter resets per-agent (necessary for chained agents), so it cannot detect the duplicate.

Question: Does DurableAgent suppress onStepFinish for cached/replayed steps? Or does it fire every time, even on replay? If the SDK could pass a "isReplay" flag to onStepFinish, we could skip billing on replays.

Limitation 3: Verify Loop Context (Coding Agent Fix Cycles)

After the coding agent completes, a verify loop checks the sandbox for errors and runs fix cycles using streamText. The fix cycle needs tool access (sandboxBash, sandboxWriteFile, etc.) to fix errors.

The problem: runVerifyLoopStep is a "use step" function that calls streamText with tools from buildDurableTools(config.toolSchemas). Those tools are themselves "use step" functions. This creates nested steps (a step calling sub-steps), which is not officially supported.

Additionally, the fix cycle uses config.messages (pre-agent messages) instead of result.messages (post-agent with all tool call results), so the fix agent cannot see what was already built.

Question: Is there a supported pattern for running a secondary agent loop after the primary DurableAgent completes? Could the verify loop be restructured as a separate workflow-level agent call instead of a nested step?

Limitation 4: Sandbox CPU Billing (Upstash Box)

Each sandbox tool wrapper reconnects to the Upstash Box via reconnectUpstashBox(boxId). The reconnected adapter creates a fresh activeCpuMs counter starting at 0. The original orchestrator maintains a single adapter across the entire session, accumulating CPU time accurately.

The Upstash Box client SDK does not expose server-side CPU metrics. The only way to track CPU is the adapter's local counter, which resets on each reconnection.

This means sandbox compute goes unbilled in durable mode. In production, this is a revenue leak.

Question: This is primarily an Upstash Box SDK limitation, but is there a way to maintain persistent state (like a CPU counter) across workflow steps without serializing the adapter itself?

Summary

Limitation Impact Workaround Available?
No AbortSignal in V8 sandbox One extra LLM call before credit circuit breaker stops agent Flag + prepareStep toolChoice:"none"
Double billing on retry Potential 2x charge for crashed steps None currently
Nested steps in verify loop Not officially supported, fix agent lacks context Works in practice but fragile
Sandbox CPU billing Compute goes unbilled None without server-side metrics

We are happy to contribute fixes or test proposed solutions. These are the last blockers before production deployment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions