feat(code_executors): add opt-in sandbox flag to LocalCommandLineCodeExecutor (#7462) by xr843 · Pull Request #7611 · microsoft/autogen

xr843 · 2026-04-20T09:44:06Z

Summary

Draft PR addressing #7462 — adds an opt-in sandbox parameter to LocalCommandLineCodeExecutor so users who cannot run Docker still get best-effort in-process hardening. This supersedes #7467 with broader scope (env-scrub + rlimits + Windows degrade path + config round-trip).

Opening as draft because I'd like maintainer input on the Windows strategy (see below) before investing further.

Three modes for `sandbox: Optional[bool]`

Value	Behavior
`None`	Default. Emits `DeprecationWarning` + `logger.warning` that a future release will make the parameter required. Execution unchanged — fully backward compatible. Replaces the existing `UserWarning`.
`False`	Explicit acknowledgement of unsandboxed execution. No warning.
`True`	POSIX: `preexec_fn` applying `RLIMIT_CPU` (= `timeout + 5`s) and `RLIMIT_AS` (default 512 MiB, configurable via `sandbox_memory_bytes`) + credential env-var scrub on both the code-execution subprocess and the pip-install subprocess in `_setup_functions`. Windows: logs a warning that rlimits are unavailable; env-scrub still applies.

Env-var scrub patterns (case-insensitive)

*_API_KEY, *_TOKEN, *_SECRET, *PASSWORD*, AWS_*, AZURE_*, GCP_*, GOOGLE_*, OPENAI_*, ANTHROPIC_*, HF_*, HUGGINGFACE_*, GITHUB_TOKEN, GH_TOKEN, NPM_TOKEN, PYPI_*.

What is intentionally out of scope for this draft

Windows resource limits (Job Objects / SetInformationJobObject). The Windows path here is degrade-only. I'd like maintainer direction on whether you want a proper Windows implementation in this PR, a follow-up, or left documented as-is.
README changes. Docstring-only for now to keep scope tight.
Network isolation, filesystem chroot, seccomp — all correctly belong in DockerCommandLineCodeExecutor; the docstring says so explicitly.

Tests

New tests/code_executors/test_local_sandbox.py:

sandbox=None emits DeprecationWarning (not UserWarning)
sandbox=False emits no warning
sandbox=True emits no DeprecationWarning
Config round-trip preserves sandbox + sandbox_memory_bytes for all three values
Config model defaults
POSIX behavioral: FAKE_API_KEY in env is scrubbed, harmless var survives
POSIX behavioral: with sandbox=False, FAKE_API_KEY reaches the subprocess
sandbox=True constructs cleanly (Windows smoke test)

Locally: uv run pytest tests/code_executors/test_local_sandbox.py — 8 passed. The existing test_commandline_code_executor.py — 13 passed, 1 skipped (Windows-only). One unrelated pre-existing failure in test_user_defined_functions.py::test_can_load_function_with_reqs due to pip missing in my uv-created venv, not touched by this PR.

Decisions worth a second look

Default RLIMIT_AS = 512 MiB. Large enough for most small Python/shell, tight enough to catch runaway allocations. Configurable via sandbox_memory_bytes.
RLIMIT_CPU = timeout + 5. 5s grace above the async wait-for timeout so we don't race it and mask the existing timeout behavior.
RLIMIT_AS refused on some platforms (e.g. macOS) — swallowed silently, CPU cap + env-scrub still apply. Let me know if you'd rather hard-fail there.
preexec_fn runs in the child between fork and exec. Standard pattern for subprocess rlimit enforcement; asyncio.create_subprocess_exec passes it through.

Links: issue #7462 · supersedes #7467

cc @msaleme @hanselhansel @polterguy from the issue collaboration trail.

🤖 Generated with Claude Code

xr843 · 2026-05-06T09:34:59Z

Marking ready for review. cc @ekzhu — this is the opt-in sandbox flag implementation discussed in #7462 (alternative direction to #7467). Happy to iterate if you'd like a different surface for the flag, or if it should be wired through differently in the agent factory.

@hanselhansel

@hanselhansel Reframe the sandbox=True posture from "feature with fallback" to "contract with explicit failure" per microsoft#7611 review feedback. * Add PlatformExecutionScopeError(RuntimeError) — raised when sandbox=True is requested but the platform cannot honor the in-process isolation guarantees (no preexec_fn / no resource module on Windows). * Replace the silent UserWarning + env-scrub-only Windows fallback with an explicit raise at __init__ that names which guarantees are unavailable (RLIMIT_AS / RLIMIT_CPU / RLIMIT_NOFILE / RLIMIT_NPROC + preexec_fn). * Defense-in-depth: also raise at the subprocess-exec site so a future refactor cannot silently spawn without the rlimit contract. * Rewrite the sandbox docstring as a per-platform contract: every POSIX guarantee is enumerated (env-scrub patterns + each rlimit + cap value), and the Windows path documents "raises PlatformExecutionScopeError" — not "falls back to". * Add test coverage: Windows raise path (sys.platform mocked) plus a RuntimeError-subclass assertion so existing handlers keep working. Procurement / threat-model reviewers can now audit the failure surface from the docstring and exception message alone, without inferring intent from "falls back to" prose.

xr843 · 2026-05-09T04:35:13Z

@hanselhansel addressed your contract framing in 4df5f7a: replaced the silent platform fallback (Windows previously emitted a UserWarning and degraded to env-scrub-only) with an explicit PlatformExecutionScopeError raise at __init__, and rewrote the sandbox: docstring per-platform as a contract rather than a feature note.

What changed:

New exception PlatformExecutionScopeError(RuntimeError) lives alongside the executor and is exported from autogen_ext.code_executors.local.
POSIX docstring section enumerates every guarantee that's part of the contract: env-scrub patterns + RLIMIT_CPU / RLIMIT_AS (2 GiB) / RLIMIT_NOFILE (256) / RLIMIT_NPROC (64).
Windows docstring section reads: "Raises PlatformExecutionScopeError — preexec_fn and the resource-module rlimits used to enforce the POSIX hardening contract are not available on Windows." No "falls back to".
The error message names which guarantees are unavailable (RLIMIT_AS / RLIMIT_CPU / RLIMIT_NOFILE / RLIMIT_NPROC + preexec_fn) and points callers at DockerCommandLineCodeExecutor.
Defense-in-depth: also raises at the subprocess-exec site so a future refactor can't regress the construction-time guard.

Test coverage added:

test_sandbox_true_on_windows_raises_platform_scope_error (uses sys.platform mock).
test_platform_execution_scope_error_is_runtime_error (subclass invariant — keeps existing except RuntimeError: handlers working).

Local run: 19 passed, 1 skipped on test_commandline_code_executor.py. Ready for another look.

msaleme · 2026-05-09T20:49:25Z

Replacing the silent platform fallback with explicit behavior is the right shape — the original Issue #7462 concern was that "sandboxed" was a property the caller couldn't reliably observe. Explicit-over-silent makes the security boundary inspectable, which is what an enterprise procurement review is going to ask for. Thanks for picking this up.

…odeExecutor Refs microsoft#7462. Supersedes closed PR microsoft#7467. The legacy `UserWarning` at construction was easily suppressed by production warning filters and `python -W ignore`, leaving unsandboxed execution of LLM-generated code as the silent default. This change introduces an explicit three-state sandbox posture parameter: - sandbox=None (default, legacy behavior for backward compatibility): DeprecationWarning + logger.warning() surface the risk in both Python warning channels and structured logging pipelines. A future release will make this parameter required. - sandbox=False Caller explicitly acknowledges unsandboxed execution; no warning is emitted. - sandbox=True Best-effort in-process hardening: * Environment entries whose name contains credential patterns (TOKEN, SECRET, API_KEY, PASSWORD, PRIVATE_KEY, CREDENTIAL, SESSION, COOKIE, AUTH) are stripped from the child process. * On POSIX, per-child rlimits (RLIMIT_CPU, RLIMIT_AS, RLIMIT_NOFILE, RLIMIT_NPROC) are applied via preexec_fn so runaway memory/fork-bomb payloads are capped. * On Windows, env scrub applies but preexec is unavailable; a UserWarning directs callers to the Docker executor for strong isolation. Docstring and LocalCommandLineCodeExecutorConfig are updated to round-trip the posture through serialization so declarative deployments cannot silently downgrade. This is NOT a substitute for DockerCommandLineCodeExecutor — adversarial payloads can still read files, make outbound connections, and write to work_dir. The docstring states this explicitly. Tests cover: default DeprecationWarning, explicit opt-out silence, env scrubbing on POSIX, and config round-trip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@hanselhansel

@hanselhansel Reframe the sandbox=True posture from "feature with fallback" to "contract with explicit failure" per microsoft#7611 review feedback. * Add PlatformExecutionScopeError(RuntimeError) — raised when sandbox=True is requested but the platform cannot honor the in-process isolation guarantees (no preexec_fn / no resource module on Windows). * Replace the silent UserWarning + env-scrub-only Windows fallback with an explicit raise at __init__ that names which guarantees are unavailable (RLIMIT_AS / RLIMIT_CPU / RLIMIT_NOFILE / RLIMIT_NPROC + preexec_fn). * Defense-in-depth: also raise at the subprocess-exec site so a future refactor cannot silently spawn without the rlimit contract. * Rewrite the sandbox docstring as a per-platform contract: every POSIX guarantee is enumerated (env-scrub patterns + each rlimit + cap value), and the Windows path documents "raises PlatformExecutionScopeError" — not "falls back to". * Add test coverage: Windows raise path (sys.platform mocked) plus a RuntimeError-subclass assertion so existing handlers keep working. Procurement / threat-model reviewers can now audit the failure surface from the docstring and exception message alone, without inferring intent from "falls back to" prose.

xr843 · 2026-05-13T01:27:37Z

Thanks @msaleme — "explicit-over-silent makes the security boundary inspectable" is exactly the framing I want this PR to land on. The PlatformExecutionScopeError raise-at-construction behavior in 5a0b2c8 should match that contract; happy to tighten the docstring further if you want it to name the failure mode in the class header rather than just __init__.

@ekzhu — flagging for your eyes since this extends the security posture you introduced in #7035 (Docker-default + warnings). The PR keeps backward compatibility (default sandbox=None preserves current behavior), and only opts callers into hardening when they explicitly request it. Would appreciate a maintainer review pass when you have a window.

xr843 · 2026-05-19T00:12:05Z

@msaleme thanks again for the review feedback — glad the "explicit-over-silent makes the security boundary inspectable" framing landed. I believe the PR is now in its final shape: the opt-in sandbox flag with PlatformExecutionScopeError raised at construction addresses the original #7462 concern that "sandboxed" was an unobservable property. CI is green. Would you be able to give a formal review approval, or let me know if anything else is needed before this can land? Happy to make adjustments.

xr843 · 2026-05-19T23:33:57Z

Pushed cd58017da reverting the sandbox notice from DeprecationWarning back to UserWarning, per @Ilya0527's PEP 565 analysis on #7462. Library-level DeprecationWarning is silenced by stock filters (default::DeprecationWarning:__main__ + ignore::DeprecationWarning), so the factory-module instantiation shape that #7462 targets never saw the warning. UserWarning has no ignore filter; logger.warning(...) still runs in parallel.

The "will become required" intent stays in the message — that's the actionable channel; the category is the visibility channel. Tests updated and passing locally (5/5 sandbox tests).

The orthogonal follow-up — flipping the default so bare LocalCommandLineCodeExecutor() requires an explicit sandbox= and raises otherwise — I'll file as a separate issue once this lands, since it's a breaking change that wants its own deprecation cycle.

xr843 · 2026-05-21T06:32:50Z

Pushed ce4c538 to make the branch match the discussion here:

the default unsandboxed path now emits UserWarning again, not DeprecationWarning, so library-level construction remains visible under Python's stock filters;
logger.warning(...) remains a parallel visibility lane;
added a regression test for warnings.simplefilter("ignore") to pin that the log record still emits even when Python warnings are globally suppressed.

On the follow-on question: I am treating this PR as strong best-effort visibility without adding an unconditional stderr-direct side effect. If an application deliberately suppresses both Python warnings and this logger, the non-bypassable enforcement step should be the follow-up breaking change where bare LocalCommandLineCodeExecutor() requires an explicit sandbox=... posture and raises otherwise. That keeps this PR backward-compatible while still making the current default risk visible in normal library usage.

xr843 · 2026-06-27T02:14:41Z

Closing this one myself. autogen has been in maintenance mode since early April (the most recent merge, #7521 on 2026-04-06, was the maintenance-mode README banner itself), so a new executor feature like this opt-in sandbox flag is out of scope for the project's current direction.

The design discussion in #7462 and in this thread may still be a useful reference if the silent-platform-fallback concern is ever revisited. Thanks @msaleme for the thoughtful review feedback earlier.

This was referenced Apr 21, 2026

[Security] LocalCommandLineCodeExecutor executes LLM-generated code without sandboxing #7462

Open

feat(local-executor): add opt-in sandbox posture to LocalCommandLineCodeExecutor #7598

Closed

xr843 marked this pull request as ready for review May 6, 2026 09:34

xr843 and others added 2 commits May 10, 2026 10:43

xr843 force-pushed the fix/lce-sandbox-flag branch from 5b4d26e to 5a0b2c8 Compare May 10, 2026 02:43

xr843 mentioned this pull request May 12, 2026

[Security] Critical: Unauthenticated RCE via WebSocket team_config (CVSS 9.8) #7662

Open

fix(local-executor): restore UserWarning sandbox notice

ce4c538

xr843 closed this Jun 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(code_executors): add opt-in sandbox flag to LocalCommandLineCodeExecutor (#7462)#7611

feat(code_executors): add opt-in sandbox flag to LocalCommandLineCodeExecutor (#7462)#7611
xr843 wants to merge 3 commits into
microsoft:mainfrom
xr843:fix/lce-sandbox-flag

xr843 commented Apr 20, 2026

Uh oh!

xr843 commented May 6, 2026

Uh oh!

xr843 commented May 9, 2026

Uh oh!

msaleme commented May 9, 2026

Uh oh!

xr843 commented May 13, 2026

Uh oh!

xr843 commented May 19, 2026

Uh oh!

xr843 commented May 19, 2026

Uh oh!

xr843 commented May 21, 2026

Uh oh!

xr843 commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

xr843 commented Apr 20, 2026

Summary

Three modes for sandbox: Optional[bool]

Env-var scrub patterns (case-insensitive)

What is intentionally out of scope for this draft

Tests

Decisions worth a second look

Uh oh!

xr843 commented May 6, 2026

Uh oh!

xr843 commented May 9, 2026

Uh oh!

msaleme commented May 9, 2026

Uh oh!

xr843 commented May 13, 2026

Uh oh!

xr843 commented May 19, 2026

Uh oh!

xr843 commented May 19, 2026

Uh oh!

xr843 commented May 21, 2026

Uh oh!

xr843 commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Three modes for `sandbox: Optional[bool]`