Skip to content

feat(sandbox): add /delete endpoint + auto-delete background scan#1038

Draft
zhangjaycee wants to merge 10 commits into
alibaba:masterfrom
zhangjaycee:feature/delete
Draft

feat(sandbox): add /delete endpoint + auto-delete background scan#1038
zhangjaycee wants to merge 10 commits into
alibaba:masterfrom
zhangjaycee:feature/delete

Conversation

@zhangjaycee
Copy link
Copy Markdown
Collaborator

close #1032

When `--storage-opt size=` is in effect, docker allocates an XFS project id
on the container's overlay2 upper dir and binds a bhard limit to it.
Previously the sandbox log dir got an independently-allocated prjid + its
own bhard (disk_limit_log), so one sandbox had two separate quotas —
log writes could exhaust the log quota while rootfs sat idle (or vice
versa) even though there was no operational reason to split them.

Reuse docker's prjid for the log dir instead. Both paths now share a single
bhard (the rootfs limit) — the user-facing knob becomes "this sandbox's
total disk write budget", which matches how operators reason about it.

Implementation (rock/deployments/docker.py):
- _get_docker_rootfs_prjid_and_mountpoint (new): reads the prjid docker
  assigned to the container's upper dir via `docker inspect` +
  `xfs_io -r -c lsproj`, plus the mountpoint via findmnt. Only meaningful
  after `docker create` — the upper dir doesn't exist before that.
- _setup_log_dir_quota_shared (new): runs in the create→start gap added
  by the previous commit. If the log dir lives on the same XFS mountpoint
  as the rootfs upper dir, attaches the same prjid to it with
  `xfs_quota project -s` (binds prjid + sets inherit flag).
  Does NOT call `limit -p` — the bhard belongs to docker; a second limit
  would clobber it. Cleanup is also docker's responsibility: `docker rm`
  resets the prjid's limit when the upper dir is torn down.
- start() invokes the shared path unconditionally. On any failure
  (no rootfs prjid, log dir on a different XFS, subprocess error) the
  log dir simply has no quota in effect — there is no independent fallback.
  Deployments that need shared-quota enforcement must keep the log dir
  and docker root on the same filesystem.
- _effective_disk_limit_log mirrors the rootfs limit on success, stays
  None otherwise.

Drops the legacy independent-prjid code path entirely:
- _try_set_log_dir_quota / _cleanup_log_dir_xfs_quota methods removed
- XFS_PRJID_MIN / XFS_PRJID_RANGE constants removed
- log_dir_xfs_prjid / log_dir_xfs_mountpoint fields removed (no longer
  need to track the prjid for manual cleanup)
…ive key

Add pick_sandbox_info_fields() that keeps only keys declared in the
SandboxInfo TypedDict, and apply it inside SandboxMetaStore.create/update
before json_set on the alive key.

Why: SandboxRecord.to_dict() returns extra DB-only keys (spec, status)
that the DB-fallback read path (meta_store.get(check_db=True)) surfaces.
Without filtering, callers that round-trip the dict back through update()
would leak those keys into Redis. Letting the schema drive the allowlist
also auto-protects future DB-only columns and removes the need for ad-hoc
blocklists at every call site.

DB writes are unchanged — sandbox_table has its own column-based filter.

(cherry picked from commit b0191a1)
Signed-off-by: Jiachen Zhang <zjc462490@alibaba-inc.com>
…e semantics

archive_cache() reads the current Redis alive-key, merges final_info on
top (e.g. stop_time, state), persists the snapshot to DB, then evicts
both Redis keys. The DB always receives a complete snapshot regardless
of what the caller passes.
When sm is None, _get_current_statemachine already checked both Redis
(alive key) and DB (check_db=True) and found nothing. archive_cache in
this path would: read Redis (empty), merge (empty), skip DB write
(nothing to write), delete Redis keys (already absent) — a pure no-op.

The operator.stop() call is kept to kill any lingering Ray actor.
restore_cache() re-seeds Redis alive + timeout keys from stored state,
making an archived sandbox live again. Used by restart to fill the
cache after on_stop has evicted it via archive_cache.
The previous shell script spawned the watchdog via `nohup ... &`, so the
process saved in `self._clean_container_background_process` was only the
short-lived setup script (which exited within milliseconds after detaching
the real watchdog). The subsequent `.kill()` call in `actor.stop()` was a
no-op and the real watchdog was unreachable from Python.

Run the watchdog as the foreground process of the Popen call and add
`start_new_session=True` to preserve the SIGHUP isolation that `nohup`
provided. With this, the Popen handle points at the actual watchdog and
`.kill()` works, which is the prerequisite for the upcoming restart fix
(restart must terminate the old watchdog before docker start, otherwise
the old watchdog races and `docker stop`s the freshly started container).
restart brings a stopped sandbox back up by running `docker start` on
the original container, preserving its filesystem state across the
stop/restart cycle. start() remains the path for fresh containers via
`docker run`.

- POST /restart admin route + SDK Sandbox.restart()
- AbstractOperator / RayOperator / SandboxActor / DockerDeployment each
  expose a restart() method; DockerDeployment.restart() runs
  `docker start` and validates the container is running.
- SandboxStateMachine adds a stopped -> pending transition. on_restart
  rebuilds DockerDeploymentConfig from the spec snapshot in
  sandbox_record.spec (DockerDeploymentConfig.model_dump written once
  by sandbox_table.create), so the new actor wraps the existing
  container with the same image / memory / cpus / auto_clear.
  Sandboxes without a spec snapshot fall back to flat sandbox_info
  fields plus pydantic field defaults.
- SandboxManager.restart_async validates the transition and dispatches
  the SM event; symmetric with stop().

perf(ports): find_free_port resolves docker-published host ports from a
module-level cache. do_port_mapping refreshes the cache once per call
so its three find_free_port lookups share a single docker scan;
standalone find_free_port callers lazy-refresh when the cache is empty.

Tests: integration suites for find_free_port and DockerDeployment
diagnostics; unit tests for the SM restart transition and the manager
restart path.
…rror

_stop() calls _cleanup_kata_disk() which deletes the host .img file bound
to the container via -v. On restart, docker start cannot mount the missing
volume and kata-agent's createContainer fails. Until a dedicated delete API
moves disk cleanup out of _stop(), restart is blocked for kata containers.

Raise NotImplementedError early in DockerDeployment.restart() so callers
get an actionable error instead of a cryptic kata-agent gRPC failure.

Signed-off-by: Jiachen Zhang <zjc462490@alibaba-inc.com>
- Add DELETE /sandbox/{id} API that permanently removes stopped containers
- State machine guarantees delete only triggers from stopped (actor already killed),
  so delete reuses the standard `sandbox-{id}` actor name
- DockerDeployment.delete() issues `docker rm -f` for permanent removal
- SandboxStateMachine: new `deleted` terminal state with `delete` transition
- MetaStore: hard-delete Redis key + DB record on delete
- Auto-delete scheduler scans for sandboxes stopped > N hours and deletes them
- SDK client: add `delete()` method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Support sandbox delete and auto-delete background scan

1 participant