fix: clear device cache when a queue item is cancelled by plz12345 · Pull Request #9223 · invoke-ai/InvokeAI

plz12345 · 2026-05-22T04:33:26Z

Summary

When an image generation job is cancelled mid-denoising, the PyTorch CUDA/MPS allocator retains its memory pool and never returns it to the OS. This causes RAM/VRAM usage to accumulate across cancellations and never drop — even as more jobs run — until the app is quit and restarted.

Root cause: TorchDevice.empty_cache() is called at the end of successful invocations (e.g. at line 957 of denoise_latents.py), but a CanceledException raised during the denoising step callback causes execution to jump directly to the except CanceledException: pass handler in run_node(), bypassing that cleanup entirely. PyTorch's allocator holds the freed tensor pool (intermediate latents, activations, noise tensors) indefinitely without an explicit empty_cache() call.

Two fixes:

run_node() except CanceledException handler — add gc.collect() + TorchDevice.empty_cache() so GPU/MPS memory from a cancelled invocation is returned to the OS immediately, not deferred until app restart.
_process() pre-job cleanup — add TorchDevice.empty_cache() alongside the existing gc.collect() call so any residual allocator memory from the previous job (whether it completed normally, errored, or was cancelled) is cleared before the next job begins.

Related Issues / Discussions

Closes #6759

QA Instructions

Start InvokeAI and queue one or more image generation jobs.
Cancel a job mid-generation (during denoising).
Observe RAM/VRAM in Activity Monitor (macOS), nvidia-smi, or equivalent — memory should drop back toward the pre-generation baseline within a few seconds of cancellation.
Before this fix: memory stays elevated permanently and accumulates with each cancellation, only recovering on app restart. After this fix: it drops promptly after each cancel.
Verify that a normal (non-cancelled) generation still completes correctly and produces expected output.

Tested on: macOS (Apple Silicon / MPS unified memory), cancelling single jobs and mid-queue jobs. Memory pressure confirmed to return to baseline after each cancellation.

Merge Plan

No database changes. Single file, two small additions. Safe to merge at any time.

Checklist

The PR has a short but descriptive title, suitable for a changelog
Tests added / updated (if applicable)
❗Changes to a redux slice have a corresponding migration
Documentation added / updated (if applicable)
Updated What's New copy (if doing a release after this PR)

PyTorch's CUDA/MPS allocator holds freed tensors in a pool and never returns them to the OS unless empty_cache() is called explicitly. Before this change, TorchDevice.empty_cache() was only called inside successful invocations (e.g. at the end of denoise_latents). A CanceledException raised during denoising skips that cleanup path, leaving working memory (intermediate latents, activations, noise tensors) stuck in the allocator pool for the lifetime of the process. Two fixes: 1. Call gc.collect() + TorchDevice.empty_cache() in the CanceledException handler in run_node(), so GPU/MPS memory is returned to the OS immediately when a node is cancelled. 2. Add TorchDevice.empty_cache() alongside the existing gc.collect() in _process() so any residual memory from the previous job (completed or cancelled) is cleared before starting the next one.

lstein · 2026-05-25T19:45:27Z

The referenced bug report is from 2024. Is this still a problem? If so, could you provide a recipe for reproducing the memory leak? Thanks.

plz12345 · 2026-05-25T21:20:50Z

Yes it is, at least on Mac.

Run a generation job.
Cancel it halfway through
That RAM is not freed up
Python process grows until RAM is exhausted if you repeat this.

I've been running with this patch since I submitted, and see the desired RAM flush via Activity Monitor, reliably.

lstein

Sorry for the long wait! I've spent some time working with your PR. Unfortunately the memory leak doesn't appear to happen on my Linux development system. It may be a Mac-specific issue and I can ask our Mac developer to give it a spin.

Before I do that, though, I'd like to draw your attention to a couple of issues I spotted looking at your proposed patch. I see two potential issues:

The calls to gc.collect() and TorchDevice.empty_cache() at 460-462 are occurring within the denoiser's Exception block, when the local execution frames are still active and are referencing the in-flight latents and activation buffers. The garbage collection calls shouldn't be able to clean up these data structures. They will only be released when the call stack unwinds. It might be more effective to set a flag in the exception block, and then calling garbage collection calls in a finally block outside the exception handler?
Line 462 is calling TorchDevice.empty_cache() before the execution of each and every queue item, regardless of whether the previous one completed successfully. This is defeating the purpose of the torch cache and may bring a performance penalty. I think that if you move the GC operations out of the exception block as described above, you won't need to make this call. However, if this is necessary to avoid the memory leak on your system, could you do a little benchmarking to see if it has a noticeable impact on generation speed?

Also a minor nit: The comment on line 457 says that python never cedes memory back to the OS (which is true), but it is contradicted by line 461 that says the "memory....is returned to the OS". A more accurate description is that the memory is returned to the python pool for reallocation.

keturn · 2026-06-01T04:02:54Z

The described behavior is also a symptom of something else going on. Memory not returned to the OS isn't unusual for some allocator implementations. But even if it doesn't go back to the OS, it should be re-used by the allocator for the next generation.

If instead you see it continue to accumulate with every subsequent cancellation, that could be a different kind of memory leak.

My two cents: If it's not obviously a bug in the app and there's a chance it's Python or PyTorch, it could be really dissatisfying to sink time into trying to suss out the intricacies of the current behavior only to discover those implementation details have been fixed or changed in the last five versions of torch… Could be a good thing to table until after the runtime updates go in. (Soon™)

plz12345 · 2026-06-01T06:40:58Z

So my gripe was around not freeing up memory when a job was cancelled. I have since become aware of the invokeai.yaml settings to force cache eviction more quickly. My workflow was flipping between Invoke and ltx-2-mlx for video gen, so Invoke being bad at freeing RAM was a pain point. I think Invoke's hot cache solution is fine where needed, but on Mac with MLX, you don't even need it with sub-30GB models because the Metal loading is that fast.

Since then, I ended up just vibe coding an app that actually uses native MLX models for image gen, via mflux,. I know there is likely not even a whiff of Invoke supporting MLX until a miracle happens in PyTorch/Apple land, and Apple releasing MLX as an architecture tells me that's never happening unless Apple and Nvidia get in bed like Microsoft and Nvidia are.

If you want me to revise this as noted, I will, but I'm likely moving on since native MLX is nearly twice as fast as the MPS/PyTorch hand-off that has to happen in Invoke without MLX.

Also the Flux.2 Klein 9b Q8 MLX model actually works, unlike the Q8 GGUF model, which which is broken on Mac (or is in Invoke, I didn't get far).

plz12345 requested review from JPPhoto, blessedcoolant, dunkeroni and lstein as code owners May 22, 2026 04:33

github-actions Bot added python PRs that change python files services PRs that change app services labels May 22, 2026

Merge branch 'main' into fix/clear-device-cache-on-cancel

b8d8df4

lstein self-assigned this May 30, 2026

lstein added the 6.13.5 Library Updates label May 30, 2026

lstein added this to Invoke - Community Roadmap May 30, 2026

lstein moved this to 6.13.5 LIBRARY UPDATES in Invoke - Community Roadmap May 30, 2026

lstein requested changes Jun 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: clear device cache when a queue item is cancelled#9223

fix: clear device cache when a queue item is cancelled#9223
plz12345 wants to merge 2 commits into
invoke-ai:mainfrom
plz12345:fix/clear-device-cache-on-cancel

plz12345 commented May 22, 2026

Uh oh!

lstein commented May 25, 2026

Uh oh!

plz12345 commented May 25, 2026

Uh oh!

lstein left a comment

Uh oh!

keturn commented Jun 1, 2026

Uh oh!

plz12345 commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

plz12345 commented May 22, 2026

Summary

Related Issues / Discussions

QA Instructions

Merge Plan

Checklist

Uh oh!

lstein commented May 25, 2026

Uh oh!

plz12345 commented May 25, 2026

Uh oh!

lstein left a comment

Choose a reason for hiding this comment

Uh oh!

keturn commented Jun 1, 2026

Uh oh!

plz12345 commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants