Skip to content

Fix BO mode wired as 'none' in non-wavelet SURE samplers#486

Open
pycms-nube wants to merge 106 commits into
Panchovix:mainfrom
pycms-nube:main
Open

Fix BO mode wired as 'none' in non-wavelet SURE samplers#486
pycms-nube wants to merge 106 commits into
Panchovix:mainfrom
pycms-nube:main

Conversation

@pycms-nube
Copy link
Copy Markdown

What changed

All 8 non-wavelet SURE samplers (sample_sure, sample_dpmpp_2s_a_sure, sample_dpmpp_2s_a_sure_adaptive, sample_dpmpp_2m_sure, sample_dpmpp_2m_sde_sure, sample_dpmpp_3m_sde_sure, sample_dpmpp_2m_sde_sure_adaptive, sample_sure_adaptive) were initialising _adam_state with:

_adam_state = {'optimizer': None, 'param': None} if sure_adam_mode != 'none' else None

This allocated a real dict when sure_adam_mode='bo' and passed it as a non-None adam_state to _sure_correct_x0.

Why it was wrong

_sure_correct_x0 only activates Adam when adam_mode in ('adam', 'adamw'). When adam_mode='bo', the adam block is skipped and the function falls through to plain SGD — the non-None dict is silently ignored. BO mode in the non-wavelet path ran as plain SGD, wasting an allocation each call and masking the mismatch.

The wavelet sampler sample_sure_wavelet already had the correct check (sure_adam_mode in ('adam', 'adamw')) so _adam_state=None for BO mode.

Fix

Changed all 8 non-wavelet samplers to match the wavelet pattern:

_adam_state = {'optimizer': None, 'param': None} if sure_adam_mode in ('adam', 'adamw') else None

Test plan

  • Run a generation with each SURE sampler variant and sure_adam_mode='bo' — confirm no runtime errors
  • Verify sure_adam_mode='adam' and 'adamw' still initialise _adam_state correctly (non-None)
  • Confirm sure_adam_mode='none' keeps _adam_state=None

🤖 Generated with Claude Code

pycms-nube and others added 30 commits March 28, 2026 13:47
Add support of diffuser of UNet, LoRA.
Also Claude now has capability to handle complex usage
And we now ready for some fun features
MPS backend has some updates now, let's bump up
This commit fix all noneType and other issiue cause by some werid loading problem. Also add typing

Diffuser pipline now has capiability to compile with LoRA.
This commit also introduce first version of forge auto offloading (maxiumn fir offloading by layers) using diffuser auto device map infere
The diffuser pipline now can do model loading using diffuser. This allows advanced model support and better model loading.
Assiatant by claude opus
We kind of fix problem in orginal DoRA support???? I not sure... Claude says orginal is wrong but at this point both reForge pipline and diffuser has support.

Restore orginal support of multi chunk CLIP. for Diffuser.
This commit introduce fix about not using pipline.

Also create stub for future optimzation
This should solve problem of SDP is not efficent for non tech users, plus easier to check if we hit performance maxiumn
On MacOS, memory foot print is essential. This commit use autocast so we can run on bp16 or extreme fp16
This commit support mapping aginst diffuser schecduler and adding diffuser scheduler.
Expand LRU cache to diffuser functions with high cost
Fix noise scale on EDM for Euler A2. And add DC-sampler, SURE, a trojactory sampler

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This fix introduce some performance improve around SURE sampler.
A proper preheat is added. Along with vaiant of DPM++ 2M/2S a
I will suggest you use DPM++ 2S a SURE. This is the most best one that somehow match the paper but not introduce werid artifacts
For SURE it's sure now.
SURE reimplement after noticing nosie is add back later
And we now can use model metadata infer what is right
This fix allows SURE to actually run under the sampling assumption.
The problem is due sampling needs high sigma while SURE don't like it.
Also we inject noise wrong, so it blew up somthing
Sure this is a SURE fix;)
Though mostly you should not change but...
Yeah why not?
Replace the fixed sigma-scaled SGD step in _sure_correct_x0 with an
optional Adam/AdamW optimiser whose state (m, v, t) persists across
diffusion steps, mapping each denoising step to one optimizer iteration.

Adam normalises each pixel's gradient by its historical variance so that
alpha becomes a true scale-invariant learning rate rather than a raw
gradient magnitude knob.  The sigma-scaling heuristic (alpha / (1 +
sigma_t)) is therefore dropped when Adam is active — it is redundant
and would double-suppress early steps that Adam already handles via its
second moment.

AdamW adds decoupled weight decay applied directly to x0_hat, pulling
the corrected estimate toward zero each step without contaminating the
moment estimates.

Four new UI options are exposed under Settings:
  sure_adam_mode   — none / adam / adamw  (Radio)
  sure_adam_beta1  — first-moment decay   (Slider, default 0.9)
  sure_adam_beta2  — second-moment decay  (Slider, default 0.999)
  sure_adam_wd     — AdamW weight decay   (Slider, default 0.01)

All eight SURE samplers receive the new parameters via the existing
signature-inspection path in sd_samplers_common.py.  Diagnostic logging
now reports eff_grad_rms and adam_ratio (eff/raw) so the per-pixel
adaptation can be verified at runtime.

Empirically the adam_ratio and eff_grad_rms trajectories are nearly
identical across a 5× range of alpha values, confirming that Adam has
absorbed the scale sensitivity that previously required manual tuning.

Signed-off-by: PYCMS <zenghongyi2004@gmail.com>
Co-developed-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Claude Sonnet 4.6 <noreply@anthropic.com>
pycms-nube and others added 30 commits June 2, 2026 17:45
Now we can track  what can be upgrade or eurgent upgrade
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v5...v6)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [actions/dependency-review-action](https://github.com/actions/dependency-review-action) from 4 to 5.
- [Release notes](https://github.com/actions/dependency-review-action/releases)
- [Commits](actions/dependency-review-action@v4...v5)

---
updated-dependencies:
- dependency-name: actions/dependency-review-action
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [github/codeql-action](https://github.com/github/codeql-action) from 3 to 4.
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](github/codeql-action@v3...v4)

---
updated-dependencies:
- dependency-name: github/codeql-action
  dependency-version: '4'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [actions/cache](https://github.com/actions/cache) from 4 to 5.
- [Release notes](https://github.com/actions/cache/releases)
- [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md)
- [Commits](actions/cache@v4...v5)

---
updated-dependencies:
- dependency-name: actions/cache
  dependency-version: '5'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4 to 6.
- [Release notes](https://github.com/actions/setup-node/releases)
- [Commits](actions/setup-node@v4...v6)

---
updated-dependencies:
- dependency-name: actions/setup-node
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
1. open_clip: GitHub archive ZIP for commit bb6e834 contains duplicate
   entries with conflicting content for ViT-g-14.json, which uv (stricter
   than pip) rejects with exit code 2.  Switch the default OPENCLIP_PACKAGE
   to the PyPI wheel `open_clip_torch>=2.24.0`, which installs cleanly via
   uv and provides the same `open_clip` import.

2. SDXL CPU/MPS tensor mismatch in get_learned_conditioning: the spatial
   conditioning tensors (original_size_as_tuple, crop_coords_top_left,
   target_size_as_tuple, aesthetic_score) were created on
   `clip.patcher.model.device`, which is always the *offload* device (CPU)
   in ldm_patched's lazy-load scheme — even after move_clip_to_gpu() has
   dispatched the model to MPS.  The CLIP text-encoder outputs (both the
   standard torch path and the MLX hook path) land on MPS because tokens
   are always created on `devices.device`.  The subsequent torch.cat of the
   pooled CLIP-G vector with the CPU spatial-cond embeddings therefore
   raised "Passed CPU tensor to MPS op".

   Fix: create all spatial conditioning tensors on `devices.device`
   (the authoritative compute device, always MPS on Apple Silicon),
   which matches where the text-encoder embeddings land regardless of
   VRAM state, separate-TE usage, or whether the MLX pipeline is active.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Bumps [open-clip-torch](https://github.com/mlfoundations/open_clip) from 2.30.0 to 3.3.0.
- [Release notes](https://github.com/mlfoundations/open_clip/releases)
- [Changelog](https://github.com/mlfoundations/open_clip/blob/main/HISTORY.md)
- [Commits](mlfoundations/open_clip@v2.30.0...v3.3.0)

---
updated-dependencies:
- dependency-name: open-clip-torch
  dependency-version: 3.3.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 3.20.0 to 7.35.0.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Commits](https://github.com/protocolbuffers/protobuf/commits)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-version: 7.35.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Updates the requirements on [pytest-cov](https://github.com/pytest-dev/pytest-cov) to permit the latest version.
- [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst)
- [Commits](pytest-dev/pytest-cov@v4.0.0...v7.1.0)

---
updated-dependencies:
- dependency-name: pytest-cov
  dependency-version: 7.1.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
- Bump CI to Python 3.14 (allow-prereleases) in run_tests and linter
- Update README: Python 3.14 is the target, brew install python@3.14
- Incorporate all pending dependabot updates:
  pillow 10→12.2.0, accelerate 0.21→1.13.0, psutil 5.9→7.2.2,
  pytorch-lightning 1.9→2.6.5, albumentations 1.4→2.0.8,
  pytest 7.3→9.0, pydantic custom-wheel→2.13.4,
  diffusers 0.32→0.37.1, huggingface-hub 0.25→0.36.2,
  fastapi 0.94→0.136.3, starlette pinned 1.2.1, timm 1.0.17→1.0.27
- Fix Pydantic v2 compat: Optional fields, ConfigDict, model_fields,
  model_dump; remove _OptionsModelProxy, use lazy _build_options_model()
- Add debug_image_probe.py instrumentation module

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The debug endpoint /debug/last-output served arbitrary local files as
inline base64 and the /file= override bypassed Gradio's security gates.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
allow-licenses and deny-licenses are mutually exclusive in
actions/dependency-review-action@v5. Keep only deny-licenses.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes two high-severity CVEs in diffusers 0.37.1:
- CVE-2026-44513 / GHSA-98h9-4798-4q5v: trust_remote_code bypass via
  custom_pipeline and local custom components
- CVE-2026-45804 / GHSA-7wx4-6vff-v64p: TOCTOU race condition enabling
  silent RCE in DiffusionPipeline.from_pretrained()

Both are patched in diffusers 0.38.0, which requires safetensors>=0.8.0-rc.0.
Accepting the pre-release safetensors intentionally — security takes
priority over stability here. Added inline warnings in requirements_versions.txt
to document the trade-off and flag for revisit once 0.8.0 stable ships.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
zoom.js:
- Remove trailing whitespace on blank lines (8 occurrences)
- Add space after `if` keyword (lines 520, 818)

extraNetworks.js:
- Remove trailing whitespace on blank lines (7 occurrences)
- Remove duplicate `extraNetworksSearchButton` declaration (no-redeclare)
- Add missing `extraNetworksTreeProcessFileClick` stub (no-undef)
- Add trailing newline (eol-last)

hires_fix.js:
- Remove trailing whitespace (line 36)
- Add trailing newline (eol-last)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pyproject.toml:
- Bump target-version py39 → py310 (match statement support)
- Move exclude to [tool.ruff] (was silently ignored under [tool.ruff.lint])
- Add per-file-ignores for upstream k-diff, mlx optional imports,
  type stubs, star-import preprocessor_compiled, late-import patterns
- Add fastapi.Body to extend-immutable-calls

Code fixes across diff_pipeline/, cst-auto/, mlx_pipeline/, scripts/,
extensions-builtin/ (Lora, forge_legacy_preprocessors, sd_forge_controlnet,
sd_forge_ipadapter, sd_forge_latent_modifier, sd_forge_multidiffusion,
sd_forge_neveroom, sd_webui_random_resolutions, reForge-*, mahiro_reforge):
- E741: rename ambiguous `l` variables
- E701: split multi-statement if-colon lines
- B006: mutable default args → None + guard
- F841: remove unused variable assignments
- E722: bare except → except Exception
- B007: unused loop vars → _
- B904: raise in except → raise ... from err
- B011: assert False → raise AssertionError
- C416: unnecessary dict/list comprehensions → dict()/list()
- B005: multi-char strip fix
- F821: add missing imports (typing.Any, traceback, devices)
- W291/W293: trailing whitespace

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
blendmodes 2024.1.1 caps pillow<11 which conflicts with pillow==12.2.0.
blendmodes==2025 requires only pillow>=10.4.0 with no upper bound.

CI: increase wait-for-it timeout 20s→60s to give the server more time
to boot on slow runners. Make kill-server step non-fatal with || true
so a connection-refused (server never started) does not fail the job.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
numpy 2.0+ is required by both scikit-image==0.25.1 and
blendmodes==2025. numpy 1.26.4 was already incompatible with our
current scikit-image pin; this resolves the full dependency chain.
torch 2.11.0 supports numpy 2.x.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
modules/launch_utils.py:
- Add _select_requirements_file(): auto-selects requirements_versions.txt
  (Python 3.11–3.13) or requirements_versions_py314.txt (Python 3.14+)
  based on sys.version_info; overridable via REQS_FILE env var
- Update check_python_version(): supported range is now 3.11–3.14;
  drop 3.10 (PyWavelets>=1.9.0 requires >=3.11)

requirements_versions.txt (Python 3.11–3.13):
- Revert pillow 12.2.0 → 10.4.0  (gradio 3.41.2 caps pillow<11)
- Revert numpy 2.2.6 → 1.26.4    (gradio 3.41.2 caps numpy<2)
- Revert blendmodes 2025 → 2024.1.1 (blendmodes 2025 needs numpy>=2)
- Bump scikit-image 0.25.1 → 0.25.2, kornia 0.8.0 → 0.8.2
- Retain security pins: diffusers 0.38.0, safetensors 0.8.0rc0

requirements_versions_py314.txt (Python 3.14+, new file):
- Same baseline with 3.14-specific uv resolutions:
  scikit-image 0.26.0, kornia 0.8.3, PyWavelets 1.9.0
- Marks where gradio/numpy/pillow constraints can be lifted
  when gradio is eventually upgraded on this branch

pyproject.toml: bump ruff target-version py310 → py311
README: update minimum Python 3.10+ → 3.11+

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
webui.py: inject pytorch_lightning.utilities.distributed compatibility
shim before any module that triggers ldm imports.
pytorch-lightning 2.x removed that module; ldm/ddpm.py and
generative-models still import rank_zero_only from the old path.
Falls back through pytorch_lightning.utilities.rank_zero →
lightning_fabric.utilities.rank_zero so it works across 2.x versions.

requirements_versions*.txt: add audioop-lts to both requirements files.
audioop was removed in Python 3.13 (PEP 594); pydub (pulled in by
gradio) imports it and crashes on 3.13+ without this backport.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
preprocessor_compiled.py: add explicit 'import functools'.
The module was previously available via star-import re-export from
preprocessor.py, but the ruff lint pass removed the unused import
there, silently breaking the re-export. functools.partial is used
directly in this file so the import must be explicit.

launch_utils.py: add _apply_repository_patches() called after all
git_clone steps. First patch fixes CPU/MPS tensor mismatch in
repositories/generative-models/sgm/modules/encoders/modules.py:
emb.to(output[out_key].device) before torch.cat so all embedder
outputs are on the same device on Apple Silicon MPS.
repositories/ is gitignored so the patch is re-applied idempotently
on every fresh clone instead of being committed directly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When the MLX pipeline fails to load on an MPS-capable machine, emit a
visible boxed warning so the user knows they are running on the slower
MPS backend and how to install MLX to fix it.

Previously the failure was silently swallowed at DEBUG log level,
leaving users confused about why generation was slow or hitting
device-placement errors (like the CPU/MPS tensor mismatch in
generative-models that requires the post-clone patch).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously the warning only fired in the except block, but
maybe_activate() returns False silently (no exception) when mlx is
not installed or Metal is unavailable — the most common case.

Now check the return value: if is_apple_silicon() is True but
maybe_activate() returned False, set the reason string and print
the warning. Exception path still sets the reason from the error
message, keeping both cases covered.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Addresses two "Uncontrolled data used in path expression" findings
reported by CodeQL in modules/ui_extensions.py.

save_config_state (line ~70):
- os.path.basename() alone only prevents directory traversal but leaves
  characters and length unconstrained; an attacker with UI access could
  write semi-controlled filenames into config_states_dir.
- Added: re.sub whitelist (word chars / spaces / hyphens / dots only),
  64-char length cap, and a realpath() containment check that rejects
  the path if it resolves outside config_states_dir after all steps.

apply_and_restart (line ~29-34):
- JSON list elements were only checked to be a list, not to be strings.
  Non-string elements could reach extension-name comparisons that
  derive filesystem paths.
- Added: assert all(isinstance(x, str) for x in ...) on both
  disable_list and update_list after json.loads.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previous fix sanitised 'name' but still used it in os.path.join(),
which CodeQL's taint analysis correctly flags — any user-derived byte
in a path expression is a finding regardless of sanitisation guards.

config_states.list_config_states() reads the display name from JSON
content (cs.get('name')), NOT from the filename, so the filename
needs zero user input.

New approach:
- Display name: sanitised (basename + whitelist + 64-char cap) and
  stored only inside the JSON body as 'name'.
- File path: timestamp + uuid4 hex suffix — entirely server-generated,
  no taint from user input whatsoever.

This closes the CodeQL 'Uncontrolled data used in path expression'
finding at line 70 definitively.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CodeQL flags 'Uncontrolled data used in path expression' at line 29
because Path(filename) is constructed directly from user-supplied data.

check_tmp_file is a security gatekeeper that checks whether a
user-requested file lives inside an allowed temp dir, so the filename
must be handled — but Path(user_input) is unnecessary.

Fix: canonicalise both sides to plain strings via os.path.realpath
+ os.path.abspath, then do a string startswith() containment check.
This is functionally equivalent (both resolve symlinks and check
whether the file is within an allowed directory) but avoids
constructing any Path object from user-controlled data.
Explicit try/except rejects malformed inputs cleanly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces `sample_sure_restart` ("Restart SURE"), a Heun-based sampler
that uses the SURE residual ratio as an adaptive restart signal instead
of triggering restarts blindly at predetermined sigma thresholds.

The SURE residual ratio is computed for free at every step:

    r_ratio = sqrt(mean((x − Dθ(x, σ))²)) / σ

For a healthy trajectory r_ratio ≈ 1 (denoiser residual ≈ injected
noise).  When the trajectory drifts off-manifold r_ratio rises above
`sure_threshold` (default 1.5) and the sampler:
  1. Re-injects noise to reach σ_restart = min(σ_cur × scale, σ_max)
  2. Runs `restart_steps` Heun sub-steps on a Karras schedule back to σ_cur
  3. Refreshes the denoised estimate and continues the main step

Mathematical backing: the stop-gradient SURE approximation gives
  SURE_approx/(n·σ²) = r_ratio² − 1
so r_ratio > T ↔ SURE_approx/(n·σ²) > T²−1.  The Lean theorem
`cond4_step_closer` (SURE_verification.lean) confirms that positive
SURE_approx signals correctable denoising error, making the restart
productive.  No extra model calls are incurred on steps where no
restart is triggered (2 NFE/step, same as standard Heun).

Parameters: sure_threshold=1.5, restart_max_times=2, restart_steps=9,
restart_scale=2.0, sure_preheat_steps=3.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant