Rendering performance: GLTF bind caching, shadow tunables, framebuffer-blit copies#320
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
💤 Files with no reviewable changes (3)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis PR introduces ChangesRendering optimizations
lltrunc and ll_round math fixes
Sequence DiagramsequenceDiagram
participant generateSunShadow
participant sUnionShadowResult
participant bucketShadowCull
participant renderShadow
alt RenderShadowCullMode == 1
generateSunShadow->>sUnionShadowResult: single union octree cull (all cascades)
loop per cascade
generateSunShadow->>bucketShadowCull: re-bucket by cascade frustum AABB
bucketShadowCull-->>generateSunShadow: per-cascade result[j]
generateSunShadow->>renderShadow: renderShadow(do_cull=false)
note over renderShadow: skips updateCull, uses pre-bucketed result
end
else RenderShadowCullMode == 0
loop per cascade
generateSunShadow->>renderShadow: renderShadow(do_cull=true)
renderShadow->>renderShadow: updateCull(shadow_cam, result)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
pushGLTFBatch rebound the full LLFetchedGLTFMaterial (up to 5 textures + ~7 uniforms) for every draw. Thread a (material, media-texture) cache through the GLTF push paths and skip the rebind when unchanged, and sort PBR faces by GLTF material in CompareBatchBreaker so same-material draws are adjacent and actually coalesce. The media-override texture stays in the cache key; renderAlphaObjects invalidates the cache where the simple pool rebinds texture units. Also drop the duplicate hero-probe occlusion pass in doOcclusion -- the first block already issues both reflection and hero queries. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sses The avatar alpha-blend shadow pass is the most expensive and least visually important avatar shadow pass, and it is rendered across every cascade. RenderAvatarShadowDetail (default 2 = unchanged) lets it (and optionally the alpha-mask pass) be skipped, speeding up crowd scenes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ss cascades Sun shadows cull and sort the scene once per cascade -- 4 octree walks + state sorts per frame. RenderShadowCullMode 1 does the octree cull once against a frustum spanning all cascades, then each cascade re-buckets the union's visible groups by its own frustum (bucketShadowCull, using the same AABBInFrustumObjectBounds test the per-cascade cull uses) and sorts only its own render map. Each cascade still renders only its slice, so it is GPU-neutral versus per-cascade culling while saving 3 of 4 octree walks -- a win on CPU-bound targets. Default 0 = unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The deferred pass loop binds a shader per pass with no emptiness check, and the materials pool exposes 24 passes (12 legacy material types x 2 rigged). Modern PBR/simple scenes use none of them, wasting up to 24 bindDeferredShader calls per frame. Add LLDrawPoolMaterials::isPassEmpty (consulted by begin/render/endDeferredPass so the bind/unbind stays balanced) to skip passes whose render map is empty. Visually a no-op -- empty passes had nothing to draw -- so it is pure CPU/GL-call reduction that helps weak hardware on the common modern-content case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The bump pool's deferred render bound gDeferredBumpProgram and re-bound the bump-image texture for every draw, even with no bump geometry (the common case in modern PBR scenes). Skip each static/rigged pass when its render map is empty, and cache the alpha-mask cutoff + bump-image bind across runs of faces that share them (faces are sorted bumpmap-then-texture). bindBumpMap's only side effect (addTextureStats) is max-based on the source texture, so skipping a repeat is a no-op there. Visually unchanged -- pure CPU/GL-call reduction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
indra/llmath/llmath.h (1)
158-166: ⚡ Quick winConsider adding range/finiteness guards to
lltruncfor defensive robustness (Lines 158–166).Casting non-finite (
NaN/Inf) or out-of-range floats toS32is technically undefined behavior in C++. While current callers (mouse input, scroll residue, dimension calculations) consistently pass in-range finite values and tests only cover normal ranges, adding explicit guards would harden the API against misuse. This is optional defensive programming rather than addressing a current issue.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@indra/llmath/llmath.h` around lines 158 - 166, The `lltrunc(F32)` and `lltrunc(F64)` functions lack defensive guards against non-finite values (NaN, Inf) and out-of-range floats before casting to S32, which results in undefined behavior. Add explicit checks in both overloads to verify the input is finite and within the valid range for S32 before performing the cast, returning a safe default value (such as 0) for any invalid inputs that fail the guard conditions.indra/newview/lldrawpoolmaterials.cpp (1)
72-82: ⚡ Quick winAdd explicit pass-range validation in
isPassEmpty.
isPassEmptyindexessMaterialPassType[pass]without a bounds guard. It’s currently safe by caller contract, but this helper now sits on begin/render/end paths, so a future pass-count mismatch turns into UB immediately.Proposed hardening
bool LLDrawPoolMaterials::isPassEmpty(S32 pass) { + if (pass < 0 || pass >= getNumDeferredPasses()) + { + llassert(false && "Invalid deferred material pass"); + return true; + } + bool rigged = false; if (pass >= 12) { rigged = true; pass -= 12; } + llassert(pass < (S32)(sizeof(sMaterialPassType) / sizeof(sMaterialPassType[0]))); U32 type = sMaterialPassType[pass] + (rigged ? 1 : 0); return gPipeline.beginRenderMap(type) == gPipeline.endRenderMap(type); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@indra/newview/lldrawpoolmaterials.cpp` around lines 72 - 82, Add explicit bounds validation in the isPassEmpty function after adjusting the pass value for the rigged flag. The function currently indexes sMaterialPassType[pass] without checking if pass is within valid bounds, which could cause undefined behavior. Insert a bounds check after the rigged adjustment (after pass -= 12) to verify that pass is a valid index for sMaterialPassType array, and return true if the pass is out of bounds, before attempting to access sMaterialPassType[pass].indra/newview/lldrawpoolavatar.cpp (1)
400-410: ⚡ Quick winShort-circuit disabled shadow passes in
beginShadowPass/endShadowPasstoo.The new gate in
renderShadowskips draws, but pass setup/teardown still runs for disabled alpha passes. Guarding begin/end with the same setting avoids the remaining per-pass shader/texture overhead.Proposed follow-up
void LLDrawPoolAvatar::beginShadowPass(S32 pass) { + static LLCachedControl<S32> avatar_shadow_detail(gSavedSettings, "RenderAvatarShadowDetail", 2); + if ((pass == SHADOW_PASS_AVATAR_ALPHA_BLEND && avatar_shadow_detail() < 2) || + (pass == SHADOW_PASS_AVATAR_ALPHA_MASK && avatar_shadow_detail() < 1)) + { + sVertexProgram = NULL; + return; + } + if (pass == SHADOW_PASS_AVATAR_OPAQUE) { ... } } void LLDrawPoolAvatar::endShadowPass(S32 pass) { - if (sShaderLevel > 0) + if (sShaderLevel > 0 && sVertexProgram) { sVertexProgram->unbind(); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@indra/newview/lldrawpoolavatar.cpp` around lines 400 - 410, The shadow pass gates added to renderShadow skip the draw calls for disabled alpha passes, but the beginShadowPass and endShadowPass functions still perform setup and teardown for those passes, wasting shader and texture overhead. Apply the same avatar_shadow_detail() conditional guards to both beginShadowPass and endShadowPass functions that check SHADOW_PASS_AVATAR_ALPHA_BLEND and SHADOW_PASS_AVATAR_ALPHA_MASK, returning early if the detail level is too low, matching the logic already implemented in renderShadow.indra/newview/pipeline.cpp (1)
11679-11679: Add explicitdo_cullargument to spot shadow renderShadow call for clarity.Line 11679 currently relies on the default
do_cull = true. Make this explicit:Change
- renderShadow(view[i + 4], proj[i + 4], shadow_cam, result[i], false); + renderShadow(view[i + 4], proj[i + 4], shadow_cam, result[i], false, true);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@indra/newview/pipeline.cpp` at line 11679, The renderShadow function call on line 11679 is missing an explicit do_cull argument and relies on the default value of true. Add true as the final argument to the renderShadow call after the existing false argument to make the do_cull parameter explicit and improve code clarity. This change affects the spot shadow rendering code where renderShadow is called with view[i + 4], proj[i + 4], shadow_cam, result[i], and false.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@indra/llrender/llrendertarget.cpp`:
- Line 668: The glCopyTexSubImage2D call in the copyTexSubImage method has its
parameters in the wrong order. The function signature requires (target, level,
xoffset, yoffset, x, y, width, height) where xoffset and yoffset are destination
texture offsets, x and y are source framebuffer coordinates, and the last two
are dimensions (width and height). Currently, the parameters are swapped: source
coordinates (srcX0, srcY0) are passed as destination offsets, destination
offsets (dstX0, dstY0) are passed as source coordinates, and endpoint
coordinates (dstX1, dstY1) are passed instead of dimensions. Fix this by
reordering the parameters to pass dstX0 and dstY0 as the third and fourth
arguments (destination offsets), srcX0 and srcY0 as the fifth and sixth
arguments (source coordinates), and calculate the width and height as the
difference between the destination endpoint coordinates and their starting
offsets (dstX1 - dstX0 and dstY1 - dstY0).
---
Nitpick comments:
In `@indra/llmath/llmath.h`:
- Around line 158-166: The `lltrunc(F32)` and `lltrunc(F64)` functions lack
defensive guards against non-finite values (NaN, Inf) and out-of-range floats
before casting to S32, which results in undefined behavior. Add explicit checks
in both overloads to verify the input is finite and within the valid range for
S32 before performing the cast, returning a safe default value (such as 0) for
any invalid inputs that fail the guard conditions.
In `@indra/newview/lldrawpoolavatar.cpp`:
- Around line 400-410: The shadow pass gates added to renderShadow skip the draw
calls for disabled alpha passes, but the beginShadowPass and endShadowPass
functions still perform setup and teardown for those passes, wasting shader and
texture overhead. Apply the same avatar_shadow_detail() conditional guards to
both beginShadowPass and endShadowPass functions that check
SHADOW_PASS_AVATAR_ALPHA_BLEND and SHADOW_PASS_AVATAR_ALPHA_MASK, returning
early if the detail level is too low, matching the logic already implemented in
renderShadow.
In `@indra/newview/lldrawpoolmaterials.cpp`:
- Around line 72-82: Add explicit bounds validation in the isPassEmpty function
after adjusting the pass value for the rigged flag. The function currently
indexes sMaterialPassType[pass] without checking if pass is within valid bounds,
which could cause undefined behavior. Insert a bounds check after the rigged
adjustment (after pass -= 12) to verify that pass is a valid index for
sMaterialPassType array, and return true if the pass is out of bounds, before
attempting to access sMaterialPassType[pass].
In `@indra/newview/pipeline.cpp`:
- Line 11679: The renderShadow function call on line 11679 is missing an
explicit do_cull argument and relies on the default value of true. Add true as
the final argument to the renderShadow call after the existing false argument to
make the do_cull parameter explicit and improve code clarity. This change
affects the spot shadow rendering code where renderShadow is called with view[i
+ 4], proj[i + 4], shadow_cam, result[i], and false.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: eeb1576f-9ed1-48e5-9ba7-a8237b231c81
📒 Files selected for processing (18)
indra/llmath/llmath.hindra/llrender/llrendertarget.cppindra/llrender/llrendertarget.hindra/newview/app_settings/settings_alchemy.xmlindra/newview/lldrawpool.cppindra/newview/lldrawpool.hindra/newview/lldrawpoolavatar.cppindra/newview/lldrawpoolbump.cppindra/newview/lldrawpoolmaterials.cppindra/newview/lldrawpoolmaterials.hindra/newview/lldrawpoolwater.cppindra/newview/llgltfmaterialpreviewmgr.cppindra/newview/llviewershadermgr.cppindra/newview/llviewershadermgr.hindra/newview/llviewertexture.cppindra/newview/llvovolume.cppindra/newview/pipeline.cppindra/newview/pipeline.h
💤 Files with no reviewable changes (3)
- indra/newview/llviewershadermgr.cpp
- indra/newview/llviewershadermgr.h
- indra/newview/llviewertexture.cpp
…s with blits Add LLRenderTarget::copyContents (FBO->FBO glBlitFramebuffer, with a glCopyTexSubImage2D fallback when depth/stencil layouts differ) and the static copyContentsToFramebuffer helper. Use it to replace the gCopyProgram/gCopyDepthProgram fullscreen-triangle copy passes with direct framebuffer blits: - water distortion depth copy (LLDrawPoolWater::beginPostDeferredPass) - screen-space reflection scene copy (copyScreenSpaceReflections) - atmospherics / water haze depth copy (doAtmospherics, doWaterHaze) - auto-exposure history copy (generateExposure) Removes the now-unused gCopyDepthProgram (shader load + extern declaration) and the stray gCopyProgram extern in llviewertexture.cpp. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lback The stencil-mismatch fallback in LLRenderTarget::copyContents passed source coordinates as the destination texel offset, destination offsets as the source framebuffer origin, and endpoint coordinates where width/height were expected. Reorder to the correct signature (target, level, xoffset, yoffset, x, y, width, height): destination offset = dstX0/dstY0, source origin = srcX0/srcY0, dimensions = srcX1-srcX0 / srcY1-srcY0. No behavior change for the existing call sites (all copy full, same-size targets at origin), but the fallback is now correct for sub-rect / mismatched-size depth copies. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Description
A batch of rendering performance improvements, staying on the current GL baseline (no bindless / MDI). Quality-affecting changes ship as opt-in settings with defaults unchanged; the rest are visual no-ops.
GLTF/PBR draw path
LLRenderPass::pushGLTFBatch/pushRiggedGLTFBatch— skip the (up to 5 textures + uniforms)LLFetchedGLTFMaterial::bindwhen the material and media-override texture are unchanged.CompareBatchBreakernow groups faces by GLTF render material so same-material draws are adjacent and actually coalesce.doOcclusion(the prior block already ran both reflection + hero managers).Empty-pass / redundant-bind skips (visual no-ops)
Shadow tunables (opt-in, default = current behavior)
RenderAvatarShadowDetail(default 2 = unchanged): optionally drop the expensive avatar alpha-blend / alpha-mask shadow passes across all cascades — big win on crowd scenes.RenderShadowCullMode(default 0 = unchanged): mode 1 culls the sun cascades once against a union frustum and re-buckets per cascade, saving 3 of 4 octree walks per frame while remaining GPU-neutral (helps CPU-bound targets).Render-target copies
LLRenderTarget::copyContents(FBO→FBOglBlitFramebuffer, with aglCopyTexSubImage2Dfallback when depth/stencil layouts differ) andcopyContentsToFramebuffer. Replace thegCopyProgram/gCopyDepthProgramfullscreen-triangle copy passes (water distortion, SSR scene copy, atmospherics, water haze, auto-exposure history) with direct blits, and remove the now-unusedgCopyDepthProgram.Build fix
lltrunc(F32)/lltrunc(F64)were defined in a header withoutinline(ODR violation →LNK2005on relink of some targets); markedinlineand switched tostd::trunc.Related Issues
No tracking issue — opportunistic rendering performance work.
Issue Link: N/A
Checklist
Additional Notes
RenderAvatarShadowDetail,RenderShadowCullMode) still warrant broader A/B on weak-CPU/GPU hardware; they're default-off so they can't regress anyone.