-
Notifications
You must be signed in to change notification settings - Fork 63
renderer: implement complete overBright in GLSL and high-precision frame-buffer #1050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We cannot overBright separate stage as stage are clamped within [0.0, 1.0] when blending multiple stages together, meaning the result of the multiplication of a separate light stage with a light factor will be clamped before blending it with the color map. This implementation implements overBright in the camera shader, but to only apply this overBright to surfaces having received lights, surfaces that do not receive lights gets divided by the light factor in a way re-multiplying them cancels the division. Stages and surfaces to be blended over other stages or surfaces are also divided to avoid multiplying multiple time the same surface, in order to have the final result being (m * a) + b instead of m * (a + b) wich would be (m * a) + (m * b) and then sump up the factor, something we don't want to do. The legacy code for overbrighting (but clamping) the light at BSP load is kept for when the feature is disabled, but rewritten to make the code better. This feature can be enabled by disabling the r_mapClampOverBright cvar.
This is needed to avoid color-banding with the various division/multiplication round trips of the complete overBright implementation. This will also be needed for sRGB colors in the future too. It is enabled by default if a feature requires it. This is optional and can be disabled with r_highPrecisionRendering, or automatically disabled if the hardware doesn't support the feature.
bd18597
to
3710c76
Compare
Complete overBright in GLSL is currently disabled by default, the code is considered good enough to be enabled by users to evaluate it and report issues. This may be improved incrementally in the future but I haven't spot a single error myself yet. Maps like |
Future sRGB code will require this complete overBright implementation and will also need high precision frame buffer for better results. All this code will be enabled by default when loading a map with sRGB, even if the |
3710c76
to
f2d2d78
Compare
This was tested with maps like:
No visual regressions and glitches were noticed. |
f2d2d78
to
d116d3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as #1035 (review) - among lightmaps/light grids/vertex lighting which ones is the new algorithm expected to apply to?
src/engine/renderer/tr_init.cpp
Outdated
@@ -77,6 +77,7 @@ Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA | |||
cvar_t *r_dynamicLightCastShadows; | |||
cvar_t *r_precomputedLighting; | |||
Cvar::Cvar<int> r_mapOverBrightBits("r_mapOverBrightBits", "default map light color shift", Cvar::NONE, 2); | |||
Cvar::Cvar<bool> r_mapClampOverBright("r_mapClampOverBright", "clamp over bright of legacy maps (enable multiplied color clamping and normalization)", Cvar::NONE, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the word "map" in these cvar names? Seems backwards since these are only used when a value is NOT provided by the map. I suggest r_defaultOverbrightBits
and r_defaultClampOverbright
with the word "default" emphasizing that changes to their value might be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was originally two cvars, r_overBrightBits
and r_mapOverBrightBits
. The first one was configuring some hardware feature, and the engine was multiplying the lights with r_mapOverBrightBits
and r_overBrightBits
.
Let's imagine we had r_mapOverBrightBits
set to 2, if there was no hardware overbright bits (r_overBrightBits 0
), the lights were fully multiplied by 2, if there was 1 hardware overbright bit (r_overBrightBits 1
), the lights were only multiplied by 1, and if there was 1 hardware overbright bits (r_overBrightBits 2
), the lights were not multiplied at all.
The related rgbGen identity
and rgbGen identityLighting
were also related to this, as one was related to blending something with multiplied light or premultiplied light, meaning one of them had to cancel the hardware overbright multiplication, with rgbGen identity
and rgbGen identityLighting
being equal if there was no hardware overbright bit (nothing to compensate)
But since our engine doesn't implement hardware overbright bit, the only cvar left is r_mapOverBrightBits
and rgbGen identity
and rgbGen identityLighting
are equal (nothing to compensate).
With that being said, one may claim that to be fully compatible with hardware overbrightbits, one may only have to multiply the end result. But the life is more complicated than that. This official Quake 3 documentation said on 6.3.1 RgbGen identityLighting
chapter:
6.3.1 RgbGen identityLighting
Colors will be (1.0,1.0,1.0) if running without overbright bits (NT, linux, windowed modes), or (0.5, 0.5, 0.5) if running with overbright. Overbright allows a greater color range at the expense of a loss of precision. Additive and blended stages will get this by default.
According to this documentation, the last platform to support hardware overbright bits was Windows 98 when playing fullscreen. Also, it is said in other places that the feature was not compatible with LCD screens and then required a CRT screen.
It happens that some more recent may have provided the feature more recently, as mentionned here, but for most people the feature is lost since forever.
It means than since 20 years most mappers and developers were testing their maps and graphical effects without that feature, and may have implemented things without testing them with hardware overbright. And since we don't do hardware overbright, we have to face the ultimate limitation: the precision of the user display, which is 24-bit, 8 bit per color. One good example of thing we cannot multiply is the Wolf:ET global fog, the screen itself is clamping the multiplied global fog, and it's permitted to have very strong doubts developer expected that fog to be multiplied, it looks wrong even on non-clamped values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The map
prefix in r_mapClampOverBright
is mainly cargo cult, but it means that option sets a value that can differ per map.
For r_mapOverBrightBits
, this is the default per-map value for when the map doesn't set mapOverBrightBits
key in worldspawn, so I did the same for r_mapClampOverBright
, the default per-map value for when the map doesn't set mapClampOverBright
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can reword the explanation in the way mapOverBrightBits
is a BSP worldspawn entity key, and the r_mapOverBrightBits
cvar carries the engine default value for it when the BSP doesn't set it, hence the same name.
In the same way new r_mapClampOverBright
engine cvar carries the default value for the new mapClampOverBright
BSP worldspawn key. I can name those cvar and key as r_clampOverBright
and clampOverBright
if people prefer. It's fine.
But mapOverBrightBits
and related cvar should be kept that way for legacy compatibility, even if the extra map
prefix for a map key is superfluous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can name those cvar and key as r_clampOverBright and clampOverBright if people prefer. It's fine.
At least for the engine cvar, the map
prefix reminds this value can be overriden by the map itself, which is not bad and may be better. If the map sets the value, it will help people understand why tweaking the engine cvar does nothing.
All of them. There should be overbright anytime there is a precomputed light. With that reverse implementation there is overbright cancelling anytime there is no precomputed light, in a way multiplying the final result reverts non-lit surface to their original colors, and multiplies lit surfaces to their overbright colors. That just makes me think we may have to divide the dynamic lights, as those are expected to not be multiplied. The overbright light factor is for precomputed lights only. |
Dynamic lights are now divided to cancel overBright on them. |
So, what people can do (just to exclude bugs from other buggy features):
Then:
And play public games this way, and report which map break. This way we can decide to disable it by default or not. I expect maps from Tremulous being all fine, but some maps built for Unvanquished having bugs. The thing is that mappers have tested against their maps with a broken lighting and then, mappers were not aware they introduced some bugs. Example of maps I spotted:
The two maps ported from Urban Terror are also buggy, but that's expected: we set
|
Actually freeway having “too bright windows“ may be due to something wrong: the glow maps are multiplied but we may not want to do this. Problem is, with collapsed stages, we know what is a glow map, but with legacy stages, especially if the engine fails to collapse them, we don't know what is a glow map. I thought today of an alternate implementation for GLSL not-clamped overBrightBits: the reason why I multiply everything at the end but divide what should not be multiplied to cancel the multiplication is that framebuffers are clamped to 1, so if we multiply a legacy light stage, it will be clamped to 1 before blending the texture over it, so it's as bad as if we clamp the lightmap at load time. But, what I thought today is that since the engine max overBrightBits is 3, we may divide all stages by |
183d0ce
to
e76f648
Compare
I cancelled overBright on collapsed glow maps. It looks like I already cancelled on separate stages one, in fact they are simply detected as other blended surfaces that should not be overBright, without needing to know they are glow stages. The remaining problem with freeway map seems to be that bloom is multiplied. There is no issue in freeway map anymore. |
Bloom overbright is now properly cancelled. |
Here is a combination of this branch and the ATCSHD remaster @RedSkyByte is working on: |
9d7c7dd
to
5014f4e
Compare
Been running this for a bit, and I haven't noticed any regression. The only oddness I have seen is in the map Archao, but that oddness was present on master too (flickering textures at the spectator spawn point). |
What do you think about renaming If you don't mind, I think I will merge this soon if no one object. Not having this is blocking the work on sRGB/linear colorspace: The clamping will be automatically disabled on maps using the sRGB/linear colorspace in all cases, meaning all newly built maps will use that code. After this is merged, there can be other improvements done:
|
I insist that we should use "default" in the cvar names lest users be surprised when the cvar fails to override the value set in the worldspawn. |
We better not want to rename |
@@ -27,6 +27,7 @@ uniform sampler2D u_MaterialMap; | |||
uniform sampler2D u_GlowMap; | |||
|
|||
uniform float u_AlphaThreshold; | |||
uniform float u_LightFactor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you could optimize it a bit by inverting (x -> 1/x) this uniform so that you can use multiplication instead of division. Also by setting it to 0 instead of 1 when there is no need for multiplications so then the operations can be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. I now do the division once at map load time instead, and replaced all the GLSL divisions by a multiplication.
It is assumed most GPUs do not know how to do jumps and then never skip a computation in a test but just discard the result if the test is false, so something *= a
with a = 1
is better than if ( a != 0 ) { something *= a; }
with a = 0
for when there is nothing to multiply, as we avoid the test instruction in all cases and we cannot avoid the multiplication in all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is assumed most GPUs do not know how to do jumps and then never skip a computation in a test but just discard the result if the test is false
My concept is a bit different. I think they can jump, but all threads must execute the same code. With if (x) { foo(); } else { bar(); }
if x
is true at least once and false at least once, all threads have to execute both branches. But if x
is false in all threads it can jump directly to bar()
.
I agree that x *= 1;
may very well be faster than if ( false ) { x *= 1; }
. But my contention is that if ( false ) { x *= 1; }
is likely faster than if ( true ) { x *= 1; }
. (Here by false
, true
and 1
I mean expressions derived from uniforms that always have the same value, rather than compile-time constants.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe, but, that complexifies the code for the reader for a benefit that may be very minor. Maybe @VReaperV has an opinion on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here by false, true and 1 I mean expressions derived from uniforms that always have the same value
If it's based only on uniforms and constant expressions (which it is unless I'm missing something), then it will be statically uniform so there wouldn't be any divergence and the unused code path will likely be eliminated.
Also, keep in mind that a compiler might optimise something like if( condition > 0.0f ) { value *= condition }
into value *= condition > 0.0f ? condition : 1.0f
anyway.
At the end of the day though, trying to optimise a single vec3 multiplication that is done once per fragment is a waste of time. (If you really want to know which one will be faster you'd need to profile both and it would be hw/driver specific and you likely won't see any significant difference, and it would just be a giant waste of time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanations! ❤️
5014f4e
to
10fa968
Compare
Ah right, the fix for it was lost in my multiple rewrites. This is now fixed. |
10fa968
to
332338b
Compare
For The map variable would tell the renderer to use the legacy code path as a workaround for maps that has may look better with the old buggy code because they were tested with the old buggy code, and the cvar would be usable by the user to test a map with the old buggy code to test the workaround before setting the variable in the map. Actually, I now don't see good reasons to not disable clamping by default in all maps. Once people start to play with not-clamped overbright, any legacy map with clamped overbright looks broken, much more than the quirks that may be revealed by not clamping the lights. We were not seeing the bugs because we never seen a map properly rendered before, but once we see a map being properly renderer, the eyes then see the bugs on almost every surface when clamping is enabled. We can't unseen those light bugs once seen, and they are everywhere. Configuring a map to force the legacy code would require such map to be strongly broken, and for now the only maps that are that broken are maps from another game that used a game using different renderer default values. Also a |
54c12a6
to
649b94b
Compare
So, the cvar to enable old clamped code base is The map related option is The complete overbright is now enabled by default for all maps. One can't unseen the buggy clamping after having played on maps without buggy clamping. The I now consider this PR totally ready (after I squash some fixup commits). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
4cf1855
to
42bc94f
Compare
This branch is addictive. Since I discovered it a few days ago, I have spent hours just looking at maps. |
We cannot overBright separate stage as stage are clamped within
[0.0, 1.0]
when blending multiple stages together, meaning the result of the multiplication of a separate light stage with a light factor will be clamped before blending it with the color map.This implementation implements overBright in the camera shader, but to only apply this overBright to surfaces having received lights, surfaces that do not receive lights gets divided by the light factor in a way re-multiplying them cancels the division.
Stages and surfaces to be blended over other stages or surfaces are also divided to avoid multiplying multiple time the same surface, in order to have the final result being
(m * a) + b
instead ofm * (a + b)
wich would be(m * a) + (m * b)
and then sump up the factor, something we don't want to do.The legacy code for overbrighting (but clamping) the light at BSP load is kept for when the feature is disabled, but rewritten to make the code better.
This feature can be enabled by disabling the
r_mapClampOverBright
cvar.This is needed to avoid color-banding with the various division/multiplication round trips of the complete overBright implementation. This will also be needed for sRGB colors in the future too. It is enabled by default if a feature requires it.
This is optional and can be disabled with
r_highPrecisionRendering
, or automatically disabled if the hardware doesn't support the feature.