Testing Outputs Against Ground Truth #300

alexmi256 · 2025-02-03T16:46:29Z

alexmi256
Feb 3, 2025

I've extracted the left and right frames from a true stereoscopic movie and I'd like to compare the different models and settings against these.

How is this best achieved?
Are there built in utils/functions to do this? Some of the files in training seem promising and I see some eval functions.

I though I could give iw3 the left image and then just compare the original vs output right image but it doesn't seem to be as straighforward as that.

I saw that there is a --export option which outputs the depth map for the given image at the default or specified resolution but if I do this I would need to compute a depth map for the original left & right images which seems doesn't seem straightforward (most solutions rely on OpenCV and there seem to be variables for this).

I also tried outputing a Full SBS image and then overlaying it against the original and noticed differences in both left and right frames with the right frame containing a much larger differece. I would have throught the left frame would be untouched. This doesn't seem to be the case however and I can't tell if it's because of encoding/decoding or if the way the depth map is applied actually generates two new images (this would actually make a lot of sense to me).

nagadomi · 2025-02-03T21:56:02Z

nagadomi
Feb 3, 2025
Maintainer

There are two different types of stereo view synthesis:

synthesizing only one eye. left (0:input image) right (+2:synthetic image), or left (-2:synthetic image) right (0:input image)
synthesizing both eyes. left (-1:synthetic image) center (0:input image) right (+1:synthetic image)

iw3 is 2. 3d movie(post production) is 1. That is why the right eye frame is much different.
It is possible to output in method 1 with iw3, but there is currently no implementation of that method.

Also, there is an option to output various parameters for comparison. --find-param

nunif/iw3/utils.py

Line 1742 in e830a9a

def find_param(args, depth_model, side_model):

Adding the option to output in method 1 to this function should achieve it.

0 replies

nagadomi · 2025-03-22T23:53:36Z

nagadomi
Mar 22, 2025
Maintainer

I added --synthetic-view option.
When --synthetic-view right is specified, only the right view becomes a synthetic view, while the left view remains the original image. The default is --synthetic-view both, where both views are synthetic views.

0 replies

alexmi256 · 2025-07-25T20:48:31Z

alexmi256
Jul 25, 2025
Author

I finally got around to testing some of this and uploaded scripts to https://github.com/alexmi256/2D3DTestingScript

Here is what I found:

Scripts for testing 2D to 3D image generation

Data used was from live indoor 3D video keyframes and cannot be shared
Comparisons were performed using imgcompare RGB images

TL;DR

For right image only synthetic view:

Use --model ZoeD_Any_N --edge-dilation --depth-aa --convergence 0.5 --divergence 4.0 --method mlbw_l4s for almost best in class generation and performance
Use --model DepthPro_S --edge-dilation --depth-aa --convergence 0.5 --divergence 4.0 --method mlbw_l4s --tta for best in class generation but 4x slower performance

Summary

DepthPro_S seems to be the best model even though it's smaller than DepthPro
DepthPro_S is ~2x slower compared to other models while DepthPro is ~4x slower
--edge-dilation for Any models doesn't matter much
--depth-aa doesn't matter much
--tta doesn't matter much, marginal improvement
--divergence and --convergence do seem to make a slight difference in right frame generation
--foreground-scale either does nothing or is only applicable in certain conditions
These score likely don't mean much
- A 5% difference between ground truth and generated may actually be a lot. gt right vs gt left difference is ~7.021%
- While I finetuned parameters for "right frame only" generation, iw3 uses both frame generation by default so these fine tuned settings may be invalid with that mode which I believed may result in better generation
- Ideally I'd be able to generate a depth map from L and R frames and compare that against iw3 generated map. This proves to be difficult in that all examples I've seen of this generate messy maps which could not be compared against the iw3 maps.
VideoDepthAnything was not tested

Results

A lower score is better because it means there is less % of a difference between ground truth and generated image

Baseline

With 1093 images from video keyframes converted and compared with imgcompare.image_diff_percent
These ran at ~9 images/second with RTX 3090 except for DepthPro and DepthPro_S which took 2.47 and 4.64

Grayscale

Model	Score
Any_V2_K_B	6.5582
Any_V2_K_L	6.5
Any_V2_K	6.5
Distill_Any_L	6.493
Any_V2_K_S	6.4901
Distill_Any_B	6.4802
Distill_Any_S	6.4798
ZoeD_NK	6.4751
ZoeD_Any_K	6.4681
Any_V2_L	6.4574
Any_V2_S	6.4568
ZoeD_K	6.4568
Any_V2_B	6.4567
Any_V2_N_B	6.445
Any_V2_N	6.4376
Any_V2_N_L	6.4376
Any_V2_N_S	6.4365
ZoeD_N	6.4152
Any_S	6.4087
Any_L	6.4076
Any_B	6.4017
ZoeD_Any_N	6.378
DepthPro	6.3077
DepthPro_S	6.2761

RGB

Same but images were kept as RGB instead of grayscale conversion

Model	Score
Any_V2_K_B	6.6775
Any_V2_K_L	6.6192
Any_V2_K	6.6192
Distill_Any_L	6.6122
Any_V2_K_S	6.6094
Distill_Any_B	6.5992
Distill_Any_S	6.5988
ZoeD_NK	6.594
ZoeD_Any_K	6.587
Any_V2_L	6.5763
Any_V2_B	6.5757
Any_V2_S	6.5757
ZoeD_K	6.5753
Any_V2_N_B	6.564
Any_V2_N	6.5567
Any_V2_N_L	6.5567
Any_V2_N_S	6.5554
ZoeD_N	6.5337
Any_S	6.5272
Any_L	6.5263
Any_B	6.5204
ZoeD_Any_N	6.4965
DepthPro	6.4267
DepthPro_S	6.3945

Tuning Options

All of these were done with RGB mode

--edge-dilation

Edge dilation only applicable for models with Any in the name

Model	Score
Any_V2_K_B	6.6775
Any_V2_K_L	6.6192
Any_V2_K	6.6192
Distill_Any_L	6.6122
Any_V2_K_S	6.6094
Distill_Any_B	6.5992
Distill_Any_S	6.5988
ZoeD_NK	6.594
ZoeD_Any_K	6.5884
Any_V2_L	6.5763
Any_V2_B	6.5757
Any_V2_S	6.5757
ZoeD_K	6.5753
Any_V2_N_B	6.564
Any_V2_N	6.5567
Any_V2_N_L	6.5567
Any_V2_N_S	6.5554
ZoeD_N	6.5337
Any_S	6.5272
Any_L	6.5263
Any_B	6.5204
ZoeD_Any_N	6.4981
DepthPro	6.4267
DepthPro_S	6.3945

--depth-aa

Edge dilation only applicable for models with Any in the name

Model	Score
Any_V2_K_B	6.6775
Any_V2_K_L	6.6192
Any_V2_K	6.6192
Distill_Any_L	6.6122
Any_V2_K_S	6.6094
Distill_Any_B	6.5992
Distill_Any_S	6.5988
ZoeD_NK	6.594
ZoeD_Any_K	6.587
Any_V2_L	6.5789
Any_V2_S	6.5781
Any_V2_B	6.578
ZoeD_K	6.5753
Any_V2_N_B	6.564
Any_V2_N	6.5567
Any_V2_N_L	6.5567
Any_V2_N_S	6.5554
ZoeD_N	6.5337
Any_S	6.5272
Any_L	6.5263
Any_B	6.5204
ZoeD_Any_N	6.4965
DepthPro	6.4267
DepthPro_S	6.3945

--tta

This dropped most processing by 2-3/s

Model	Score
Any_V2_K_B	6.6729
Any_V2_K_L	6.6178
Any_V2_K	6.6178
Any_V2_K_S	6.6082
Distill_Any_L	6.6078
Distill_Any_S	6.5982
Distill_Any_B	6.5977
ZoeD_Any_K	6.5832
Any_V2_L	6.5763
Any_V2_S	6.5762
ZoeD_NK	6.5753
Any_V2_B	6.5744
ZoeD_K	6.5729
Any_V2_N_B	6.5622
Any_V2_N	6.5549
Any_V2_N_L	6.5549
Any_V2_N_S	6.5509
ZoeD_N	6.5328
Any_S	6.5258
Any_L	6.5244
Any_B	6.518
ZoeD_Any_N	6.4886
DepthPro	6.4194
DepthPro_S	6.3922

--method

I used ZoeD_Any_N as the model to test this with

Method	Score
forward	6.6026
backward	6.5096
row_flow_sym	6.5046
row_flow_v3_sym	6.5046
forward_fill	6.4968
row_flow	6.4965
row_flow_v3	6.4965
mlbw_l2	6.4887
mlbw_l2s	6.4886
row_flow_v2	6.4881
mlbw_l4	6.4876
mlbw_l4s	6.4871

--find-params

I used ZoeD_Any_N as the model to test this with
This generates a combination (80) of divergence, convergence, foreground-scale images just for one image
I only used a smaller sample of 110 images for this test
Foreground Scale doesn't appear to do anything despite listed as supported by ZoeD_Any_N

Combination	Score
d50_c0_fs2	8.1231
d50_c0_fs0	8.1231
d50_c0_fs3	8.1231
d50_c0_fs1	8.1231
d40_c0_fs0	7.4349
d40_c0_fs1	7.4349
d40_c0_fs2	7.4349
d40_c0_fs3	7.4349
d30_c0_fs1	6.9144
d30_c0_fs0	6.9144
d30_c0_fs2	6.9144
d30_c0_fs3	6.9144
d50_c7_fs2	6.8131
d50_c7_fs0	6.8131
d50_c7_fs1	6.8131
d50_c7_fs3	6.8131
d50_c2_fs3	6.6905
d50_c2_fs0	6.6905
d50_c2_fs1	6.6905
d50_c2_fs2	6.6905
d10_c7_fs3	6.6651
d10_c7_fs2	6.6651
d10_c7_fs1	6.6651
d10_c7_fs0	6.6651
d40_c7_fs3	6.6126
d40_c7_fs1	6.6126
d40_c7_fs2	6.6126
d40_c7_fs0	6.6126
d20_c0_fs0	6.6111
d20_c0_fs3	6.6111
d20_c0_fs1	6.6111
d20_c0_fs2	6.6111
d10_c5_fs2	6.603
d10_c5_fs1	6.603
d10_c5_fs3	6.603
d10_c5_fs0	6.603
d10_c0_fs2	6.5788
d10_c0_fs3	6.5788
d10_c0_fs1	6.5788
d10_c0_fs0	6.5788
d10_c2_fs1	6.5733
d10_c2_fs3	6.5733
d10_c2_fs2	6.5733
d10_c2_fs0	6.5733
d20_c7_fs3	6.5256
d20_c7_fs0	6.5256
d20_c7_fs1	6.5256
d20_c7_fs2	6.5256
d30_c7_fs1	6.5101
d30_c7_fs0	6.5101
d30_c7_fs3	6.5101
d30_c7_fs2	6.5101
d40_c2_fs1	6.485
d40_c2_fs0	6.485
d40_c2_fs2	6.485
d40_c2_fs3	6.485
d20_c2_fs2	6.4094
d20_c2_fs3	6.4094
d20_c2_fs1	6.4094
d20_c2_fs0	6.4094
d30_c2_fs3	6.3981
d30_c2_fs2	6.3981
d30_c2_fs0	6.3981
d30_c2_fs1	6.3981
d20_c5_fs0	6.3532
d20_c5_fs3	6.3532
d20_c5_fs2	6.3532
d20_c5_fs1	6.3532
d30_c5_fs2	6.1943
d30_c5_fs3	6.1943
d30_c5_fs0	6.1943
d30_c5_fs1	6.1943
d50_c5_fs1	6.1495
d50_c5_fs2	6.1495
d50_c5_fs0	6.1495
d50_c5_fs3	6.1495
d40_c5_fs0	6.1468
d40_c5_fs2	6.1468
d40_c5_fs3	6.1468
d40_c5_fs1	6.1468

Divergence	Score
50	6.9441
40	6.6698
10	6.6051
30	6.5042
20	6.4748

Convergence	Score
0	7.1325
7	6.6253
2	6.5113
5	6.2894

Foreground Scale	Score
0	6.6396
3	6.6396
1	6.6396
2	6.6396

--edge-dilation --depth-aa --convergence 0.5 --divergence 4.0

Model	Score
Any_V2_K_B	6.5049
ZoeD_Any_K	6.4538
ZoeD_NK	6.4198
Any_V2_K_S	6.4086
Any_V2_K_L	6.3935
Any_V2_K	6.3935
Distill_Any_L	6.3747
ZoeD_K	6.3696
Distill_Any_B	6.3644
Distill_Any_S	6.3527
Any_V2_B	6.3328
Any_V2_L	6.3288
Any_V2_S	6.3272
Any_V2_N_S	6.2999
Any_V2_N_B	6.2968
ZoeD_N	6.2829
Any_V2_N	6.2786
Any_V2_N_L	6.2786
Any_L	6.2562
Any_S	6.2501
Any_B	6.2484
ZoeD_Any_N	6.2473
DepthPro	6.097
DepthPro_S	6.0856

--edge-dilation --depth-aa --convergence 0.5 --divergence 4.0 --method mlbw_l4s --tta

Model	Score
Any_V2_K_B	6.494
ZoeD_Any_K	6.4459
Any_V2_K_S	6.4058
Any_V2_K_L	6.3862
Any_V2_K	6.3862
ZoeD_K	6.365
ZoeD_NK	6.3623
Distill_Any_L	6.359
Distill_Any_B	6.3541
Distill_Any_S	6.3426
Any_V2_B	6.3236
Any_V2_L	6.3204
Any_V2_S	6.3183
Any_V2_N_B	6.2917
Any_V2_N_S	6.2887
ZoeD_N	6.2768
Any_V2_N	6.2702
Any_V2_N_L	6.2702
Any_L	6.2508
Any_S	6.2469
Any_B	6.2433
ZoeD_Any_N	6.2358
DepthPro_S	6.0858
DepthPro	6.0637

7 replies

nagadomi Jul 26, 2025
Maintainer

--synthetic-view right and --synthetic-view both should look almost the same in VR device/3D TV.
For right, only the right view shifts by 2 units.
For both, the right view shifts by 1 and the left view shifts by -1.
In both cases, the difference between left and right views is 2 units.

The reason both is set as the default is because smaller shifts make image warping easier and reduce artifacts.
In the case of right, the left eye sees the original image, but the right eye may have twice the artifacts.

alexmi256 Jul 26, 2025
Author

In the code at https://github.com/nagadomi/nunif/blob/dev/iw3/mapper.py#L157 I see the shift mappers mentioned but in the CLI I'm only seeing:

error: argument --mapper: invalid choice: 'shift_30' (choose from 'auto', 'pow2', 'softplus', 'softplus2', 'inv_mul_3', 'inv_mul_2', 'inv_mul_1', 'none', 'mul_1', 'mul_2', 'mul_3', 'div_25', 'div_10', 'div_6', 'div_4', 'div_2', 'div_1')

I also came across this error with softplus:

  File "/home/alex/Downloads/nunif/iw3/cli.py", line 20, in <module>
    main()
  File "/home/alex/Downloads/nunif/iw3/cli.py", line 12, in main
    iw3_main(args)
  File "/home/alex/Downloads/nunif/iw3/utils.py", line 1993, in iw3_main
    process_images(image_files, args.output, args, depth_model, side_model, title="Images")
  File "/home/alex/Downloads/nunif/iw3/utils.py", line 466, in process_images
    output = process_image(im, args, depth_model, side_model)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/Downloads/nunif/iw3/utils.py", line 418, in process_image
    left_eye, right_eye = apply_divergence(depth, x, args, side_model)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/Downloads/nunif/iw3/utils.py", line 279, in apply_divergence
    left_eye, right_eye = apply_divergence_nn_LR(
                          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/Downloads/nunif/iw3/backward_warp.py", line 138, in apply_divergence_nn_LR
    right_eye = apply_divergence_nn(model, c, depth, divergence * 2, convergence, steps,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/Downloads/nunif/iw3/backward_warp.py", line 158, in apply_divergence_nn
    return apply_divergence_nn_delta(model, c, depth, divergence, convergence, steps,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/Downloads/nunif/iw3/backward_warp.py", line 187, in apply_divergence_nn_delta
    x = torch.stack([make_input_tensor(None, depth_warp[i],
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/Downloads/nunif/iw3/backward_warp.py", line 20, in make_input_tensor
    depth = get_mapper(mapper)(depth)
            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/alex/Downloads/nunif/iw3/mapper.py", line 144, in <lambda>
    return lambda x: chain(x, functions)
                     ^^^^^^^^^^^^^^^^^^^
  File "/home/alex/Downloads/nunif/iw3/mapper.py", line 118, in chain
    x = f(x)
        ^^^^
TypeError: softplus01() missing 2 required positional arguments: 'bias' and 'scale'

I tried Any_B with different mappers and only got

| Mapper    |   Score |
|-----------|---------|
| div_1     |  7.1796 |
| mul_3     |  7.071  |
| div_2     |  6.982  |
| mul_2     |  6.8231 |
| div_4     |  6.7987 |
| inv_mul_3 |  6.7252 |
| div_6     |  6.7165 |
| pow2      |  6.6896 |
| inv_mul_2 |  6.658  |
| mul_1     |  6.643  |
| div_10    |  6.6404 |
| inv_mul_1 |  6.5946 |
| div_25    |  6.5662 |
| none      |  6.5204 |
| auto      |  6.5204 |

nagadomi Jul 26, 2025
Maintainer

If you specify --mapper-type shift --foreground-scale 3, shift_30 will be used internally.
In any case, those issues were fixed in #451

alexmi256 Jul 28, 2025
Author

Thanks again for being so active on these fixes.
I did quite a few more mapper tests but in the end I didn't find relative depth models to outperform the others.

The absolute best results I got were via
--model DepthPro --edge-dilation 2 --depth-aa --convergence 0.5 --divergence 4.0 --foreground-scale 0 --method mlbw_l4s --mapper div_2 --mapper-type div --tta
but this is very slow and I think I would use DepthPro_S or
--model ZoeD_Any_N --edge-dilation 2 --depth-aa --convergence 0.5 --divergence 4.0 --foreground-scale 0 --method mlbw_l4s --mapper mul_2 --mapper-type mul
for everyday conversions.

I'll have to also try VideoDepthAnything but this may be harder to do so so since I'm not sure how I'd output right side frames that I can compare to ground truth without encoders affecting the image quality.

I'm now curious how UniDepthV2 works since I saw another discussion that it's better than DepthPro

nagadomi Jul 28, 2025
Maintainer

That a divergence of around 3 to 5 in metric depth looks natural is consistent with my feeling.
For VDA, VDA_Metric seems close to ZoeD_Any_N, while VDA_L appears close to Any_V2_L.

Regarding the mapper and divergence, I think it's difficult to evaluate it on a VR player because the screen position and size are variable.
For example, when the screen is positioned closer, the image tends to appear flatter (requiring a larger divergence to achieve a 3D effect), while when the screen is farther away, the image appears more three-dimensional (a smaller divergence is sufficient).
The screen position and size depend on the application or can be interactively adjusted by the user.

gituser123456789000 · 2025-10-02T11:04:38Z

gituser123456789000
Oct 2, 2025

All those numbers and results... the listing seems very confusing... lower score is better?

While going back and deleting old outdated AI and 3D renders, I came across some Depth Anything (I don't know which model was used) vs Apple's higher resolution Depth Pro comparisons, and with fresh eyes months later, Depth Pro was looking good. Not absolute perfection and the best result with every detail, but overall looked better.

I'd be interested in seeing Depth Pro get added to iw3 again... obviously unless there's been some newer model that's surpassed it. If I remember correctly, Depth Pro didn't get fully implemented because of flickering in video? So we don't use it for video then.. but it could be an improvement for photos.. or if there's some new implementation method to fix the flickering it may have had before.

Back to the rankings/numbers shown.. I don't see how any Zoe based model rated any good. Zoe causes mid range bulging, foreground crush, etc. So I don't understand how Zoe models are mixed in randomly in those results... And if lower numbers are better, then it shows Depth Anything v2 is worse than v1? Idk about that... And it shows Large is worse than Base is worse than Small?? That's definitely not the case. The results are highly questionable and confusing.

1 reply

nagadomi Oct 3, 2025
Maintainer

This compares the warped view with the actual 3D movie at the pixel level.
It is the same as comparing the absolute value of disparity(It is not exactly the same, but the intention is the same), so matching the scale and shift is more important than having detailed depth. In other words, the distances of each object in the scene should be the same.
I think the reason DepthPro and ZoeD_Any_N give good results is because they are metric depth models.

Uh oh!

Testing Outputs Against Ground Truth #300

Uh oh!

alexmi256 Feb 3, 2025

Replies: 4 comments · 8 replies

Uh oh!

Uh oh!

nagadomi Feb 3, 2025 Maintainer

Uh oh!

nagadomi Mar 22, 2025 Maintainer

Uh oh!

alexmi256 Jul 25, 2025 Author

Scripts for testing 2D to 3D image generation

TL;DR

Summary

Results

Baseline

Grayscale

RGB

Tuning Options

--edge-dilation

--depth-aa

--tta

--method

--find-params

--edge-dilation --depth-aa --convergence 0.5 --divergence 4.0

--edge-dilation --depth-aa --convergence 0.5 --divergence 4.0 --method mlbw_l4s --tta

Uh oh!

Uh oh!

nagadomi Jul 26, 2025 Maintainer

Uh oh!

alexmi256 Jul 26, 2025 Author

Uh oh!

nagadomi Jul 26, 2025 Maintainer

Uh oh!

alexmi256 Jul 28, 2025 Author

Uh oh!

Uh oh!

nagadomi Jul 28, 2025 Maintainer

Uh oh!

gituser123456789000 Oct 2, 2025

Uh oh!

Uh oh!

nagadomi Oct 3, 2025 Maintainer

alexmi256
Feb 3, 2025

Replies: 4 comments 8 replies

nagadomi
Feb 3, 2025
Maintainer

nagadomi
Mar 22, 2025
Maintainer

alexmi256
Jul 25, 2025
Author

nagadomi Jul 26, 2025
Maintainer

alexmi256 Jul 26, 2025
Author

nagadomi Jul 26, 2025
Maintainer

alexmi256 Jul 28, 2025
Author

nagadomi Jul 28, 2025
Maintainer

gituser123456789000
Oct 2, 2025

nagadomi Oct 3, 2025
Maintainer