Skip to content

tests: add Khronos VVS test framework (cherry-picked from upstream)#215

Merged
zlatinski merged 20 commits intomainfrom
khronos-test-framework-cherry-picked
Mar 30, 2026
Merged

tests: add Khronos VVS test framework (cherry-picked from upstream)#215
zlatinski merged 20 commits intomainfrom
khronos-test-framework-cherry-picked

Conversation

@zlatinski
Copy link
Copy Markdown
Contributor

Cherry-pick the complete Vulkan Video Samples test framework from the Khronos upstream repository (KhronosGroup/Vulkan-Video-Samples, branch main, commits 9d588d9e..fc17607f).

This brings in the unified Python test runner and all incremental improvements, without modifying any library/decoder/encoder/filter C++ code (those changes were reverted to avoid mixing test infrastructure with code changes that need separate review).

tests/vvs_test_runner.py — unified entry point for encode + decode tests tests/decode_samples.json — ~40 decoder test definitions (H.264/H.265/AV1/VP9) tests/encode_samples.json — ~15 encoder test definitions (H.264/H.265/AV1) tests/skipped_samples.json — per-driver skip list (nvidia, nvk, anv, radv, amd) tests/libs/ — framework library modules:

  • video_test_framework_base.py (base classes)

  • video_test_framework_decode.py (decoder framework)

  • video_test_framework_encode.py (encoder framework)

  • video_test_driver_detect.py (GPU driver auto-detection)

  • video_test_fetch_sample.py (asset downloading + SHA256 verification)

  • video_test_result_reporter.py (result reporting + JSON export)

  • video_test_config_base.py, video_test_platform_utils.py, video_test_utils.py tests/unit_tests/ — pytest self-tests (CLI, skip list, filtering, configs, status) tests/generate_sample_md5.py — MD5 generation for new test samples tests/manage_samples_list.py — sample list management tests/README.md — comprehensive documentation (532 lines) tests/conftest.py — pytest configuration
    .github/workflows/test.yml — CI workflow (lint + unit tests + codec tests)

  • 9d588d9e tests: introduce testing framework

  • b6088f42 tests: rename video_test_framework_codec.py to vvs_test_runner.py

  • ad7ed0e4 tests: add extended test framework support

  • 4a657d88 tests: add encode resolution boundary tests

  • 4748f8ee tests: only download resources for tests that will actually run

  • f50a3ebb tests: bypass skip list when test is explicitly requested with -t

  • b815a5d8 tests: display skipped tests in running list and fix summary counts

  • a57bc453 tests: add codec filter support to --list-samples

  • c2343556 tests: update skip list after film grain and error handling fixes (and 21 other incremental improvements)

  • fc17607f filter: fix YCBCR2RGBA shader compilation error

  • 6114990a common: Fix 10-bit/12-bit sample normalization

  • 0dbd2ba7 Config: rename --no-device-fallback to --noDeviceFallback (test-side changes from these commits ARE included)

@zlatinski zlatinski force-pushed the khronos-test-framework-cherry-picked branch from 5a12ae5 to 979b869 Compare March 23, 2026 23:28
The Khronos test framework (vvs_test_runner.py) hard-codes --verbose
in every decoder and encoder command via video_test_framework_base.py
and video_test_framework_encode.py. The NVIDIA fork's binaries did not
recognize this flag, causing all 76 Khronos test samples to fail with
"Unknown argument --verbose" (exit code 255).

Both binaries already had the `verbose` member variable and used it
throughout for conditional output — they just lacked the CLI argument
entry to set it.

Decoder (DecoderConfig.h):
  - Add {"--verbose", ...} entry after --verboseValidate, matching the
    Khronos DecoderConfig.h layout exactly.

Encoder (VkEncoderConfig.cpp):
  - Add --verbose to the printHelp() usage string.
  - Add argument parsing (args[i] == "--verbose") before the catch-all
    else block.

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
The Khronos test framework (vvs_test_runner.py) hard-codes
--noDeviceFallback in every decoder and encoder command to prevent
GPU fallback when testing with --deviceID. The NVIDIA fork's binaries
did not recognize this flag, causing all 73 Khronos tests to fail
with "Unknown argument --noDeviceFallback" (exit 255) — same class
of issue as the --verbose fix in the previous commit.

Decoder (DecoderConfig.h):
  - Add noDeviceFallback member variable (uint32_t : 1)
  - Initialize to false in reset()
  - Add CLI flag entry {"--noDeviceFallback", ...} after --deviceUuid

Encoder (VkEncoderConfig.h + VkEncoderConfig.cpp):
  - Add noDeviceFallback member variable (uint32_t : 1)
  - Initialize to false in constructor
  - Add argument parsing and help text entry

Note: The actual device fallback logic from the Khronos
VulkanDeviceContext is not ported — the flag is accepted and stored
but the selection behavior remains unchanged. This is sufficient for
Khronos test framework compatibility since the flag's purpose is to
prevent fallback on multi-GPU systems, which the NVIDIA fork does
not implement.

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
On GPU-less CI runners (GitHub Actions), every Khronos test failed
with exit 255 because the NVIDIA fork's binaries returned -1 (=255)
on Vulkan init failure. The test framework maps exit 69 to
NOT_SUPPORTED (pass), but treats 255 as FAIL.

Port the exit code mechanism from the Khronos repo:

VkVSCommon.h (new file):
  - VVS_EXIT_UNSUPPORTED = EX_UNAVAILABLE (69)
  - IsVideoUnsupportedResult() — checks for VK_ERROR_FEATURE_NOT_PRESENT,
    VK_ERROR_INCOMPATIBLE_DRIVER, VK_ERROR_EXTENSION_NOT_PRESENT, and
    all video-specific KHR errors
  - ExitCodeFromVkResult() — maps VkResult to exit code
  - CHECK_VULKAN_FEATURE macro

Encoder Main.cpp:
  - Include VkVSCommon.h
  - Replace all 'return -1' with proper exit codes:
    - VVS_EXIT_UNSUPPORTED for VkResult indicating missing HW/driver
    - EXIT_FAILURE for other errors
  - Replace assert()s with fprintf(stderr, ...) for CI-friendly output
  - 7 VVS_EXIT_UNSUPPORTED return points matching Khronos layout

Decoder Main.cpp:
  - Include VkVSCommon.h
  - Same pattern: IsVideoUnsupportedResult check at every VkResult
    failure → VVS_EXIT_UNSUPPORTED
  - 4 VVS_EXIT_UNSUPPORTED return points (InitVulkanDecoderDevice,
    InitPhysicalDevice display path, InitPhysicalDevice headless path,
    CreateVulkanDevice headless path)
  - Replace assert()s with fprintf(stderr, ...)

This ensures that on CI without a GPU, the test framework sees exit 69
and reports NOT_SUPPORTED instead of FAIL, making the CI green for
GPU-less runners.

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
The VVS_EXIT_UNSUPPORTED (exit 69) fix from commit 39435dc was only
applied to the demo apps (vk-video-dec-test, vk-video-enc-test). The
officially supported test apps still returned -1 (=255) on Vulkan init
failure, causing GPU-less CI to report FAIL instead of NOT_SUPPORTED.

Apply the same pattern to all 3 test apps:
- vulkan-video-dec-test: 10 return points fixed
- vulkan-video-simple-dec-test: 4 return points fixed
- vulkan-video-enc-test: 1 return point fixed + early exit on failure

Pattern: IsVideoUnsupportedResult(result) → VVS_EXIT_UNSUPPORTED,
other errors → EXIT_FAILURE. Replaced assert() with fprintf(stderr)
for CI-friendly output.

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
Cherry-pick the complete Vulkan Video Samples test framework from the
Khronos upstream repository (KhronosGroup/Vulkan-Video-Samples, branch
main, commits 9d588d9e..fc17607f).

This brings in the unified Python test runner and all incremental
improvements, without modifying any library/decoder/encoder/filter C++
code (those changes were reverted to avoid mixing test infrastructure
with code changes that need separate review).

tests/vvs_test_runner.py — unified entry point for encode + decode tests
tests/decode_samples.json — ~40 decoder test definitions (H.264/H.265/AV1/VP9)
tests/encode_samples.json — ~15 encoder test definitions (H.264/H.265/AV1)
tests/skipped_samples.json — per-driver skip list (nvidia, nvk, anv, radv, amd)
tests/libs/ — framework library modules:
  - video_test_framework_base.py (base classes)
  - video_test_framework_decode.py (decoder framework)
  - video_test_framework_encode.py (encoder framework)
  - video_test_driver_detect.py (GPU driver auto-detection)
  - video_test_fetch_sample.py (asset downloading + SHA256 verification)
  - video_test_result_reporter.py (result reporting + JSON export)
  - video_test_config_base.py, video_test_platform_utils.py, video_test_utils.py
tests/unit_tests/ — pytest self-tests (CLI, skip list, filtering, configs, status)
tests/generate_sample_md5.py — MD5 generation for new test samples
tests/manage_samples_list.py — sample list management
tests/README.md — comprehensive documentation (532 lines)
tests/conftest.py — pytest configuration
.github/workflows/test.yml — CI workflow (lint + unit tests + codec tests)

- 9d588d9e tests: introduce testing framework
- b6088f42 tests: rename video_test_framework_codec.py to vvs_test_runner.py
- ad7ed0e4 tests: add extended test framework support
- 4a657d88 tests: add encode resolution boundary tests
- 4748f8ee tests: only download resources for tests that will actually run
- f50a3ebb tests: bypass skip list when test is explicitly requested with -t
- b815a5d8 tests: display skipped tests in running list and fix summary counts
- a57bc453 tests: add codec filter support to --list-samples
- c2343556 tests: update skip list after film grain and error handling fixes
  (and 21 other incremental improvements)

- fc17607f filter: fix YCBCR2RGBA shader compilation error
- 6114990a common: Fix 10-bit/12-bit sample normalization
- 0dbd2ba7 Config: rename --no-device-fallback to --noDeviceFallback
  (test-side changes from these commits ARE included)

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
The Khronos test framework passes --profile <name> for all non-default
encoder tests. The H.264 and H.265 codec-specific
parsers (DoParseArguments) did not handle this flag, causing 11/21
encoder tests to fail with exit 255.

AV1 already had --profile parsing (no change needed).

H.264 (VkEncoderConfigH264.cpp):
  Accept baseline/main/high/high444 (or 0/1/2/3) and set profileIdc.

H.265 (VkEncoderConfigH265.cpp):
  Accept main/main10/mainstill/range/scc (or 0/1/2/3/4) and set
  profile.

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
The H.264 encoder's adaptiveTransformMode defaulted to ENABLE, which
set transform_8x8_mode_flag=true in the PPS regardless of profile.
H.264 Main profile only supports 4x4 transform (spec Table A-2).
This triggered NVIDIA driver assertion:
  "Main profile doesn't support Adaptive 8x8 transform"
  "Invalid PPS ID used when fetching the encoded PPS"

Fix: check profile before setting transform_8x8_mode_flag. For
profiles below High, force transform_8x8_mode_flag=false. Also
changed InitProfileLevel() default from use8x8Transform=true to
false, with autoselect only enabling it for High profile and above.

Fixes encode_h264_main_profile, encode_h264_ip_gop, and
encode_h264_small_frame Khronos test failures.

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
…d::terminate

VkVideoEncoder::DeinitEncoder() now calls WaitForThreadsToComplete()
before destroying resources. Without this, when the encoder test app
exits, the shared_ptr destructor chain destroys the std::vector<std::thread>
with joinable threads still running, which per C++ spec calls
std::terminate().

The demo encoder app was unaffected because its main() explicitly calls
DeinitEncoder() before the shared_ptr drops. The test app
(vulkan-video-enc-test) relies on shared_ptr destructor ordering, which
never reached DeinitEncoder() before the thread vector destructor.

This fixes all 17 encoder test crashes in the Khronos VVS test suite
when running with the officially supported test binaries.

Tested: H.264/H.265/AV1 encoder tests × 5 runs each — all clean exits.

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
For single-tile OBU_FRAME types where tile_start_and_end_present_flag
is not set, consumed_bits() returns 0 (valid). Changed assert > 0
to assert >= 0. consumedBytes=0 means entire payload is tile data.

Fixes decode_av1_argon_test787 and decode_av1_720x480_tile_group.

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
The YCBCR2RGBA compute shader failed to compile with multiple errors:
  - 'outputImageRGB' undeclared identifier
  - 'normalizeYCbCr' no matching overloaded function
  - 'shiftCbCr' no matching overloaded function

Root cause: refactored InitYCBCR2RGBA had two bugs:

1. Missing output format override (GetOutputFormat):
   The YCBCR2RGBA filter converts YCbCr→RGBA, so the output format must
   be RGBA (R8G8B8A8_UNORM or R16G16B16A16_UNORM). Without the override,
   the output format was the decoder's NV12, causing
   ShaderGenerateImagePlaneDescriptors to generate 'outputImageY' and
   'outputImageCbCr' plane bindings instead of 'outputImageRGB'.

2. normalizeYCbCr(uvec3) vs vec3 mismatch:
   The refactored shader generated normalizeYCbCr(uvec3 yuv) but called
   it with vec3 from imageLoad (which returns float). The Khronos version
   uses vec3 throughout for the image-based YCBCR2RGBA path.

After the shader compilation failed, vkCreateComputePipelines returned a
null shader module, and the driver dereferenced it at offset 0x118 →
SIGSEGV in __VkShaderModule::GetShaderCodeHash().

Fix:
- Add GetOutputFormat() that overrides output to RGBA for YCBCR2RGBA
- Call it in the constructor to set m_outputFormat correctly
- Rewrite InitYCBCR2RGBA main() to match the proven Khronos version:
  inline fetch/convert/store, vec3 normalizeYCbCr signature

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
Replace fence-based file dump sync with timeline semaphores and add
an async threaded dump pool to eliminate decode pipeline stalls.

Problem:
The file dump (VkVideoFrameToFile) waited on frameCompleteFence, which
gets wait+reset by QueuePictureForDecode when the DPB recycles a slot.
If the decoder pipeline runs deep enough, the fence is reset for a new
frame before the dump reads — causing an infinite wait on the wrong
frame's fence.

Fix 1 — TL semaphore for dump sync:
Wait on the forward timeline semaphore (frameCompleteDoneSemValue)
instead of the fence. TL values are monotonically increasing and tied
to decode order — they cannot be reset by slot recycling. After reading
the frame, signal the consumer-done TL semaphore so the decoder knows
the frame is released.

Fix 2 — Threaded dump pool (VkVideoDumpPool):
Add a thread pool (4 workers) for async frame dumping, matching the
TRV FileDumper pattern:
- Non-blocking queueFrame() returns immediately from decode loop
- Workers: TL semaphore wait → pixel read → ordered file write
  → signal release TL semaphore
- Display order enforced via m_nextWriteOrder + condition variable
- Dedicated TL semaphore registered as external consumer via
  AddExternalConsumer for proper slot reuse waiting

Fix 3 — ClearParent() pool node state reset:
ClearParent() unconditionally set m_cmdBufState = CmdBufStateReset,
discarding CmdBufStateSubmitted without resetting the fence. On pool
node reacquire, ResetCommandBuffer() saw Reset and skipped the fence
wait+reset, leaving it signaled from the previous submission. This
triggered the videoDecodeCompleteFence assertion. Fix: preserve actual
command buffer state across pool release/reacquire cycles.

Fix 4 — Consumer semaphore deadlock in dump-only path:
The dump pool incorrectly set hasConsummerSignalSemaphore, causing the
decode submit to wait on consumerCompleteSemaphore. In the dump-only
path (--noPresent), no graphics consumer signals this semaphore →
infinite deadlock. The dump pool uses its own dedicated TL release
semaphore via AddExternalConsumer; the hasConsummerSignalSemaphore flag
is only needed when graphics presentation is active.

Fix 5 — Debug serialization flags rewritten for TL semaphores:
- checkDecodeIdleSync: now waits on both the decode fence and TL value
  (previously skipped fence wait when TL semaphore was non-null)
- syncCpuAfterStaging: now waits on frameCompleteFence + filter TL
  value (previously waited on the wrong fence — pool node fence
  signaled by decode, not filter)

Files changed:
- VkVideoFrameToFile.cpp: TL semaphore wait instead of fence
- VkVideoDumpPool.h: New threaded dump pool class
- VulkanVideoProcessor.cpp: Dump pool creation + external consumer
- VulkanCommandBufferPool.h: ClearParent() state fix
- VulkanVideoFrameBuffer.cpp: Phase 2 comments
- VkVideoDecoder.cpp: Debug flag rewrites, remove instrumentation

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
…phore

The graphics presentation now creates its own timeline semaphore and
registers it via AddExternalConsumer(SEM_SYNC_TYPE_IDX_PRESENTER).
QueuePictureForDecode waits on this semaphore before reusing a DPB
slot, ensuring the graphics pipeline has finished reading the frame.

This follows the same pattern as the dump pool's external consumer
registration, providing consistent slot reuse protection for all
consumers (dump, presentation, and potentially encoder).

Changes:
- VulkanFrame.h: added m_presenterReleaseSemaphore and consumer index
- VulkanFrame.cpp: create TL semaphore in AttachShell (not AttachQueue,
  so --noPresent mode doesn't register a semaphore nobody signals).
  Signal it from the graphics queue submit using the value from
  externalConsumerDoneValues[].
- VkVideoQueue.h: added virtual AddExternalConsumer() to interface
- VulkanVideoProcessor.h: override with uint64_t signature

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
Fence routing:
- videoDecodeCompleteFence = VK_NULL_HANDLE when filter enabled
- Filter signals frameCompleteFence as the last producer
- Filter pool node fence not used by decoder (syncWithHost=false)
- Fence assertion guarded for VK_NULL_HANDLE
- fieldPic debug uses frameCompleteFence

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
The compute filter output images were created with only a combined
image view (no per-plane views). The compute shader writes Y and CbCr
through separate bindings (5=Y, 6=CbCr) which require per-plane
storage views. Without these, the shader wrote through the combined
view for both bindings, corrupting the CbCr channel (greenish/random
chroma on display).

Fix: create a YCbCr conversion for the filter output image spec with
VK_IMAGE_CREATE_EXTENDED_USAGE_BIT | VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT.
This enables the 6-param VkImageResourceView::Create to produce both:
1. A combined sampled view (for display YCbCr sampling)
2. Per-plane storage views (for compute shader write)

Also pass planeUsageOverride=VK_IMAGE_USAGE_STORAGE_BIT when the
image spec has ycbcrConversion AND storage usage, so the per-plane
views are created with the correct usage flags.

Validation layer: no new errors introduced. Pre-existing VUID errors
(tiling-08717, image-01762) unchanged.

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
…al fix)

The decoder and filter shared a single TL semaphore but
signaled from different queues (decode family 3, compute family 0).
With pipelining, the decode queue ran ahead, pushing the TL value
past what the filter needed to signal next. This backward signal
(e.g., filter tries to signal 114 but TL is already at 337) was
silently dropped, leaving frameCompleteFence unsignaled forever.

Fix: use the filter pool node's BINARY semaphore for decode→filter
handoff. The decoder signals it, the filter waits on it. Each
decode/filter pair has its own binary semaphore — no ordering conflict.
The TL semaphore is now signaled ONLY by the filter (compute queue),
so values are always monotonically increasing.

  Decode submit → signals binary semaphore (pool node GetSemaphore())
  Filter submit → waits binary semaphore
               → signals TL @ filterCompleteTimelineValue

Without the filter, the decoder signals the TL directly (unchanged).

Verified on NVIDIA RTX 5080:
- av1_superres 1080p: PASS (481 frames, 188 FPS) — was ALWAYS CRASH
- h265_itu_slist_a: PASS (66 frames, 828 FPS) — was ALWAYS CRASH
- h264_4k: PASS (27 frames, 73 FPS)
- h265_2160p: PASS (31 frames, 1416 FPS)
- HEVC 10-bit 4K: PASS (30 frames, 8.5 FPS)
- vp9_tile_1x2, vp9_svc: PASS
- h264_clip_a: PASS (31 frames, 3725 FPS)

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
Replace assert with bounds check for ColorPrimaries,
TransferCharacteristics, and MatrixCoefficients arrays.
AV1 streams (e.g., argon_test1019) can have values beyond the
array bounds. Now prints "Unknown" instead of crashing.

Fixes decode_av1_argon_test1019 (was OOB crash, now PASS).

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
Fixes:
1. VkVideoDecoder.cpp: removed stale checkDecodeIdleSync block (was
   incorrectly placed before filter submit). Added pipeline stall
   (checkIdleSync=true) at end of DecodePictureWithParameters — waits
   on frameCompleteFence + filter TL after all stages complete using
   WaitAndResetFence. Added SYNC-FAIL diagnostic for fence timeout.

2. VkVideoDumpPool.h: moved bytesWritten inside the callback block,
   added zero-write guard with printf warning.

With pipeline stall enabled, filter+dump produces correct frames but
at reduced FPS (serialized). The stall confirms the sync issue is in
the pipelined path — the display/dump reads incomplete filter output
when no CPU wait is present.

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
…pute

The filter output images started with VK_IMAGE_LAYOUT_UNDEFINED and were
never transitioned to VK_IMAGE_LAYOUT_GENERAL before the compute shader
wrote to them. On NVIDIA, a transition from UNDEFINED clears/invalidates
image contents, causing the green/corrupt CbCr seen in display and dump.

Every frame triggered [LAYOUT-BUG] — all 17 DPB slots had UNDEFINED
layout on the filter output image.

Fix: record a VkImageMemoryBarrier2 (UNDEFINED → GENERAL) in the
filter's command buffer (compute queue) before RecordCommandBuffer.
Only triggered when currentImageLayout == UNDEFINED (first use or
after InvalidateImageLayout on reconfigure).

The DPB images had proper UNDEFINED → DECODE_DPB transitions (line 1046),
but the filter output images were missed.

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
Replace the COM-style VkVideoRefCountBase intrusive ref-counting with
standard C++ std::shared_ptr across the entire codebase (68 files,
448 insertions, 1001 deletions).

Core changes:
- VkSharedBaseObj<T> is now a using-alias for std::shared_ptr<T>
- VkVideoRefCountBase reduced to a plain polymorphic base with
  virtual destructor (no enable_shared_from_this, no refcount)
- AddRef/Release/m_refCount removed from all ~21 subclasses

Pool lifecycle:
- Custom deleters return nodes to pool via weak_ptr<Pool>
- Explicit bitmask tracking replaces use_count() polling for
  node availability (use_count is formally approximate per C++ std)
- Pool nodes hold strong parent ref during checkout to prevent
  use-after-free if pool is destroyed while nodes are checked out
- All pool custom deleters hardened: node access moved inside
  poolWeak.lock() guard to prevent use-after-free

API boundary cleanup:
- Eliminated makeSharedFromRaw() bridge at encoder and decoder
  call sites using shared_ptr aliasing constructors
- ReferencedObjectsInfo changed from raw pointers to shared_ptrs
- FindByRawPtr() searches all registered parameter objects instead
  of just the last-of-type array (fixes lossy PPS/SPS lookup)
- dependency_data_s: replaced memset with value-initialization
  (struct now contains shared_ptr, memset was UB)

Circular reference fix:
- VkParserVideoPictureParameters → StdVideoPictureParametersSet →
  client back-reference formed a cycle that leaked on shutdown
- Fix: ReleaseClientObject() virtual breaks the cycle in Reset()
- VkVideoDecoder::Deinitialize() calls Reset() during shutdown
- Verified zero NVDBG_MALLOC leaks for all codecs

Pure virtual destructor fix:
- VulkanVideoDecoder::~VulkanVideoDecoder() called Deinitialize()
  which invokes pure virtual FreeContext() — UB after vtable unwind
- Fix: call Deinitialize() in each derived class destructor
  (H264, H265, AV1, VP9) where the vtable is still intact
- Same fix applied to IVulkanVideoParser base class
- Fixes 10KB context leak from failed FreeContext() dispatch

Parser PPS ownership:
- H264 parser m_pps replaced from non-owning no-op-deleter
  shared_ptr to proper ownership copy from PPS table

Minor:
- DecoderFrameProcessorState exposes typed VulkanFrame accessor
- FilterTestApp: revert dynamic_cast back to static_cast (type
  is compile-time invariant, no RTTI needed)
- DeviceWaitIdle() added to VulkanDeviceContext destructor

Validated with ASan: zero leaks across H264, H265, AV1, VP9
decode and all encoder paths. Filter tests 49/49 pass.

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
The decoder library (vulkan_video_decoder.cpp) previously created its
own VulkanDeviceContext internally, which called LoadVk() to load a
separate Vulkan loader dispatch table. When the test app passed its
own VkDevice handle — created through the app's loader — the library
tried to call vkGetDeviceQueue through its own incompatible dispatch
table, causing a VUID-vkGetDeviceQueue-device-parameter crash.

This dual-loader problem manifested as the decode_h264_clip_a_hw_load_balancing
test crash, where --enableHwLoadBalancing triggered numDecodeQueues=-1,
exercising the vkGetDeviceQueue path more aggressively.

Fix: extend CreateVulkanVideoDecoder() with a new first parameter
VulkanDeviceContext* pVkDevCtxt. When non-null, the library uses the
caller's device context directly — sharing the Vulkan loader dispatch
table, device, queues, and all state. When null, the library creates
its own VulkanDeviceContext internally (preserving the old behavior for
standalone usage like vulkan-video-simple-dec).

Implementation details:
- VulkanVideoDecoderImpl::m_vkDevCtxt changed from a by-value member
  to a pointer (m_pVkDevCtxt) + std::unique_ptr<VulkanDeviceContext>
  (m_ownDevCtxt) for the self-managed case
- Constructor takes optional VulkanDeviceContext*; when null, allocates
  its own and points m_pVkDevCtxt at it
- Initialize() checks m_pVkDevCtxt->getDevice() != VK_NULL_HANDLE to
  skip device creation when the context is already fully initialized
- All internal usage changed from m_vkDevCtxt. to m_pVkDevCtxt->

Callers updated:
- vulkan-video-dec-test: passes &vkDevCtxt (app-created device)
- vulkan-video-simple-dec: passes nullptr (library creates device)

Tested: Khronos VVS test suite with test apps — 98.6% pass rate
(0 crashes, up from 71.6% before encoder+decoder fixes).
The hw_load_balancing test now passes.

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
@zlatinski zlatinski force-pushed the khronos-test-framework-cherry-picked branch from 979b869 to f74ac39 Compare March 27, 2026 19:35
@zlatinski zlatinski merged commit 72c8a8e into main Mar 30, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant