Skip to content

[GPU][TESTS] Fix race condition in GpuCacheDirWithDotsParamTest#35698

Open
zhanmyz wants to merge 1 commit intoopenvinotoolkit:masterfrom
zhanmyz:cvs-182616-fix-race-issue
Open

[GPU][TESTS] Fix race condition in GpuCacheDirWithDotsParamTest#35698
zhanmyz wants to merge 1 commit intoopenvinotoolkit:masterfrom
zhanmyz:cvs-182616-fix-race-issue

Conversation

@zhanmyz
Copy link
Copy Markdown
Contributor

@zhanmyz zhanmyz commented May 7, 2026

Details:

  • Fix race condition in GpuCacheDirWithDotsParamTest that causes intermittent [GPU] Failed to write 4 bytes to stream! Wrote 0 in CI parallel execution.

Description of the issue (symptom, root-cause, how it was resolved)

  • Symptom: CacheDirDotVariants/GpuCacheDirWithDotsParamTest.smoke_PopulateAndReuseCache/1 fails intermittently in CI with:

    Check 'written_size == size' failed at binary_buffer.hpp:27:
    [GPU] Failed to write 4 bytes to stream! Wrote 0
    
  • Root Cause:

    • The test constructs cacheDir from std::hash<std::string>{}(test_name) + GetParam(), which is fully deterministic — every process resolves to the same on-disk path (7815f5b1e52eaecf/test_encoder/test_encoder.encrypted/).
    • When gtest-parallel dispatches multiple workers concurrently, one process's removeDir(cacheDir) in SetUp()/TearDown() deletes the .blob file while another process is mid-write via write_cache_entry()export_model()sputn().
    • On Docker overlay2 (CI), the unlink invalidates the open fd's writable backing, causing sputn() to return 0.
  • Resolution:

    • Replace the deterministic hash prefix with ov::test::utils::generateTestFilePrefix(), which incorporates thread ID and timestamp to guarantee a unique cache directory per process invocation.

The code and line that caused this issue (if it is not changed directly)

void SetUp() override {
std::stringstream ss;
ss << std::hex << std::hash<std::string>{}(std::string(::testing::UnitTest::GetInstance()->current_test_info()->name()));
// Base (no trailing slash first)
cacheDir = ss.str() + GetParam();
// Clean previous
ov::test::utils::removeFilesWithExt(cacheDir, "blob");
ov::test::utils::removeFilesWithExt(cacheDir, "cl_cache");
ov::test::utils::removeDir(cacheDir);
core.set_property(ov::cache_dir(cacheDir));
}
void TearDown() override {
ov::test::utils::removeFilesWithExt(cacheDir, "blob");
ov::test::utils::removeFilesWithExt(cacheDir, "cl_cache");
ov::test::utils::removeDir(cacheDir);
}

Reproduction step and snapshot (if applicable)

# Single process — always passes:
ov_gpu_func_tests --gtest_filter="CacheDirDotVariants/GpuCacheDirWithDotsParamTest.smoke_PopulateAndReuseCache/1"

# 30 parallel processes — ~30% failure rate:
for i in $(seq 1 30); do
  (ov_gpu_func_tests --gtest_filter="...smoke_PopulateAndReuseCache/1" --gtest_repeat=3 > run_$i.log 2>&1) &
done; wait
grep -l FAILED run_*.log | wc -l  # → 9/30 failed

Checklist

  • [v] Is it a proper fix? (not a workaround)
  • Did you include test case for this fix, if necessary?
    • No new test needed — the existing test now correctly uses a unique path and will no longer race.
  • [v] Did you review existing test that can be extended to cover this scenario? The fix is within the existing test itself.

Tickets:

 ### Details:
  - Replace deterministic cacheDir path with generateTestFilePrefix() to ensure each parallel process uses a unique cache directory.
  - The original hash-only prefix was identical across processes, causing removeDir() to race with concurrent write_cache_entry().

 ### Tickets:
  - *CVS-182616*

Signed-off-by: zhanmyz <yazhan.ma@intel.com>
@zhanmyz zhanmyz requested review from a team as code owners May 7, 2026 06:35
@zhanmyz zhanmyz requested a review from Copilot May 7, 2026 06:35
@github-actions github-actions Bot added the category: GPU OpenVINO GPU plugin label May 7, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an intermittent CI failure in the Intel GPU functional test suite by eliminating a cross-process cache directory collision in GpuCacheDirWithDotsParamTest during parallel execution.

Changes:

  • Replace a deterministic cache directory name (hash of test name) with ov::test::utils::generateTestFilePrefix() to ensure per-invocation uniqueness.
  • Prevent concurrent gtest-parallel workers from deleting each other’s cache directories/files during SetUp()/TearDown() cleanup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants