fix(gpu-libs): bundle hipBLASLt TensileLibrary data so ROCm backends stop falling back (#10660) by localai-bot · Pull Request #10672 · mudler/LocalAI

localai-bot · 2026-07-03T20:32:43Z

Description

The ROCm packager bundled rocBLAS kernel data (rocblas/library/*.dat) into each backend's lib/ dir and run.sh pointed ROCBLAS_TENSILE_LIBPATH at it, but the parallel hipBLASLt data dir (hipblaslt/library/TensileLibrary_lazy_gfx*.dat) was never packaged and no HIPBLASLT_TENSILE_LIBPATH was set.

Because backends ship their own libhipblaslt.so under lib/ (prioritized via LD_LIBRARY_PATH), that lib resolved its per-arch kernel data relative to itself, found nothing, and silently fell back to slow generic kernels, logging:

rocblaslt error: Cannot read "TensileLibrary_lazy_gfx1201.dat": No such file or directory
rocblaslt error: Could not load "TensileLibrary_lazy_gfx1201.dat"

The reporter's attached log confirms the asymmetry: ROCBLAS_TENSILE_LIBPATH=/backends/rocm-llama-cpp/lib/rocblas/library is set and working, while hipBLASLt has no bundled data and no env var. This affected every gfx arch, not just gfx1201.

Fix

Mirrors the existing rocBLAS handling:

scripts/build/package-gpu-libs.sh — extract the rocblas data-dir copy into a reusable copy_rocm_data_dir helper and call it for both rocblas and hipblaslt. (Also keeps the deliberate single-line local x=$(shopt -p nullglob) idiom to avoid tripping set -e when nullglob is unset.)
backend/cpp/llama-cpp/run.sh + backend/cpp/turboquant/run.sh — export HIPBLASLT_TENSILE_LIBPATH when the bundled hipblaslt/library dir exists (the only two backends with the rocBLAS pattern).

Tests

New regression test scripts/build/package-gpu-libs-rocm-data_test.sh runs package_rocm_libs against a fabricated ROCm tree (via a new ROCM_BASE_DIRS override) and asserts both rocblas/ and hipblaslt/ data dirs are bundled. Developed TDD: it fails on the pre-fix code (rocblas bundled, hipblaslt not) and passes after. The existing package-gpu-libs_test.sh (#10537) still passes.

Caveat

This bundles whatever gfx*.dat the build image's ROCm actually provides. gfx1201/RDNA4 tensile data landed in ROCm 6.4 — if the shipped ROCm predates it, that specific arch would still need a ROCm bump. The packaging gap itself is fixed for every supported arch. Verified at the unit level; not driven end-to-end on gfx1201 hardware.

Notes

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

…stop falling back (#10660) The ROCm packager copied rocBLAS kernel data (rocblas/library/*.dat) into the bundled lib/ dir and run.sh pointed ROCBLAS_TENSILE_LIBPATH at it, but the parallel hipBLASLt data dir (hipblaslt/library/TensileLibrary_lazy_gfx*.dat) was never packaged and no HIPBLASLT_TENSILE_LIBPATH was set. The bundled libhipblaslt.so therefore resolved its per-arch kernel data relative to itself, found nothing, and silently fell back to slow generic kernels, logging: rocblaslt error: Cannot read "TensileLibrary_lazy_gfx1201.dat": No such file or directory rocblaslt error: Could not load "TensileLibrary_lazy_gfx1201.dat" Fix, mirroring the existing rocBLAS handling: - package-gpu-libs.sh: extract the rocblas data-dir copy into a reusable copy_rocm_data_dir helper and call it for both rocblas and hipblaslt. - llama-cpp/turboquant run.sh: export HIPBLASLT_TENSILE_LIBPATH when the bundled hipblaslt/library dir exists. The helper takes an optional ROCM_BASE_DIRS override so the copy is unit testable without a real ROCm install; add a regression test that runs package_rocm_libs against a fabricated ROCm tree and asserts both data dirs are bundled. Note: this bundles whatever gfx*.dat the build image's ROCm provides. If a given arch's tensile data is absent from the shipped ROCm, that arch still needs a ROCm bump; the packaging gap itself is fixed for every supported arch. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(gpu-libs): bundle hipBLASLt TensileLibrary data so ROCm backends stop falling back (mudler#10660) (mudler#10672) the The ROCm packager copied rocBLAS kernel data (rocblas/library/*.dat) into the bundled lib/ dir and run.sh pointed ROCBLAS_TENSILE_LIBPATH at it, but the parallel hipBLASLt data dir (hipblaslt/library/TensileLibrary_lazy_gfx*.dat) was never packaged and no HIPBLASLT_TENSILE_LIBPATH was set. The bundled libhipblaslt.so therefore resolved its per-arch kernel data relative to itself, found nothing, and silently fell back to slow generic kernels, logging: rocblaslt error: Cannot read "TensileLibrary_lazy_gfx1201.dat": No such file or directory rocblaslt error: Could not load "TensileLibrary_lazy_gfx1201.dat" Fix, mirroring the existing rocBLAS handling: - package-gpu-libs.sh: extract the rocblas data-dir copy into a reusable copy_rocm_data_dir helper and call it for both rocblas and hipblaslt. - llama-cpp/turboquant run.sh: export HIPBLASLT_TENSILE_LIBPATH when the bundled hipblaslt/library dir exists. The helper takes an optional ROCM_BASE_DIRS override so the copy is unit testable without a real ROCm install; add a regression test that runs package_rocm_libs against a fabricated ROCm tree and asserts both data dirs are bundled. Note: this bundles whatever gfx*.dat the build image's ROCm provides. If a given arch's tensile data is absent from the shipped ROCm, that arch still needs a ROCm bump; the packaging gap itself is fixed for every supported arch. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> * chore: ⬆️ Update ggml-org/llama.cpp to `d4cff114c0084f1fbc9b4c62717eca8fb2ae494a` (mudler#10671) :arrow_up: Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> * chore: :arrow_up: Update CrispStrobe/CrispASR to `f35185b876fc482fcb2053a81a2697936ed5fcc0` (mudler#10670) :arrow_up: Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> * fix(backends): enable ROCm/HIP GPU offload for ggml audio backends (mudler#10666) (mudler#10667) qwen3-tts-cpp, omnivoice-cpp, acestep-cpp and vibevoice-cpp shipped rocm-* variants that silently ran on CPU ([Load] backend: CPU). Two coupled defects: - The Makefiles passed -DGGML_HIPBLAS=ON, but the vendored ggml only understands -DGGML_HIP=ON (GGML_HIPBLAS was removed upstream), so the ggml-hip backend target was never created and no GPU code was built. - The CMake foreach that links the ggml GPU backends into the module listed blas/cuda/metal/vulkan but not hip, so even a built ggml-hip would not have been linked and its static backend registration would never run. CUDA users were unaffected because cublas passes the correct GGML_CUDA=ON and the foreach already links cuda. Mirror the proven llama-cpp hipblas block (ROCm clang CC/CXX + AMDGPU_TARGETS) and add hip to each foreach. Upstream picks the best device via ggml_backend_init_best(), so no runtime flag is needed once HIP is compiled and linked. Assisted-by: Claude:claude-opus-4-8[1m] [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: LocalAI [bot] <139863280+localai-bot@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

localai-bot mentioned this pull request Jul 3, 2026

ROCM Llama CPP backend missing file #10660

Closed

mudler approved these changes Jul 3, 2026

View reviewed changes

mudler added the bug Something isn't working label Jul 3, 2026

mudler merged commit 348f3c8 into master Jul 4, 2026
80 checks passed

mudler deleted the worktree-fix+hipblaslt-tensile-data-10660 branch July 4, 2026 06:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(gpu-libs): bundle hipBLASLt TensileLibrary data so ROCm backends stop falling back (#10660)#10672

fix(gpu-libs): bundle hipBLASLt TensileLibrary data so ROCm backends stop falling back (#10660)#10672
mudler merged 1 commit into
masterfrom
worktree-fix+hipblaslt-tensile-data-10660

localai-bot commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented Jul 3, 2026

Description

Fix

Tests

Caveat

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants