chore: :arrow_up: Update ggml-org/llama.cpp to `d4cff114c0084f1fbc9b4c62717eca8fb2ae494a` by localai-bot · Pull Request #10671 · mudler/LocalAI

localai-bot · 2026-07-03T20:29:51Z

Changes: https://github.com/ggml-org/llama.cpp/compare/fdb1db877c526ec90f668eca1b858da5dba85560..d4cff114c0084f1fbc9b4c62717eca8fb2ae494a

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(gpu-libs): bundle hipBLASLt TensileLibrary data so ROCm backends stop falling back (mudler#10660) (mudler#10672) the The ROCm packager copied rocBLAS kernel data (rocblas/library/*.dat) into the bundled lib/ dir and run.sh pointed ROCBLAS_TENSILE_LIBPATH at it, but the parallel hipBLASLt data dir (hipblaslt/library/TensileLibrary_lazy_gfx*.dat) was never packaged and no HIPBLASLT_TENSILE_LIBPATH was set. The bundled libhipblaslt.so therefore resolved its per-arch kernel data relative to itself, found nothing, and silently fell back to slow generic kernels, logging: rocblaslt error: Cannot read "TensileLibrary_lazy_gfx1201.dat": No such file or directory rocblaslt error: Could not load "TensileLibrary_lazy_gfx1201.dat" Fix, mirroring the existing rocBLAS handling: - package-gpu-libs.sh: extract the rocblas data-dir copy into a reusable copy_rocm_data_dir helper and call it for both rocblas and hipblaslt. - llama-cpp/turboquant run.sh: export HIPBLASLT_TENSILE_LIBPATH when the bundled hipblaslt/library dir exists. The helper takes an optional ROCM_BASE_DIRS override so the copy is unit testable without a real ROCm install; add a regression test that runs package_rocm_libs against a fabricated ROCm tree and asserts both data dirs are bundled. Note: this bundles whatever gfx*.dat the build image's ROCm provides. If a given arch's tensile data is absent from the shipped ROCm, that arch still needs a ROCm bump; the packaging gap itself is fixed for every supported arch. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> * chore: ⬆️ Update ggml-org/llama.cpp to `d4cff114c0084f1fbc9b4c62717eca8fb2ae494a` (mudler#10671) :arrow_up: Update ggml-org/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> * chore: :arrow_up: Update CrispStrobe/CrispASR to `f35185b876fc482fcb2053a81a2697936ed5fcc0` (mudler#10670) :arrow_up: Update CrispStrobe/CrispASR Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> * fix(backends): enable ROCm/HIP GPU offload for ggml audio backends (mudler#10666) (mudler#10667) qwen3-tts-cpp, omnivoice-cpp, acestep-cpp and vibevoice-cpp shipped rocm-* variants that silently ran on CPU ([Load] backend: CPU). Two coupled defects: - The Makefiles passed -DGGML_HIPBLAS=ON, but the vendored ggml only understands -DGGML_HIP=ON (GGML_HIPBLAS was removed upstream), so the ggml-hip backend target was never created and no GPU code was built. - The CMake foreach that links the ggml GPU backends into the module listed blas/cuda/metal/vulkan but not hip, so even a built ggml-hip would not have been linked and its static backend registration would never run. CUDA users were unaffected because cublas passes the correct GGML_CUDA=ON and the foreach already links cuda. Mirror the proven llama-cpp hipblas block (ROCm clang CC/CXX + AMDGPU_TARGETS) and add hip to each foreach. Upstream picks the best device via ggml_backend_init_best(), so no runtime flag is needed once HIP is compiled and linked. Assisted-by: Claude:claude-opus-4-8[1m] [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: LocalAI [bot] <139863280+localai-bot@users.noreply.github.com> Co-authored-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

⬆️ Update ggml-org/llama.cpp

10d2ac1

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

localai-bot force-pushed the update/LLAMA_VERSION branch from 83b2c7f to 10d2ac1 Compare July 3, 2026 20:29

mudler merged commit 8396ce1 into mudler:master Jul 4, 2026
68 checks passed

localai-bot added the dependencies label Jul 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: ⬆️ Update ggml-org/llama.cpp to `d4cff114c0084f1fbc9b4c62717eca8fb2ae494a`#10671

chore: ⬆️ Update ggml-org/llama.cpp to `d4cff114c0084f1fbc9b4c62717eca8fb2ae494a`#10671
mudler merged 1 commit into
mudler:masterfrom
ci-forks:update/LLAMA_VERSION

localai-bot commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants