Skip to content

chore: ⬆️ Update ggml-org/llama.cpp to d4cff114c0084f1fbc9b4c62717eca8fb2ae494a#10671

Merged
mudler merged 1 commit into
mudler:masterfrom
ci-forks:update/LLAMA_VERSION
Jul 4, 2026
Merged

chore: ⬆️ Update ggml-org/llama.cpp to d4cff114c0084f1fbc9b4c62717eca8fb2ae494a#10671
mudler merged 1 commit into
mudler:masterfrom
ci-forks:update/LLAMA_VERSION

Conversation

@localai-bot

Copy link
Copy Markdown
Collaborator

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@localai-bot localai-bot force-pushed the update/LLAMA_VERSION branch from 83b2c7f to 10d2ac1 Compare July 3, 2026 20:29
@mudler mudler merged commit 8396ce1 into mudler:master Jul 4, 2026
68 checks passed
l0caldadmin added a commit to l0caldadmin/LocalAI that referenced this pull request Jul 4, 2026
* fix(gpu-libs): bundle hipBLASLt TensileLibrary data so ROCm backends stop falling back (mudler#10660) (mudler#10672) the 

The ROCm packager copied rocBLAS kernel data (rocblas/library/*.dat) into the
bundled lib/ dir and run.sh pointed ROCBLAS_TENSILE_LIBPATH at it, but the
parallel hipBLASLt data dir (hipblaslt/library/TensileLibrary_lazy_gfx*.dat)
was never packaged and no HIPBLASLT_TENSILE_LIBPATH was set. The bundled
libhipblaslt.so therefore resolved its per-arch kernel data relative to itself,
found nothing, and silently fell back to slow generic kernels, logging:

    rocblaslt error: Cannot read "TensileLibrary_lazy_gfx1201.dat": No such file or directory
    rocblaslt error: Could not load "TensileLibrary_lazy_gfx1201.dat"

Fix, mirroring the existing rocBLAS handling:
- package-gpu-libs.sh: extract the rocblas data-dir copy into a reusable
  copy_rocm_data_dir helper and call it for both rocblas and hipblaslt.
- llama-cpp/turboquant run.sh: export HIPBLASLT_TENSILE_LIBPATH when the
  bundled hipblaslt/library dir exists.

The helper takes an optional ROCM_BASE_DIRS override so the copy is unit
testable without a real ROCm install; add a regression test that runs
package_rocm_libs against a fabricated ROCm tree and asserts both data dirs
are bundled.

Note: this bundles whatever gfx*.dat the build image's ROCm provides. If a
given arch's tensile data is absent from the shipped ROCm, that arch still
needs a ROCm bump; the packaging gap itself is fixed for every supported arch.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

* chore: ⬆️ Update ggml-org/llama.cpp to `d4cff114c0084f1fbc9b4c62717eca8fb2ae494a` (mudler#10671)

:arrow_up: Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* chore: :arrow_up: Update CrispStrobe/CrispASR to `f35185b876fc482fcb2053a81a2697936ed5fcc0` (mudler#10670)

:arrow_up: Update CrispStrobe/CrispASR

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>

* fix(backends): enable ROCm/HIP GPU offload for ggml audio backends (mudler#10666) (mudler#10667)

qwen3-tts-cpp, omnivoice-cpp, acestep-cpp and vibevoice-cpp shipped
rocm-* variants that silently ran on CPU ([Load] backend: CPU). Two
coupled defects:

- The Makefiles passed -DGGML_HIPBLAS=ON, but the vendored ggml only
  understands -DGGML_HIP=ON (GGML_HIPBLAS was removed upstream), so the
  ggml-hip backend target was never created and no GPU code was built.
- The CMake foreach that links the ggml GPU backends into the module
  listed blas/cuda/metal/vulkan but not hip, so even a built ggml-hip
  would not have been linked and its static backend registration would
  never run.

CUDA users were unaffected because cublas passes the correct GGML_CUDA=ON
and the foreach already links cuda. Mirror the proven llama-cpp hipblas
block (ROCm clang CC/CXX + AMDGPU_TARGETS) and add hip to each foreach.
Upstream picks the best device via ggml_backend_init_best(), so no
runtime flag is needed once HIP is compiled and linked.


Assisted-by: Claude:claude-opus-4-8[1m] [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: LocalAI [bot] <139863280+localai-bot@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants