-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Pull requests: ggml-org/llama.cpp
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
vulkan: enable fp16 for gcn 3 and 4 chips
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#13396
opened May 9, 2025 by
netrunnereve
•
Draft
llama : do not crash if there is no CPU backend
examples
#13395
opened May 8, 2025 by
slaren
Loading…
grammar: handle misplaced special regex chars [*+?]
#13391
opened May 8, 2025 by
rick-github
Loading…
Add --parse-special for enabling parsing of special tokens in imatrix calculation
examples
#13389
opened May 8, 2025 by
bartowski1182
Loading…
metal : optimize MoE for large batches
Apple Metal
https://en.wikipedia.org/wiki/Metal_(API)
ggml
changes relating to the ggml tensor library for machine learning
#13388
opened May 8, 2025 by
ggerganov
Loading…
CUDA: fix crash on large batch size for MoE models
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#13384
opened May 8, 2025 by
JohannesGaessler
Loading…
sycl: simplify bin_bcast_kernel
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
#13383
opened May 8, 2025 by
AD2605
Loading…
musa: enable MUSA graphs
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#13382
opened May 8, 2025 by
yeahdongcn
•
Draft
gguf-py: Optimize python script changes
GGUFReader
read-only mode performance
python
#13378
opened May 8, 2025 by
Isotr0py
Loading…
llama-run: add support for downloading models from ModelScope
examples
#13370
opened May 8, 2025 by
yeahdongcn
Loading…
CUDA: update build CTK version to 12.8
devops
improvements to build systems and github actions
ggml
changes relating to the ggml tensor library for machine learning
Nvidia GPU
Issues specific to Nvidia GPUs
#13360
opened May 7, 2025 by
thevishalagarwal
Loading…
SYCL: Fix test-backend-ops crashes with SYCL-Graph
ggml
changes relating to the ggml tensor library for machine learning
SYCL
https://en.wikipedia.org/wiki/SYCL - GPU programming language
rpc : add rpc_msg_set_tensor_hash_req
ggml
changes relating to the ggml tensor library for machine learning
#13353
opened May 7, 2025 by
rgerganov
Loading…
python : bump transformers version
python
python script changes
#13351
opened May 7, 2025 by
ngxson
Loading…
Add mistral-chat-7b preset for llama-server
examples
#13348
opened May 7, 2025 by
vahedshaik
Loading…
llama: move page cache via mbind to prevent cross-NUMA access
#13335
opened May 6, 2025 by
vishalc-ibm
Loading…
add AMD Genoa
ggml
changes relating to the ggml tensor library for machine learning
#13334
opened May 6, 2025 by
QuPengfei
Loading…
vulkan: Allow up to 4096 elements for mul_mat_id row_ids
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#13326
opened May 6, 2025 by
jeffbolznv
Loading…
vulkan: scalar flash attention implementation
devops
improvements to build systems and github actions
ggml
changes relating to the ggml tensor library for machine learning
Vulkan
Issues specific to the Vulkan backend
#13324
opened May 6, 2025 by
jeffbolznv
Loading…
[Perf] [CPU] eliminate redundant memory access in group query attention
ggml
changes relating to the ggml tensor library for machine learning
#13319
opened May 5, 2025 by
ZelinMa557
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2025-05-06.