Releases · ggml-org/llama.cpp

30 Apr 10:31

44cd8d9

b5228 Latest

Latest

feat(ggml-cpu): enable z17 compile (#13182)

z17 compilation requires GCC 15.1.0 and onwards

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

Assets 26

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-04-30T10:31:11Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-04-30T10:31:24Z
llama-b5228-bin-macos-arm64.zip

22.1 MB 2025-04-30T10:31:38Z
llama-b5228-bin-macos-x64.zip

23.7 MB 2025-04-30T10:31:40Z
llama-b5228-bin-ubuntu-arm64.zip

23.7 MB 2025-04-30T10:31:42Z
llama-b5228-bin-ubuntu-vulkan-x64.zip

32.4 MB 2025-04-30T10:31:43Z
llama-b5228-bin-ubuntu-x64.zip

25.1 MB 2025-04-30T10:31:45Z
llama-b5228-bin-win-avx-x64.zip

18.1 MB 2025-04-30T10:31:47Z
llama-b5228-bin-win-avx2-x64.zip

18.1 MB 2025-04-30T10:31:48Z
llama-b5228-bin-win-avx512-x64.zip

18.1 MB 2025-04-30T10:31:50Z
Source code (zip)

2025-04-30T09:47:35Z
Source code (tar.gz)

2025-04-30T09:47:35Z

30 Apr 09:45

github-actions

b5226

da84c04

b5226

docker : do not build tests (#13204)

* docker : do not build tests

* include "ggml-cpu.h"

Assets 26

30 Apr 07:12

github-actions

b5225

a0f7016

b5225

rpc : fix cache directory initialization (#13188)

Signed-off-by: xiaofei <hbuxiaofei@gmail.com>

Assets 26

29 Apr 19:23

github-actions

b5223

e2e1ddb

b5223

server : Prefilling assistant message in openai compatible API (#13174)

* Prefilling assistant message in openai compatible API

* fixed indentation

* fixed code convention

* simplify method usage

* no more than one assistant message at end of messages

* merge checks into prefill code

* Update examples/server/utils.hpp

---------

Co-authored-by: matteo <matteo@naspc.lan>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

Assets 26

29 Apr 19:02

github-actions

b5222

d9d398f

b5222

sampling : when top-k <= 0 -> noop (#13173)

ggml-ci

Assets 26

29 Apr 18:46

github-actions

b5221

5a63980

b5221

llama-bench: fixed size of fields to correctly map to values (#13183)

Assets 26

29 Apr 17:31

github-actions

b5220

cdf7658

b5220

CUDA: fix non-cont. inputs for batched mat mul (#13155)

Assets 26

29 Apr 12:13

github-actions

b5219

7d3af70

b5219

llama : llm_type order by size (#13177)

Assets 26

29 Apr 10:47

github-actions

b5218

00e3e5a

b5218

mtmd : add qwen2vl and qwen2.5vl (#13141)

* llava : add clip_n_output_tokens, deprecate clip_n_patches

* mtmd : add qwen2vl and qwen2.5vl

* decode_embd_batch::set_position_...

* working version

* deprecate llama-qwen2vl-cli

* correct order W, H of clip_embd_nbytes_by_img

* edit existing line in hot topics

Assets 26

29 Apr 10:11

github-actions

b5217

e98b369

b5217

llama : set qwen3 model type sizes (#13175)

Assets 26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggml-org/llama.cpp

b5228

b5226

b5225

b5223

b5222

b5221

b5220

b5219

b5218

b5217