Skip to content

Releases: ggml-org/llama.cpp

b5228

30 Apr 10:31
44cd8d9
Compare
Choose a tag to compare
feat(ggml-cpu): enable z17 compile (#13182)

z17 compilation requires GCC 15.1.0 and onwards

Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>

b5226

30 Apr 09:45
da84c04
Compare
Choose a tag to compare
docker : do not build tests (#13204)

* docker : do not build tests

* include "ggml-cpu.h"

b5225

30 Apr 07:12
a0f7016
Compare
Choose a tag to compare
rpc : fix cache directory initialization (#13188)

Signed-off-by: xiaofei <hbuxiaofei@gmail.com>

b5223

29 Apr 19:23
e2e1ddb
Compare
Choose a tag to compare
server : Prefilling assistant message in openai compatible API (#13174)

* Prefilling assistant message in openai compatible API

* fixed indentation

* fixed code convention

* simplify method usage

* no more than one assistant message at end of messages

* merge checks into prefill code

* Update examples/server/utils.hpp

---------

Co-authored-by: matteo <matteo@naspc.lan>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

b5222

29 Apr 19:02
d9d398f
Compare
Choose a tag to compare
sampling : when top-k <= 0 -> noop (#13173)

ggml-ci

b5221

29 Apr 18:46
5a63980
Compare
Choose a tag to compare
llama-bench: fixed size of fields to correctly map to values (#13183)

b5220

29 Apr 17:31
cdf7658
Compare
Choose a tag to compare
CUDA: fix non-cont. inputs for batched mat mul (#13155)

b5219

29 Apr 12:13
7d3af70
Compare
Choose a tag to compare
llama : llm_type order by size (#13177)

b5218

29 Apr 10:47
00e3e5a
Compare
Choose a tag to compare
mtmd : add qwen2vl and qwen2.5vl (#13141)

* llava : add clip_n_output_tokens, deprecate clip_n_patches

* mtmd : add qwen2vl and qwen2.5vl

* decode_embd_batch::set_position_...

* working version

* deprecate llama-qwen2vl-cli

* correct order W, H of clip_embd_nbytes_by_img

* edit existing line in hot topics

b5217

29 Apr 10:11
e98b369
Compare
Choose a tag to compare
llama : set qwen3 model type sizes (#13175)