Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5228
b5226
docker : do not build tests (#13204) * docker : do not build tests * include "ggml-cpu.h"
b5225
rpc : fix cache directory initialization (#13188) Signed-off-by: xiaofei <hbuxiaofei@gmail.com>
b5223
server : Prefilling assistant message in openai compatible API (#13174) * Prefilling assistant message in openai compatible API * fixed indentation * fixed code convention * simplify method usage * no more than one assistant message at end of messages * merge checks into prefill code * Update examples/server/utils.hpp --------- Co-authored-by: matteo <matteo@naspc.lan> Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
b5222
sampling : when top-k <= 0 -> noop (#13173) ggml-ci
b5221
llama-bench: fixed size of fields to correctly map to values (#13183)
b5220
CUDA: fix non-cont. inputs for batched mat mul (#13155)
b5219
llama : llm_type order by size (#13177)
b5218
mtmd : add qwen2vl and qwen2.5vl (#13141) * llava : add clip_n_output_tokens, deprecate clip_n_patches * mtmd : add qwen2vl and qwen2.5vl * decode_embd_batch::set_position_... * working version * deprecate llama-qwen2vl-cli * correct order W, H of clip_embd_nbytes_by_img * edit existing line in hot topics
b5217
llama : set qwen3 model type sizes (#13175)