Description
Hi, I am a beginner and I've encountered an issue when trying to save to gguf.
Initially, I received the error message: "Unsloth: The file ('llama.cpp/llama-quantize' or 'llama.cpp/llama-quantize.exe' if you are on Windows WSL) or 'llama.cpp/quantize' does not exist."
Following the instructions at https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#cpu-build, I successfully built llama.cpp, and moved the "llama-quantize.exe" to ./llama.cpp. (#748)
model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
However, I got the new error when converting it into q4_k_m:
Unsloth: Conversion completed! Output location: C:\Code\unsloth\model\unsloth.BF16.gguf Unsloth: [2] Converting GGUF 16bit into q4_k_m. This might take 20 minutes... '.' is not recognized as an internal or external command, operable program or batch file. Traceback (most recent call last): File "C:\Code\unsloth\test_lora.py", line 36, in <module> model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m") File "C:\Users\User\anaconda3\envs\unsloth_env\Lib\site-packages\unsloth\save.py", line 1748, in unsloth_save_pretrained_gguf all_file_locations, want_full_precision = save_to_gguf( ^^^^^^^^^^^^^ File "C:\Users\User\anaconda3\envs\unsloth_env\Lib\site-packages\unsloth\save.py", line 1251, in save_to_gguf raise RuntimeError( RuntimeError: Unsloth: Quantization failed! You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone --recursive https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && make all -j Once that's done, redo the quantization.
Unsloth 2025.2.15: Fast Llama patching. Transformers: 4.49.0. GPU: NVIDIA GeForce RTX 4080. Max memory: 15.992 GB. Platform: Windows. Torch: 2.6.0+cu124. CUDA: 8.9. CUDA Toolkit: 12.4. Triton: 3.1.0
What should I do?
Thanks