Skip to content

RuntimeError: Unsloth: Quantization failed! You might have to compile llama.cpp yourself, then run this again. #1781

Open
@Cyanogen8872

Description

@Cyanogen8872

Hi, I am a beginner and I've encountered an issue when trying to save to gguf.
Initially, I received the error message: "Unsloth: The file ('llama.cpp/llama-quantize' or 'llama.cpp/llama-quantize.exe' if you are on Windows WSL) or 'llama.cpp/quantize' does not exist."

Following the instructions at https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#cpu-build, I successfully built llama.cpp, and moved the "llama-quantize.exe" to ./llama.cpp. (#748)

model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")

However, I got the new error when converting it into q4_k_m:

Unsloth: Conversion completed! Output location: C:\Code\unsloth\model\unsloth.BF16.gguf Unsloth: [2] Converting GGUF 16bit into q4_k_m. This might take 20 minutes... '.' is not recognized as an internal or external command, operable program or batch file. Traceback (most recent call last): File "C:\Code\unsloth\test_lora.py", line 36, in <module> model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m") File "C:\Users\User\anaconda3\envs\unsloth_env\Lib\site-packages\unsloth\save.py", line 1748, in unsloth_save_pretrained_gguf all_file_locations, want_full_precision = save_to_gguf( ^^^^^^^^^^^^^ File "C:\Users\User\anaconda3\envs\unsloth_env\Lib\site-packages\unsloth\save.py", line 1251, in save_to_gguf raise RuntimeError( RuntimeError: Unsloth: Quantization failed! You might have to compile llama.cpp yourself, then run this again. You do not need to close this Python program. Run the following commands in a new terminal: You must run this in the same folder as you're saving your model. git clone --recursive https://github.com/ggerganov/llama.cpp cd llama.cpp && make clean && make all -j Once that's done, redo the quantization.

Unsloth 2025.2.15: Fast Llama patching. Transformers: 4.49.0. GPU: NVIDIA GeForce RTX 4080. Max memory: 15.992 GB. Platform: Windows. Torch: 2.6.0+cu124. CUDA: 8.9. CUDA Toolkit: 12.4. Triton: 3.1.0

What should I do?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions