"RuntimeError: CUDA driver error: unknown error" when Fine Tuning Llama-3.2-11B-Vision-Instruct

Hello, I’m trying to fine-tune Llama 3.2 11B Vision Instruct to take inputs of an image and output text and a number.
I have been following the process documented by the Unsloth notebook:
https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb

I am getting the following error from the below line of the attached training.py file:
_trainer_stats = trainer.train()_

The error (shown below) is cryptic and I could not find much help after searching about it...

*********************************************************
Going To Create the Trainer
Created the trainer
GPU = NVIDIA GeForce RTX 4080 SUPER. Max memory = 15.992 GB.
8.525 GB of memory reserved.
Shown current memory stats
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 10 | Num Epochs = 30 | Total steps = 30
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 67,174,400/11,000,000,000 (0.61% trained)
  0%|                                                                                            | 0/30 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
Unsloth: Will smartly offload gradients to save VRAM!
Traceback (most recent call last):
  File "/home/ananya/AnanyaIR/training.py", line 185, in <module>
  File "/home/ananya/.local/lib/python3.12/site-packages/transformers/trainer.py", line 2245, in train
  File "<string>", line 315, in _fast_inner_training_loop
  File "<string>", line 77, in _unsloth_training_step
  File "/home/ananya/.local/lib/python3.12/site-packages/accelerate/accelerator.py", line 2454, in backward
  File "/home/ananya/.local/lib/python3.12/site-packages/torch/_tensor.py", line 626, in backward
  File "/home/ananya/.local/lib/python3.12/site-packages/torch/autograd/__init__.py", line 347, in backward
  File "/home/ananya/.local/lib/python3.12/site-packages/torch/autograd/graph.py", line 823, in _engine_run_backward
  File "/home/ananya/.local/lib/python3.12/site-packages/torch/autograd/function.py", line 307, in apply
  File "/home/ananya/.local/lib/python3.12/site-packages/unsloth_zoo/gradient_checkpointing.py", line 554, in backward
  File "/home/ananya/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
  File "/home/ananya/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
  File "/home/ananya/.local/lib/python3.12/site-packages/transformers/models/mllama/modeling_mllama.py", line 960, in forward
  File "/home/ananya/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
  File "/home/ananya/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
  File "/tmp/unsloth_compiled_cache/unsloth_compiled_module_mllama.py", line 568, in forward
  File "/tmp/unsloth_compiled_cache/unsloth_compiled_module_mllama.py", line 535, in MllamaTextCrossSdpaAttention_forward
RuntimeError: CUDA driver error: unknown error
  0%|          | 0/30 [00:15<?, ?it/s]
*********************************************************

If anyone knows what the cause of this error might be, I’d really appreciate the help. Thank you.

_PS: 
Some sources with similar unknown errors indicated a possible out of memory issue and I tried setting
gpu_memory_utilization = 0.6
in the FastVisionModel.from_pretrained call, though that resulted in another error
TypeError: MllamaForConditionalGeneration.__init__() got an unexpected keyword argument 'gpu_memory_utilization'
So it looks like that parameter cannot be set here_

[training.txt](https://github.com/user-attachments/files/19901378/training.txt)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"RuntimeError: CUDA driver error: unknown error" when Fine Tuning Llama-3.2-11B-Vision-Instruct #2408

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"RuntimeError: CUDA driver error: unknown error" when Fine Tuning Llama-3.2-11B-Vision-Instruct #2408

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions