Open
Description
Very interesting library @r2d4 !
I am trying to use the example in the README but with the model being on the GPU (as is required for many of the recent larger LLMs):
import regex
from transformers import AutoModelForCausalLM, AutoTokenizer
from rellm import complete_re
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
prompt = "ReLLM, the best way to get structured data out of LLMs, is an acronym for "
pattern = regex.compile(r'Re[a-z]+ L[a-z]+ L[a-z]+ M[a-z]+')
# THIS IS WHAT I'D LIKE TO DO
devide = "cuda:0"
model.to(device)
output = complete_re(tokenizer=tokenizer,
model=model,
prompt=prompt,
pattern=pattern,
do_sample=True,
max_new_tokens=80)
print(output)
fails with
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
Is it possible to use ReLLM with the model living on the GPU?
Metadata
Metadata
Assignees
Labels
No labels