How to use on GPU?

Very interesting library @r2d4 ! 

I am trying to use the example in the README but with the model being on the GPU (as is required for many of the recent larger LLMs):

```bash
import regex
from transformers import AutoModelForCausalLM, AutoTokenizer

from rellm import complete_re

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

prompt = "ReLLM, the best way to get structured data out of LLMs, is an acronym for "
pattern = regex.compile(r'Re[a-z]+ L[a-z]+ L[a-z]+ M[a-z]+')

# THIS IS WHAT I'D LIKE TO DO
devide = "cuda:0"
model.to(device)

output = complete_re(tokenizer=tokenizer, 
                     model=model, 
                     prompt=prompt,
                     pattern=pattern,
                     do_sample=True,
                     max_new_tokens=80)
print(output)
```

fails with

```
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
```

Is it possible to use ReLLM with the model living on the GPU?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use on GPU? #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

How to use on GPU? #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions