Skip to content

AutoGraph could not transform when fine-tuning HuggingFace Pretrained Model #55038

Closed
@adam1brownell

Description

@adam1brownell

System information
Macbook OS 11.6

numpy==1.19.5
pathlib2==2.3.5
python==3.8.8
tensorflow==2.7.0
transformers==4.2.0

Describe the current behavior
During .fit(), receiving warnings:

WARNING:tensorflow:AutoGraph could not transform <bound method Socket.send of <zmq.Socket(zmq.PUSH) at XXX >> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert

and

The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Would not have submitted except for the "please report" phrase in first warning. Happy to close quickly if this is resolved/trivial.

Describe the expected behavior

Neither of these warnings to appear prior to training.

Standalone code to reproduce the issue

# train_texts is a list of strings
# train_labels is a list of integers (0,1)

import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
from sklearn.model_selection import train_test_split

train_texts, val_texts, train_labels, val_labels = train_test_split(train_texts, train_labels, test_size=.2)
checkpoint = "cardiffnlp/twitter-xlm-roberta-base-sentiment"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

train_encodings = tokenizer(train_texts, truncation=True, padding=True, max_length=512, return_tensors='tf')['input_ids']
val_encodings = tokenizer(val_texts, truncation=True, padding=True, max_length=512, return_tensors='tf')['input_ids']
test_encodings = tokenizer(test_texts, truncation=True, padding=True, max_length=512, return_tensors='tf')['input_ids']

train_dataset = tf.data.Dataset.from_tensor_slices((
    train_encodings,
    train_labels
))
val_dataset = tf.data.Dataset.from_tensor_slices((
    val_encodings,
    val_labels
))
test_dataset = tf.data.Dataset.from_tensor_slices((
    test_encodings,
    test_labels
))

model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint)

optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss) 
## Warnings occur here:
model.fit(train_dataset.shuffle(1000).batch(16), epochs=3, batch_size=16)

Other info / logs
Verbose==10 logs leading up to warning:
tf_verbose_log.txt

Metadata

Metadata

Assignees

Labels

TF 2.7Issues related to TF 2.7.0comp:autographAutograph related issuesstaleThis label marks the issue/pr stale - to be closed automatically if no activitystat:awaiting responseStatus - Awaiting response from authortype:supportSupport issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions