Description
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
yes - OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Mac OS Monterey 12.3, Metal device set to: Apple M1 Pro - TensorFlow installed from (source or binary):
binary - TensorFlow version:
tensorflow-deps 2.7.0
tensorflow-macos 2.8.0
tensorflow-metal 0.4.0 - Python version:
3.9.12 - GPU model and memory:
Apple M1 Pro
Describe the current behavior
I run this simple code with a GRU layer with a tf.function
decorator:
import tensorflow as tf
from time import time
a = tf.random.truncated_normal([4, 4, 4])
layer = tf.keras.layers.GRU(4)
@tf.function
def f(a):
return layer(a)
start = time()
for _ in range(1000):
with tf.GradientTape() as tape:
b = f(a)
print(str(time() - start), "seconds")
its much slower (~5-10x times) than running in the eager mode. However, this bug only shows up for recurrent layers. When using Dense, the tf.function
mode is faster than the eager mode as expected. The issue also disappeared outside tf.GradientTape()
.
I only encountered this problem in my Apple Macbook Pro with M1 chip. I tried it on a linux machine and it's ok.
Describe the expected behavior
tf.function
should be faster (at least not several times slower) than the eager mode.
Standalone code to reproduce the issue
It cannot be reproduced on a linux machine, so no Colab notebook is available.
Other info / logs
FYI the code above runs with the warning message as follows:
2022-03-31 14:40:34.462151: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-03-31 14:40:34.463604: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-03-31 14:40:34.480917: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] function_optimizer failed: INVALID_ARGUMENT: Input 0 of node gru_partitionedcall_10_RetVal was passed float from gru/PartitionedCall:12 incompatible with expected variant.
2022-03-31 14:40:34.487243: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] tfg_optimizer{} failed: INVALID_ARGUMENT: Input 0 of node gru_partitionedcall_10_RetVal was passed float from gru/PartitionedCall:12 incompatible with expected variant.
when importing GraphDef to MLIR module in GrapplerHook
2022-03-31 14:40:34.488903: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] function_optimizer failed: INVALID_ARGUMENT: Input 0 of node gru_partitionedcall_10_RetVal was passed float from gru/PartitionedCall:12 incompatible with expected variant.
2022-03-31 14:40:34.494395: W tensorflow/core/common_runtime/process_function_library_runtime.cc:932] Ignoring multi-device function optimization failure: INVALID_ARGUMENT: Input 0 of node gru_partitionedcall_10_RetVal was passed float from gru/PartitionedCall:12 incompatible with expected variant.
2022-03-31 14:40:34.508855: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.