Skip to content

GRU performance severely degraded inside tf.function with Apple m1 chip #55455

Closed
@David-Mao

Description

@David-Mao

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
    yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
    Mac OS Monterey 12.3, Metal device set to: Apple M1 Pro
  • TensorFlow installed from (source or binary):
    binary
  • TensorFlow version:
    tensorflow-deps 2.7.0
    tensorflow-macos 2.8.0
    tensorflow-metal 0.4.0
  • Python version:
    3.9.12
  • GPU model and memory:
    Apple M1 Pro

Describe the current behavior

I run this simple code with a GRU layer with a tf.function decorator:

import tensorflow as tf
from time import time

a = tf.random.truncated_normal([4, 4, 4])
layer = tf.keras.layers.GRU(4) 

@tf.function
def f(a):
    return layer(a)

start = time()
for _ in range(1000):
    with tf.GradientTape() as tape:
        b = f(a)
print(str(time() - start), "seconds")

its much slower (~5-10x times) than running in the eager mode. However, this bug only shows up for recurrent layers. When using Dense, the tf.function mode is faster than the eager mode as expected. The issue also disappeared outside tf.GradientTape().
I only encountered this problem in my Apple Macbook Pro with M1 chip. I tried it on a linux machine and it's ok.

Describe the expected behavior
tf.function should be faster (at least not several times slower) than the eager mode.

Standalone code to reproduce the issue
It cannot be reproduced on a linux machine, so no Colab notebook is available.

Other info / logs

FYI the code above runs with the warning message as follows:

2022-03-31 14:40:34.462151: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-03-31 14:40:34.463604: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-03-31 14:40:34.480917: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] function_optimizer failed: INVALID_ARGUMENT: Input 0 of node gru_partitionedcall_10_RetVal was passed float from gru/PartitionedCall:12 incompatible with expected variant.
2022-03-31 14:40:34.487243: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] tfg_optimizer{} failed: INVALID_ARGUMENT: Input 0 of node gru_partitionedcall_10_RetVal was passed float from gru/PartitionedCall:12 incompatible with expected variant.
when importing GraphDef to MLIR module in GrapplerHook
2022-03-31 14:40:34.488903: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:828] function_optimizer failed: INVALID_ARGUMENT: Input 0 of node gru_partitionedcall_10_RetVal was passed float from gru/PartitionedCall:12 incompatible with expected variant.
2022-03-31 14:40:34.494395: W tensorflow/core/common_runtime/process_function_library_runtime.cc:932] Ignoring multi-device function optimization failure: INVALID_ARGUMENT: Input 0 of node gru_partitionedcall_10_RetVal was passed float from gru/PartitionedCall:12 incompatible with expected variant.
2022-03-31 14:40:34.508855: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.

Metadata

Metadata

Assignees

Labels

TF 2.8comp:tf.functiontf.function related issuesstaleThis label marks the issue/pr stale - to be closed automatically if no activitystat:awaiting responseStatus - Awaiting response from authortype:bugBugtype:performancePerformance Issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions