Disable autocast cache for tensor views as fix for #48049 (#48696) #48936

malfet · 2020-12-07T17:47:50Z

Summary:
Fixes #48049

Root cause of the issue explained here.

This PR implements albanD's suggestion to add the !t.is_view() check and disable autocast caching for views of tensors.

The added test checks for an increase in memory usage by comparing the initially allocated memory with the memory after 3 iterations using a single nn.Linear layer in a no_grad and autocast context.

After this PR the memory usage in the original issue doesn't grow anymore and yields:

autocast: True
0: 0MB (peak 1165MB)
1: 0MB (peak 1264MB)
2: 0MB (peak 1265MB)
3: 0MB (peak 1265MB)
4: 0MB (peak 1265MB)
5: 0MB (peak 1265MB)
6: 0MB (peak 1265MB)
7: 0MB (peak 1265MB)
8: 0MB (peak 1265MB)
9: 0MB (peak 1265MB)

CC ngimel mcarilli

This is a cherry-pick of #48696 into release/1.7 branch

Summary: Fixes #48049 Root cause of the issue explained [here](#48049 (comment)). This PR implements albanD's suggestion to add the `!t.is_view()` check and disable autocast caching for views of tensors. The added test checks for an increase in memory usage by comparing the initially allocated memory with the memory after 3 iterations using a single `nn.Linear` layer in a `no_grad` and `autocast` context. After this PR the memory usage in the original issue doesn't grow anymore and yields: ```python autocast: True 0: 0MB (peak 1165MB) 1: 0MB (peak 1264MB) 2: 0MB (peak 1265MB) 3: 0MB (peak 1265MB) 4: 0MB (peak 1265MB) 5: 0MB (peak 1265MB) 6: 0MB (peak 1265MB) 7: 0MB (peak 1265MB) 8: 0MB (peak 1265MB) 9: 0MB (peak 1265MB) ``` CC ngimel mcarilli Pull Request resolved: #48696 Reviewed By: bdhirsh Differential Revision: D25276231 Pulled By: ngimel fbshipit-source-id: e2571e9f166c0a6f6f569b0c28e8b9ca34132743

dr-ci · 2020-12-07T17:59:04Z

💊 CI failures summary and remediations

As of commit a1dcb78 (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 1/2 non-CircleCI failure(s)

XLA failure

Job pytorch_xla_linux_bionic_py3_6_clang9_test is failing. Please create an issue with title prefixed by [PT_BREAK] in pytorch/xla and link to to this PR. If you have questions, please reach out to @ailzhang / @dlibenzi / @JackCaoG.

Extra GitHub checks: 1 failed

Failed: GitHub Actions - flake8-py3

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 2 times.

malfet requested a review from seemethere Dec 7, 2020

facebook-github-bot added the cla signed label Dec 7, 2020

seemethere approved these changes Dec 7, 2020

View changes

seemethere added this to the 1.7.1 milestone Dec 7, 2020

seemethere mentioned this pull request Dec 7, 2020

[v1.7.1] Release Tracker #47622

Closed

seemethere merged commit 57bffc3 into release/1.7 Dec 7, 2020
76 of 87 checks passed

seemethere deleted the malfet/cp-48696 branch Dec 7, 2020

ptrblck mentioned this pull request Dec 10, 2020

native amp consumes 10x gpu memory #48049

Closed

s-jse mentioned this pull request Dec 25, 2020

Wip/version bumps stanford-oval/genienlp#66

Merged

pytorch / pytorch Public

Disable autocast cache for tensor views as fix for #48049 (#48696) #48936

Disable autocast cache for tensor views as fix for #48049 (#48696) #48936

malfet commented Dec 7, 2020 •

edited

dr-ci bot commented Dec 7, 2020 •

edited

pytorch / pytorch Public

Disable autocast cache for tensor views as fix for #48049 (#48696) #48936

Disable autocast cache for tensor views as fix for #48049 (#48696) #48936

Conversation

malfet commented Dec 7, 2020 • edited

dr-ci bot commented Dec 7, 2020 • edited

💊 CI failures summary and remediations

XLA failure

Extra GitHub checks: 1 failed

malfet commented Dec 7, 2020 •

edited

dr-ci bot commented Dec 7, 2020 •

edited