Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable autocast cache for tensor views as fix for #48049 (#48696) #48936

Merged
merged 1 commit into from Dec 7, 2020

Conversation

malfet
Copy link
Contributor

@malfet malfet commented Dec 7, 2020

Summary:
Fixes #48049

Root cause of the issue explained here.

This PR implements albanD's suggestion to add the !t.is_view() check and disable autocast caching for views of tensors.

The added test checks for an increase in memory usage by comparing the initially allocated memory with the memory after 3 iterations using a single nn.Linear layer in a no_grad and autocast context.

After this PR the memory usage in the original issue doesn't grow anymore and yields:

autocast: True
0: 0MB (peak 1165MB)
1: 0MB (peak 1264MB)
2: 0MB (peak 1265MB)
3: 0MB (peak 1265MB)
4: 0MB (peak 1265MB)
5: 0MB (peak 1265MB)
6: 0MB (peak 1265MB)
7: 0MB (peak 1265MB)
8: 0MB (peak 1265MB)
9: 0MB (peak 1265MB)

CC ngimel mcarilli

This is a cherry-pick of #48696 into release/1.7 branch

Summary:
Fixes #48049

Root cause of the issue explained [here](#48049 (comment)).

This PR implements albanD's suggestion to add the `!t.is_view()` check and disable autocast caching for views of tensors.

The added test checks for an increase in memory usage by comparing the initially allocated memory with the memory after 3 iterations using a single `nn.Linear` layer in a `no_grad` and `autocast` context.

After this PR the memory usage in the original issue doesn't grow anymore and yields:
```python
autocast: True
0: 0MB (peak 1165MB)
1: 0MB (peak 1264MB)
2: 0MB (peak 1265MB)
3: 0MB (peak 1265MB)
4: 0MB (peak 1265MB)
5: 0MB (peak 1265MB)
6: 0MB (peak 1265MB)
7: 0MB (peak 1265MB)
8: 0MB (peak 1265MB)
9: 0MB (peak 1265MB)
```

CC ngimel mcarilli

Pull Request resolved: #48696

Reviewed By: bdhirsh

Differential Revision: D25276231

Pulled By: ngimel

fbshipit-source-id: e2571e9f166c0a6f6f569b0c28e8b9ca34132743
@dr-ci
Copy link

@dr-ci dr-ci bot commented Dec 7, 2020

💊 CI failures summary and remediations

As of commit a1dcb78 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 1/2 non-CircleCI failure(s)

XLA failure

Job pytorch_xla_linux_bionic_py3_6_clang9_test is failing. Please create an issue with title prefixed by [PT_BREAK] in pytorch/xla and link to to this PR. If you have questions, please reach out to @ailzhang / @dlibenzi / @JackCaoG.


Extra GitHub checks: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 2 times.

@seemethere seemethere added this to the 1.7.1 milestone Dec 7, 2020
@seemethere seemethere merged commit 57bffc3 into release/1.7 Dec 7, 2020
76 of 87 checks passed
@seemethere seemethere deleted the malfet/cp-48696 branch Dec 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants