Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences between .data and .detach #6990

Closed
jay960702 opened this issue Apr 26, 2018 · 7 comments
Closed

Differences between .data and .detach #6990

jay960702 opened this issue Apr 26, 2018 · 7 comments
Assignees

Comments

@jay960702
Copy link

@jay960702 jay960702 commented Apr 26, 2018

Issue description

Hi all,

I am not very clear about the differences between .data and .detach() in the latest pytorch 0.4.
For example:

a = torch.tensor([1,2,3], requires_grad = True)
b = a.data
c = a.detach()

so b is not as the same as the c?

Here is a part of the 'PyTorch 0.4.0 Migration Guide':

"However, .data can be unsafe in some cases. Any changes on x.data wouldn’t be tracked by autograd, and the computed gradients would be incorrect if x is needed in a backward pass. A safer alternative is to use x.detach(), which also returns a Tensor that shares data with requires_grad=False, but will have its in-place changes reported by autograd if x is needed in backward."

Can anyone give me more explanations about this sentence: "but will have its in-place changes reported by autograd if x is needed in backward"? Thanks!

@zou3519
Copy link
Contributor

@zou3519 zou3519 commented Apr 26, 2018

Here's an example. If you use detach() instead of .data, gradient computation is guaranteed to be correct..

>>> a = torch.tensor([1,2,3.], requires_grad = True)
>>> out = a.sigmoid()
>>> c = out.detach()
>>> c.zero_()  
tensor([ 0.,  0.,  0.])

>>> out  # modified by c.zero_() !!
tensor([ 0.,  0.,  0.])

>>> out.sum().backward()  # Requires the original value of out, but that was overwritten by c.zero_()
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

As opposed to using .data:

>>> a = torch.tensor([1,2,3.], requires_grad = True)
>>> out = a.sigmoid()
>>> c = out.data
>>> c.zero_()
tensor([ 0.,  0.,  0.])

>>> out  # out  was modified by c.zero_()
tensor([ 0.,  0.,  0.])

>>> out.sum().backward()
>>> a.grad  # The result is very, very wrong because `out` changed!
tensor([ 0.,  0.,  0.])

I'll leave this issue open: we should add an example to the migration guide and clarify that section.

@jay960702
Copy link
Author

@jay960702 jay960702 commented Apr 26, 2018

Hi Richard @zou3519

Thanks for your reply!
Your example is good. Any in-place change on x.detach() will cause errors when x is needed in backward, so .detach() is a safer way for the exclusion of subgraphs from gradient computation.

@yl-jiang
Copy link

@yl-jiang yl-jiang commented Mar 24, 2019

Why the fllowing code working?

a = torch.tensor([1,2,3.], requires_grad = True)
out = a.sigmoid().sum()
c = out.data
c.zero_()
tensor(0.)

out
tensor(0.)

out.backward()
a.grad
tensor([0.1966, 0.1050, 0.0452])

@asanakoy
Copy link
Contributor

@asanakoy asanakoy commented Jul 25, 2019

Any use-cases where .data is preferred over detach().
Is .data deprecated in Pytoch 1.* ?

@shawnthu
Copy link

@shawnthu shawnthu commented Sep 20, 2019

@peterzsj6
Copy link

@peterzsj6 peterzsj6 commented Sep 15, 2020

Here's an example. If you use detach() instead of .data, gradient computation is guaranteed to be correct..

>>> a = torch.tensor([1,2,3.], requires_grad = True)
>>> out = a.sigmoid()
>>> c = out.detach()
>>> c.zero_()  
tensor([ 0.,  0.,  0.])

>>> out  # modified by c.zero_() !!
tensor([ 0.,  0.,  0.])

>>> out.sum().backward()  # Requires the original value of out, but that was overwritten by c.zero_()
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

As opposed to using .data:

>>> a = torch.tensor([1,2,3.], requires_grad = True)
>>> out = a.sigmoid()
>>> c = out.data
>>> c.zero_()
tensor([ 0.,  0.,  0.])

>>> out  # out  was modified by c.zero_()
tensor([ 0.,  0.,  0.])

>>> out.sum().backward()
>>> a.grad  # The result is very, very wrong because `out` changed!
tensor([ 0.,  0.,  0.])

I'll leave this issue open: we should add an example to the migration guide and clarify that section.

Thank you for your example, but I saw other ppl's video saying autograd will check the version of the tensor to prevent this from happening. I am new to this, so I am a little confused.
https://youtu.be/MswxJw-8PvE?t=323

@wtiman520209
Copy link

@wtiman520209 wtiman520209 commented Jul 31, 2021

Why the fllowing code working?

a = torch.tensor([1,2,3.], requires_grad = True)
out = a.sigmoid().sum()
c = out.data
c.zero_()
tensor(0.)

out
tensor(0.)

out.backward()
a.grad
tensor([0.1966, 0.1050, 0.0452])

because value of out is not used for computing the gradient, even though value of out is change, the computed gradient w.r.t. a is still correct. tensor.detach() could detect whether tensors involved in computing gradient are changed or not, but tensor.data has no such functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
7 participants