nn.Parameter{List,Dict} not copied to gpus in forward pass when nn.DataParallel is used #36035

grtzsohalf · 2020-04-05T03:30:47Z

🐛 Bug

When I use nn.DataParallel to wrap an nn.Module X, nn.Parameter in X is not copied to gpus in the forward pass. I think nn.Parameter can be considered as a part of module parameters, so it should be treated like other nn.Module parameters in X as well. Is it an intentional design?

To Reproduce

test.py:

import sys
import torch
import torch.nn as nn
import torch.nn.functional as F


gpus = list(map(int, sys.argv[1].split(',')))


class Net(nn.Module):
    def __init__(self):
        super().__init__()

        self.alpha = nn.ParameterList()

        for i in range(4):
            self.alpha.append(nn.Parameter(1e-3*torch.randn(i+2, 5)))

        self.cnn = nn.Conv2d(1, 1, 1, 1, 1)


    def forward(self, x):
        print(self.alpha)
        print(self.cnn)
        return x


if __name__ == '__main__':
    net = Net().cuda()
    if len(gpus) > 1:
        net = nn.DataParallel(net, device_ids=gpus)

    net(torch.rand(4, 5))

When I run python3 test.py 0 (which means device_id = [0]), the output is

ParameterList(
    (0): Parameter containing: [torch.cuda.FloatTensor of size 2x5 (GPU 0)]
    (1): Parameter containing: [torch.cuda.FloatTensor of size 3x5 (GPU 0)]
    (2): Parameter containing: [torch.cuda.FloatTensor of size 4x5 (GPU 0)]
    (3): Parameter containing: [torch.cuda.FloatTensor of size 5x5 (GPU 0)]
)
Conv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))

However, when I run python3 test.py 0,1 (which means device_id = [0, 1]), the output is

ParameterList()
Conv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))
ParameterList()
Conv2d(1, 1, kernel_size=(1, 1), stride=(1, 1), padding=(1, 1))

Only nn.Module is copied to gpus in forward pass.
How can I use and train nn.Parameter just like nn.Module with nn.DataParallel?

Expected behavior

When the nn.Module X is wrapped with nn.DataParallel, both nn.Module and nn.Parameter in X should be copied to gpus.

Environment

PyTorch version: 1.6.0.dev20200401+cu101
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Arch Linux
GCC version: (Arch Linux 9.3.0-1) 9.3.0
CMake version: version 3.17.0

Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: 10.2.89
GPU models and configuration:
GPU 0: GeForce GTX TITAN X
GPU 1: GeForce GTX 1060 6GB

Nvidia driver version: 440.64
cuDNN version: /usr/lib/libcudnn.so.7.6.5

Versions of relevant libraries:
[pip3] numpy==1.18.2
[pip3] torch==1.6.0.dev20200401+cu101
[pip3] torchexp==0.1.0
[pip3] torchvision==0.6.0.dev20200401+cu101
[conda] Could not collect

cc @ezyang @gchanan @zou3519 @albanD @mruberry

The text was updated successfully, but these errors were encountered:

zou3519 · 2020-04-06T15:01:08Z

Seems like a correctness issue so I am marking it as high-pri.

albanD · 2020-04-06T17:08:20Z

Looks like a bad interaction between ParameterList and the replicate function on nn.Module. This needs more investigation.

albanD · 2020-04-06T17:26:12Z

Reducing the priority as many other things do not interact nicely with DataParallel. We can increase it again if we see many users hit this issue.

grtzsohalf · 2020-04-07T02:49:21Z

Thanks for your notice, @zou3519 @albanD .
For now, is there a recommended workaround to use nn.Parameter with nn.DataParallel?

zou3519 · 2020-04-07T12:54:57Z

Instead of using nn.ParameterList, Does it work if you directly assign the parameters to the model?

grtzsohalf · 2020-04-07T13:14:28Z

Now I directly pass nn.Parameter as a part of inputs into the model.
test.py:

import sys
import torch
import torch.nn as nn
import torch.nn.functional as F


gpus = list(map(int, sys.argv[1].split(',')))


class Net(nn.Module):
    def __init__(self):
        super().__init__()

        self.alpha = nn.ParameterList()
        for i in range(2):
            self.alpha.append(nn.Parameter(1e-3*torch.randn(i+2, 5)))
        print('Init: ', [a for a in self.alpha])


    def forward(self, x, alphas):
        # print('Inputs: ', x.shape)
        if alphas is not None:
            alphas = [a.squeeze(0) for a in alphas]
            print('In forward pass: ', [a for a in alphas])
            print([a.shape for a in alphas])
        else:
            print('In forward pass: ', [a.shape for a in self.alpha])
        return x


if __name__ == '__main__':
    net = Net().cuda()
    if len(gpus) > 1:
        net = nn.DataParallel(net, device_ids=gpus)
        alphas = [a.unsqueeze(0).repeat(len(gpus),1,1) for a in net.module.alpha]
        print([a.shape for a in alphas])
    else:
        alphas = None

    net(torch.rand(4, 5), alphas)
    # print('Not in forward pass: ', [n for n, p in net.named_parameters()])

If I run python3 test.py 0,1, then I get:

Init:  [Parameter containing:
tensor([[-0.0011,  0.0020,  0.0002, -0.0011,  0.0001],
        [ 0.0008,  0.0002,  0.0022,  0.0007, -0.0011]], requires_grad=True), Parameter containing:
tensor([[ 0.0010, -0.0008,  0.0010,  0.0003, -0.0013],
        [-0.0005, -0.0005, -0.0008,  0.0005,  0.0003],
        [ 0.0003,  0.0016,  0.0003, -0.0002,  0.0009]], requires_grad=True)]

[torch.Size([2, 2, 5]), torch.Size([2, 3, 5])]

In forward pass:  In forward pass:  [tensor([[-0.0011,  0.0020,  0.0002, -0.0011,  0.0001],
        [ 0.0008,  0.0002,  0.0022,  0.0007, -0.0011]], device='cuda:0',
       grad_fn=<SqueezeBackward1>), tensor([[ 0.0010, -0.0008,  0.0010,  0.0003, -0.0013],
        [-0.0005, -0.0005, -0.0008,  0.0005,  0.0003],
        [ 0.0003,  0.0016,  0.0003, -0.0002,  0.0009]], device='cuda:0',
       grad_fn=<SqueezeBackward1>)]
[torch.Size([2, 5]), torch.Size([3, 5])]
[tensor([[-0.0011,  0.0020,  0.0002, -0.0011,  0.0001],
        [ 0.0008,  0.0002,  0.0022,  0.0007, -0.0011]], device='cuda:1',
       grad_fn=<SqueezeBackward1>), tensor([[ 0.0010, -0.0008,  0.0010,  0.0003, -0.0013],
        [-0.0005, -0.0005, -0.0008,  0.0005,  0.0003],
        [ 0.0003,  0.0016,  0.0003, -0.0002,  0.0009]], device='cuda:1',
       grad_fn=<SqueezeBackward1>)]
[torch.Size([2, 5]), torch.Size([3, 5])]

This approach can work.

grtzsohalf · 2020-04-07T13:15:12Z

Instead of using nn.ParameterList, Does it work if you directly assign the parameters to the model?

It doesn't work in this way.

lidazhang · 2020-04-14T15:43:31Z

Same problem here using nn.Parameter() with DataParallel

cyranawm · 2020-05-29T16:29:45Z

Same problem here, during the forward pass of a module wrapped with DataParallel, the nn.ParameterList is empty.

ruoshiliu · 2020-06-19T02:39:15Z

I also encountered the same problem. This problem also applies to ParameterDict(). Is there any workaround solution for this issue?

albanD · 2020-06-19T14:39:44Z

Hi,

The workaround right now is just to save the Parameter directly on the module and not in the dict.

Bumping priority as it seem quite common

ezyang · 2020-06-22T17:27:22Z

nb that if at all possible, you should using DistributedDataParallel instead!

vfdev-5 · 2020-06-25T15:38:30Z

@ezyang I'll work on this issue.

As far as I understand from the code of replicate method of nn/parallel/replicate.py, I think the problem is with this line:

pytorch/torch/nn/parallel/replicate.py

Line 146 in e440c37

setattr(replica, key, param)

as seems like we cannot add parameters with setattr for ParameterList or ParameterDict.
Replacing it with

    if isinstance(replica, (ParameterList, ParameterDict)):
        replica._parameters[str(key)] = Parameter(param)
    else:
        setattr(replica, key, param)

propagates parameters from list/dict to replicas. What do you think about such solution ? Otherwise, could you please hint another way to fix the problem. Thanks !

albanD · 2020-06-25T15:54:37Z

Should they actually be Parameters? The comment seem to hint at the fact that they should not.

vfdev-5 · 2020-06-25T15:59:25Z

@albanD seems like we cannot add non-Parameter object to replica._parameters[str(key)] = ...

In replica, attribute _parameter is set as empty ordered dict: replica._parameters = OrderedDict() inside Module._replicate_for_data_parallel.

- added test replicate on parameter, list, dict

albanD · 2020-06-26T14:00:55Z

I agree that this would solve this issue but I wonder if it's not going to cause more issues when other part of torch.nn interact with it. I don't think we want to "pretend" these are Parameters even though they are Tensors with history. Won't .parameters() pick up these new Tensors as it traverse all the _parameters dict?

One thing I can thing of would be to allow ParameterList to contain elements that are not Parameters (like any other nn.Module). These would be added via a special API and won't be registered in the _parameters dict.

vfdev-5 · 2020-06-26T14:49:47Z

Won't .parameters() pick up these new Tensors as it traverse all the _parameters dict?

Thanks for pointing this out ! Yes, we pick this tensors up.

One thing I can thing of would be to allow ParameterList to contain elements that are not Parameters (like any other nn.Module). These would be added via a special API and won't be registered in the _parameters dict.

I think, the problem will then appear from the user side when he/she would like to access the elements of ParameterList...

albanD · 2020-06-26T15:15:17Z

I think, the problem will then appear from the user side when he/she would like to access the elements of ParameterList...

Ho yes sorry, this would imply modifying the getitem to return the elements from either _parameters or regular attributes (saved in another dict I guess).

vfdev-5 · 2020-06-29T11:35:10Z

Just checked DDP case single process multiple-devices use-case (#36503) and same issue persist for ParameterList/Dict.

Code and output

> python -m torch.distributed.launch --use_env --nproc_per_node=1 ddp_parameters.py

import torch
import torch.nn as nn
import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel as DDP


class Net(nn.Module):
    
    def __init__(self):
        super().__init__()
        
        self.beta = nn.Parameter(torch.tensor(0.0))
        self.alpha = nn.ParameterList([nn.Parameter(torch.tensor(1.0)), nn.Parameter(torch.tensor(2.0))])
        self.gamma = nn.ParameterDict({"a": nn.Parameter(torch.tensor(3.0)), "b": nn.Parameter(torch.tensor(4.0))})


class NestedNet(nn.Module):
    
    def __init__(self):
        super().__init__()
        
        self.base = Net()

        self.main_beta = nn.Parameter(torch.tensor(10.0))
        self.main_alpha = nn.ParameterList([nn.Parameter(torch.tensor(11.0)), nn.Parameter(torch.tensor(12.0))])
        self.main_gamma = nn.ParameterDict({"A": nn.Parameter(torch.tensor(13.0)), "B": nn.Parameter(torch.tensor(14.0))})

    def forward(self, x):
        print(dist.get_rank(), "- forward", self.main_beta)
        print(dist.get_rank(), "-- forward", self.main_alpha)
        print(dist.get_rank(), "--- forward", self.base.alpha)


if __name__ == '__main__':
    import os    
    dist.init_process_group("gloo", init_method="env://")
    
    local_rank = int(os.environ["LOCAL_RANK"])
    torch.cuda.set_device(local_rank)
    net = NestedNet().cuda()
    ddp_net = DDP(net, device_ids=[0, 1])

    if dist.get_rank() == 0:
        print("ddp_net parameters:", list(ddp_net.parameters()))
    
    ddp_net(torch.rand(2, 4))

Output

ddp_net parameters: [Parameter containing:
tensor(10., device='cuda:0', requires_grad=True), Parameter containing:
tensor(0., device='cuda:0', requires_grad=True), Parameter containing:
tensor(1., device='cuda:0', requires_grad=True), Parameter containing:
tensor(2., device='cuda:0', requires_grad=True), Parameter containing:
tensor(3., device='cuda:0', requires_grad=True), Parameter containing:
tensor(4., device='cuda:0', requires_grad=True), Parameter containing:
tensor(11., device='cuda:0', requires_grad=True), Parameter containing:
tensor(12., device='cuda:0', requires_grad=True), Parameter containing:
tensor(13., device='cuda:0', requires_grad=True), Parameter containing:
tensor(14., device='cuda:0', requires_grad=True)]
0 - forward 0 - forward Parameter containing:
tensor(10., device='cuda:0', requires_grad=True)
0 -- forward ParameterList(
    (0): Parameter containing: [torch.cuda.FloatTensor of size  (GPU 0)]
    (1): Parameter containing: [torch.cuda.FloatTensor of size  (GPU 0)]
)
0 --- forward ParameterList(
    (0): Parameter containing: [torch.cuda.FloatTensor of size  (GPU 0)]
    (1): Parameter containing: [torch.cuda.FloatTensor of size  (GPU 0)]
)
tensor(10., device='cuda:1', requires_grad=True)
0 -- forward ParameterList()
0 --- forward ParameterList()

User can fetch correctly all parameters with .parameters(), but ParameterList/Dict are only present on replica 0.

concretevitamin · 2020-08-02T19:35:14Z

This is a serious issue preventing us from upgrading to 1.6. Working around ParameterList (e.g., assign directly as attributes) is non-ideal as it breaks loading prior checkpoints. Adding glue code to load prior checkpoints would work, but it's something I don't expect from a stable release.

Also related to issue #42327.

mrshenli · 2020-08-03T16:20:21Z

This seems to be another error instances caused by the changes we made in #33907?

If the goal is to access parameters in replicated models, it will no longer work after #33907. I temporary work-around is to access them from _former_parameters as we did in DDP.

@ngimel do we have a long-term solution for this? This error seems to emerge quite frequently.

vfdev-5 · 2020-08-03T16:31:06Z

@mrshenli in this PR : #40589 I tried to propose a fix for this issue with DP and ParameterList/Dict. Could you please take a look and give a feedback ? Thanks

ngimel · 2020-08-03T16:55:05Z

@mrshenli not really. In case of ParameterList/ParameterDict #33907 exposed a deficiency in ParameterList/ParameterDict design that already made them incompatible with reparameterization, see e.g. the following example where ParameterList fails without any data_parallel replication:

import torch
import torch.nn as nn

class MyMod(torch.nn.Module):
    def __init__(self):
        super(MyMod, self).__init__()
        self.params = nn.ParameterList([nn.Parameter(torch.randn(10, 10)) for i in range(10)])

    def forward(self, x):
        print( self.params[0].size())
        # ParameterList can act as an iterable, or be indexed using ints
        for i, p in enumerate(self.params):
            x = self.params[i // 2].mm(x) + p.mm(x)
        return x

m = MyMod()
torch.nn.utils.weight_norm(m.params, name="0")
print("applied weight norm")
x = torch.randn(10, 10)
out=m(x)

Note that while one should not access parameters() in the forward pass (and this is the only thing that #33907 breaks), and accessing them by attribute name works even after #33907, accessing elements of ParameterList by index is the sole purpose of its existence, and the only way to access those attributes, and yet it does not work after a regular reparameterization. ParameterList/ParameterDict happened to work with data parallel replication before #33907 because it hid the fact that it was, in fact, reparameterization, but the proper fix is to make ParameterList/ParameterDict behave correctly in reparameterization scenarios.

mrshenli · 2020-08-03T19:12:30Z

Hey @vfdev-5, based on @ngimel's comments above, the fix for ParameterList should be general one that works for both with and without DataParallel?

vfdev-5 · 2020-08-03T20:36:27Z

@mrshenli I see. As far as I understand weight_norm issue with ParameterList is that we can not simply add and fetch tensors from those structures. For example, a regular module after weight_norm of a parameter contains 1 tensor and 2 parameters

import torch
import torch.nn as nn

m = nn.Linear(10, 10)
m2 = torch.nn.utils.weight_norm(m, name="weight")
print(type(m2.weight), type(m2.weight_g), type(m2.weight_v))
> <class 'torch.Tensor'> <class 'torch.nn.parameter.Parameter'> <class 'torch.nn.parameter.Parameter'>

but for ParameterList we lose original parameter and have 2 additional parameters:

import torch
import torch.nn as nn

plist = nn.ParameterList([nn.Parameter(torch.randn(10, 10)) for i in range(2)])
plist2 = torch.nn.utils.weight_norm(plist, name="0")
print(["{}:{}".format(n, p.shape) for n, p in plist2.named_parameters()])
type(plist2[0])
> ['1:torch.Size([10, 10])', '0_g:torch.Size([10, 1])', '0_v:torch.Size([10, 10])']
> KeyError: '0'

So, naively, the fix for ParameterList is to be able to to do setattr(parameter_list_module, name, tensor) and redirect
parameter_list_module[name] to getattr(parameter_list_module, name) and fetch the tensor from the module instead of parameter_list_module._parameters ?

jacobaustin123 · 2020-09-08T23:30:10Z

Running into the same problem here. Seems like a fairly annoying issue. And the error messages are unhelpful. The data in the ParameterList/Dict just disappears at some point during the forward pass.

XinYuan-believe · 2020-10-08T06:55:49Z

Yeah, this is a serious issue especially for meta learning which changes the network parameter online according to the parameter list.

vaxenburg · 2020-10-10T00:12:39Z

I'm having the same problem. ParameterList appears empty in the forward pass. Perhaps the workaround could help, but it's far from ideal.

ymthink · 2020-10-14T03:03:32Z

Encountered the same problem, which slows down the training time a lot.

Devchonka · 2020-12-14T21:11:04Z

Could the fix for this please please be prioritized? Its been half a year..

Devchonka · 2021-01-29T19:13:59Z

@albanD sorry to bother you, do you guys have a timeline for when you think a fix could be integrated? I'm relatively new to ML and torch and this is a big blocker for me (the above proposed workaround isn't an option in my case).

JeanKossaifi · 2021-01-30T00:46:53Z

I also encountered the issue.

Snippet to reproduce:

class Dummy(nn.Module):
    def __init__(self):
        super().__init__()
        self.params = nn.ParameterList()
        self.params.append(nn.Parameter(torch.Tensor(3, 4)))
        self.params.append(nn.Parameter(torch.Tensor(4, 5)))
        
    def forward(self, x):
        print(len(self.params), self.params)
        return x

Everything works as expected on a single GPU:

>>> device = 'cuda:0'
>>> d = Dummy().to(device)
>>> x = torch.randn(32, 4, 4).to(device)
>>> r = d(x)
2 ParameterList(
    (0): Parameter containing: [torch.cuda.FloatTensor of size 3x4 (GPU 0)]
    (1): Parameter containing: [torch.cuda.FloatTensor of size 4x5 (GPU 0)]
)

However, when using DataParallel and more than one device, the ParameterList is empty:

>>> device = 'cuda'
>>> d = nn.DataParallel(d).to(device)
>>> r = d(x)
00 ParameterList()
 ParameterList()

In addition, I get the warning:

.../torch/nn/modules/container.py:435: UserWarning: Setting attributes on ParameterList is not supported.
  warnings.warn("Setting attributes on ParameterList is not supported.")

albanD · 2021-02-01T15:57:39Z

@Devchonka I am afraid there is no one working on a fix for this at the moment. The main reason is that this is due to an issue in the original design of ParameterList/Dict and it cannot be easily fixed without having many side effects on other usage of these wrappers.
The "simplest" workaround for now is not to use them unfortunately and save all your Parameters directly on the Module.
Or if you want to make your code cleaner, you can have a custom nn.Module that will hold them for you but you won't be able to access them as if it was a list.

JeanKossaifi · 2021-02-08T13:11:31Z

@albanD what about #40589?

We are using ParameterList in our project and this bug breaks the library for several users who are using DataParallel. Do you recommend implementing our own ParameterList as an nn.Module until this is fixed?

follow suggestions from pytorch/pytorch#36035 (comment)

zanshuxun · 2021-03-06T06:21:02Z

same error

zou3519 added high priority module: data parallel triaged labels Apr 6, 2020

pytorch-probot bot added the triage review label Apr 6, 2020

albanD added module: nn and removed triage review high priority labels Apr 6, 2020

lucidrains mentioned this issue May 5, 2020

TypeError exception in AxialPositionalEncoding when using DataParallel lucidrains/sinkhorn-transformer#17

Closed

yysijie mentioned this issue May 24, 2020

RuntimeError: Given groups=1, weight of size [400, 256, 1, 1], expected input[64, 3, 1, 1] to have 256 channels, but got 3 channels instead open-mmlab/mmskeleton#312

Open

albanD added the high priority label Jun 19, 2020

pytorch-probot bot added the triage review label Jun 19, 2020

ezyang changed the title ~~nn.Parameter not copied to gpus in forward pass when nn.DataParallel is used~~ nn.Parameter{List,Dict} not copied to gpus in forward pass when nn.DataParallel is used Jun 22, 2020

ezyang added small and removed triage review labels Jun 22, 2020

pearu assigned vfdev-5 Jun 25, 2020

vfdev-5 pushed a commit to Quansight/pytorch that referenced this issue Jun 26, 2020

[WIP] Fixes pytorch#36035

c9a3683

- added test replicate on parameter, list, dict

anthonyfuller7 mentioned this issue Jul 15, 2020

Error using DataParallel with reformer model: There were no tensor arguments to this function huggingface/transformers#5778

Closed

concretevitamin mentioned this issue Aug 2, 2020

Wrong in nn.ParameterList and nn.DataParallel, pytorch1.6 #42327

Closed

zhangmozhe mentioned this issue Oct 12, 2020

Bug with dataparallel in Pytorch 1.6 tristandeleu/pytorch-meta#97

Closed

AGKhalil mentioned this issue Oct 13, 2020

made train.py even more general. now you can sepecify the algorithm, … squiReinforcementLearning/squiRL#5

Merged

Lorien2027 mentioned this issue Oct 21, 2020

Some questions about the search stage JaminFong/DenseNAS#17

Closed

TevenLeScao mentioned this issue Nov 3, 2020

TransformerXL: StopIteration: Caught StopIteration in replica 0 on device 0 huggingface/transformers#8145

Closed

JeanKossaifi mentioned this issue Jan 30, 2021

Warnings end error using TuckerTRL with multiple GPUs tensorly/torch#3

Closed

xptree added a commit to laekov/fastmoe that referenced this issue Feb 24, 2021

fixes pytorch/pytorch/issues/36035

d7706f0

follow suggestions from pytorch/pytorch#36035 (comment)

JeanKossaifi referenced this issue in tensorly/torch Mar 5, 2021

FIX ParameterList issue

f26c79e

pytorch / pytorch

nn.Parameter{List,Dict} not copied to gpus in forward pass when nn.DataParallel is used #36035

nn.Parameter{List,Dict} not copied to gpus in forward pass when nn.DataParallel is used #36035

grtzsohalf commented Apr 5, 2020 •

edited by pytorch-probot bot

zou3519 commented Apr 6, 2020

albanD commented Apr 6, 2020

albanD commented Apr 6, 2020

grtzsohalf commented Apr 7, 2020 •

edited

zou3519 commented Apr 7, 2020

grtzsohalf commented Apr 7, 2020 •

edited

grtzsohalf commented Apr 7, 2020

lidazhang commented Apr 14, 2020

cyranawm commented May 29, 2020

ruoshiliu commented Jun 19, 2020

albanD commented Jun 19, 2020 •

edited

ezyang commented Jun 22, 2020

vfdev-5 commented Jun 25, 2020

albanD commented Jun 25, 2020

vfdev-5 commented Jun 25, 2020 •

edited

albanD commented Jun 26, 2020

vfdev-5 commented Jun 26, 2020

albanD commented Jun 26, 2020

vfdev-5 commented Jun 29, 2020

concretevitamin commented Aug 2, 2020

mrshenli commented Aug 3, 2020

vfdev-5 commented Aug 3, 2020

ngimel commented Aug 3, 2020

mrshenli commented Aug 3, 2020

vfdev-5 commented Aug 3, 2020

jacobaustin123 commented Sep 8, 2020

XinYuan-believe commented Oct 8, 2020

vaxenburg commented Oct 10, 2020

ymthink commented Oct 14, 2020

Devchonka commented Dec 14, 2020

Devchonka commented Jan 29, 2021

JeanKossaifi commented Jan 30, 2021

albanD commented Feb 1, 2021

JeanKossaifi commented Feb 8, 2021

zanshuxun commented Mar 6, 2021

pytorch / pytorch

nn.Parameter{List,Dict} not copied to gpus in forward pass when nn.DataParallel is used #36035

nn.Parameter{List,Dict} not copied to gpus in forward pass when nn.DataParallel is used #36035

Comments

grtzsohalf commented Apr 5, 2020 • edited by pytorch-probot bot

🐛 Bug

To Reproduce

Expected behavior

Environment

zou3519 commented Apr 6, 2020

albanD commented Apr 6, 2020

albanD commented Apr 6, 2020

grtzsohalf commented Apr 7, 2020 • edited

zou3519 commented Apr 7, 2020

grtzsohalf commented Apr 7, 2020 • edited

grtzsohalf commented Apr 7, 2020

lidazhang commented Apr 14, 2020

cyranawm commented May 29, 2020

ruoshiliu commented Jun 19, 2020

albanD commented Jun 19, 2020 • edited

ezyang commented Jun 22, 2020

vfdev-5 commented Jun 25, 2020

albanD commented Jun 25, 2020

vfdev-5 commented Jun 25, 2020 • edited

albanD commented Jun 26, 2020

vfdev-5 commented Jun 26, 2020

albanD commented Jun 26, 2020

vfdev-5 commented Jun 29, 2020

concretevitamin commented Aug 2, 2020

mrshenli commented Aug 3, 2020

vfdev-5 commented Aug 3, 2020

ngimel commented Aug 3, 2020

mrshenli commented Aug 3, 2020

vfdev-5 commented Aug 3, 2020

jacobaustin123 commented Sep 8, 2020

XinYuan-believe commented Oct 8, 2020

vaxenburg commented Oct 10, 2020

ymthink commented Oct 14, 2020

Devchonka commented Dec 14, 2020

Devchonka commented Jan 29, 2021

JeanKossaifi commented Jan 30, 2021

albanD commented Feb 1, 2021

JeanKossaifi commented Feb 8, 2021

zanshuxun commented Mar 6, 2021

grtzsohalf commented Apr 5, 2020 •

edited by pytorch-probot bot

grtzsohalf commented Apr 7, 2020 •

edited

grtzsohalf commented Apr 7, 2020 •

edited

albanD commented Jun 19, 2020 •

edited

vfdev-5 commented Jun 25, 2020 •

edited