Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull in fairscale.nn.Pipe into PyTorch. #44090

Closed

Conversation

@pritamdamania87
Copy link

@pritamdamania87 pritamdamania87 commented Sep 3, 2020

Stack from ghstack:

  • #44090 Pull in fairscale.nn.Pipe into PyTorch.

This is an initial commit pulling in the torchgpipe fork at
https://github.com/facebookresearch/fairscale.

The purpose of this commit is to just pull in the code and ensure all tests and
builds work fine. We will slowly modify this to match our intended API
mentioned in #44827. Follow up PRs would
address further changes needed on top of the initial commit..

We're pulling the code into the torch.distributed._pipeline.sync package. The
package is private on purpose since there is a lot of work (ex: docs, API
changes etc.) that needs to go in before we can actually officially support
this.

Differential Revision: D23493316

NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!

This is an initial commit pulling in the torchgpipe fork at
https://github.com/facebookresearch/fairscale.

The purpose of this commit is to just pull in the code and ensure all tests and
builds work fine. We will slowly modify this to match our intended API
mentioned in https://fb.quip.com/txurAV3zIFox#RPZACAfAKMq. Follow up PRs would
address further changes needed on top of the initial commit..

We're pulling the code into the `torch.distributed._pipeline.sync` package. The
package is private on purpose since there is a lot of work (ex: docs, API
changes etc.) that needs to go in before we can actually officially support
this.

Differential Revision: [D23493316](https://our.internmc.facebook.com/intern/diff/D23493316/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23493316/)!

[ghstack-poisoned]
@dr-ci
Copy link

@dr-ci dr-ci bot commented Sep 3, 2020

💊 CI failures summary and remediations

As of commit 42f6723 (more details on the Dr. CI page):



🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_backward_compatibility_check_test (1/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Oct 21 22:42:22 The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not.
Oct 21 22:42:22 processing existing schema:  __setstate__(__torch__.torch.classes.quantized.EmbeddingPackedParamsBase _0, (int, Tensor[], float[], int[]) _1) -> (None _0) 
Oct 21 22:42:22 processing existing schema:  bit_rate(__torch__.torch.classes.quantized.EmbeddingPackedParamsBase _0) -> (int _0) 
Oct 21 22:42:22 processing existing schema:  version(__torch__.torch.classes.quantized.EmbeddingPackedParamsBase _0) -> (int _0) 
Oct 21 22:42:22 processing existing schema:  __getstate__(__torch__.torch.classes.xnnpack.LinearOpContext _0) -> ((Tensor, Tensor?, Scalar?, Scalar?) _0) 
Oct 21 22:42:22 processing existing schema:  __setstate__(__torch__.torch.classes.xnnpack.LinearOpContext _0, (Tensor, Tensor?, Scalar?, Scalar?) _1) -> (None _0) 
Oct 21 22:42:22 processing existing schema:  __getstate__(__torch__.torch.classes.xnnpack.Conv2dOpContext _0) -> ((Tensor, Tensor?, int[], int[], int[], int, Scalar?, Scalar?) _0) 
Oct 21 22:42:22 processing existing schema:  __setstate__(__torch__.torch.classes.xnnpack.Conv2dOpContext _0, (Tensor, Tensor?, int[], int[], int[], int, Scalar?, Scalar?) _1) -> (None _0) 
Oct 21 22:42:22 processing existing schema:  __getstate__(__torch__.torch.classes.xnnpack.TransposeConv2dOpContext _0) -> ((Tensor, Tensor?, int[], int[], int[], int[], int, Scalar?, Scalar?) _0) 
Oct 21 22:42:22 processing existing schema:  __setstate__(__torch__.torch.classes.xnnpack.TransposeConv2dOpContext _0, (Tensor, Tensor?, int[], int[], int[], int[], int, Scalar?, Scalar?) _1) -> (None _0) 
Oct 21 22:42:22 processing existing schema:  __init__(__torch__.torch.classes.dist_rpc.WorkerInfo _0, str _1, int _2) -> (None _0) 
Oct 21 22:42:22 The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not.  
Oct 21 22:42:22  
Oct 21 22:42:22 Broken ops: [ 
Oct 21 22:42:22 	prim::is_vulkan(Tensor a) -> (bool) 
Oct 21 22:42:22 ] 
Oct 21 22:42:22 + cleanup 
Oct 21 22:42:22 + retcode=1 
Oct 21 22:42:22 + set +x 
Oct 21 22:42:22 =================== sccache compilation log =================== 
Oct 21 22:42:22 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Oct 21 22:42:22 Compile requests                 0 

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (2/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Oct 22 00:13:09 [E request_callback_no_python.cpp:581] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future
Oct 22 00:13:09 At: 
Oct 22 00:13:09   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(94): serialize 
Oct 22 00:13:09   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(146): serialize 
Oct 22 00:13:09  
Oct 22 00:13:09 [E request_callback_no_python.cpp:581] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Oct 22 00:13:09  
Oct 22 00:13:09 At: 
Oct 22 00:13:09   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(94): serialize 
Oct 22 00:13:09   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(146): serialize 
Oct 22 00:13:09  
Oct 22 00:13:09 [E request_callback_no_python.cpp:581] Received error while processing request type 2: RuntimeError: Can not pickle torch.futures.Future 
Oct 22 00:13:09  
Oct 22 00:13:09 At: 
Oct 22 00:13:09   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(94): serialize 
Oct 22 00:13:09   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(146): serialize 
Oct 22 00:13:09  
Oct 22 00:13:09 ok (1.741s) 
Oct 22 00:13:11   test_return_future_remote (__main__.ProcessGroupRpcTestWithSpawn) ... RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend. 
Oct 22 00:13:11 RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend. 
Oct 22 00:13:11 RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend. 
Oct 22 00:13:11 RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend. 

🚧 2 fixed upstream failures:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is newer than viable/strict, you can try basing on an older, stable commit:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase --onto FETCH_HEAD $(git merge-base origin/master HEAD)

If your commit is older than viable/strict:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 32 times.

This is an initial commit pulling in the torchgpipe fork at
https://github.com/facebookresearch/fairscale.

The purpose of this commit is to just pull in the code and ensure all tests and
builds work fine. We will slowly modify this to match our intended API
mentioned in https://fb.quip.com/txurAV3zIFox#RPZACAfAKMq. Follow up PRs would
address further changes needed on top of the initial commit..

We're pulling the code into the `torch.distributed._pipeline.sync` package. The
package is private on purpose since there is a lot of work (ex: docs, API
changes etc.) that needs to go in before we can actually officially support
this.

Differential Revision: [D23493316](https://our.internmc.facebook.com/intern/diff/D23493316/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23493316/)!

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this issue Sep 3, 2020
Pull Request resolved: #44090

This is an initial commit pulling in the torchgpipe fork at
https://github.com/facebookresearch/fairscale.

The purpose of this commit is to just pull in the code and ensure all tests and
builds work fine. We will slowly modify this to match our intended API
mentioned in https://fb.quip.com/txurAV3zIFox#RPZACAfAKMq. Follow up PRs would
address further changes needed on top of the initial commit..

We're pulling the code into the `torch.distributed._pipeline.sync` package. The
package is private on purpose since there is a lot of work (ex: docs, API
changes etc.) that needs to go in before we can actually officially support
this.
ghstack-source-id: 111317338

Differential Revision: [D23493316](https://our.internmc.facebook.com/intern/diff/D23493316/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23493316/)!
Copy link

@msbaines msbaines left a comment

lgtm

This is an initial commit pulling in the torchgpipe fork at
https://github.com/facebookresearch/fairscale.

The purpose of this commit is to just pull in the code and ensure all tests and
builds work fine. We will slowly modify this to match our intended API
mentioned in #44827. Follow up PRs would
address further changes needed on top of the initial commit..

We're pulling the code into the `torch.distributed._pipeline.sync` package. The
package is private on purpose since there is a lot of work (ex: docs, API
changes etc.) that needs to go in before we can actually officially support
this.

Differential Revision: [D23493316](https://our.internmc.facebook.com/intern/diff/D23493316/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23493316/)!

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this issue Oct 7, 2020
Pull Request resolved: #44090

This is an initial commit pulling in the torchgpipe fork at
https://github.com/facebookresearch/fairscale.

The purpose of this commit is to just pull in the code and ensure all tests and
builds work fine. We will slowly modify this to match our intended API
mentioned in https://fb.quip.com/txurAV3zIFox#RPZACAfAKMq. Follow up PRs would
address further changes needed on top of the initial commit..

We're pulling the code into the `torch.distributed._pipeline.sync` package. The
package is private on purpose since there is a lot of work (ex: docs, API
changes etc.) that needs to go in before we can actually officially support
this.
ghstack-source-id: 113726320

Differential Revision: [D23493316](https://our.internmc.facebook.com/intern/diff/D23493316/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23493316/)!
@rohan-varma
Copy link
Member

@rohan-varma rohan-varma commented Oct 8, 2020

CI issue for Mac/Windows looks legit:

Oct 07 05:00:59 Traceback (most recent call last):
Oct 07 05:00:59   File "test/run_test.py", line 830, in <module>
Oct 07 05:00:59     main()
Oct 07 05:00:59   File "test/run_test.py", line 813, in main
Oct 07 05:00:59     raise RuntimeError(err_message)
Oct 07 05:00:59 RuntimeError: distributed/_pipeline/sync/skip/test_api failed!

This is an initial commit pulling in the torchgpipe fork at
https://github.com/facebookresearch/fairscale.

The purpose of this commit is to just pull in the code and ensure all tests and
builds work fine. We will slowly modify this to match our intended API
mentioned in #44827. Follow up PRs would
address further changes needed on top of the initial commit..

We're pulling the code into the `torch.distributed._pipeline.sync` package. The
package is private on purpose since there is a lot of work (ex: docs, API
changes etc.) that needs to go in before we can actually officially support
this.

Differential Revision: [D23493316](https://our.internmc.facebook.com/intern/diff/D23493316/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23493316/)!

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this issue Oct 16, 2020
Pull Request resolved: #44090

This is an initial commit pulling in the torchgpipe fork at
https://github.com/facebookresearch/fairscale.

The purpose of this commit is to just pull in the code and ensure all tests and
builds work fine. We will slowly modify this to match our intended API
mentioned in https://fb.quip.com/txurAV3zIFox#RPZACAfAKMq. Follow up PRs would
address further changes needed on top of the initial commit..

We're pulling the code into the `torch.distributed._pipeline.sync` package. The
package is private on purpose since there is a lot of work (ex: docs, API
changes etc.) that needs to go in before we can actually officially support
this.
ghstack-source-id: 114511626

Differential Revision: [D23493316](https://our.internmc.facebook.com/intern/diff/D23493316/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23493316/)!
This is an initial commit pulling in the torchgpipe fork at
https://github.com/facebookresearch/fairscale.

The purpose of this commit is to just pull in the code and ensure all tests and
builds work fine. We will slowly modify this to match our intended API
mentioned in #44827. Follow up PRs would
address further changes needed on top of the initial commit..

We're pulling the code into the `torch.distributed._pipeline.sync` package. The
package is private on purpose since there is a lot of work (ex: docs, API
changes etc.) that needs to go in before we can actually officially support
this.

Differential Revision: [D23493316](https://our.internmc.facebook.com/intern/diff/D23493316/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23493316/)!

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this issue Oct 19, 2020
Pull Request resolved: #44090

This is an initial commit pulling in the torchgpipe fork at
https://github.com/facebookresearch/fairscale.

The purpose of this commit is to just pull in the code and ensure all tests and
builds work fine. We will slowly modify this to match our intended API
mentioned in https://fb.quip.com/txurAV3zIFox#RPZACAfAKMq. Follow up PRs would
address further changes needed on top of the initial commit..

We're pulling the code into the `torch.distributed._pipeline.sync` package. The
package is private on purpose since there is a lot of work (ex: docs, API
changes etc.) that needs to go in before we can actually officially support
this.
ghstack-source-id: 114636036

Differential Revision: [D23493316](https://our.internmc.facebook.com/intern/diff/D23493316/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23493316/)!
This is an initial commit pulling in the torchgpipe fork at
https://github.com/facebookresearch/fairscale.

The purpose of this commit is to just pull in the code and ensure all tests and
builds work fine. We will slowly modify this to match our intended API
mentioned in #44827. Follow up PRs would
address further changes needed on top of the initial commit..

We're pulling the code into the `torch.distributed._pipeline.sync` package. The
package is private on purpose since there is a lot of work (ex: docs, API
changes etc.) that needs to go in before we can actually officially support
this.

Differential Revision: [D23493316](https://our.internmc.facebook.com/intern/diff/D23493316/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23493316/)!

[ghstack-poisoned]
Copy link
Contributor

@mrshenli mrshenli left a comment

Folder structure looks OK to me.

Since we imported the code as-is from torch-gpipe, the current copyright makes sense. In the future, if we modified say 100% of one file, do we need to update the copyright in that file and the corresponding folder accordingly?

@@ -22,6 +22,9 @@ All contributions by Yangqing Jia:
Copyright (c) 2015 Yangqing Jia
All rights reserved.

All contributions by Kakao Brain:
Copyright 2019-2020 Kakao Brain
Copy link
Contributor

@mrshenli mrshenli Oct 21, 2020

I assume the legal team has reviewed these license updates and they are OK with it?

Copy link
Author

@pritamdamania87 pritamdamania87 Oct 21, 2020

Yes, this was added as per their guidance and I believe they mentioned they have a job that verifies licensing changes once things land. So they might reach out later if needed.

'distributed/_pipeline/sync/skip/test_api',
'distributed/_pipeline/sync/skip/test_gpipe',
'distributed/_pipeline/sync/skip/test_inspect_skip_layout',
'distributed/_pipeline/sync/skip/test_leak',
'distributed/_pipeline/sync/skip/test_portal',
'distributed/_pipeline/sync/skip/test_stash_pop',
'distributed/_pipeline/sync/skip/test_tracker',
'distributed/_pipeline/sync/skip/test_verify_skippables',
'distributed/_pipeline/sync/test_balance',
'distributed/_pipeline/sync/test_bugs',
'distributed/_pipeline/sync/test_checkpoint',
'distributed/_pipeline/sync/test_copy',
'distributed/_pipeline/sync/test_deferred_batch_norm',
'distributed/_pipeline/sync/test_dependency',
'distributed/_pipeline/sync/test_inplace',
'distributed/_pipeline/sync/test_microbatch',
'distributed/_pipeline/sync/test_phony',
'distributed/_pipeline/sync/test_pipe',
'distributed/_pipeline/sync/test_pipeline',
'distributed/_pipeline/sync/test_stream',
'distributed/_pipeline/sync/test_transparency',
'distributed/_pipeline/sync/test_worker',
Copy link
Contributor

@mrshenli mrshenli Oct 21, 2020

how much time does this add to general CI tests? If it is a lot, shall we consider adding it to slow tests for now to minimize the impact on other PRs?

Copy link
Author

@pritamdamania87 pritamdamania87 Oct 21, 2020

Most of the tests didn't take too long, but I still added all of them to SLOW_TESTS to be safe.

This is an initial commit pulling in the torchgpipe fork at
https://github.com/facebookresearch/fairscale.

The purpose of this commit is to just pull in the code and ensure all tests and
builds work fine. We will slowly modify this to match our intended API
mentioned in #44827. Follow up PRs would
address further changes needed on top of the initial commit..

We're pulling the code into the `torch.distributed._pipeline.sync` package. The
package is private on purpose since there is a lot of work (ex: docs, API
changes etc.) that needs to go in before we can actually officially support
this.

Differential Revision: [D23493316](https://our.internmc.facebook.com/intern/diff/D23493316/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23493316/)!

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this issue Oct 21, 2020
Pull Request resolved: #44090

This is an initial commit pulling in the torchgpipe fork at
https://github.com/facebookresearch/fairscale.

The purpose of this commit is to just pull in the code and ensure all tests and
builds work fine. We will slowly modify this to match our intended API
mentioned in https://fb.quip.com/txurAV3zIFox#RPZACAfAKMq. Follow up PRs would
address further changes needed on top of the initial commit..

We're pulling the code into the `torch.distributed._pipeline.sync` package. The
package is private on purpose since there is a lot of work (ex: docs, API
changes etc.) that needs to go in before we can actually officially support
this.
ghstack-source-id: 114864254

Differential Revision: [D23493316](https://our.internmc.facebook.com/intern/diff/D23493316/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23493316/)!
@facebook-github-bot
Copy link
Contributor

@facebook-github-bot facebook-github-bot commented Oct 22, 2020

This pull request has been merged in 06d50b5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

6 participants