master
Commits on Feb 19, 2021
-
Update FindvecLib.cmake for macOS 10.14, 10.15 and Big Sur (#51288)
Summary: When compiling libtorch on macOS there is the option to use the `vecLib` BLAS library from Apple's (Accelerate)[https://developer.apple.com/documentation/accelerate] framework. Recent versions of macOS have changed the location of veclib.h, this change adds the new locations to `FindvecLib.cmake` To test run the following command: ``` BLAS=vecLib python setup.py install --cmake --cmake-only ``` The choice of BLAS library is confirmed in the output: ``` -- Trying to find preferred BLAS backend of choice: vecLib -- Found vecLib: /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Accelerate.framework/Versions/Current/Frameworks/vecLib.framework/Versions/Current/Headers ``` Pull Request resolved: #51288 Reviewed By: jbschlosser Differential Revision: D26531136 Pulled By: malfet fbshipit-source-id: ce86807ccbf66973f33b3acb99b7f40cfd182b9b
-
[BE] Cleanup UnaryOpsKernel.cpp (#52444)
Summary: Delete unused `dispatchtypes` of `IMPLEMENT_FLOAT_KERNEL` and `IMPLEMENT_COMPLEX_KERNEL` Move common part of above-mentioned macros into `IMPLEMENT_ITERATOR_LAMBDA` macro Pull Request resolved: #52444 Reviewed By: walterddr Differential Revision: D26517032 Pulled By: malfet fbshipit-source-id: f03f89602f14fb513c66f3f2a96596e4c1e4cd16
-
Revert D26515596: [pytorch][PR] Add support for pow
Test Plan: revert-hammer Differential Revision: D26515596 (83feaeb) Original commit changeset: 0c25a8eba8ed fbshipit-source-id: 1a206f0b2923d922911fdaa5448a4e3a844ac5c4
-
Fixed _out variants of linear algebra functions (#51560)
Summary: This PR modifies the behavior of `_out` variants to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch With this PR result and input tensors must be on the same device and have the same "type kind". I skipped `qr` and `eig` in this process as they require a bit more work. Functions that can use the provided storage directly do so. If `result` is not empty and not in the batched column-major format or does not have the same type as input then we have to allocate a temporary tensor and copy it. TODO: - [x] Add more tests for same device and valid safe dtype - [x] Move inv and solve changes to separate PRs #51968, #51977 Ref. #42666 Pull Request resolved: #51560 Reviewed By: albanD Differential Revision: D26400734 Pulled By: heitorschueroff fbshipit-source-id: a6201ed7e919c1670c6ff3ef60217d1dbfb72e67
-
[RPC] delete torch/csrc/utils/future.h (#51698)
Summary: Pull Request resolved: #51698 Completely eliminates torch::utils::Future as we are now full relying on JitFuture. ghstack-source-id: 122037612 Test Plan: CI Reviewed By: kiukchung Differential Revision: D26243735 fbshipit-source-id: 95010a730f9d35e618f74c5f9de482738cd57c15
-
[RPC] Refactor rref_context to not use utils::Future (#51697)
Summary: Pull Request resolved: #51697 Refactors the rest of rref_context, specifically pendingOwners map and `getOwnerRRef` to use JitFuture. ghstack-source-id: 122037611 Test Plan: CI Reviewed By: wanchaol Differential Revision: D26243268 fbshipit-source-id: ab8874c8253274e8fe50dcd7291e0655a8f3f1df
-
log newly added construction and runtime stats at randomly selected i…
…terations (#51394) Summary: Pull Request resolved: #51394 log newly added construction and runtime stats at randomly selected iterations ghstack-source-id: 121934040 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D26161885 fbshipit-source-id: add6e02c1a03e6f74f08b9a9aecf90fa81631d60
-
add stats that can only be collected at runtime (#51386)
Summary: Pull Request resolved: #51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls ghstack-source-id: 121933566 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D26158645 fbshipit-source-id: ce5f15187802eba76accb980449be68902c10178
-
[DDP] Enhance warning for find_unused_params (#52385)
Summary: Pull Request resolved: #52385 This warning should specify that we did not find unused params in the _forward_ pass, which is when we log this warning. This is to avoid confusion when we get an error because not all outputs were used to compute loss, which also raises an error about unused parameters (to be fixed in the next diff) ghstack-source-id: 122001929 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26494136 fbshipit-source-id: d9b41732ea7e5e31b899d590d311080e3dc56682
-
[DDP] unittest for when params arent used in backward pass (#52384)
Summary: Pull Request resolved: #52384 Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient. ghstack-source-id: 122001930 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26482479 fbshipit-source-id: c80bdeea7cf9db35390e385084ef28d64ed239eb
-
[DataLoader] Change signature of Functional DataPipe (#52458)
Summary: Pull Request resolved: #52458 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D26523282 Pulled By: ejguan fbshipit-source-id: c7358fc351f859617754a27b8a701d11ada5d61a
-
Enable min & max for Float16 & BFloat16 (#51244)
Summary: Fixes #50790. Added `min()` & `max()` support for `Float16` & `BFloat16`. CUDA already supported these ops on `Float16`, so the other three combinations had to be enabled. `OpInfo`s for `min` & `max` were also added, and their sample inputs were removed from `method_tests()`. ### MORE INFO The (slightly) long-term goal is to add dispatch for `min()` & `max()` related operations on CPU & CUDA for `Float16` & `BFloat16`, wherever they aren't present already: 1. `amin()` 2. `argmax()` 3. `amax()` 4. `argmin()` 5. `torch._aminmax()` 6. `torch.clamp()` on CPU. Was already supported on CUDA 7. `min()` (in this PR) 8. `max()` (in this PR) 9. `minimum()` 10. `maximum()` I'll submit separate PRs for the other ops. Pull Request resolved: #51244 Reviewed By: jbschlosser Differential Revision: D26503455 Pulled By: anjali411 fbshipit-source-id: c32247f214e9272ca2e4322a23337874e737b140
-
[quant][graphmode][fx] Fix fp16 dynamic quant for functional linear (#…
-
Add NNC support for aten::hardtanh (a hot operation in mobilenet v2/v…
-
Add onnxifi interface for set/get options (#52388)
Summary: Pull Request resolved: #52388 Pull Request resolved: pytorch/glow#5364 This allows us to change global variables through onnxifi calls. And add python bindings along with it. Note that we supply a dummy backend_id as it's not needed by glow due to setting being global. #codemod Test Plan: ``` buck test mode/dev //glow/fb/test:test_onnxifi_optionnnpi ``` Reviewed By: jfix71, khabinov Differential Revision: D26481652 fbshipit-source-id: 19b8201c77f653cf7d93ad68760aa7fb5ec45ff4
-
[ROCm] missing template declarations for complex blas (#52472)
Summary: Pull Request resolved: #52472 Reviewed By: jbschlosser Differential Revision: D26533896 Pulled By: anjali411 fbshipit-source-id: 55503028d5e087fc91992b417836cc87eb60ad55
-
Improvements for FX tracer (#52232)
Summary: Pull Request resolved: #52232 Pull Request resolved: pytorch/glow#5327 Reviewed By: gcatron Differential Revision: D26355583 fbshipit-source-id: f062e0b3a9cadf1584738bed85e9964b9a63efaf
-
[glow] Extending AOT config with two more fields (#5359)
Summary: Pull Request resolved: pytorch/glow#5359 Reviewed By: ChunliF Differential Revision: D26468908 fbshipit-source-id: 16c4f4215f302c023d75c204b999f23ed6254aa1
Commits on Feb 18, 2021
-
[ROCm] Enable test_ddp_hooks.py test cases (#52403)
Summary: Re-enabling these test cases for ROCm because they are passing. jeffdaily Pull Request resolved: #52403 Reviewed By: jbschlosser, SciPioneer Differential Revision: D26516727 Pulled By: malfet fbshipit-source-id: 6c70805eda39b0aadfbeb30a527af3906d2da867
-
Add arm64 binary build (#52443)
Summary: This is getting tested by #52441. Adds new config for macos arm64 to our binary builds. Now stores artifacts for mac builds. Pull Request resolved: #52443 Reviewed By: walterddr Differential Revision: D26517330 Pulled By: janeyx99 fbshipit-source-id: 02774937a827bdd4c08486dc9f8fe63446917f1e
-
Revert D26299594: [PyTorch Mobile] 15KiB size reduction by reducing M…
…axTypeIndex from 256 to 32 Test Plan: revert-hammer Differential Revision: D26299594 (9e54532) Original commit changeset: 9a78c03da621 fbshipit-source-id: 2be1149539892447872eb3289f3fdef0ac92c090
-
Revert nightly docker build cuda version to 11.1.1. (#52234)
Summary: CUDA 11.2 has a performance regression, so revert to CUDA 11.1.1. Pull Request resolved: #52234 Test Plan: [CI](https://github.com/pytorch/pytorch/actions?query=workflow%3A%22Build+PyTorch+nightly+Docker+image+and+push+to+GitHub+Container+Registry%22) Reviewed By: glaringlee Differential Revision: D26519105 Pulled By: xuzhao9 fbshipit-source-id: d1e1ecb7904c196292d83767b71000b465de73ce
-
[iOS GPU] Fix max_pool_2d (#52431)
Summary: Pull Request resolved: #52431 The previous implementation was missing the padding information. Thus is not correct. ghstack-source-id: 121950755 Test Plan: - `buck test pp-macos` - CircleCI Reviewed By: SS-JIA Differential Revision: D26508482 fbshipit-source-id: b28b99c399c4f1390a5cc4f023e470eed0f8c073
-
Make LLVM the default backend for TE (#52314)
Summary: Fixes #52264 When CPU fusion is enabled without LLVM support in PyTorch, it causes huge slowdown (> 50x). This PR makes the LLVM backend the default backend for TE. Now, an error will be reported if CPU fusion is enabled without LLVM support, to avoid this performance regression. This PR also updates the tests to not use LLVM, so that the old flow is continued. This is necessary because tests run in CI do not have LLVM. Pull Request resolved: #52314 Reviewed By: ejguan Differential Revision: D26491294 Pulled By: navahgar fbshipit-source-id: 74561db1207da805d6d28039450db046ba2988fb
-
enable mkldnn conv2d backward to support mkldnn tensor input (#48994)
Summary: Pull Request resolved: #48994 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D25537189 Pulled By: VitalyFedyunin fbshipit-source-id: d81d247798fad3815b735468d66ef9d62c07ef77
-
[PyTorch Mobile] 15KiB size reduction by reducing MaxTypeIndex from 2…
…56 to 32 (#51881) Summary: Pull Request resolved: #51881 `MaxTypeIndex` controls the size of the array ``` detail::TypeMetaData* TypeMeta::typeMetaDatas() { static detail::TypeMetaData instances[MaxTypeIndex + 1] ``` in `typeid.cpp`. In practice, I have seen that this array doesn't hold more than 18 elements once the PyTorch library has been initialized (in mobile unit tests). I couldn't find situations where elements may be added to this array post library initialization. There is a runtime check to prevent array overflow, so reducing the size of the storage shouldn't come at any additional risk from the perspective of loss in visibility of errors. The fact that this array is staically allocated ends up using a bunch of space in the binary (potentially to initialize the trailing elements?). I'm somewhat surprised but this. However, this change registered a 15KiB size win on both fbios as well as igios. Found this when I was looking at a bloaty run that I shared with smessmer on friday: https://www.internalfb.com/intern/everpaste/?handle=GLXImQisHOfT74EBAKw47V3ktuAzbsIXAAAB I initially thought that the methods being passed in to the constructor of `detail::TypeMetaData` were causing the size increase, but only later relaized the issue after reading the folloing helpful comment: ``` // The remainder of the array is padded with TypeMetaData blanks. // The first of these is the entry for ScalarType::Undefined. // The rest are consumed by CAFFE_KNOWN_TYPE entries. ``` ghstack-source-id: 121875657 Test Plan: Sandcastle runs + the following BSB runs. ### igios ``` D26299594-V1 (https://www.internalfb.com/intern/diff/D26299594/?dest_number=121221891) igios: Succeeded Change in Download Size for arm64 + 3x assets variation: +596 B Change in Uncompressed Size for arm64 + 3x assets variation: -15.8 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:443632243487886@base/bsb:443632243487886@diff/ ``` ### fbios ``` D26299594-V1 (https://www.internalfb.com/intern/diff/D26299594/?dest_number=121221891) fbios: Succeeded Change in Download Size for arm64 + 3x assets variation: +104 B Change in Uncompressed Size for arm64 + 3x assets variation: -15.7 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:169233698063125@base/bsb:169233698063125@diff/ ``` Reviewed By: raziel, iseeyuan Differential Revision: D26299594 fbshipit-source-id: 9a78c03da621fbc25a1d8087376628bccc8dbfda
-
-
[BE] _get_torch_cuda_version should return tuple (#52409)
Summary: Pull Request resolved: #52409 Reviewed By: jbschlosser, glaringlee Differential Revision: D26513924 Pulled By: walterddr fbshipit-source-id: ee18ef357c326c5ad344d80c59821cc2b8814734
-
-
Enhance Tensor.unflatten to support -1 as the inferred size (#51955)
Summary: Fixes #51719, #28142 **Change** - Update `torch.Tensor.unflatten` to support users pass`-1` as the inferred size for both tensors and named tensors. - Examples of using `-1` in the `unflatten` function are added to the docs. - Fix the rendered issue of original `unflatten` docs by removing a blank line between its example section. Pull Request resolved: #51955 Reviewed By: agolynski Differential Revision: D26467198 Pulled By: zou3519 fbshipit-source-id: 6a3ede25561223187273796427ad0cb63f125364
-
ns for fx: make unshadowed activation comparison work for N models (#…
…52357) Summary: Pull Request resolved: #52357 Refactor the NS for FX compare unshadowed activations API to be able to work on N models and do arbitrary matching strategies. We factor out a util which takes a model and a list of nodes to extract weights for, with names to give the extracted weights. The user can then call this util with a set of nodes and names created in any way they want. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26487270 fbshipit-source-id: 1372ef07b5f3ddc7cebdfb2dee0221a2facd0527
-
ns for fx: make weights comparison work on N models (#52356)
Summary: Pull Request resolved: #52356 Refactor the NS for FX compare weights API to be able to work on N models and do arbitrary matching strategies. We factor out a util which takes a model and a list of nodes to extract weights for, with names to give the extracted weights. The user can then call this util with a set of nodes and names created in any way they want. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26487271 fbshipit-source-id: 0c2172a1b33d47565004a307aff14d205671add7
-
[wip] ns for fx: add support for subgraph matching (#52130)
Summary: Pull Request resolved: #52130 We have patterns like (F.linear, F.relu) which need to match to (toq.linear_relu). So, we need to match subgraphs. This PR does the following: * defines a "subgraph" as (start_node, end_node). The current assumption is that subgraphs are simple, there is always a path from start_node to end_node, and we can ignore any non-input args/kwargs of these nodes for the purposes of matching and copying things. An example one node subgraph is (F.linear, F.linear). An example two node subgraph is (F.linear, F.relu). * changes the matching logic to iterate over subgraphs instead of nodes * changes the NS core APIs to use subgraph pairs instead of node pairs: 1. for weights, we match on the start node 2. for unshadowed activations, we observe the end nodes 3. for shadowed activations, we copy the subgraph of a to graph c TODO(before review) write up better, not ready for review yet Test Plan: TODO before land: better test plan Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26403092 fbshipit-source-id: e49aaad4b02b8d60589435848bee422b8f41937a
-
NS for FX: add test for a simple sparsenn model (#52092)
Summary: Pull Request resolved: #52092 Adds a very simple toy sparsenn model, and enables its inspection with the new NS APIs. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_sparsenn_compare_activations python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_sparsenn_shadow ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26403095 fbshipit-source-id: 3c3650aca47186deb32f2b3f1d87a0716d1ad9d1