pytorch / pytorch

Summary: When compiling libtorch on macOS there is the option to use the `vecLib` BLAS library from Apple's (Accelerate)[https://developer.apple.com/documentation/accelerate] framework. Recent versions of macOS have changed the location of veclib.h, this change adds the new locations to `FindvecLib.cmake` To test run the following command: ``` BLAS=vecLib python setup.py install --cmake --cmake-only ``` The choice of BLAS library is confirmed in the output: ``` -- Trying to find preferred BLAS backend of choice: vecLib -- Found vecLib: /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Accelerate.framework/Versions/Current/Frameworks/vecLib.framework/Versions/Current/Headers ``` Pull Request resolved: #51288 Reviewed By: jbschlosser Differential Revision: D26531136 Pulled By: malfet fbshipit-source-id: ce86807ccbf66973f33b3acb99b7f40cfd182b9b

Summary: Delete unused `dispatchtypes` of `IMPLEMENT_FLOAT_KERNEL` and `IMPLEMENT_COMPLEX_KERNEL` Move common part of above-mentioned macros into `IMPLEMENT_ITERATOR_LAMBDA` macro Pull Request resolved: #52444 Reviewed By: walterddr Differential Revision: D26517032 Pulled By: malfet fbshipit-source-id: f03f89602f14fb513c66f3f2a96596e4c1e4cd16

Test Plan: revert-hammer Differential Revision: D26515596 (83feaeb) Original commit changeset: 0c25a8eba8ed fbshipit-source-id: 1a206f0b2923d922911fdaa5448a4e3a844ac5c4

Summary: This PR modifies the behavior of `_out` variants to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch With this PR result and input tensors must be on the same device and have the same "type kind". I skipped `qr` and `eig` in this process as they require a bit more work. Functions that can use the provided storage directly do so. If `result` is not empty and not in the batched column-major format or does not have the same type as input then we have to allocate a temporary tensor and copy it. TODO: - [x] Add more tests for same device and valid safe dtype - [x] Move inv and solve changes to separate PRs #51968, #51977 Ref. #42666 Pull Request resolved: #51560 Reviewed By: albanD Differential Revision: D26400734 Pulled By: heitorschueroff fbshipit-source-id: a6201ed7e919c1670c6ff3ef60217d1dbfb72e67

Summary: Pull Request resolved: #51698 Completely eliminates torch::utils::Future as we are now full relying on JitFuture. ghstack-source-id: 122037612 Test Plan: CI Reviewed By: kiukchung Differential Revision: D26243735 fbshipit-source-id: 95010a730f9d35e618f74c5f9de482738cd57c15

Summary: Pull Request resolved: #51697 Refactors the rest of rref_context, specifically pendingOwners map and `getOwnerRRef` to use JitFuture. ghstack-source-id: 122037611 Test Plan: CI Reviewed By: wanchaol Differential Revision: D26243268 fbshipit-source-id: ab8874c8253274e8fe50dcd7291e0655a8f3f1df

…terations (#51394) Summary: Pull Request resolved: #51394 log newly added construction and runtime stats at randomly selected iterations ghstack-source-id: 121934040 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D26161885 fbshipit-source-id: add6e02c1a03e6f74f08b9a9aecf90fa81631d60

Summary: Pull Request resolved: #51386 add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data 1. gpu time stats are not collected for single process multiple devices in this diff, as that requires events are created and recorded on multiple devices 2. use at::cuda::event API for safer calls 3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results. 4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls ghstack-source-id: 121933566 Test Plan: unit tests Reviewed By: SciPioneer Differential Revision: D26158645 fbshipit-source-id: ce5f15187802eba76accb980449be68902c10178

Summary: Pull Request resolved: #52385 This warning should specify that we did not find unused params in the _forward_ pass, which is when we log this warning. This is to avoid confusion when we get an error because not all outputs were used to compute loss, which also raises an error about unused parameters (to be fixed in the next diff) ghstack-source-id: 122001929 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26494136 fbshipit-source-id: d9b41732ea7e5e31b899d590d311080e3dc56682

Summary: Pull Request resolved: #52384 Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient. ghstack-source-id: 122001930 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D26482479 fbshipit-source-id: c80bdeea7cf9db35390e385084ef28d64ed239eb

Summary: Pull Request resolved: #52458 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D26523282 Pulled By: ejguan fbshipit-source-id: c7358fc351f859617754a27b8a701d11ada5d61a

Summary: Fixes #50790. Added `min()` & `max()` support for `Float16` & `BFloat16`. CUDA already supported these ops on `Float16`, so the other three combinations had to be enabled. `OpInfo`s for `min` & `max` were also added, and their sample inputs were removed from `method_tests()`. ### MORE INFO The (slightly) long-term goal is to add dispatch for `min()` & `max()` related operations on CPU & CUDA for `Float16` & `BFloat16`, wherever they aren't present already: 1. `amin()` 2. `argmax()` 3. `amax()` 4. `argmin()` 5. `torch._aminmax()` 6. `torch.clamp()` on CPU. Was already supported on CUDA 7. `min()` (in this PR) 8. `max()` (in this PR) 9. `minimum()` 10. `maximum()` I'll submit separate PRs for the other ops. Pull Request resolved: #51244 Reviewed By: jbschlosser Differential Revision: D26503455 Pulled By: anjali411 fbshipit-source-id: c32247f214e9272ca2e4322a23337874e737b140

…52369) Summary: Pull Request resolved: #52369 Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D26491425 fbshipit-source-id: d2c2a70bf1bc43ac2b63ac4cf9ae9c07887f12e9

Summary: Fixes #18627 Adds pow support for JIT Pull Request resolved: #52374 Test Plan: python test/test_jit.py -k test_torch_pow Reviewed By: Lilyjjo Differential Revision: D26515596 Pulled By: nikithamalgifb fbshipit-source-id: 0c25a8eba8ed93291c5e447e863edac2a35b61fb

…3) (#52394) Summary: Pull Request resolved: #52394 Test Plan: Imported from OSS test/test_tensorexpr.py test/test_jit_fuser_te.py Reviewed By: bertmaher Differential Revision: D26497856 Pulled By: huiguoo fbshipit-source-id: 8558f89826cad250da6f970bfc49384f2b9d7ee0

Summary: Pull Request resolved: #52388 Pull Request resolved: pytorch/glow#5364 This allows us to change global variables through onnxifi calls. And add python bindings along with it. Note that we supply a dummy backend_id as it's not needed by glow due to setting being global. #codemod Test Plan: ``` buck test mode/dev //glow/fb/test:test_onnxifi_optionnnpi ``` Reviewed By: jfix71, khabinov Differential Revision: D26481652 fbshipit-source-id: 19b8201c77f653cf7d93ad68760aa7fb5ec45ff4

Summary: Pull Request resolved: #52472 Reviewed By: jbschlosser Differential Revision: D26533896 Pulled By: anjali411 fbshipit-source-id: 55503028d5e087fc91992b417836cc87eb60ad55

Summary: Pull Request resolved: #52232 Pull Request resolved: pytorch/glow#5327 Reviewed By: gcatron Differential Revision: D26355583 fbshipit-source-id: f062e0b3a9cadf1584738bed85e9964b9a63efaf

Summary: Pull Request resolved: pytorch/glow#5359 Reviewed By: ChunliF Differential Revision: D26468908 fbshipit-source-id: 16c4f4215f302c023d75c204b999f23ed6254aa1

Summary: Re-enabling these test cases for ROCm because they are passing. jeffdaily Pull Request resolved: #52403 Reviewed By: jbschlosser, SciPioneer Differential Revision: D26516727 Pulled By: malfet fbshipit-source-id: 6c70805eda39b0aadfbeb30a527af3906d2da867

Summary: This is getting tested by #52441. Adds new config for macos arm64 to our binary builds. Now stores artifacts for mac builds. Pull Request resolved: #52443 Reviewed By: walterddr Differential Revision: D26517330 Pulled By: janeyx99 fbshipit-source-id: 02774937a827bdd4c08486dc9f8fe63446917f1e

…axTypeIndex from 256 to 32 Test Plan: revert-hammer Differential Revision: D26299594 (9e54532) Original commit changeset: 9a78c03da621 fbshipit-source-id: 2be1149539892447872eb3289f3fdef0ac92c090

Summary: CUDA 11.2 has a performance regression, so revert to CUDA 11.1.1. Pull Request resolved: #52234 Test Plan: [CI](https://github.com/pytorch/pytorch/actions?query=workflow%3A%22Build+PyTorch+nightly+Docker+image+and+push+to+GitHub+Container+Registry%22) Reviewed By: glaringlee Differential Revision: D26519105 Pulled By: xuzhao9 fbshipit-source-id: d1e1ecb7904c196292d83767b71000b465de73ce

Summary: Pull Request resolved: #52431 The previous implementation was missing the padding information. Thus is not correct. ghstack-source-id: 121950755 Test Plan: - `buck test pp-macos` - CircleCI Reviewed By: SS-JIA Differential Revision: D26508482 fbshipit-source-id: b28b99c399c4f1390a5cc4f023e470eed0f8c073

Summary: Fixes #52264 When CPU fusion is enabled without LLVM support in PyTorch, it causes huge slowdown (> 50x). This PR makes the LLVM backend the default backend for TE. Now, an error will be reported if CPU fusion is enabled without LLVM support, to avoid this performance regression. This PR also updates the tests to not use LLVM, so that the old flow is continued. This is necessary because tests run in CI do not have LLVM. Pull Request resolved: #52314 Reviewed By: ejguan Differential Revision: D26491294 Pulled By: navahgar fbshipit-source-id: 74561db1207da805d6d28039450db046ba2988fb

Summary: Pull Request resolved: #48994 Test Plan: Imported from OSS Reviewed By: ejguan Differential Revision: D25537189 Pulled By: VitalyFedyunin fbshipit-source-id: d81d247798fad3815b735468d66ef9d62c07ef77

…56 to 32 (#51881) Summary: Pull Request resolved: #51881 `MaxTypeIndex` controls the size of the array ``` detail::TypeMetaData* TypeMeta::typeMetaDatas() { static detail::TypeMetaData instances[MaxTypeIndex + 1] ``` in `typeid.cpp`. In practice, I have seen that this array doesn't hold more than 18 elements once the PyTorch library has been initialized (in mobile unit tests). I couldn't find situations where elements may be added to this array post library initialization. There is a runtime check to prevent array overflow, so reducing the size of the storage shouldn't come at any additional risk from the perspective of loss in visibility of errors. The fact that this array is staically allocated ends up using a bunch of space in the binary (potentially to initialize the trailing elements?). I'm somewhat surprised but this. However, this change registered a 15KiB size win on both fbios as well as igios. Found this when I was looking at a bloaty run that I shared with smessmer on friday: https://www.internalfb.com/intern/everpaste/?handle=GLXImQisHOfT74EBAKw47V3ktuAzbsIXAAAB I initially thought that the methods being passed in to the constructor of `detail::TypeMetaData` were causing the size increase, but only later relaized the issue after reading the folloing helpful comment: ``` // The remainder of the array is padded with TypeMetaData blanks. // The first of these is the entry for ScalarType::Undefined. // The rest are consumed by CAFFE_KNOWN_TYPE entries. ``` ghstack-source-id: 121875657 Test Plan: Sandcastle runs + the following BSB runs. ### igios ``` D26299594-V1 (https://www.internalfb.com/intern/diff/D26299594/?dest_number=121221891) igios: Succeeded Change in Download Size for arm64 + 3x assets variation: +596 B Change in Uncompressed Size for arm64 + 3x assets variation: -15.8 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:443632243487886@base/bsb:443632243487886@diff/ ``` ### fbios ``` D26299594-V1 (https://www.internalfb.com/intern/diff/D26299594/?dest_number=121221891) fbios: Succeeded Change in Download Size for arm64 + 3x assets variation: +104 B Change in Uncompressed Size for arm64 + 3x assets variation: -15.7 KiB Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:169233698063125@base/bsb:169233698063125@diff/ ``` Reviewed By: raziel, iseeyuan Differential Revision: D26299594 fbshipit-source-id: 9a78c03da621fbc25a1d8087376628bccc8dbfda

Summary: Pull Request resolved: #52319 Fixes: #52254 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26488411 Pulled By: gchanan fbshipit-source-id: 60eb471609986584c4235ba7f263581e988e7642

Summary: Pull Request resolved: #52409 Reviewed By: jbschlosser, glaringlee Differential Revision: D26513924 Pulled By: walterddr fbshipit-source-id: ee18ef357c326c5ad344d80c59821cc2b8814734

Summary: Pull Request resolved: #52389 Fixes: #49159 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D26496319 Pulled By: gchanan fbshipit-source-id: d385cd683ef09e0596a9875ce84d03e6e77acc93

Summary: Fixes #51719, #28142 **Change** - Update `torch.Tensor.unflatten` to support users pass`-1` as the inferred size for both tensors and named tensors. - Examples of using `-1` in the `unflatten` function are added to the docs. - Fix the rendered issue of original `unflatten` docs by removing a blank line between its example section. Pull Request resolved: #51955 Reviewed By: agolynski Differential Revision: D26467198 Pulled By: zou3519 fbshipit-source-id: 6a3ede25561223187273796427ad0cb63f125364

…52357) Summary: Pull Request resolved: #52357 Refactor the NS for FX compare unshadowed activations API to be able to work on N models and do arbitrary matching strategies. We factor out a util which takes a model and a list of nodes to extract weights for, with names to give the extracted weights. The user can then call this util with a set of nodes and names created in any way they want. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26487270 fbshipit-source-id: 1372ef07b5f3ddc7cebdfb2dee0221a2facd0527

Summary: Pull Request resolved: #52356 Refactor the NS for FX compare weights API to be able to work on N models and do arbitrary matching strategies. We factor out a util which takes a model and a list of nodes to extract weights for, with names to give the extracted weights. The user can then call this util with a set of nodes and names created in any way they want. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26487271 fbshipit-source-id: 0c2172a1b33d47565004a307aff14d205671add7

Summary: Pull Request resolved: #52130 We have patterns like (F.linear, F.relu) which need to match to (toq.linear_relu). So, we need to match subgraphs. This PR does the following: * defines a "subgraph" as (start_node, end_node). The current assumption is that subgraphs are simple, there is always a path from start_node to end_node, and we can ignore any non-input args/kwargs of these nodes for the purposes of matching and copying things. An example one node subgraph is (F.linear, F.linear). An example two node subgraph is (F.linear, F.relu). * changes the matching logic to iterate over subgraphs instead of nodes * changes the NS core APIs to use subgraph pairs instead of node pairs: 1. for weights, we match on the start node 2. for unshadowed activations, we observe the end nodes 3. for shadowed activations, we copy the subgraph of a to graph c TODO(before review) write up better, not ready for review yet Test Plan: TODO before land: better test plan Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26403092 fbshipit-source-id: e49aaad4b02b8d60589435848bee422b8f41937a

Summary: Pull Request resolved: #52092 Adds a very simple toy sparsenn model, and enables its inspection with the new NS APIs. Test Plan: ``` python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_sparsenn_compare_activations python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_sparsenn_shadow ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D26403095 fbshipit-source-id: 3c3650aca47186deb32f2b3f1d87a0716d1ad9d1

pytorch / pytorch

Commits on Feb 19, 2021

Commits on Feb 18, 2021