Skip to content
Permalink
master

Commits on Feb 19, 2021

  1. Update FindvecLib.cmake for macOS 10.14, 10.15 and Big Sur (#51288)

    Summary:
    When compiling libtorch on macOS there is the option to use the `vecLib` BLAS library from Apple's (Accelerate)[https://developer.apple.com/documentation/accelerate] framework. Recent versions of macOS have changed the location of veclib.h, this change adds the new locations to `FindvecLib.cmake`
    
    To test run the following command:
    ```
    BLAS=vecLib python setup.py install --cmake --cmake-only
    ```
    
    The choice of BLAS library is confirmed in the output:
    ```
    -- Trying to find preferred BLAS backend of choice: vecLib
    -- Found vecLib: /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/Accelerate.framework/Versions/Current/Frameworks/vecLib.framework/Versions/Current/Headers
    ```
    
    Pull Request resolved: #51288
    
    Reviewed By: jbschlosser
    
    Differential Revision: D26531136
    
    Pulled By: malfet
    
    fbshipit-source-id: ce86807ccbf66973f33b3acb99b7f40cfd182b9b
    davidkyle authored and facebook-github-bot committed Feb 19, 2021
  2. [BE] Cleanup UnaryOpsKernel.cpp (#52444)

    Summary:
    Delete unused `dispatchtypes` of `IMPLEMENT_FLOAT_KERNEL` and `IMPLEMENT_COMPLEX_KERNEL`
    Move common part of above-mentioned macros into `IMPLEMENT_ITERATOR_LAMBDA` macro
    
    Pull Request resolved: #52444
    
    Reviewed By: walterddr
    
    Differential Revision: D26517032
    
    Pulled By: malfet
    
    fbshipit-source-id: f03f89602f14fb513c66f3f2a96596e4c1e4cd16
    malfet authored and facebook-github-bot committed Feb 19, 2021
  3. Revert D26515596: [pytorch][PR] Add support for pow

    Test Plan: revert-hammer
    
    Differential Revision:
    D26515596 (83feaeb)
    
    Original commit changeset: 0c25a8eba8ed
    
    fbshipit-source-id: 1a206f0b2923d922911fdaa5448a4e3a844ac5c4
    zou3519 authored and facebook-github-bot committed Feb 19, 2021
  4. Fixed _out variants of linear algebra functions (#51560)

    Summary:
    This PR modifies the behavior of `_out` variants to match the description here https://github.com/pytorch/pytorch/wiki/Developer-FAQ#how-does-out-work-in-pytorch
    With this PR result and input tensors must be on the same device and have the same "type kind".
    
    I skipped `qr` and `eig` in this process as they require a bit more work.
    
    Functions that can use the provided storage directly do so. If `result` is not empty and not in the batched column-major format or does not have the same type as input then we have to allocate a temporary tensor and copy it.
    
    TODO:
    
    - [x] Add more tests for same device and valid safe dtype
    - [x] Move inv and solve changes to separate PRs #51968, #51977
    
    Ref. #42666
    
    Pull Request resolved: #51560
    
    Reviewed By: albanD
    
    Differential Revision: D26400734
    
    Pulled By: heitorschueroff
    
    fbshipit-source-id: a6201ed7e919c1670c6ff3ef60217d1dbfb72e67
    IvanYashchuk authored and facebook-github-bot committed Feb 19, 2021
  5. [RPC] delete torch/csrc/utils/future.h (#51698)

    Summary:
    Pull Request resolved: #51698
    
    Completely eliminates torch::utils::Future as we are now full relying on JitFuture.
    ghstack-source-id: 122037612
    
    Test Plan: CI
    
    Reviewed By: kiukchung
    
    Differential Revision: D26243735
    
    fbshipit-source-id: 95010a730f9d35e618f74c5f9de482738cd57c15
    rohan-varma authored and facebook-github-bot committed Feb 19, 2021
  6. [RPC] Refactor rref_context to not use utils::Future (#51697)

    Summary:
    Pull Request resolved: #51697
    
    Refactors the rest of rref_context, specifically pendingOwners map and `getOwnerRRef` to use JitFuture.
    ghstack-source-id: 122037611
    
    Test Plan: CI
    
    Reviewed By: wanchaol
    
    Differential Revision: D26243268
    
    fbshipit-source-id: ab8874c8253274e8fe50dcd7291e0655a8f3f1df
    rohan-varma authored and facebook-github-bot committed Feb 19, 2021
  7. log newly added construction and runtime stats at randomly selected i…

    …terations (#51394)
    
    Summary:
    Pull Request resolved: #51394
    
    log newly added construction and runtime stats at randomly selected iterations
    ghstack-source-id: 121934040
    
    Test Plan: unit tests
    
    Reviewed By: SciPioneer
    
    Differential Revision: D26161885
    
    fbshipit-source-id: add6e02c1a03e6f74f08b9a9aecf90fa81631d60
    zhaojuanmao authored and facebook-github-bot committed Feb 19, 2021
  8. add stats that can only be collected at runtime (#51386)

    Summary:
    Pull Request resolved: #51386
    
    add stats such as rebuilt bucket stats, unused parameter stats and performance stats to ddp logging data
    
    1. gpu time stats are not collected for single process multiple devices in this diff, as that requires events are created and recorded on multiple devices
    2. use at::cuda::event API for safer calls
    3. events may not be created in autograd hook if hook is not triggered in user's codes, e.g., users runs in non-sync mode in some iterations. So we checked events are created or not before synchronizing, also skipped invalid results.
    4. users may not set device upfront, so explicitly set proper device before creating events in our prepare_forward() and prepare_backward() calls
    
    ghstack-source-id: 121933566
    
    Test Plan: unit tests
    
    Reviewed By: SciPioneer
    
    Differential Revision: D26158645
    
    fbshipit-source-id: ce5f15187802eba76accb980449be68902c10178
    zhaojuanmao authored and facebook-github-bot committed Feb 19, 2021
  9. [DDP] Enhance warning for find_unused_params (#52385)

    Summary:
    Pull Request resolved: #52385
    
    This warning should specify that we did not find unused params in the
    _forward_ pass, which is when we log this warning. This is to avoid confusion
    when we get an error because not all outputs were used to compute loss, which
    also raises an error about unused parameters (to be fixed in the next diff)
    ghstack-source-id: 122001929
    
    Test Plan: CI
    
    Reviewed By: zhaojuanmao
    
    Differential Revision: D26494136
    
    fbshipit-source-id: d9b41732ea7e5e31b899d590d311080e3dc56682
    rohan-varma authored and facebook-github-bot committed Feb 19, 2021
  10. [DDP] unittest for when params arent used in backward pass (#52384)

    Summary:
    Pull Request resolved: #52384
    
    Adds a simple UT with unittest that we can modify when we enable DDP backward without needing all parameters to get gradient.
    ghstack-source-id: 122001930
    
    Test Plan: CI
    
    Reviewed By: zhaojuanmao
    
    Differential Revision: D26482479
    
    fbshipit-source-id: c80bdeea7cf9db35390e385084ef28d64ed239eb
    rohan-varma authored and facebook-github-bot committed Feb 19, 2021
  11. [DataLoader] Change signature of Functional DataPipe (#52458)

    Summary: Pull Request resolved: #52458
    
    Test Plan: Imported from OSS
    
    Reviewed By: glaringlee
    
    Differential Revision: D26523282
    
    Pulled By: ejguan
    
    fbshipit-source-id: c7358fc351f859617754a27b8a701d11ada5d61a
    ejguan authored and facebook-github-bot committed Feb 19, 2021
  12. Enable min & max for Float16 & BFloat16 (#51244)

    Summary:
    Fixes #50790.
    
    Added `min()` & `max()` support for `Float16` & `BFloat16`.
    CUDA already supported these ops on `Float16`, so the other three combinations had to be enabled.
    `OpInfo`s for `min` & `max` were also added, and their sample inputs were removed from `method_tests()`.
    
    ### MORE INFO
    The (slightly) long-term goal is to add dispatch for `min()` & `max()` related operations on CPU & CUDA for `Float16` & `BFloat16`,
    wherever they aren't present already:
    1. `amin()`
    2. `argmax()`
    3. `amax()`
    4. `argmin()`
    5. `torch._aminmax()`
    6. `torch.clamp()` on CPU. Was already supported on CUDA
    7. `min()` (in this PR)
    8. `max()` (in this PR)
    9. `minimum()`
    10. `maximum()`
    
    I'll submit separate PRs for the other ops.
    
    Pull Request resolved: #51244
    
    Reviewed By: jbschlosser
    
    Differential Revision: D26503455
    
    Pulled By: anjali411
    
    fbshipit-source-id: c32247f214e9272ca2e4322a23337874e737b140
    imaginary-person authored and facebook-github-bot committed Feb 19, 2021
  13. [quant][graphmode][fx] Fix fp16 dynamic quant for functional linear (#…

    …52369)
    
    Summary: Pull Request resolved: #52369
    
    Test Plan: Imported from OSS
    
    Reviewed By: supriyar
    
    Differential Revision: D26491425
    
    fbshipit-source-id: d2c2a70bf1bc43ac2b63ac4cf9ae9c07887f12e9
    jerryzh168 authored and facebook-github-bot committed Feb 19, 2021
  14. Add support for pow (#52374)

    Summary:
    Fixes #18627
    Adds pow support for JIT
    
    Pull Request resolved: #52374
    
    Test Plan: python test/test_jit.py -k test_torch_pow
    
    Reviewed By: Lilyjjo
    
    Differential Revision: D26515596
    
    Pulled By: nikithamalgifb
    
    fbshipit-source-id: 0c25a8eba8ed93291c5e447e863edac2a35b61fb
    nikithamalgifb authored and facebook-github-bot committed Feb 19, 2021
  15. Add NNC support for aten::hardtanh (a hot operation in mobilenet v2/v…

    …3) (#52394)
    
    Summary: Pull Request resolved: #52394
    
    Test Plan:
    Imported from OSS
    
    test/test_tensorexpr.py
    test/test_jit_fuser_te.py
    
    Reviewed By: bertmaher
    
    Differential Revision: D26497856
    
    Pulled By: huiguoo
    
    fbshipit-source-id: 8558f89826cad250da6f970bfc49384f2b9d7ee0
    huiguoo authored and facebook-github-bot committed Feb 19, 2021
  16. Add onnxifi interface for set/get options (#52388)

    Summary:
    Pull Request resolved: #52388
    
    Pull Request resolved: pytorch/glow#5364
    
    This allows us to change global variables through onnxifi calls. And add python bindings along with it. Note that we supply a dummy backend_id as it's not needed by glow due to setting being global.
    
    #codemod
    
    Test Plan:
    ```
    buck test mode/dev //glow/fb/test:test_onnxifi_optionnnpi
    ```
    
    Reviewed By: jfix71, khabinov
    
    Differential Revision: D26481652
    
    fbshipit-source-id: 19b8201c77f653cf7d93ad68760aa7fb5ec45ff4
    yinghai authored and facebook-github-bot committed Feb 19, 2021
  17. [ROCm] missing template declarations for complex blas (#52472)

    Summary: Pull Request resolved: #52472
    
    Reviewed By: jbschlosser
    
    Differential Revision: D26533896
    
    Pulled By: anjali411
    
    fbshipit-source-id: 55503028d5e087fc91992b417836cc87eb60ad55
    jeffdaily authored and facebook-github-bot committed Feb 19, 2021
  18. Improvements for FX tracer (#52232)

    Summary:
    Pull Request resolved: #52232
    
    Pull Request resolved: pytorch/glow#5327
    
    Reviewed By: gcatron
    
    Differential Revision: D26355583
    
    fbshipit-source-id: f062e0b3a9cadf1584738bed85e9964b9a63efaf
    jfix71 authored and facebook-github-bot committed Feb 19, 2021
  19. [glow] Extending AOT config with two more fields (#5359)

    Summary: Pull Request resolved: pytorch/glow#5359
    
    Reviewed By: ChunliF
    
    Differential Revision: D26468908
    
    fbshipit-source-id: 16c4f4215f302c023d75c204b999f23ed6254aa1
    khabinov authored and facebook-github-bot committed Feb 19, 2021

Commits on Feb 18, 2021

  1. [ROCm] Enable test_ddp_hooks.py test cases (#52403)

    Summary:
    Re-enabling these test cases for ROCm because they are passing.
    
    jeffdaily
    
    Pull Request resolved: #52403
    
    Reviewed By: jbschlosser, SciPioneer
    
    Differential Revision: D26516727
    
    Pulled By: malfet
    
    fbshipit-source-id: 6c70805eda39b0aadfbeb30a527af3906d2da867
    KyleCZH authored and facebook-github-bot committed Feb 18, 2021
  2. Add arm64 binary build (#52443)

    Summary:
    This is getting tested by #52441.
    
    Adds new config for macos arm64 to our binary builds.
    Now stores artifacts for mac builds.
    
    Pull Request resolved: #52443
    
    Reviewed By: walterddr
    
    Differential Revision: D26517330
    
    Pulled By: janeyx99
    
    fbshipit-source-id: 02774937a827bdd4c08486dc9f8fe63446917f1e
    janeyx99 authored and facebook-github-bot committed Feb 18, 2021
  3. Revert D26299594: [PyTorch Mobile] 15KiB size reduction by reducing M…

    …axTypeIndex from 256 to 32
    
    Test Plan: revert-hammer
    
    Differential Revision:
    D26299594 (9e54532)
    
    Original commit changeset: 9a78c03da621
    
    fbshipit-source-id: 2be1149539892447872eb3289f3fdef0ac92c090
    malfet authored and facebook-github-bot committed Feb 18, 2021
  4. Revert nightly docker build cuda version to 11.1.1. (#52234)

    Summary:
    CUDA 11.2 has a performance regression, so revert to CUDA 11.1.1.
    
    Pull Request resolved: #52234
    
    Test Plan: [CI](https://github.com/pytorch/pytorch/actions?query=workflow%3A%22Build+PyTorch+nightly+Docker+image+and+push+to+GitHub+Container+Registry%22)
    
    Reviewed By: glaringlee
    
    Differential Revision: D26519105
    
    Pulled By: xuzhao9
    
    fbshipit-source-id: d1e1ecb7904c196292d83767b71000b465de73ce
    xuzhao9 authored and facebook-github-bot committed Feb 18, 2021
  5. [iOS GPU] Fix max_pool_2d (#52431)

    Summary:
    Pull Request resolved: #52431
    
    The previous implementation was missing the padding information. Thus is not correct.
    ghstack-source-id: 121950755
    
    Test Plan:
    - `buck test pp-macos`
    - CircleCI
    
    Reviewed By: SS-JIA
    
    Differential Revision: D26508482
    
    fbshipit-source-id: b28b99c399c4f1390a5cc4f023e470eed0f8c073
    xta0 authored and facebook-github-bot committed Feb 18, 2021
  6. Make LLVM the default backend for TE (#52314)

    Summary:
    Fixes #52264
    
    When CPU fusion is enabled without LLVM support in PyTorch, it causes huge slowdown (> 50x). This PR makes the LLVM backend the default backend for TE. Now, an error will be reported if CPU fusion is enabled without LLVM support, to avoid this performance regression.
    
    This PR also updates the tests to not use LLVM, so that the old flow is continued. This is necessary because tests run in CI do not have LLVM.
    
    Pull Request resolved: #52314
    
    Reviewed By: ejguan
    
    Differential Revision: D26491294
    
    Pulled By: navahgar
    
    fbshipit-source-id: 74561db1207da805d6d28039450db046ba2988fb
    navahgar authored and facebook-github-bot committed Feb 18, 2021
  7. enable mkldnn conv2d backward to support mkldnn tensor input (#48994)

    Summary: Pull Request resolved: #48994
    
    Test Plan: Imported from OSS
    
    Reviewed By: ejguan
    
    Differential Revision: D25537189
    
    Pulled By: VitalyFedyunin
    
    fbshipit-source-id: d81d247798fad3815b735468d66ef9d62c07ef77
    XiaobingSuper authored and facebook-github-bot committed Feb 18, 2021
  8. [PyTorch Mobile] 15KiB size reduction by reducing MaxTypeIndex from 2…

    …56 to 32 (#51881)
    
    Summary:
    Pull Request resolved: #51881
    
    `MaxTypeIndex` controls the size of the array
    
    ```
    detail::TypeMetaData* TypeMeta::typeMetaDatas() {
      static detail::TypeMetaData instances[MaxTypeIndex + 1]
    ```
    
    in `typeid.cpp`.
    
    In practice, I have seen that this array doesn't hold more than 18 elements once the PyTorch library has been initialized (in mobile unit tests). I couldn't find situations where elements may be added to this array post library initialization.
    
    There is a runtime check to prevent array overflow, so reducing the size of the storage shouldn't come at any additional risk from the perspective of loss in visibility of errors.
    
    The fact that this array is staically allocated ends up using a bunch of space in the binary (potentially to initialize the trailing elements?). I'm somewhat surprised but this. However, this change registered a 15KiB size win on both fbios as well as igios.
    
    Found this when I was looking at a bloaty run that I shared with smessmer on friday: https://www.internalfb.com/intern/everpaste/?handle=GLXImQisHOfT74EBAKw47V3ktuAzbsIXAAAB
    
    I initially thought that the methods being passed in to the constructor of `detail::TypeMetaData` were causing the size increase, but only later relaized the issue after reading the folloing helpful comment:
    
    ```
        // The remainder of the array is padded with TypeMetaData blanks.
        // The first of these is the entry for ScalarType::Undefined.
        // The rest are consumed by CAFFE_KNOWN_TYPE entries.
    ```
    ghstack-source-id: 121875657
    
    Test Plan:
    Sandcastle runs + the following BSB runs.
    
    ### igios
    
    ```
    D26299594-V1 (https://www.internalfb.com/intern/diff/D26299594/?dest_number=121221891)
    
    igios: Succeeded
    Change in Download Size for arm64 + 3x assets variation: +596 B
    Change in Uncompressed Size for arm64 + 3x assets variation: -15.8 KiB
    
    Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:443632243487886@base/bsb:443632243487886@diff/
    ```
    
    ### fbios
    
    ```
    D26299594-V1 (https://www.internalfb.com/intern/diff/D26299594/?dest_number=121221891)
    
    fbios: Succeeded
    Change in Download Size for arm64 + 3x assets variation: +104 B
    Change in Uncompressed Size for arm64 + 3x assets variation: -15.7 KiB
    
    Mbex Comparison: https://our.intern.facebook.com/intern/mbex/bsb:169233698063125@base/bsb:169233698063125@diff/
    ```
    
    Reviewed By: raziel, iseeyuan
    
    Differential Revision: D26299594
    
    fbshipit-source-id: 9a78c03da621fbc25a1d8087376628bccc8dbfda
    dhruvbird authored and facebook-github-bot committed Feb 18, 2021
  9. Allow broadcasting against lerp weights. (#52319)

    Summary:
    Pull Request resolved: #52319
    
    Fixes: #52254
    
    Test Plan: Imported from OSS
    
    Reviewed By: albanD
    
    Differential Revision: D26488411
    
    Pulled By: gchanan
    
    fbshipit-source-id: 60eb471609986584c4235ba7f263581e988e7642
    gchanan authored and facebook-github-bot committed Feb 18, 2021
  10. [BE] _get_torch_cuda_version should return tuple (#52409)

    Summary: Pull Request resolved: #52409
    
    Reviewed By: jbschlosser, glaringlee
    
    Differential Revision: D26513924
    
    Pulled By: walterddr
    
    fbshipit-source-id: ee18ef357c326c5ad344d80c59821cc2b8814734
    walterddr authored and facebook-github-bot committed Feb 18, 2021
  11. Fix upsample bicubic2d batching handling on CPU. (#52389)

    Summary:
    Pull Request resolved: #52389
    
    Fixes: #49159
    
    Test Plan: Imported from OSS
    
    Reviewed By: albanD
    
    Differential Revision: D26496319
    
    Pulled By: gchanan
    
    fbshipit-source-id: d385cd683ef09e0596a9875ce84d03e6e77acc93
    gchanan authored and facebook-github-bot committed Feb 18, 2021
  12. Enhance Tensor.unflatten to support -1 as the inferred size (#51955)

    Summary:
    Fixes #51719, #28142
    
    **Change**
    - Update `torch.Tensor.unflatten` to support users pass`-1` as the inferred size for both tensors and named tensors.
    - Examples of using `-1` in the `unflatten` function are added to the docs.
    - Fix the rendered issue of original `unflatten` docs by removing a blank line between its example section.
    
    Pull Request resolved: #51955
    
    Reviewed By: agolynski
    
    Differential Revision: D26467198
    
    Pulled By: zou3519
    
    fbshipit-source-id: 6a3ede25561223187273796427ad0cb63f125364
    RockingJavaBean authored and facebook-github-bot committed Feb 18, 2021
  13. ns for fx: make unshadowed activation comparison work for N models (#…

    …52357)
    
    Summary:
    Pull Request resolved: #52357
    
    Refactor the NS for FX compare unshadowed activations API to be able
    to work on N models and do arbitrary matching strategies.
    
    We factor out a util which takes a model and a list of
    nodes to extract weights for, with names to give the extracted
    weights. The user can then call this util with a set
    of nodes and names created in any way they want.
    
    Test Plan:
    ```
    python test/test_quantization.py TestFXNumericSuiteCoreAPIs
    ```
    
    Imported from OSS
    
    Reviewed By: raghuramank100
    
    Differential Revision: D26487270
    
    fbshipit-source-id: 1372ef07b5f3ddc7cebdfb2dee0221a2facd0527
    vkuzo authored and facebook-github-bot committed Feb 18, 2021
  14. ns for fx: make weights comparison work on N models (#52356)

    Summary:
    Pull Request resolved: #52356
    
    Refactor the NS for FX compare weights API to be able to
    work on N models and do arbitrary matching strategies.
    
    We factor out a util which takes a model and a list of
    nodes to extract weights for, with names to give the extracted
    weights.  The user can then call this util with a set
    of nodes and names created in any way they want.
    
    Test Plan:
    ```
    python test/test_quantization.py TestFXNumericSuiteCoreAPIs
    ```
    
    Imported from OSS
    
    Reviewed By: raghuramank100
    
    Differential Revision: D26487271
    
    fbshipit-source-id: 0c2172a1b33d47565004a307aff14d205671add7
    vkuzo authored and facebook-github-bot committed Feb 18, 2021
  15. [wip] ns for fx: add support for subgraph matching (#52130)

    Summary:
    Pull Request resolved: #52130
    
    We have patterns like (F.linear, F.relu) which need to match
    to (toq.linear_relu).  So, we need to match subgraphs.
    
    This PR does the following:
    * defines a "subgraph" as (start_node, end_node). The current assumption
    is that subgraphs are simple, there is always a path from start_node to
    end_node, and we can ignore any non-input args/kwargs of these nodes
    for the purposes of matching and copying things. An example one node
    subgraph is (F.linear, F.linear).  An example two node subgraph
    is (F.linear, F.relu).
    * changes the matching logic to iterate over subgraphs instead of nodes
    * changes the NS core APIs to use subgraph pairs instead of node pairs:
    1. for weights, we match on the start node
    2. for unshadowed activations, we observe the end nodes
    3. for shadowed activations, we copy the subgraph of a to graph c
    
    TODO(before review) write up better, not ready for review yet
    
    Test Plan:
    TODO before land: better test plan
    
    Imported from OSS
    
    Reviewed By: raghuramank100
    
    Differential Revision: D26403092
    
    fbshipit-source-id: e49aaad4b02b8d60589435848bee422b8f41937a
    vkuzo authored and facebook-github-bot committed Feb 18, 2021
  16. NS for FX: add test for a simple sparsenn model (#52092)

    Summary:
    Pull Request resolved: #52092
    
    Adds a very simple toy sparsenn model, and enables
    its inspection with the new NS APIs.
    
    Test Plan:
    ```
    python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_sparsenn_compare_activations
    python test/test_quantization.py TestFXNumericSuiteCoreAPIs.test_sparsenn_shadow
    ```
    
    Imported from OSS
    
    Reviewed By: raghuramank100
    
    Differential Revision: D26403095
    
    fbshipit-source-id: 3c3650aca47186deb32f2b3f1d87a0716d1ad9d1
    vkuzo authored and facebook-github-bot committed Feb 18, 2021
Older