Pulse · pytorch/pytorch · GitHub

September 24, 2024 – September 27, 2024

Overview

148 Active pull requests

290 Active issues

17 Pull requests merged by 8 people

SDPA regression fix to work around high-precision by default
#136536 merged Sep 27, 2024
[Docs] fix inconsistent docs in conv1d, conv2d, and conv3d
#136813 merged Sep 27, 2024
[Update] Update note for Getting Started with PyTorch on Intel GPUs
#136731 merged Sep 27, 2024
Fix ROCm skip decorator for test_ddp_tp and multiprocess UTs (#136161)
#136801 merged Sep 26, 2024
Update current maintainers
#136769 merged Sep 26, 2024
Constraint setuptools to 72.1.0 or older in requirements.txt
#136729 merged Sep 26, 2024
Revert "Trace fwd graph under no_grad mode #134872"
#136734 merged Sep 26, 2024
Make test_skip_data_serialization regex more flexible
#136710 merged Sep 26, 2024
Disable iOS workflow
#136706 merged Sep 26, 2024
[RELEASE-ONLY CHANGES] Don't push to https://ghcr.io/
#136703 merged Sep 26, 2024
Fix hardcoded ROCm paths in Caffe2Targets.cmake
#136700 merged Sep 26, 2024
[ROCm] upgrade ROCm CI builds to py3.10 (#134108)
#136696 merged Sep 26, 2024
[ROCm][CI] upgrade CI to ROCm 6.2 (#132555)
#136467 merged Sep 25, 2024
[ROCm] Cherry-pick unit test fixes to release/2.5
#136557 merged Sep 25, 2024
fix stride compare failed when size value equal to one in ForeachUtils.h
#136426 merged Sep 25, 2024
[ROCm] [BUGFIX] Re-enable rocm-specific tuning parameters v2 (#133852)
#136139 merged Sep 25, 2024
Fix test_skip_data_serialization pickle exception match
#136617 merged Sep 25, 2024

131 Pull requests opened by 80 people

[DeviceMesh][EZ] Add group description to new group
#136558 opened Sep 24, 2024
Sac ilp
#136562 opened Sep 24, 2024
Not for commit -- debugging CI
#136564 opened Sep 24, 2024
Rel 2.5 dummy change
#136569 opened Sep 24, 2024
Enable regression test for add loop benchmarks
#136573 opened Sep 24, 2024
[SymmetricMemory] improve multicast initialization/fallback logic
#136577 opened Sep 24, 2024
[DTensor] Add shard method
#136589 opened Sep 25, 2024
[export] simplify automatic dynamic shapes processing
#136591 opened Sep 25, 2024
Bump webrick from 1.7.0 to 1.8.2 in /ios/TestApp
#136593 opened Sep 25, 2024
[Partitioner] Enumerate partitions by iterating partition ids
#136598 opened Sep 25, 2024
[DeviceMesh] Remove set_device
#136604 opened Sep 25, 2024
[FSDP2] Added `shard_placement_fn` only for `_FlatShard`
#136606 opened Sep 25, 2024
[Partitioner] Remove unnecessary upstream nodes in dependency viewer
#136608 opened Sep 25, 2024
[inductor] Reduce block sizes when using Triton CPU backend
#136612 opened Sep 25, 2024
[Partitioner] Reduce time consuming of partitions merger
#136614 opened Sep 25, 2024
[DeviceMesh] Respect user's device type in complimentary PG init
#136615 opened Sep 25, 2024
[Partitioner] Speed up the update of partition map
#136616 opened Sep 25, 2024
Enable ruff's unused variable checking for all pytorch
#136625 opened Sep 25, 2024
Add out_dtype kw argument to optimize_bsr_dense_addmm
#136626 opened Sep 25, 2024
Set RUNPATH so installed tests can find the required shared libraries
#136627 opened Sep 25, 2024
[aotd] Test rrelu noise mutation in compile
#136629 opened Sep 25, 2024
Fix 136201-Compile with USE_CPP_CODE_COVERAGE=ON throw erros: use lld…
#136632 opened Sep 25, 2024
Add output dtype support to count_nonzeros
#136635 opened Sep 25, 2024
Remove potentially unnecessary decomps
#136641 opened Sep 25, 2024
[DO NOT MERGE] Test .github/workflows/inductor-perf-compare.yml on AWS A100 infra
#136646 opened Sep 25, 2024
[ts_converter] Fix prim::If buffer names
#136648 opened Sep 25, 2024
[ts_converter] Support as_tensor
#136649 opened Sep 25, 2024
Fix six broken tests in test_ops.py
#136653 opened Sep 25, 2024
add types to _dynamo/code_context.py
#136665 opened Sep 25, 2024
Migrate ARM64 Linux binary jobs to runner determinator
#136666 opened Sep 25, 2024
Don't generate implicit value ranges for missing symbols.
#136667 opened Sep 25, 2024
Revert a bunch of stuff
#136668 opened Sep 25, 2024
inductor: use previous guards to know if a size is 1 for broadcasting
#136670 opened Sep 25, 2024
Tensorify compute on Python scalars
#136674 opened Sep 25, 2024
[aotd] No AOT compilation for backward
#136675 opened Sep 25, 2024
enable auto functionalize v2 by default
#136685 opened Sep 25, 2024
[user triton] Make tl.constexpr specialization work for triton_op & capture_triton
#136686 opened Sep 25, 2024
[Inductor][CPP] Cache weight tiles in L1D for AMX int8 WoQ GEMM
#136688 opened Sep 25, 2024
Rewrite fake mode detector
#136690 opened Sep 25, 2024
Scoped extension building for C++ backed custom ops tests
#136695 opened Sep 26, 2024
## Fix `devices` Parameter Type in `benchmark_utilization` Function
#136698 opened Sep 26, 2024
[inductor] Test scheme to minimize mem overhead of autotuning
#136701 opened Sep 26, 2024
[WIP][Inductor] auto-chunker
#136702 opened Sep 26, 2024
TEMP
#136707 opened Sep 26, 2024
Delete duplicate bindings in torch/csrc/autograd/python_torch_functions_manual.cpp
#136711 opened Sep 26, 2024
[PT2][Inductor] Add runtime numeric check for the post grad pass
#136724 opened Sep 26, 2024
Add diagonal_copy to torch/_decomp/__init__.py
#136730 opened Sep 26, 2024
[Inductor] change user_visible_outputs to user_visible_output_idxs
#136732 opened Sep 26, 2024
[compiled autograd] initialize cudagraph tls from context manager
#136735 opened Sep 26, 2024
Add type check for `f` in `torch.package.PackageExporter`
#136738 opened Sep 26, 2024
[Quant] Check stride > 0 for QConv and QConvTranspose
#136739 opened Sep 26, 2024
[no land] test fail due to win
#136740 opened Sep 26, 2024
[compiled autograd] undo view_to_reshape inductor fx pass in node name matching
#136741 opened Sep 26, 2024
[AOTI] Support generate c shim layer for Intel GPU.
#136742 opened Sep 26, 2024
Wrap torch_python with torch_compile_options
#136743 opened Sep 26, 2024
Fix overflow error when `torch.bincount()` handles a large tensor
#136745 opened Sep 26, 2024
change GPT2ForSequenceClassification inference accuracy tolerance
#136749 opened Sep 26, 2024
Implement `AcceleratorHooksInterface`'s virtual functions `deviceCount()` and `getCurrentDevice()` for CUDA and XPU
#136752 opened Sep 26, 2024
[Intel GPU] qlinear.pointwise with mixed dtype support
#136753 opened Sep 26, 2024
compile time benchmarks for AOTDispatcher (inference/training/subclasses)
#136759 opened Sep 26, 2024
compile time benchmarks for AOTDispatcher (partitioner)
#136760 opened Sep 26, 2024
[Inductor UT] Generalize device-bias code introduced from #136472
#136761 opened Sep 26, 2024
TEST
#136763 opened Sep 26, 2024
Fix for MSVC problem on Windows Arm64
#136765 opened Sep 26, 2024
[aoti][inplace] Support skipping model buffers
#136770 opened Sep 26, 2024
Preserve custom ops via run_decomps
#136773 opened Sep 26, 2024
Remove dtype check on meta device
#136774 opened Sep 26, 2024
[WIP] Add dtype attribute to TritonCSEVariable
#136778 opened Sep 26, 2024
[Inductor] Ensure that the strides of user-visible outputs remain unchanged after post_grad passes
#136779 opened Sep 26, 2024
Add generator parameter to rand*_like functions
#136780 opened Sep 26, 2024
[inductor] add a threshold for membw saving during fusion
#136782 opened Sep 26, 2024
Enable experiments for protected branches
#136785 opened Sep 26, 2024
Download pre-compiled AOTriton from GitHub unless AOTRITON_INSTALL_FROM_SOURCE=1 is set
#136786 opened Sep 26, 2024
add ToFloat, TruncToInt to PythonReferenceAnalysis
#136787 opened Sep 26, 2024
override bool(), is_nonzero for real tensor tracing
#136788 opened Sep 26, 2024
[NCCL] Implement ncclCommInitRankScalable
#136789 opened Sep 26, 2024
[c10d] Fix the device query story of ProcessGroup
#136790 opened Sep 26, 2024
FlexAttention support for NJT
#136792 opened Sep 26, 2024
Init threadpool with user defined num_threads before default
#136793 opened Sep 26, 2024
[BE] Add script to keept the runner-determinator scripts in sync
#136794 opened Sep 26, 2024
Skip the torch.compile in torch::deploy
#136795 opened Sep 26, 2024
[TorchRec][PT2 compile] enable dynamo in _get_user_embeddings
#136798 opened Sep 26, 2024
[CI] upload_metrics function to upload to s3 instead of dynamo
#136799 opened Sep 26, 2024
update the torch.linalg.solve tests for NumPy 2
#136800 opened Sep 26, 2024
[export] add translations for SymInt/Bool deserialization; FloorDiv
#136802 opened Sep 26, 2024
[hack/POC] get DTensor to work with compiled autograd
#136803 opened Sep 26, 2024
[pipelining] Clean up dead code
#136804 opened Sep 26, 2024
[RELEASE-ONLY CHANGES] Delete slow workflows
#136805 opened Sep 26, 2024
Enable tracing through auot_functionalized_v2 in compiled autograd
#136806 opened Sep 26, 2024
[Pytorch][AO] Update choose_qparams_per_token op to output correct shape for scales and zp
#136807 opened Sep 27, 2024
[inductor] Improve operatorbench.py
#136808 opened Sep 27, 2024
[inductor] Benchmark Halide in operatorbench.py
#136809 opened Sep 27, 2024
[halide-backend] Fix ops.fma codegen
#136810 opened Sep 27, 2024
Companion PR to https://github.com/pytorch/pytorch/pull/134022
#136818 opened Sep 27, 2024
Avoid sqrt calculations with values less than zero
#136824 opened Sep 27, 2024
[DONOTＭERGE] Update xpu.txt
#136825 opened Sep 27, 2024
Added some tests to prevent regressions in partitioning and flexattention
#136826 opened Sep 27, 2024
[cpu] Modify inductor opt flag --- ftree-loop-vectorize
#136827 opened Sep 27, 2024
[Inductor] Handle device property `warp_size` is None but used on XPU.
#136834 opened Sep 27, 2024
Add back DistributedDataParallel types that were lost when pyi was removed
#136835 opened Sep 27, 2024
[Distributed][Test] Fix todo in distributed test files
#136836 opened Sep 27, 2024
Add option to disable operator profiling
#136838 opened Sep 27, 2024
Update maintainers for inductor and x86 CPU
#136839 opened Sep 27, 2024
[SymmetricMemory] expose the multicast_ptr
#136840 opened Sep 27, 2024
[TEST ONLY][hack/POC] get DTensor to work with compiled autograd
#136841 opened Sep 27, 2024
Traceable FSDP2 + TP
#136842 opened Sep 27, 2024
[fsdp2] based on device, use stream and Event
#136843 opened Sep 27, 2024
Use static variables
#136847 opened Sep 27, 2024
Fix clang-tidy warnings
#136848 opened Sep 27, 2024
Enable XNNPACK for quantized add
#136850 opened Sep 27, 2024
Enable clang-tidy on torch/csrc/lazy
#136851 opened Sep 27, 2024
Run Aarch64 Dashboard with TORCHINDUCTOR_FREEZING and TORCHINDUCTOR_CPP_WRAPPER
#136853 opened Sep 27, 2024
[DCP] use global coordinator rank for distributed ops in _DistWrapper
#136854 opened Sep 27, 2024
Change aarch64 dashboard config to use float32 inference
#136855 opened Sep 27, 2024
[WIP][Inductor UT] Generalize newly introduced inductor UTs for intel GPU (Part 2)
#136856 opened Sep 27, 2024
Get rid of quadratic tests to has_same_metadata
#136857 opened Sep 27, 2024
Avoid reorder in mkldnn_to_dense when output is already in a public format
#136859 opened Sep 27, 2024
[MPS] Error checking/bf16 support for `torch.normal`
#136863 opened Sep 27, 2024
[reland][Elastic] Skip store barrier and store get in host assign
#136865 opened Sep 27, 2024
[inductor] Enable coordinate descent tuning with max-autotune
#136867 opened Sep 27, 2024
testing
#136868 opened Sep 27, 2024
[export] Draft of draft export
#136869 opened Sep 27, 2024
drafting
#136870 opened Sep 27, 2024
Improve is_fbcode functionality
#136871 opened Sep 27, 2024
Fix prefix store seg fault
#136872 opened Sep 27, 2024
[AOTI] Add TORCH_CHECK_STD_ERROR
#136873 opened Sep 27, 2024
Bump triton pin to latest 3.1.x release branch
#136874 opened Sep 27, 2024
Fix autograd.Function + NJT when an output grad is None
#136875 opened Sep 27, 2024
[testing] reenable kernel_benchmark.py tests
#136876 opened Sep 27, 2024
upload test stats: remove nan/inf when uploading
#136877 opened Sep 27, 2024
dont let partitioner think it can fuse pointwise ops into user triton kernels
#136878 opened Sep 27, 2024

196 Issues closed by 23 people

[Doc issue] RMSNorm formula
#136597 closed Sep 27, 2024
Severe SDPA Performance Regression 2.5.0-RC1
#135778 closed Sep 27, 2024
Could not find a configuration file for package "HIP" - requested 1.0, found 6.0.0
#128313 closed Sep 27, 2024
PyTorch for ROCm on a Supported Device Throws "hipErrorNoBinaryForGpu"
#73534 closed Sep 27, 2024
test_eig_with_eigvec_cuda_float64 is flaky on ROCm
#57128 closed Sep 27, 2024
Error on installation
#83795 closed Sep 27, 2024
DISABLED test_unary_ops (__main__.TestTensorExprFuser)
#105119 closed Sep 27, 2024
`torch.igamma` error and gives wrong results on float64 ROCm
#46531 closed Sep 27, 2024
[ROCm] test failures during 4.1 upgrade
#54535 closed Sep 27, 2024
from torch._C import default_generator ImportError: cannot import name 'default_generator'
#40295 closed Sep 27, 2024
ROCm 2.1: test_gamma_gpu_sample test fails
#16661 closed Sep 27, 2024
SDPA batching rules need randomness handling
#135020 closed Sep 27, 2024
`torch._export.aot_compile` CUDA version not compatible with C++ ABI
#134777 closed Sep 27, 2024
Time cost bug in "torch.linalg.cholesky()"
#136823 closed Sep 27, 2024
DISABLED test_comprehensive_polygamma_polygamma_n_3_cpu_bool (__main__.TestInductorOpInfoCPU)
#135986 closed Sep 27, 2024
DISABLED test_comprehensive_polygamma_polygamma_n_0_cpu_float16 (__main__.TestInductorOpInfoCPU)
#135985 closed Sep 27, 2024
DISABLED test_comprehensive_nn_functional_cosine_embedding_loss_cpu_int32 (__main__.TestInductorOpInfoCPU)
#135981 closed Sep 27, 2024
DISABLED test_comprehensive_short_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135987 closed Sep 27, 2024
DISABLED test_comprehensive_repeat_cpu_int32 (__main__.TestInductorOpInfoCPU)
#135984 closed Sep 27, 2024
DISABLED test_comprehensive_nn_functional_interpolate_nearest-exact_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135988 closed Sep 27, 2024
DISABLED test_comprehensive_nn_functional_interpolate_nearest-exact_cpu_float16 (__main__.TestInductorOpInfoCPU)
#135983 closed Sep 27, 2024
DISABLED test_comprehensive_nn_functional_pad_replicate_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135975 closed Sep 27, 2024
DISABLED test_comprehensive_softmax_with_dtype_cpu_bool (__main__.TestInductorOpInfoCPU)
#135976 closed Sep 27, 2024
DISABLED test_comprehensive_special_spherical_bessel_j0_cpu_int64 (__main__.TestInductorOpInfoCPU)
#135982 closed Sep 27, 2024
DISABLED test_comprehensive_round_decimals_neg_3_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135980 closed Sep 27, 2024
DISABLED test_comprehensive_nn_functional_pad_replicate_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135977 closed Sep 27, 2024
DISABLED test_comprehensive_scatter_reduce_amax_cpu_int64 (__main__.TestInductorOpInfoCPU)
#135978 closed Sep 27, 2024
DISABLED test_comprehensive_signal_windows_nuttall_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135989 closed Sep 27, 2024
DISABLED test_comprehensive_nn_functional_pad_constant_cpu_int32 (__main__.TestInductorOpInfoCPU)
#135974 closed Sep 27, 2024
Torchvision.transforms.v2 does nothing / fails silently with numpy arrays
#136844 closed Sep 27, 2024
`torch._dynamo.exc.Unsupported: torch.* op returned non-Tensor bool call_method is_inference`
#135439 closed Sep 27, 2024
DISABLED test_comprehensive_sinc_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135940 closed Sep 27, 2024
DISABLED test_comprehensive_put_cpu_bool (__main__.TestInductorOpInfoCPU)
#135952 closed Sep 27, 2024
DISABLED test_comprehensive_scatter_reduce_sum_cpu_bool (__main__.TestInductorOpInfoCPU)
#135947 closed Sep 27, 2024
DISABLED test_comprehensive_nn_functional_feature_alpha_dropout_with_train_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135949 closed Sep 27, 2024
DISABLED test_comprehensive_round_decimals_3_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135951 closed Sep 27, 2024
DISABLED test_comprehensive_prod_cpu_int32 (__main__.TestInductorOpInfoCPU)
#135942 closed Sep 27, 2024
DISABLED test_comprehensive_rsub_cpu_float16 (__main__.TestInductorOpInfoCPU)
#135946 closed Sep 27, 2024
DISABLED test_comprehensive_scatter_reduce_prod_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135944 closed Sep 27, 2024
DISABLED test_comprehensive_polygamma_polygamma_n_2_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135950 closed Sep 27, 2024
DISABLED test_comprehensive_special_scaled_modified_bessel_k1_cpu_bool (__main__.TestInductorOpInfoCPU)
#135941 closed Sep 27, 2024
DISABLED test_comprehensive_put_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135938 closed Sep 27, 2024
DISABLED test_comprehensive_scatter_reduce_amin_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135939 closed Sep 27, 2024
DISABLED test_comprehensive_special_ndtri_cpu_bool (__main__.TestInductorOpInfoCPU)
#135948 closed Sep 27, 2024
DISABLED test_comprehensive_square_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135943 closed Sep 27, 2024
DISABLED test_comprehensive_ones_cpu_int64 (__main__.TestInductorOpInfoCPU)
#135945 closed Sep 27, 2024
DISABLED test_comprehensive_remainder_cpu_int64 (__main__.TestInductorOpInfoCPU)
#135937 closed Sep 27, 2024
models `.forward` and exported onnx are not the same
#130826 closed Sep 27, 2024
torch.onnx.errors.UnsupportedOperatorError
#131635 closed Sep 27, 2024
torch.onnx.export Doesn't allow dynamic shapes when updating a tensor
#135233 closed Sep 27, 2024
ONNX dynamic sized model export with torch.onnx.dynamo_export fails when .copy_() / roll / fftn is used
#128324 closed Sep 27, 2024
onnx.export() fails on aten::embedding_bag with padding_idx
#128930 closed Sep 27, 2024
[ONNX] How to export the FlashAttention kernel
#135645 closed Sep 27, 2024
`torch.nn.functional._in_projection_packed` Failed to export to ONNX
#135764 closed Sep 27, 2024
[ONNX] Support `operator.mod`
#136524 closed Sep 27, 2024
The order of the parameters of `nn.Conv1d()`, `nn.Conv2d()` and `nn.Conv3d()` should be explained in the actual order of the parameters.
#135880 closed Sep 27, 2024
DISABLED test_comprehensive_scatter_add_cpu_float16 (__main__.TestInductorOpInfoCPU)
#135899 closed Sep 27, 2024
DISABLED test_comprehensive_norm_inf_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135908 closed Sep 27, 2024
DISABLED test_comprehensive_ones_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135902 closed Sep 27, 2024
DISABLED test_comprehensive_polygamma_polygamma_n_4_cpu_int32 (__main__.TestInductorOpInfoCPU)
#135900 closed Sep 27, 2024
DISABLED test_comprehensive_ones_like_cpu_float16 (__main__.TestInductorOpInfoCPU)
#135905 closed Sep 27, 2024
DISABLED test_comprehensive_polygamma_polygamma_n_2_cpu_int64 (__main__.TestInductorOpInfoCPU)
#135904 closed Sep 27, 2024
DISABLED test_comprehensive_special_scaled_modified_bessel_k0_cpu_int32 (__main__.TestInductorOpInfoCPU)
#135901 closed Sep 27, 2024
DISABLED test_comprehensive_roll_cpu_float16 (__main__.TestInductorOpInfoCPU)
#135898 closed Sep 27, 2024
DISABLED test_comprehensive_special_xlog1py_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135895 closed Sep 27, 2024
DISABLED test_comprehensive_special_modified_bessel_k1_cpu_int64 (__main__.TestInductorOpInfoCPU)
#135903 closed Sep 27, 2024
DISABLED test_comprehensive_rand_like_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135897 closed Sep 27, 2024
DISABLED test_comprehensive_remainder_cpu_int32 (__main__.TestInductorOpInfoCPU)
#135907 closed Sep 27, 2024
DISABLED test_comprehensive_nn_functional_conv3d_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135896 closed Sep 27, 2024
DISABLED test_comprehensive_signbit_cpu_bool (__main__.TestInductorOpInfoCPU)
#135906 closed Sep 27, 2024
DISABLED test_comprehensive_reciprocal_cpu_int32 (__main__.TestInductorOpInfoCPU)
#135909 closed Sep 27, 2024
DISABLED test_comprehensive_repeat_cpu_int64 (__main__.TestInductorOpInfoCPU)
#135910 closed Sep 27, 2024
vector norm is drastically different for different data types
#123645 closed Sep 27, 2024
INTERNAL ASSERT FAILED at "../torch/csrc/autograd/python_torch_functions_manual.cpp":661 when returning a constant tensor in the forward method
#132134 closed Sep 26, 2024
Long queue for Linux runners
#136762 closed Sep 26, 2024
[torch.profiler] double counting CUDA wrapper self-cuda-time
#60783 closed Sep 26, 2024
matrix_norm performance vastly underwhelming vs deprecated torch.norm
#136360 closed Sep 26, 2024
Can't load AOT Inductor binary on cuda:1 device
#136369 closed Sep 26, 2024
[NJT] Gradients for bias do not get populated for nn.Linear
#136652 closed Sep 26, 2024
[RFC] Cuda support matrix for Release 2.5
#134015 closed Sep 26, 2024
DISABLED test_torch_function_mode_guards_ignored_types_py (__main__.TorchFunctionModeTests)
#135102 closed Sep 26, 2024
RuntimeError: "arange_mps" not implemented for 'BFloat16'
#136624 closed Sep 26, 2024
Wrapper subclasses utilizing reentrant dispatch break when a TorchDispatchMode is enabled
#136565 closed Sep 26, 2024
DISABLED test_comprehensive_polygamma_polygamma_n_0_cpu_int64 (__main__.TestInductorOpInfoCPU)
#135856 closed Sep 26, 2024
DISABLED test_comprehensive_special_modified_bessel_k1_cpu_int32 (__main__.TestInductorOpInfoCPU)
#135846 closed Sep 26, 2024
DISABLED test_comprehensive_nn_functional_conv2d_cpu_int64 (__main__.TestInductorOpInfoCPU)
#135839 closed Sep 26, 2024
DISABLED test_comprehensive_randn_like_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135840 closed Sep 26, 2024
DISABLED test_comprehensive_special_scaled_modified_bessel_k1_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135850 closed Sep 26, 2024
DISABLED test_comprehensive_nn_functional_threshold_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135844 closed Sep 26, 2024
DISABLED test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cpu_int32 (__main__.TestInductorOpInfoCPU)
#135843 closed Sep 26, 2024
DISABLED test_comprehensive_remainder_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135855 closed Sep 26, 2024
DISABLED test_comprehensive_where_cpu_int64 (__main__.TestInductorOpInfoCPU)
#135842 closed Sep 26, 2024
DISABLED test_comprehensive_nn_functional_triplet_margin_loss_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135854 closed Sep 26, 2024
DISABLED test_comprehensive_rsub_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135848 closed Sep 26, 2024
DISABLED test_comprehensive_signbit_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135853 closed Sep 26, 2024
DISABLED test_comprehensive_rad2deg_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135852 closed Sep 26, 2024
DISABLED test_comprehensive_sigmoid_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135849 closed Sep 26, 2024
DISABLED test_comprehensive_randint_cpu_float16 (__main__.TestInductorOpInfoCPU)
#135838 closed Sep 26, 2024
DISABLED test_comprehensive_zeros_cpu_int64 (__main__.TestInductorOpInfoCPU)
#135845 closed Sep 26, 2024
DISABLED test_comprehensive_special_hermite_polynomial_he_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135847 closed Sep 26, 2024
DISABLED test_comprehensive_polygamma_polygamma_n_2_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135851 closed Sep 26, 2024
DISABLED test_comprehensive_polygamma_polygamma_n_3_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135841 closed Sep 26, 2024
DISABLED test_comprehensive_nn_functional_triplet_margin_loss_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135798 closed Sep 26, 2024
DISABLED test_comprehensive_polygamma_polygamma_n_3_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135738 closed Sep 26, 2024
DISABLED test_comprehensive_polygamma_polygamma_n_4_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135744 closed Sep 26, 2024
DISABLED test_comprehensive_nn_functional_smooth_l1_loss_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135746 closed Sep 26, 2024
DISABLED test_comprehensive_prod_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135810 closed Sep 26, 2024
DISABLED test_comprehensive_polygamma_polygamma_n_0_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135753 closed Sep 26, 2024
DISABLED test_scaled_dot_product_fused_attention_overrideable (__main__.TestSDPAPrivateUse1Only)
#134601 closed Sep 26, 2024
`input` parameter of `index_select()` with a 0D tensor works
#136636 closed Sep 26, 2024
The doc of `linalg.matrix_norm()` should say that there is `input` parameter instead of `A` parameter
#136619 closed Sep 26, 2024
DISABLED test_comprehensive_ones_cpu_bool (__main__.TestInductorOpInfoCPU)
#135740 closed Sep 26, 2024
DISABLED test_comprehensive_prod_cpu_float16 (__main__.TestInductorOpInfoCPU)
#135807 closed Sep 26, 2024
DISABLED test_comprehensive_randint_like_cpu_float16 (__main__.TestInductorOpInfoCPU)
#135799 closed Sep 26, 2024
DISABLED test_multi_output_unbacked_custom_op_cuda (__main__.TestInductorDynamicCUDA)
#135755 closed Sep 26, 2024
DISABLED test_comprehensive_polygamma_polygamma_n_4_cpu_bool (__main__.TestInductorOpInfoCPU)
#135751 closed Sep 26, 2024
DISABLED test_closure_out_of_scope_cell_with_mutation (__main__.MiscTests)
#135556 closed Sep 26, 2024
DISABLED test_comprehensive_nn_functional_rms_norm_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135739 closed Sep 26, 2024
DISABLED test_comprehensive_prod_cpu_int64 (__main__.TestInductorOpInfoCPU)
#135800 closed Sep 26, 2024
DISABLED test_comprehensive_scatter_reduce_prod_cpu_bool (__main__.TestInductorOpInfoCPU)
#135752 closed Sep 26, 2024
DISABLED test_comprehensive_vdot_cpu_int64 (__main__.TestInductorOpInfoCPU)
#135748 closed Sep 26, 2024
DISABLED test_comprehensive_special_modified_bessel_i0_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135743 closed Sep 26, 2024
DISABLED test_comprehensive_where_cpu_int32 (__main__.TestInductorOpInfoCPU)
#135784 closed Sep 26, 2024
DISABLED test_comprehensive_special_scaled_modified_bessel_k0_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135801 closed Sep 26, 2024
DISABLED test_comprehensive_triu_cpu_float16 (__main__.TestInductorOpInfoCPU)
#135737 closed Sep 26, 2024
DISABLED test_comprehensive_polygamma_polygamma_n_1_cpu_bool (__main__.TestInductorOpInfoCPU)
#135812 closed Sep 26, 2024
DISABLED test_comprehensive_nn_functional_softmin_with_dtype_cpu_float16 (__main__.TestInductorOpInfoCPU)
#135813 closed Sep 26, 2024
DISABLED test_comprehensive_special_modified_bessel_i1_cpu_int64 (__main__.TestInductorOpInfoCPU)
#135814 closed Sep 26, 2024
DISABLED test_comprehensive_nn_functional_tanhshrink_cpu_float16 (__main__.TestInductorOpInfoCPU)
#135804 closed Sep 26, 2024
DISABLED test_comprehensive_special_bessel_y1_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135805 closed Sep 26, 2024
DISABLED test_comprehensive_nn_functional_softshrink_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135742 closed Sep 26, 2024
DISABLED test_comprehensive_transpose_copy_cpu_bool (__main__.TestInductorOpInfoCPU)
#135782 closed Sep 26, 2024
DISABLED test_comprehensive_nn_functional_upsample_nearest_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135802 closed Sep 26, 2024
DISABLED test_comprehensive_polygamma_polygamma_n_4_cpu_int64 (__main__.TestInductorOpInfoCPU)
#135749 closed Sep 26, 2024
DISABLED test_comprehensive_special_airy_ai_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135750 closed Sep 26, 2024
DISABLED test_comprehensive__unsafe_masked_index_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#131118 closed Sep 26, 2024
DISABLED test_comprehensive_nn_functional_binary_cross_entropy_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135806 closed Sep 26, 2024
DISABLED test_comprehensive_special_ndtr_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135754 closed Sep 26, 2024
DISABLED test_comprehensive_pow_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135815 closed Sep 26, 2024
DISABLED test_python_ref_executor__refs_stft_executor_aten_cuda_complex128 (__main__.TestCommonCUDA)
#135756 closed Sep 26, 2024
DISABLED test_comprehensive_scatter_reduce_prod_cpu_int32 (__main__.TestInductorOpInfoCPU)
#135797 closed Sep 26, 2024
DISABLED test_comprehensive_polygamma_polygamma_n_3_cpu_int64 (__main__.TestInductorOpInfoCPU)
#135745 closed Sep 26, 2024
DISABLED test_comprehensive_unravel_index_cpu_int32 (__main__.TestInductorOpInfoCPU)
#135808 closed Sep 26, 2024
DISABLED test_comprehensive_pca_lowrank_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135811 closed Sep 26, 2024
DISABLED test_comprehensive_nn_functional_avg_pool3d_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135747 closed Sep 26, 2024
DISABLED test_comprehensive_short_cpu_int64 (__main__.TestInductorOpInfoCPU)
#135803 closed Sep 26, 2024
DISABLED test_comprehensive_nn_functional_triplet_margin_with_distance_loss_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135741 closed Sep 26, 2024
[Break XPU] Device bias code introduced from #134874: create CUDA tensor on XPU device.
#136595 closed Sep 26, 2024
The doc of `Sigmoid()` says there are `*args` and `**kwargs` but they don't work
#133688 closed Sep 26, 2024
The doc of `Softsign()` says there are `*args` and `**kwargs` but they don't work
#133684 closed Sep 26, 2024
The doc of `Tanh()` says there are `*args` and `**kwargs` but they don't work
#133683 closed Sep 26, 2024
Disable Python torch.library calls under torch::deploy
#136177 closed Sep 26, 2024
`/opt/rocm/lib/libamdhip64.so` is hardcoded in `Caffe2Targets.cmake` in ROCm wheels
#131701 closed Sep 26, 2024
[Flex attention] Error in create_block_mask with _compile=True on Torch 2.6
#136306 closed Sep 26, 2024
ValueError: Pointer argument (at 3) cannot be accessed from Triton
#136078 closed Sep 25, 2024
AttributeError: module 'distutils' has no attribute '_msvccompiler'
#136541 closed Sep 25, 2024
Discrepancy between scaled_dot_product_attention and flex_attention outputs
#136651 closed Sep 25, 2024
quantize_fx module not working on x86 machine for any torch vision model
#136511 closed Sep 25, 2024
tl.constexpr inputs to user-defined triton kernels should not be dynamic
#136504 closed Sep 25, 2024
DISABLED test_b2b_gemm_left_assoc_good_shape (__main__.B2BGEMMTest)
#133233 closed Sep 25, 2024
DISABLED test_b2b_gemm_trivial_right_assoc_good_shape (__main__.B2BGEMMTest)
#134143 closed Sep 25, 2024
DISABLED test_b2b_gemm_trivial_left_assoc_good_shape (__main__.B2BGEMMTest)
#133403 closed Sep 25, 2024
DISABLED test_b2b_gemm_right_assoc_good_shape (__main__.B2BGEMMTest)
#133311 closed Sep 25, 2024
DISABLED test_save_with_without_initializer_dont_include_initializer_no_fake_mode_no_exported_program (__main__.TestFxToOnnx)
#125020 closed Sep 25, 2024
[inductor][cpu] inductor_max_autotune models accuracy failure in 2024-08-10 nightly release
#133465 closed Sep 25, 2024
DISABLED test_comprehensive_nn_functional_prelu_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135667 closed Sep 25, 2024
DISABLED test_comprehensive_trapezoid_cpu_int64 (__main__.TestInductorOpInfoCPU)
#135670 closed Sep 25, 2024
DISABLED test_comprehensive_rot90_cpu_float16 (__main__.TestInductorOpInfoCPU)
#135666 closed Sep 25, 2024
DISABLED test_comprehensive_xlogy_cpu_bool (__main__.TestInductorOpInfoCPU)
#135674 closed Sep 25, 2024
DISABLED test_comprehensive_softmax_cpu_float16 (__main__.TestInductorOpInfoCPU)
#135682 closed Sep 25, 2024
DISABLED test_comprehensive_special_hermite_polynomial_h_cpu_int32 (__main__.TestInductorOpInfoCPU)
#135677 closed Sep 25, 2024
DISABLED test_comprehensive_polygamma_polygamma_n_3_cpu_int32 (__main__.TestInductorOpInfoCPU)
#135669 closed Sep 25, 2024
DISABLED test_comprehensive_nn_functional_soft_margin_loss_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135673 closed Sep 25, 2024
DISABLED test_autograd_cpp_node_saved_dynamic (__main__.TestCompiledAutograd)
#135685 closed Sep 25, 2024
DISABLED test_comprehensive_polygamma_polygamma_n_2_cpu_bool (__main__.TestInductorOpInfoCPU)
#135684 closed Sep 25, 2024
DISABLED test_comprehensive_nn_functional_prelu_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135672 closed Sep 25, 2024
DISABLED test_comprehensive_polygamma_polygamma_n_1_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135679 closed Sep 25, 2024
DISABLED test_comprehensive_nn_functional_adaptive_max_pool2d_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135671 closed Sep 25, 2024
DISABLED test_comprehensive_norm_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135681 closed Sep 25, 2024
DISABLED test_comprehensive_nn_functional_unfold_cpu_bool (__main__.TestInductorOpInfoCPU)
#135683 closed Sep 25, 2024
DISABLED test_comprehensive_t_copy_cpu_bool (__main__.TestInductorOpInfoCPU)
#135665 closed Sep 25, 2024
DISABLED test_comprehensive_nn_functional_smooth_l1_loss_cpu_float16 (__main__.TestInductorOpInfoCPU)
#135678 closed Sep 25, 2024
DISABLED test_comprehensive_nn_functional_upsample_nearest_cpu_uint8 (__main__.TestInductorOpInfoCPU)
#135676 closed Sep 25, 2024
DISABLED test_comprehensive_signal_windows_nuttall_cpu_float32 (__main__.TestInductorOpInfoCPU)
#135675 closed Sep 25, 2024
DISABLED test_comprehensive_nn_functional_kl_div_cpu_float64 (__main__.TestInductorOpInfoCPU)
#135668 closed Sep 25, 2024
`.eval()` and `.train()` don't set value of `.training` properly on `torch.compile()` module
#132986 closed Sep 25, 2024
DISABLED test_comprehensive_zeros_cpu_float16 (__main__.TestInductorOpInfoCPU)
#135642 closed Sep 25, 2024
AdEMAMix: Adaptive Exponential Moving Average Mix Optimizer
#135609 closed Sep 25, 2024
Libtorch build for ROCM error: “aten/src/THH” not exist
#126640 closed Sep 25, 2024
[ONNX] Replace ONNXProgram class
#136274 closed Sep 24, 2024
torch.linalg.lstsq generating different solutions on CPU and GPU
#136443 closed Sep 24, 2024
Pytorch unable to compile with gcc version later than 12
#136556 closed Sep 24, 2024
[Feature Request] Calculating FLOPs for computational graph operations
#5013 closed Sep 24, 2024
[torch.export] Automate export constrains like in onnx dynamo
#136210 closed Sep 24, 2024
[torch.export] Can't load UNet after compiling ExportedProgram with torch_tensorrt.dynamo.compile and saving
#136317 closed Sep 24, 2024
Adding a link to libtorch in the ompl project will cause the definition of Boost to not be found during the link process of the original code of the project
#136517 closed Sep 24, 2024

94 Issues opened by 60 people

fused_scaled_matmul_reduce_scatter report error with channel-wise scaling
#136866 opened Sep 27, 2024
`enforce_cond_guards_match` (completely unused)
#136864 opened Sep 27, 2024
Cleanup stale Dynamo feature flags
#136862 opened Sep 27, 2024
Error when calling multiple backward passes on FSDP model
#136861 opened Sep 27, 2024
ONNX export: torch.onnx.errors.SymbolicValueError: Unsupported prim::Constant kind: 'ival'
#136860 opened Sep 27, 2024
AOTAutograd has_same_metadata call in collect_metadata_analysis.py is quadratic
#136852 opened Sep 27, 2024
INTERNAL ASSERT FAILED in `torch.cuda.current_stream/default_stream/ExternalStream/set_per_process_memory_fraction`
#136849 opened Sep 27, 2024
` RuntimeError: cannot mutate tensors with frozen storage` when attempting to export with `nn.ReLU(True)` in one model but not another?
#136846 opened Sep 27, 2024
Pytorch picks wrong cuda version for building extensions
#136845 opened Sep 27, 2024
Compiling a module leads to `AssertionError: expected size 64==64, stride 1==49 at dim=1`
#136837 opened Sep 27, 2024
Thread safety issue with torch.compile()
#136833 opened Sep 27, 2024
false INTERNAL ASSERT FAILED was triggered when torch.device is mkldnn
#136831 opened Sep 27, 2024
false INTERNAL ASSERT FAILED in `torch.jit.set_fusion_strategy`
#136829 opened Sep 27, 2024
false INTERNAL ASSERT FAILED in `torch.empty`/`torch.ones`
#136828 opened Sep 27, 2024
[Break XPU] device_props.warp_size is None on XPU.
#136820 opened Sep 27, 2024
Aborted (core dumped) in `torch.hsmm`/`torch.hspmm`/`torch.hsmm`/`torch.sspaddmm`
#136819 opened Sep 27, 2024
Segmentation fault (core dumped) in `torch.profiler.profile`
#136817 opened Sep 27, 2024
In the forward(query, key, value, key_padding_mask=None, need_weights=True, attn_mask=None, average_attn_weights=True, is_causal=False) method of MultiheadAttention, there is an issue with the key_padding_mask parameter.||
#136816 opened Sep 27, 2024
Aborted (core dumped) in `torch.cuda.caching_allocator_delete`
#136815 opened Sep 27, 2024
Dynamo inlining errors with some calls to nested functions that use captured variables
#136814 opened Sep 27, 2024
`CUDNN_BACKEND_OPERATION: cudnnFinalize Failed cudnn_status: CUDNN_STATUS_BAD_PARAM` when using `float16`, works for `bfloat16` / `float32`
#136812 opened Sep 27, 2024
[rfc] [pipelining] shape inference + cached buffer allocation
#136811 opened Sep 27, 2024
ValueRange division breaks with pow_by_natural
#136797 opened Sep 26, 2024
compilation of rrelu_with_noise with bfloat16 input does not capture noise mutation
#136784 opened Sep 26, 2024
torch.is_grad_enabled() is False when using custom_op decorator
#136771 opened Sep 26, 2024
Pipelining zero bubble and activation checkpointing bug
#136766 opened Sep 26, 2024
PyTorch_ROCm use CPU rather than GPU
#136758 opened Sep 26, 2024
torch.export.export fails to trace through a binary operator
#136757 opened Sep 26, 2024
Onnx scaled_dot_product_attention does not allow to export model
#136756 opened Sep 26, 2024
Some virtual functions in `AcceleratorHooksInterface` are not overrided
#136751 opened Sep 26, 2024
Poor-quality random numbers generated by torch.poisson on gpus
#136750 opened Sep 26, 2024
Inconsistent behavior of cdist with half-precision inputs
#136748 opened Sep 26, 2024
Provide `gather_mm` functionality and/or expand nested tensor support
#136747 opened Sep 26, 2024
torch._int_mm accuracy issue on AMD CPU
#136746 opened Sep 26, 2024
Onnx exporting bug
#136737 opened Sep 26, 2024
torch.unravel_index does not check out-of-bounds
#136736 opened Sep 26, 2024
Be smart about autograd formulas saving either the input or output, depending on context
#136733 opened Sep 26, 2024
Aborted (core dumped) in `torch.package.package_exporter.PackageExporter`/`torch.package.PackageExporter`
#136728 opened Sep 26, 2024
Aborted (core dumped) in `torch.distributed.rpc`
#136726 opened Sep 26, 2024
Segmentation fault (core dumped) in `torch.distributed.dist.TCPStore`
#136725 opened Sep 26, 2024
Segmentation fault (core dumped) in `torch.distributed.PrefixStore`
#136723 opened Sep 26, 2024
Floating point exception (core dumped) in `torch.ao.nn.quantized.ConvTranspose1d\ConvTranspose2d\ConvTranspose3d` when stride=0
#136722 opened Sep 26, 2024
Segmentation fault (core dumped) in `torch.bincount`
#136720 opened Sep 26, 2024
Segmentation fault (core dumped) in `torch.nn.functional.max_pool1d`
#136719 opened Sep 26, 2024
Floating point exception (core dumped) in `torch.ao.nn.quantized.Conv1d/Conv2d/Conv3d` when stride=0
#136718 opened Sep 26, 2024
Floating point exception (core dumped) in `torch.ao.nn.intrinsic.quantized.ConvReLU1d/ConvReLU2d/ConvReLU3d` when stride=0
#136717 opened Sep 26, 2024
Segmentation fault (core dumped) in `torch.nn.functional.max_pool3d`/`torch.quantized_max_pool3d` when dilation is negative
#136716 opened Sep 26, 2024
Aborted (core dumped) in `torch.linalg.ldl_solve` with double free or corruption (out)
#136714 opened Sep 26, 2024
Segmentation fault (core dumped) in `torch.ao.nn.quantized.dynamic.LSTMCell/GRUCell`
#136712 opened Sep 26, 2024
Segmentation fault (core dumped) in `torch._weight_norm`/`torch._weight_int8pack_mm`
#136709 opened Sep 26, 2024
Segmentation fault (core dumped) in `torch._fft_r2c`/`torch._fft_c2`/`torch._fft_c2r`
#136704 opened Sep 26, 2024
false INTERNAL ASSERT FAILED in `torch._add_batch_dim`
#136699 opened Sep 26, 2024
Incorrect Type for `devices` Parameter in `benchmark_utilization` Function
#136697 opened Sep 26, 2024
`slow` workflow has been broken for 4+ weeks
#136694 opened Sep 25, 2024
`windows.g4dn.xlarge` are periodically unavailable
#136693 opened Sep 25, 2024
NotImplementedError: The operator 'aten::linalg_matrix_exp' is not currently implemented for the MPS device.
#136692 opened Sep 25, 2024
Segfaulting/aborting unit tests do not show up in "Show Additional Test Info" section
#136691 opened Sep 25, 2024
DISABLED test_pinned_memory_empty_cache (__main__.TestCuda)
#136687 opened Sep 25, 2024
[triton_op] Automatically `tl.constexpr` user-written kernel params when they are static integers
#136681 opened Sep 25, 2024
torch.nn.InstanceNorm3d producing inconsistent output for float16 tensors on CPU and GPU
#136680 opened Sep 25, 2024
[BUG] torch/extension.h: undefined symbol
#136664 opened Sep 25, 2024
Composition of torch.compile and torch.func.grad silently produces a wrong result.
#136662 opened Sep 25, 2024
[NJT] Dropout(0.0) with NJT increments cuda rng_state (only for no-compile)
#136656 opened Sep 25, 2024
[TorchScript] typing_extensions.deprecated doesn't work
#136654 opened Sep 25, 2024
torch.compile HUD dashboard should have repro commands
#136647 opened Sep 25, 2024
Add support for immutable tensors in torch.export
#136642 opened Sep 25, 2024
inductor can't broadcast tensors when they have dynamic shapes:
#136640 opened Sep 25, 2024
[inductor][cpu]jx_nest_base fp32 inductor_max_autotune accuracy failure in 2024_09_23 nightly release
#136639 opened Sep 25, 2024
DISABLED test_graph_optims_RMSprop_cuda_float32 (__main__.TestCudaOptimsCUDA)
#136638 opened Sep 25, 2024
`all_gather_object` fails
#136637 opened Sep 25, 2024
torch.unique does not keep order of occurences even with "sorted=False"
#136633 opened Sep 25, 2024
[torch.export.load] failed while executing `pow_by_natural`
#136628 opened Sep 25, 2024
The MPS Backend sometimes samples outside of distribution domain with `multinomial`
#136623 opened Sep 25, 2024
torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::linalg_inv' to ONNX opset version 18 is not supported
#136622 opened Sep 25, 2024
A `complex` tensor with `linalg.matrix_norm()` returns a `float` tensor is returned even if I set `dtype=torch.complex64`
#136621 opened Sep 25, 2024
DISABLED test_graph_optims_RAdam_cuda_float32 (__main__.TestCudaOptimsCUDA)
#136620 opened Sep 25, 2024
[inductor][cpu]pyhpc_isoneutral_mixing singel thread AMP cpp wrapper performance regression in 2024-09-22 nightly release
#136618 opened Sep 25, 2024
tensor.triu_(1) not working properly with large matrix
#136611 opened Sep 25, 2024
automatic_dynamic_shapes for mark_unbacked
#136605 opened Sep 25, 2024
DISABLED test_graph_optims_NAdam_cuda_float32 (__main__.TestCudaOptimsCUDA)
#136602 opened Sep 25, 2024
[ONNX] Single model export for HF models prompt and token phase
#136592 opened Sep 25, 2024
DistributedSampler shuffle option doesn't work as expected
#136588 opened Sep 24, 2024
"Python Replay Stack is Empty"
#136587 opened Sep 24, 2024
support FakeTensor input for torch.compile
#136586 opened Sep 24, 2024
torch.export support for the latest transformers `DynamicCache` as input
#136582 opened Sep 24, 2024
Bool convolutions (and other integral types)
#136578 opened Sep 24, 2024
DISABLED test_graph_optims_Adamax_cuda_float32 (__main__.TestCudaOptimsCUDA)
#136576 opened Sep 24, 2024
[ONNX] Dynamic shapes: support `torch.sym_not`
#136572 opened Sep 24, 2024
Setting a `complex` tensor to `linalg.vector_norm()` returns a `float` tensor
#136568 opened Sep 24, 2024
[RFC] Offload collectives to NVSwitch when possible
#136567 opened Sep 24, 2024
The doc of `linalg.vector_norm()` should not say `ord` parameter accepts the `str` value `fro` or `nuc`
#136563 opened Sep 24, 2024
The doc of `linalg.vector_norm()` should say there is `x` or `input` parameter
#136560 opened Sep 24, 2024
torch.compile errors when tracing numpy.random.uniform with numpy2
#136559 opened Sep 24, 2024
The doc of `linalg.norm()` should say there is `input` parameter instead of `A` parameter for `linalg.norm()`
#136555 opened Sep 24, 2024

310 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Add `truediv` support in export serializer
#136364 commented on Sep 27, 2024 • 13 new comments
Implements user buffer registration using MemPool
#133603 commented on Sep 27, 2024 • 10 new comments
[scan] support jit inductor
#135603 commented on Sep 27, 2024 • 10 new comments
Fix AOTI CPP GEMM Template issue without freezing
#136421 commented on Sep 27, 2024 • 10 new comments
Enable Windows Arm64
#133088 commented on Sep 27, 2024 • 9 new comments
[inductor] Add a OperatorBench to benchmark custom operations
#136169 commented on Sep 27, 2024 • 8 new comments
[inductor] Support freezing with FX graph caching
#136505 commented on Sep 27, 2024 • 8 new comments
Add SVE implementation of embedding_lookup_idx
#133995 commented on Sep 27, 2024 • 7 new comments
[dynamo] add torch._dynamo.enable and fix compile/enable/disable interaction
#132926 commented on Sep 27, 2024 • 7 new comments
Create copies for all replica `dict` attributes (namely hook-tracking dicts) in `_replicate_for_data_parallel`
#128272 commented on Sep 26, 2024 • 7 new comments
raw_alloc ignores PYTORCH_NO_CUDA_MEMORY_CACHING
#131114 commented on Sep 27, 2024 • 6 new comments
[sparse][semi-structured] Add float8 dtype support to 24 sparsity
#136397 commented on Sep 27, 2024 • 6 new comments
Make IPC features extendable on third-party devices
#133222 commented on Sep 27, 2024 • 6 new comments
Enable failing diffs on regression
#136551 commented on Sep 27, 2024 • 6 new comments
Add lowering for aten.searchsorted
#135701 commented on Sep 27, 2024 • 6 new comments
[Inductor] Enable Cpp wraper for Intel GPU.
#135318 commented on Sep 27, 2024 • 6 new comments
Ensure noncontiguous tensor creation tests offsetting
#136396 commented on Sep 27, 2024 • 5 new comments
Fix AOT Graph capture not propagating non_blocking copy parameter to …
#136513 commented on Sep 27, 2024 • 5 new comments
Fix tensor subclass + dynamic shapes in torch.compile + aot autograd
#125941 commented on Sep 27, 2024 • 5 new comments
Fix PT2 Source Code Annotations
#136460 commented on Sep 27, 2024 • 5 new comments
[aotd] Subclasses profile logging
#136478 commented on Sep 27, 2024 • 5 new comments
Introduce torch.sym_sum
#136429 commented on Sep 26, 2024 • 4 new comments
Allow async ops for all gather with gather dim != 0
#136428 commented on Sep 27, 2024 • 4 new comments
[ROCm] fastSpecializedAtomicAdd for MI300
#135770 commented on Sep 27, 2024 • 4 new comments
Simplify find_localzeros
#133325 commented on Sep 27, 2024 • 4 new comments
Lowerings: remove restriction on TensorBox keyword arguments
#136055 commented on Sep 27, 2024 • 4 new comments
Add support for `@contextmanager` in Dynamo
#136033 commented on Sep 27, 2024 • 3 new comments
Add UTs for accelerator device-agnostic runtime APIs
#133572 commented on Sep 25, 2024 • 3 new comments
Introduce a device-agnostic runtime API design
#132204 commented on Sep 25, 2024 • 3 new comments
Enable XPUEvent elapsed_time function
#134666 commented on Sep 27, 2024 • 3 new comments
Improve decomposition for constant_pad_nd
#123661 commented on Sep 25, 2024 • 3 new comments
[inductor] refine loop split logic
#128812 commented on Sep 26, 2024 • 3 new comments
Add CI for Triton CPU backend
#135342 commented on Sep 27, 2024 • 2 new comments
[ROCm] Tunableop record untuned
#128813 commented on Sep 27, 2024 • 2 new comments
[Inductor] Pick ISA for inductor based on ATEN_CPU_CAPABILITY
#123514 commented on Sep 27, 2024 • 2 new comments
Remove unused Python variables outside torch/ and test/
#136359 commented on Sep 25, 2024 • 2 new comments
[ARM][feat]: Add KleidiAI Backend & enable 4 bit matmul operators
#134124 commented on Sep 27, 2024 • 2 new comments
[RELEASE ONLY CHANGES] Revert XNNPACK Update
#136522 commented on Sep 27, 2024 • 2 new comments
Add deterministic path for CUDA `cumsum`
#136224 commented on Sep 27, 2024 • 2 new comments
Enable additional tests for MPS CI runs
#134356 commented on Sep 26, 2024 • 2 new comments
[1/N] Fix clang-tidy warnings in torch/csrc/api/
#134545 commented on Sep 27, 2024 • 2 new comments
multiprocessing.spawn: allow a grace period when shutdown
#131278 commented on Sep 27, 2024 • 1 new comment
[wip][compiled autograd] Lifted C++ lambdas
#135402 commented on Sep 27, 2024 • 1 new comment
Error message for allow_in_graph decorator and arbitrary function combo
#135972 commented on Sep 27, 2024 • 1 new comment
fix sampler - force cpu device for .tolist tensors
#135990 commented on Sep 25, 2024 • 1 new comment
Extend vectorization with SVE(ARM) with Torch Compile (Inductor)
#134672 commented on Sep 25, 2024 • 1 new comment
Enable -Werror on s390x
#136527 commented on Sep 27, 2024 • 1 new comment
Fix adaptive_max_pool2d fallback
#136367 commented on Sep 24, 2024 • 1 new comment
Add doc for device-agnostic runtime APIs
#133323 commented on Sep 25, 2024 • 1 new comment
[WIP] add support for bias grads in flexattention inductor
#136077 commented on Sep 27, 2024 • 1 new comment
Add Support for Tracking Parameter Names (named_parameters) in Optimizer State Dict
#134107 commented on Sep 25, 2024 • 1 new comment
fix sequence number for group
#134578 commented on Sep 25, 2024 • 1 new comment
[ONNX] Remove deprecated OperatorExportTypes and ExportTypes
#136277 commented on Sep 27, 2024 • 1 new comment
[scan] support closure
#135602 commented on Sep 27, 2024 • 1 new comment
Properly uses ref-counting for torch.cuda.use_mem_pool
#133600 commented on Sep 27, 2024 • 0 new comments
Adds snapshot API for MemPools to get pool memory segments
#133601 commented on Sep 27, 2024 • 0 new comments
Refactors empty_cache to return only MemPool memory to the system
#133602 commented on Sep 27, 2024 • 0 new comments
Add decomposition for squeeze_copy
#130941 commented on Sep 26, 2024 • 0 new comments
Reuse UT for Intel GPU backend [Part1]
#127602 commented on Sep 27, 2024 • 0 new comments
Enable Bert with Semi Structure Sparsity on ROCm
#133934 commented on Sep 27, 2024 • 0 new comments
test_execution_trace.py: Use instantiate_device_type_tests to run GPU tests on HPU as well
#133975 commented on Sep 27, 2024 • 0 new comments
[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor
#127294 commented on Sep 25, 2024 • 0 new comments
[inductor] enable bf32 test for mkldnn conv
#127293 commented on Sep 25, 2024 • 0 new comments
Fix lru_cache where config is used
#134235 commented on Sep 25, 2024 • 0 new comments
INT8 SDPA API
#134317 commented on Sep 26, 2024 • 0 new comments
Fix unbind_copy and add its decomposition
#134319 commented on Sep 26, 2024 • 0 new comments
WIP - Prologue Fusion
#134532 commented on Sep 27, 2024 • 0 new comments
Disable AMP when propagating fake tensors
#134583 commented on Sep 27, 2024 • 0 new comments
Remove unused Python variables in test/
#134665 commented on Sep 25, 2024 • 0 new comments
DISABLED test_grad_scaler_with_preset_grad_scale_in_place_unscale_False_Adam_cuda_float32 (__main__.TestCudaOptimsCUDA)
#135721 commented on Sep 26, 2024 • 0 new comments
Fix constant propagation in builtins and UserClasses
#131354 commented on Sep 26, 2024 • 0 new comments
[triton ci] Allow building with triton hash
#131371 commented on Sep 27, 2024 • 0 new comments
[BE] typing for decorators - _jit_internal
#131573 commented on Sep 26, 2024 • 0 new comments
torch.fx.Tracer.record_stack_traces fix
#131741 commented on Sep 27, 2024 • 0 new comments
Generalization of distributed UT content to enable non cuda device execution
#131758 commented on Sep 27, 2024 • 0 new comments
Danielmic decouple some api
#131882 commented on Sep 27, 2024 • 0 new comments
Add support for more dtypes for serialization
#131939 commented on Sep 27, 2024 • 0 new comments
Tentative fix for fake tensor SymInt
#131943 commented on Sep 25, 2024 • 0 new comments
Fix bmm_sparse_cuda illegal memory access
#131977 commented on Sep 27, 2024 • 0 new comments
Delete cmake/Modules_CUDA_fix directory
#132035 commented on Sep 27, 2024 • 0 new comments
Add Weighted Loss Functions to PyTorch : WMSE, WMAE, and Weighted Huber Loss
#132049 commented on Sep 26, 2024 • 0 new comments
Refactor serialization IO infrastructure and support HDFS/HTTP
#130913 commented on Sep 27, 2024 • 0 new comments
use spawn as default start method to create dataloader subprocess
#132210 commented on Sep 25, 2024 • 0 new comments
xpu: support sycl with torch.utils.cpp_extension APIs
#132945 commented on Sep 24, 2024 • 0 new comments
S390x update builder image
#132983 commented on Sep 27, 2024 • 0 new comments
[Intel GPU] qconv at XPU backend
#133080 commented on Sep 26, 2024 • 0 new comments
support zb1p and zb2p algorithms
#130752 commented on Sep 27, 2024 • 0 new comments
Inductor annotations
#130429 commented on Sep 26, 2024 • 0 new comments
[inductor][cpp] Add BMM kernel template for autotuning
#129772 commented on Sep 27, 2024 • 0 new comments
[Intel GPU] qlinear at XPU backend
#133307 commented on Sep 26, 2024 • 0 new comments
Add Triton CPU as an Inductor backend
#133408 commented on Sep 27, 2024 • 0 new comments
[c10d]add coalesce support for device types other than cuda
#133429 commented on Sep 25, 2024 • 0 new comments
Remove unused variables in torch/
#133492 commented on Sep 26, 2024 • 0 new comments
Removed std namespace from log function calls
#133565 commented on Sep 25, 2024 • 0 new comments
Make dot and vdot structured ops (#64)
#134671 commented on Sep 26, 2024 • 0 new comments
Replace vmap custom ctx manager by one annotated with `@contextmanager`
#136053 commented on Sep 27, 2024 • 0 new comments
Unify cpp_extension build directory removal
#136059 commented on Sep 27, 2024 • 0 new comments
[WIP][Inductor UT] Generalize newly introduced inductor UTs for intel GPU (Part 1)
#136069 commented on Sep 27, 2024 • 0 new comments
[prototype] Invoke subgraph higher order op
#136171 commented on Sep 27, 2024 • 0 new comments
[Dynamo][autograd.Function] Use fake tensor prop to infer fwd output
#136184 commented on Sep 27, 2024 • 0 new comments
[inductor] Pass `device_type` argument to `do_bench`
#136189 commented on Sep 25, 2024 • 0 new comments
Add determinmistic kernel for reflection2d
#136241 commented on Sep 24, 2024 • 0 new comments
Remove _preserve_ops from export
#136247 commented on Sep 27, 2024 • 0 new comments
Set output num_float_feature to have dynamic dimension
#136268 commented on Sep 27, 2024 • 0 new comments
[QAT] Make Fused modules torchscriptable
#136285 commented on Sep 27, 2024 • 0 new comments
Introduce _ArglessActivation base class for parameterless activation functions
#136296 commented on Sep 26, 2024 • 0 new comments
Fix parameter names in docstrings
#136297 commented on Sep 24, 2024 • 0 new comments
Add int1 to int7 dtypes
#136301 commented on Sep 27, 2024 • 0 new comments
Pass rounding_mode for div reference inputs through kwargs
#136308 commented on Sep 25, 2024 • 0 new comments
[dynamo] Replace __str__ with __repr__ in some places
#136316 commented on Sep 24, 2024 • 0 new comments
SuperResolution Adaround experiment
#136328 commented on Sep 26, 2024 • 0 new comments
[PyTorch] Port ExecuTorch bfdot improvement back to ATen BlasKernel
#136331 commented on Sep 25, 2024 • 0 new comments
Add a new distributed backend (XCCL) for Intel GPUs
#136343 commented on Sep 26, 2024 • 0 new comments
[test] add types to composite_compliance.py
#136385 commented on Sep 26, 2024 • 0 new comments
Backward pass ac
#136431 commented on Sep 24, 2024 • 0 new comments
Increase update_hint_regression problem size to 1000
#136434 commented on Sep 24, 2024 • 0 new comments
Fix to() method on sparse tensors.
#136435 commented on Sep 26, 2024 • 0 new comments
init
#136475 commented on Sep 27, 2024 • 0 new comments
[WIP] Add py3.13t wheel
#136490 commented on Sep 27, 2024 • 0 new comments
Limit the option value of TORCH_SHOW_DISPATCH_TRACE
#136510 commented on Sep 27, 2024 • 0 new comments
Make Context to be Device-agnostic Step by Step (1/N)
#136519 commented on Sep 27, 2024 • 0 new comments
Make Context to be Device-agnostic Step by Step (2/N)
#136526 commented on Sep 27, 2024 • 0 new comments
[AOTI] Refactor call chain of generate_kernel_call
#136531 commented on Sep 24, 2024 • 0 new comments
[AOTI] Turn on the ABI-compatible mode as default
#136534 commented on Sep 27, 2024 • 0 new comments
[mha] Disable native_mha(fast_path) in dynamo compilation
#136542 commented on Sep 24, 2024 • 0 new comments
Add missing mappings to support torch.uint16 in quantization and export
#136547 commented on Sep 27, 2024 • 0 new comments
[not for commit] Benchmark Triton CPU backend
#134725 commented on Sep 26, 2024 • 0 new comments
xpu: support SyclExtension class APIs
#134735 commented on Sep 24, 2024 • 0 new comments
Make device-specific event inherits from torch.Event
#134845 commented on Sep 27, 2024 • 0 new comments
Use torch.Stream&torch.Event for Dynamo capature
#134850 commented on Sep 27, 2024 • 0 new comments
Improvements for associative_scan - lifted_args for combine_mode='generic'
#134921 commented on Sep 26, 2024 • 0 new comments
update CMAKE_PREFIX_PATH setting command
#134934 commented on Sep 26, 2024 • 0 new comments
[c10d] fix sequence numbers for coalesced operations
#135132 commented on Sep 25, 2024 • 0 new comments
[Inductor][Precompile cache] Lookup cache before calling precompile inside the precompiling future
#135166 commented on Sep 25, 2024 • 0 new comments
[Intel GPU] qconv_pointwise.binary XPU support
#135189 commented on Sep 26, 2024 • 0 new comments
[merge rules] Add ONNX team to docs/source/conf.py
#135228 commented on Sep 27, 2024 • 0 new comments
Tests Generelization for multiple accelerator devices
#135242 commented on Sep 27, 2024 • 0 new comments
[executorch hash update] update the pinned executorch hash
#135287 commented on Sep 27, 2024 • 0 new comments
[Inductor] Rename test_cuda_cpp_wrapper.py to test_gpu_cpp_wrapper.py,
#135320 commented on Sep 27, 2024 • 0 new comments
[Intel GPU] qlinear_pointwise.binary[_tensor] XPU support
#135337 commented on Sep 26, 2024 • 0 new comments
add supports_coalescing property in c10d::Backend to determine whether backend supports coalescing
#135338 commented on Sep 26, 2024 • 0 new comments
Torchbench nightly MPS runs
#135386 commented on Sep 27, 2024 • 0 new comments
Don't uselessly recompute axiom dict every static eval call
#135429 commented on Sep 27, 2024 • 0 new comments
[Intel GPU] qconv.pointwise with mixed dtype XPU support
#135465 commented on Sep 26, 2024 • 0 new comments
Add BFloat16 support for BRGEMM flash attention forward kernel
#135473 commented on Sep 27, 2024 • 0 new comments
Download pre-compiled AOTriton from GitHub unless AOTRITON_INSTALL_FROM_SOURCE=1 is set
#135560 commented on Sep 24, 2024 • 0 new comments
Fix tensor.data_ptr() representation overflow
#135567 commented on Sep 25, 2024 • 0 new comments
[scan] fix typo in signature and remove wrapper
#135600 commented on Sep 27, 2024 • 0 new comments
[scan] flatten subgraph output and make subgraph inputs to be a slice
#135601 commented on Sep 27, 2024 • 0 new comments
[ROCm][AOTI] add CK backend
#135641 commented on Sep 25, 2024 • 0 new comments
Use a custom Symbol class for performance
#135651 commented on Sep 25, 2024 • 0 new comments
[compiled autograd] log placeholder origin in verbose
#135663 commented on Sep 26, 2024 • 0 new comments
[SDPA] Bump `grad_query` fudge factor for Flash Attention
#135711 commented on Sep 27, 2024 • 0 new comments
Migrate to training ir in quantization_pt2e_qat unittests
#135769 commented on Sep 27, 2024 • 0 new comments
[ROCm] Update to AOTriton 0.7b (Cherry-picked)
#135869 commented on Sep 27, 2024 • 0 new comments
[FlexAttention] Remove restriction on QK headdim > V headdim
#135884 commented on Sep 26, 2024 • 0 new comments
[aoti] Add warning to ask users to switch to new API
#135893 commented on Sep 27, 2024 • 0 new comments
Failure of iOS Build Test: Build (default, 1, 1, macos-14-xlarge, SIMULATOR, arm64)
#136284 commented on Sep 25, 2024 • 0 new comments
DISABLED test_closure_recompiles (__main__.MiscTests)
#135687 commented on Sep 25, 2024 • 0 new comments
Get the error: AttributeError: Can't pickle local object 'convert_frame.<locals>._convert_frame'
#93470 commented on Sep 25, 2024 • 0 new comments
dataclasses.replace not supported by dynamo
#136481 commented on Sep 25, 2024 • 0 new comments
Inductor handling of large (13K+) nodes graph resulted in nccl timeout (10mins)
#136447 commented on Sep 25, 2024 • 0 new comments
Allow Inductor to Compose with FakeTensorMode to Estimate Memory Usage
#136446 commented on Sep 25, 2024 • 0 new comments
Make it possible to run pr_time_benchmarks without explicitly specifying PYTHONPATH
#136430 commented on Sep 25, 2024 • 0 new comments
[Flex attention] RuntimeError with vmap when using torch.compile in create_mask
#136427 commented on Sep 25, 2024 • 0 new comments
[Tracker] Move nested tensors to beta
#112398 commented on Sep 25, 2024 • 0 new comments
torch.compiled custom Triton kernels can output incorrect results
#136550 commented on Sep 25, 2024 • 0 new comments
nn.CosineSimilarity returns value larger than 1
#78064 commented on Sep 25, 2024 • 0 new comments
Bug Report: Distributed Process Group Hangs with NCCL and GLOO Backends
#132003 commented on Sep 25, 2024 • 0 new comments
Don't create caffe2::pthreadpool() with getDefaultNumThreads()-many threads in set_num_threads(1)
#134714 commented on Sep 25, 2024 • 0 new comments
Issues compiling `torch` with `mkl`
#133823 commented on Sep 25, 2024 • 0 new comments
Noisy warning - torch.fx.experimental.symbolic_shapes: [WARNING] Ignored guard (...), this could result in accuracy problems
#101265 commented on Sep 25, 2024 • 0 new comments
Any plans for a "torch.minmax" (min-max normalization) function?
#128785 commented on Sep 25, 2024 • 0 new comments
[torch.export] `torch._export.serde.serialize.SerializeError: Serializing <built-in function truediv> is not supported`
#136113 commented on Sep 25, 2024 • 0 new comments
RuntimeError: NVML_SUCCESS == DriverAPI::get()->nvmlInit_v2_() INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":813, please report a bug to PyTorch.
#130486 commented on Sep 25, 2024 • 0 new comments
dynamo (re)compilation issues: shape (1,1), nn.Parameter, mark_dynamic
#135011 commented on Sep 25, 2024 • 0 new comments
Microsoft Visual C++ Redistributable is not installed, this may lead to the DLL load failure.
#126507 commented on Sep 26, 2024 • 0 new comments
RuntimeError: Unrecognized CachingAllocator option: expandable_segments
#123505 commented on Sep 26, 2024 • 0 new comments
Confusing error message for DataLoader with num_workers=0 and non-zero timeout
#106634 commented on Sep 26, 2024 • 0 new comments
`to()` Method Does Not Move Internal Components of Sparse Tensors
#136258 commented on Sep 26, 2024 • 0 new comments
DISABLED test_autograd_function_backed_op (__main__.TestCustomOp)
#132115 commented on Sep 26, 2024 • 0 new comments
[dynamo] Format string with __class__
#118675 commented on Sep 26, 2024 • 0 new comments
[MPS] MPSNDArray error: product of dimension sizes > 2**32
#134177 commented on Sep 26, 2024 • 0 new comments
[compiled autograd][cudagraphs] TLS is gc'd between unit test cudagraph runs
#126934 commented on Sep 26, 2024 • 0 new comments
☂️ 150+ MacOS tests were marked flaky recently
#135885 commented on Sep 26, 2024 • 0 new comments
[MPS] Inconsistent performance issues
#136003 commented on Sep 26, 2024 • 0 new comments
Real tensor prop for bool cast on x.eq().any() call fails export
#135630 commented on Sep 26, 2024 • 0 new comments
[dynamo] enable TorchDispatchMode for eager part when graph breaks
#136495 commented on Sep 26, 2024 • 0 new comments
DISABLED test_grad_scaler_with_preset_grad_scale_in_place_unscale_False_AdamW_cuda_float32 (__main__.TestCudaOptimsCUDA)
#135692 commented on Sep 26, 2024 • 0 new comments
Significant Accuracy Difference between Compiled and Eager Flex Attention
#135161 commented on Sep 25, 2024 • 0 new comments
Improve inductor codegen for writing out tensor and tensor.t() in the same kernel
#133242 commented on Sep 25, 2024 • 0 new comments
linux-aarch64 CI tests are being timed out resulting in test failures
#136192 commented on Sep 25, 2024 • 0 new comments
Improve the Inductor generated kernel for the pattern of `output1 = pointwise(intput); output2 = transpose(output1)`
#130015 commented on Sep 25, 2024 • 0 new comments
Can't run Flex-Attention on CPU - NoValidChoicesError during autotuneSelectAlgorithm
#136525 commented on Sep 25, 2024 • 0 new comments
AOTDispatcher debug mode
#136272 commented on Sep 25, 2024 • 0 new comments
DISABLED test_pattern_matcher_multi_user_cpu (__main__.CpuTests)
#135296 commented on Sep 24, 2024 • 0 new comments
Wrong results when sampling from the beta distribution for small alpha=beta
#136532 commented on Sep 24, 2024 • 0 new comments
dynamo creates unnecessary buffers
#124653 commented on Sep 24, 2024 • 0 new comments
[triton x pt2] Dynamo should trace data_ptr accesses
#136271 commented on Sep 24, 2024 • 0 new comments
DISABLED test_angle_cpu (__main__.CpuTritonTests)
#136124 commented on Sep 24, 2024 • 0 new comments
[export] Export CIA preservation doesn't work well with prim decomposition as custom decomp.
#136050 commented on Sep 24, 2024 • 0 new comments
Use Incremental Fake Tensor Updater more uniformly across torch.compile compilation
#120116 commented on Sep 24, 2024 • 0 new comments
ImportError: libcudnn.so.8: cannot open shared object file: No such file or directory
#104259 commented on Sep 24, 2024 • 0 new comments
The difference between input grad computed by channels last backward and the input grad computed by channels first backward of Hardswish on MPS is too large
#107214 commented on Sep 24, 2024 • 0 new comments
DISABLED test_roi_align_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#103156 commented on Sep 24, 2024 • 0 new comments
profiler for PT2 can give wrong compilation frame ID
#136235 commented on Sep 24, 2024 • 0 new comments
Allow partial fallback when frame recompiles failed or exceed the cache size limit
#135458 commented on Sep 24, 2024 • 0 new comments
torch.compile 100x slower than eager mode for torch.cumprod backward pass
#136263 commented on Sep 24, 2024 • 0 new comments
[ONNX] Handle autocast HOP
#136545 commented on Sep 24, 2024 • 0 new comments
Dynamo inlining should compile partial subgraphs / improve graph break with recursive calls
#111003 commented on Sep 24, 2024 • 0 new comments
torch.library.custom_op with mutated inputs can have silent incorrectness under torch.compile
#130487 commented on Sep 24, 2024 • 0 new comments
capture_triton gets skipped by dynamo
#136056 commented on Sep 24, 2024 • 0 new comments
Symmetric memory's rendezvous throws an error
#136494 commented on Sep 24, 2024 • 0 new comments
[Inductor][FP8] AOTInductor appears to ignore a transpose in `test_fp8_view_of_param_non_abi_compatible_cuda`
#136209 commented on Sep 24, 2024 • 0 new comments
Investigate torch.compile Windows support.
#122094 commented on Sep 25, 2024 • 0 new comments
Errors with torch.compile after upgrading to 2.4.0
#133571 commented on Sep 25, 2024 • 0 new comments
distributed.scatter memory leak in source rank
#104174 commented on Sep 25, 2024 • 0 new comments
DISABLED test_gelu_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#135222 commented on Sep 25, 2024 • 0 new comments
forward AD implimentation : _scaled_dot_product_efficient_attention
#98164 commented on Sep 25, 2024 • 0 new comments
Stochastic rounding in bfloat16
#120376 commented on Sep 25, 2024 • 0 new comments
DISABLED test_metadata_consistency_check (__main__.DTensorMeshTest)
#131598 commented on Sep 25, 2024 • 0 new comments
[PTD BE DAY]Burn Down Distributed Disabled Tests!!
#132845 commented on Sep 25, 2024 • 0 new comments
DISABLED test_vdd_clamp_cpu (__main__.CpuTests)
#135328 commented on Sep 25, 2024 • 0 new comments
torch._dynamo.exc.Unsupported: Unexpected type in sourceless builder torch.Tensor when running Mamba models in vLLM
#136497 commented on Sep 25, 2024 • 0 new comments
torch._dynamo.exc.Unsupported: 'immutable_list' object does not support mutation when running MiniCPM-Llama model in vLLM
#136499 commented on Sep 25, 2024 • 0 new comments
torch._dynamo.exc.Unsupported: ObservedKeyError exception running Gguf llama model in vLLM
#136502 commented on Sep 25, 2024 • 0 new comments
torch.compile error
#113537 commented on Sep 25, 2024 • 0 new comments
Add Test for Inductor and Dynamo Config BC Breakages
#133040 commented on Sep 25, 2024 • 0 new comments
Make TRITON_INTERPRET=1 work with inductor generated kernels
#123956 commented on Sep 24, 2024 • 0 new comments
input buffer sometimes incorrectly marked as inplace when it's not in Inductor
#120217 commented on Sep 24, 2024 • 0 new comments
Dynamo should prune non-live captured variables
#127350 commented on Sep 24, 2024 • 0 new comments
[feature request] Varlen indexing function for lookup and concat of varlen BPE tokens from a tensor vocab (i.e. `detokenize(...)` and arrays of strings)
#135704 commented on Sep 24, 2024 • 0 new comments
DISABLED test_graph_optims_Adam_cuda_float32 (__main__.TestCudaOptimsCUDA)
#136537 commented on Sep 24, 2024 • 0 new comments
[inductor] online softmax
#127011 commented on Sep 27, 2024 • 0 new comments
CUDA deps cannot be preloaded under Bazel
#117350 commented on Sep 27, 2024 • 0 new comments
Dynamo is not thread safe
#118260 commented on Sep 27, 2024 • 0 new comments
Periodic ROCM distribtued jobs are broken
#91630 commented on Sep 27, 2024 • 0 new comments
Potential memory leak in Adam optimizer in AMD chips (CPU)
#75949 commented on Sep 27, 2024 • 0 new comments
[channels_last] Segmentation fault with aten.convolution
#136348 commented on Sep 27, 2024 • 0 new comments
[CTA] Let's Stamp Out Flaky Tests!
#74590 commented on Sep 27, 2024 • 0 new comments
DISABLED test_grad_scaler_with_preset_grad_scale_in_place_unscale_True_SGD_cuda_float32 (__main__.TestCudaOptimsCUDA)
#135925 commented on Sep 27, 2024 • 0 new comments
GPU Vendor-Agnosticism via Vulkan
#47996 commented on Sep 27, 2024 • 0 new comments
Pytorch + ROCm+ Windows
#106529 commented on Sep 27, 2024 • 0 new comments
[FSDP2 Related]`torch.split_with_sizes_copy` of the GPU does not update the version counter of `out` correctly.
#132014 commented on Sep 27, 2024 • 0 new comments
Label tracking meta-issue (edit me to get automatically CC'ed on issues! cc bot)
#24422 commented on Sep 27, 2024 • 0 new comments
Performance regression in torch.compile
#136254 commented on Sep 27, 2024 • 0 new comments
DISABLED test_grad_scaler_with_preset_grad_scale_in_place_unscale_True_Adam_cuda_float32 (__main__.TestCudaOptimsCUDA)
#135881 commented on Sep 27, 2024 • 0 new comments
DISABLED test_scatter_fallback_abi_compatible_cuda (__main__.AOTInductorTestABICompatibleCuda)
#131627 commented on Sep 27, 2024 • 0 new comments
Build failure with Xcode 15 linker
#111086 commented on Sep 27, 2024 • 0 new comments
Function Request: np.interp
#50334 commented on Sep 27, 2024 • 0 new comments
[mark_dynamic] Assertion errors when marking tensor with outer and inner functions
#135568 commented on Sep 27, 2024 • 0 new comments
DISABLED test_grad_scaler_with_preset_grad_scale_in_place_unscale_True_AdamW_cuda_float32 (__main__.TestCudaOptimsCUDA)
#135829 commented on Sep 27, 2024 • 0 new comments
DISABLED test_device_mode_ops_sparse_mm_reduce_cpu_bfloat16 (__main__.TestDeviceUtilsCPU)
#132494 commented on Sep 27, 2024 • 0 new comments
Fix max_width computation in _tensor_str._Formatter
#126859 commented on Sep 26, 2024 • 0 new comments
[Storage_ipc] Option II: Provides IPC extensions for 3rd devices.
#126373 commented on Sep 27, 2024 • 0 new comments
[WIP] Warn on future divergent behavior for conditional views
#126129 commented on Sep 25, 2024 • 0 new comments
allow to use bf16 as fp32 internal precision for mkldnn conv backward
#126054 commented on Sep 25, 2024 • 0 new comments
allow to use bf16 as fp32 internal precision for mkldnn conv
#126050 commented on Sep 25, 2024 • 0 new comments
refine fp32 precision api
#125888 commented on Sep 25, 2024 • 0 new comments
[vision hash update] update the pinned vision hash
#125806 commented on Sep 27, 2024 • 0 new comments
[Storage_ipc] Provides IPC extensions for 3rd devices.
#125122 commented on Sep 27, 2024 • 0 new comments
[ROCm] hipSPARSELt Integration
#124320 commented on Sep 27, 2024 • 0 new comments
Switch batch norm stack to consolidated ops
#119496 commented on Sep 24, 2024 • 0 new comments
Automated submodule update: FBGEMM
#115316 commented on Sep 27, 2024 • 0 new comments
Automated submodule update: kineto
#106149 commented on Sep 27, 2024 • 0 new comments
[MPS] Possible persistent infinite loop in `nn.ReplicationPad1d`
#135442 commented on Sep 27, 2024 • 0 new comments
[ONNX] Cannot view a tensor with shape torch.Size([1, 512, 32, 128]) and strides (2097152, 128, 65536, 1) as a tensor with shape (1, 512, 4096)
#136543 commented on Sep 27, 2024 • 0 new comments
[MPS] BatchNorm2D produces incorrect results for column first tensors
#134580 commented on Sep 27, 2024 • 0 new comments
[MPS] Incorrect result from batch norm with sliced inputs
#133520 commented on Sep 27, 2024 • 0 new comments
aot_export is not currently supported with traceable tensor subclass- error comes when distributed tensor is an input to aot_export_joint_simple
#136289 commented on Sep 27, 2024 • 0 new comments
Python 3.13 support for PyTorch
#130249 commented on Sep 27, 2024 • 0 new comments
[RFC] Default torch.compile backend customization
#136118 commented on Sep 27, 2024 • 0 new comments
[ONNX] Exporter improvement tasks
#129274 commented on Sep 27, 2024 • 0 new comments
[ONNX] rfftn/irfftn produces incorrect shapes
#125903 commented on Sep 27, 2024 • 0 new comments
Stack trace is symbolized when no exception is thrown
#133979 commented on Sep 27, 2024 • 0 new comments
DISABLED test_input_mutation2_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#135295 commented on Sep 26, 2024 • 0 new comments
`torch.compile` cannot be used in official Docker runtime images
#116696 commented on Sep 26, 2024 • 0 new comments
[RFC] Integrate NCCL scalable init API
#136539 commented on Sep 26, 2024 • 0 new comments
Cannot Convert Pytorch model with fft_rfftn layers to ONNX using latest torch.onnx.dynamo_export
#133785 commented on Sep 26, 2024 • 0 new comments
torch._dynamo.exc.InternalTorchDynamoError when tracing through torch.ops.prim.NumToTensor
#136448 commented on Sep 26, 2024 • 0 new comments
Inductor pattern doesn't match on dynamic tensor marked with torch.dynamo.mark_dynamic
#136329 commented on Sep 26, 2024 • 0 new comments
Iterating dataloader fails on sliced dataset
#131883 commented on Sep 26, 2024 • 0 new comments
On AMD GPUs (ROCm 5.7-6.2), cannot backpropagate loss tensor containing more than `2e8` elements
#136291 commented on Sep 26, 2024 • 0 new comments
Remove cusparselt deprecated API usage
#136553 commented on Sep 26, 2024 • 0 new comments
Wonder why _MultiProcessingDataLoaderIter.__next__ too slow？
#132492 commented on Sep 26, 2024 • 0 new comments
[torch.export] Detect internal constrains
#136216 commented on Sep 26, 2024 • 0 new comments
DISABLED test_aot_export_cond_simple_cuda_float32 (__main__.TestHOPCUDA)
#123096 commented on Sep 26, 2024 • 0 new comments
DISABLED test_grad_scaler_with_preset_grad_scale_in_place_unscale_False_SGD_cuda_float32 (__main__.TestCudaOptimsCUDA)
#135783 commented on Sep 26, 2024 • 0 new comments
torch.compile doesn't generate any graphs when modules are patched with new parameters class
#136257 commented on Sep 26, 2024 • 0 new comments
missing thrust/complex.h PyTorch_rocm on gfx1032
#136442 commented on Sep 26, 2024 • 0 new comments
[inductor] Graph breaks in CohereForAI/aya-23-8b
#128095 commented on Sep 26, 2024 • 0 new comments
ReduceLROnPlateau will throw IndexError: list index out of range with modified optimizer's param_groups.
#104361 commented on Sep 27, 2024 • 0 new comments
`import torch` fails on stock ubuntu if one uses XPU nightly binaries
#135867 commented on Sep 27, 2024 • 0 new comments
DISABLED test_scaled_dot_product_fused_attention_overrideable_backward (__main__.TestSDPAPrivateUse1Only)
#134602 commented on Sep 27, 2024 • 0 new comments
DISABLED test_fused_sdp_choice_privateuseone (__main__.TestSDPAPrivateUse1Only)
#134600 commented on Sep 27, 2024 • 0 new comments
[inductor][cpu]GPT2ForSequenceClassification AMP static/dynamic shape default/cpp wrapper single thread accuracy crash
#123503 commented on Sep 27, 2024 • 0 new comments
Support `divmod` for tensors
#90820 commented on Sep 27, 2024 • 0 new comments
DISABLED test_autograd_cpp_node_data_dependent (__main__.TestCompiledAutograd)
#125579 commented on Sep 27, 2024 • 0 new comments
DISABLED test_pointwise_bessel_y1_cuda (__main__.GPUTests)
#127756 commented on Sep 27, 2024 • 0 new comments
xpu: huggingface levit test_retain_grad_hidden_states_attentions test hangs on exit on PVC
#136007 commented on Sep 27, 2024 • 0 new comments
torch.nn.InstanceNorm2d and torch.nn.InstanceNorm3d returns nan with tensors of float16 dtype on cpu
#135542 commented on Sep 27, 2024 • 0 new comments
torch.onnx.export with dynamic axes fails for torch.nn.InstanceNorm1d with track_running_stats=True
#128501 commented on Sep 27, 2024 • 0 new comments
InternalTorchDynamoError on converting llama-2 to onnx using torch.onnx.dynamo_export
#128480 commented on Sep 27, 2024 • 0 new comments
[ONNX] view(dtype=dtype) is not supported by both onnx.export and onnx.dynamo_export
#126921 commented on Sep 27, 2024 • 0 new comments
[v.2.5.0] Release Tracker
#135522 commented on Sep 27, 2024 • 0 new comments
ONNX Export Fails with Dynamic Slicing on Data-Dependent Value
#136083 commented on Sep 27, 2024 • 0 new comments
autograd.Function x Dynamo tracing incorrectly returns Tensors that don't require grad
#129963 commented on Sep 27, 2024 • 0 new comments
PyTorch 2.5.0 exposes statically linked `libstdc++` CXX11 ABI symbols.
#133437 commented on Sep 27, 2024 • 0 new comments
The call to ncclCommSplit does not have the correct config parameters set
#129862 commented on Sep 27, 2024 • 0 new comments
MPS code contains references to undocumented APIs
#135637 commented on Sep 27, 2024 • 0 new comments