Pulse · pytorch/pytorch · GitHub

February 27, 2025 – March 6, 2025

Overview

202 Active pull requests

319 Active issues

1 Pull request merged by 1 person

Add a docstring to build.sh
#144566 merged Mar 5, 2025

201 Pull requests opened by 114 people

Implement needs_exact_strides for mutable custom operators
#148091 opened Feb 27, 2025
Rename node.meta["arg_kwarg_vals"] to node.meta["eager_input_vals"]
#148092 opened Feb 27, 2025
[ROCm][Windows] Fix OpenMP Flags for clang-cl
#148097 opened Feb 27, 2025
[Experiment] meaure the effect of combining cpp-wrapper and cudagraphs
#148100 opened Feb 27, 2025
Inductor respects exact strides on custom ops by default
#148104 opened Feb 27, 2025
Use myst_nb in docs
#148105 opened Feb 27, 2025
stable torch library draft
#148124 opened Feb 27, 2025
Use correct boxed_forward_device_index when running `CompiledFxGraph.post_compile`
#148130 opened Feb 27, 2025
[WIP] [Inductor] Use real input to autotune user defined triton kernels
#148131 opened Feb 27, 2025
WIP enable aten convolution out in lowerings
#148132 opened Feb 27, 2025
Checks kv pair indexing in OrderedPreservingDictTest.test_range_insert
#148136 opened Feb 27, 2025
Disable cudnn to avoid creating guards that denies exporting
#148140 opened Feb 28, 2025
draft
#148160 opened Feb 28, 2025
[Don't merge]Upgrade submodule oneDNN to v3.7 (#147498)(Z7)
#148163 opened Feb 28, 2025
Improvement with comprehensive docstrings and implementation of class method for the code.
#148170 opened Feb 28, 2025
[Don't merge]Upgrade submodule oneDNN to v3.7 (#147498)(ZI)
#148173 opened Feb 28, 2025
Fix torch.matmul related out dtype check
#148174 opened Feb 28, 2025
Fix addbmm & addmv & baddbmm out dtype check
#148176 opened Feb 28, 2025
[Dynamo] Replace `unimplemented` with`unimplemented_v2` in `torch/_dynamo/variables/base.py`
#148177 opened Feb 28, 2025
[pytree] add another simplified pytree module `torch.pytree`
#148180 opened Feb 28, 2025
[BE][PYFMT] migrate PYFMT for `torch/ao/` to `ruff format`
#148185 opened Feb 28, 2025
[BE][PYFMT] migrate PYFMT for `test/inductor/` to `ruff format`
#148186 opened Feb 28, 2025
Support huggingface reading and writing for multi rank case
#148189 opened Feb 28, 2025
[BE][Ez]: Use itertools.chain.from_iterable when possible
#148190 opened Feb 28, 2025
Enable oneDNN dispatch for gemm bf16bf16->bf16
#148197 opened Feb 28, 2025
Move estimate runtime and pick loop order heuristics into choices.py
#148202 opened Feb 28, 2025
[debug] 'No available kernel' error for cudnn on A100
#148204 opened Feb 28, 2025
[cond] support output the same unbacked symbol from two branches
#148206 opened Feb 28, 2025
[inductor] support dilation in max_pool2d lowering
#148209 opened Feb 28, 2025
[inductor] Lowerings for max_pool3d
#148210 opened Feb 28, 2025
[not for merge] [AOTI] selectively build code at O1
#148212 opened Feb 28, 2025
Expose functions used in custom backend in torch_python dll
#148213 opened Feb 28, 2025
ROCm: Disable torch check for Multiplication of two Float8_e5m2 matrices
#148228 opened Feb 28, 2025
[cutlass backend] Expand addmm test to all 4 broadcastable shape bias
#148234 opened Mar 1, 2025
Make require_contiguous require exact strides instead of stride order
#148235 opened Mar 1, 2025
[cutlass backend] try reenable subproc add mm test
#148236 opened Mar 1, 2025
[ROCm][TunableOp] Add support for rowwise scaling on scaled GEMM.
#148238 opened Mar 1, 2025
handle jk for emulation runs
#148240 opened Mar 1, 2025
[fx] Move map_aggregate to C++
#148243 opened Mar 1, 2025
Set requires grad in TensorMaker::make_tensor()
#148255 opened Mar 1, 2025
[BE]: No include left behind - recursive glob setuptools support
#148258 opened Mar 1, 2025
[fx] Move Node._update_args_kwargs to C++
#148260 opened Mar 1, 2025
[fx] Move Node._prepend/Node._remove_from_list to C++
#148261 opened Mar 1, 2025
Typo Errors fixed in multiple files
#148262 opened Mar 1, 2025
Fix bug when Inductor include path contains spaces
#148271 opened Mar 1, 2025
[torch] Fix unsafe concurrent access to autocast_enabled
#148281 opened Mar 2, 2025
[fx] Optimizations for node name generation
#148288 opened Mar 2, 2025
[fx] Optimize TracerBase.create_arg and Graph._gen_python_code
#148292 opened Mar 3, 2025
Treat CUDA warnings as errors
#148294 opened Mar 3, 2025
```torch.as_strided``` negative stride SIGSEV fix when using ```torch.compile```
#148301 opened Mar 3, 2025
Better log message to update pr_time_benchmarks/expected_results.csv
#148303 opened Mar 3, 2025
Reland: [inductor] Simplify grid handling
#148305 opened Mar 3, 2025
do not run `test_ck_blas_library` on cpu
#148316 opened Mar 3, 2025
[ATen][CUDA] Optimize 128 bit vectorization
#148320 opened Mar 3, 2025
Checking for cuda version to see if bf16 is natively supported or emulated
#148322 opened Mar 3, 2025
[Windows][Inductor][XPU] Unload triton pyd files to be able to remove them on Windows.
#148323 opened Mar 3, 2025
Flex unskip
#148327 opened Mar 3, 2025
[pytree] simplify public API exposition with `__module__`
#148328 opened Mar 3, 2025
Update CURL url for manywheel images
#148343 opened Mar 3, 2025
[test] cutlass
#148351 opened Mar 3, 2025
[DO NOT MERGE] Test new ROCm CI Navi31 nodes
#148355 opened Mar 3, 2025
test index_put
#148357 opened Mar 3, 2025
[inductor] Improve type annotations in _inductor/ir.py
#148358 opened Mar 3, 2025
[AOTI][dashboard] Skip torchbench models not supported by export
#148359 opened Mar 3, 2025
Enabling xpu in OffsetBasedRNGTracker .
#148360 opened Mar 3, 2025
[@no-merge] Enable process based async cp + caching
#148373 opened Mar 3, 2025
Documents torch.cuda.MemPool API
#148374 opened Mar 3, 2025
[Utilization] Add utilization monitor for linux build
#148375 opened Mar 3, 2025
[reland][ca] side-effect free inital trace: compiled_args
#148376 opened Mar 3, 2025
Throws error when using torch.cuda.MemPool with expandable segments
#148378 opened Mar 3, 2025
[Optimus][Auto-AC] Support activation quantization
#148380 opened Mar 3, 2025
[ca] remove compiled_autograd_tracing
#148381 opened Mar 3, 2025
[Docs][TunableOp] TunableOp documentation update
#148384 opened Mar 4, 2025
[dynamo] Remove dead code path around `functools.partial` objects
#148386 opened Mar 4, 2025
Add new GHA workflow to cache ROCm CI docker images on MI300 CI runners periodically
#148394 opened Mar 4, 2025
[Docs] update bucketize documentaion
#148400 opened Mar 4, 2025
[dynamo] show stack above dynamo in graph break user tracebacks
#148401 opened Mar 4, 2025
Use oneDNN v3.7.1 for Intel GPU
#148403 opened Mar 4, 2025
Add api info for torch._C._nn.pyi
#148405 opened Mar 4, 2025
ci: Move s390x builds with the rest
#148406 opened Mar 4, 2025
Enable ASAN on inductor CUDA tests
#148407 opened Mar 4, 2025
[WIP] Add `device` arg to `_lazy_clone`
#148408 opened Mar 4, 2025
Add api info for torch._C._nn.pyi [1/N]
#148410 opened Mar 4, 2025
Disable flake8 advice C416
#148412 opened Mar 4, 2025
Automated perf_linter changes: generators
#148413 opened Mar 4, 2025
Automated perf_linter changes: list constructors
#148414 opened Mar 4, 2025
Automated perf_linter changes: x in (...)
#148415 opened Mar 4, 2025
Add perf_linter to auto-fix some anti-patterns
#148416 opened Mar 4, 2025
Add 'x in {...}' patterns to perf_linter
#148417 opened Mar 4, 2025
[2/N] Use Python 3.9 typing
#148418 opened Mar 4, 2025
ci: Add sccache to manylinux images
#148419 opened Mar 4, 2025
[set_linter] allow x in {...}
#148422 opened Mar 4, 2025
[Intel GPU][pt2e]: Collapse 3D input to 2D for matmul in qlinear_pointwise_binary fusion
#148423 opened Mar 4, 2025
Temp test
#148424 opened Mar 4, 2025
Optimize `torch.distributions` Score function
#148429 opened Mar 4, 2025
Introduce guard_or_true, guard_or_false and avoid guard_size_oblivious in decompositions.py
#148430 opened Mar 4, 2025
set non_blocking to true in torch._foreach_copy_ to improve performance
#148431 opened Mar 4, 2025
[ROCm] Add TF32 option for Flex Attention for gfx90a
#148432 opened Mar 4, 2025
Do not crash when compiling quantized LORA models
#148435 opened Mar 4, 2025
Expand docs for `nn.functional`, and make the wording consistent
#148436 opened Mar 4, 2025
[ROCm] Incorporate ROCm triton specific tuning parameters
#148437 opened Mar 4, 2025
Let `CUDAExtension` to find stub libs
#148441 opened Mar 4, 2025
Update s390x docker image
#148444 opened Mar 4, 2025
Fix test failures on non-x86 Linux
#148445 opened Mar 4, 2025
[inductor][triton] Block ptr analysis fix assert on matched index expression
#148446 opened Mar 4, 2025
Enable more nightly tests on s390x
#148452 opened Mar 4, 2025
Update docstring to match code.
#148455 opened Mar 4, 2025
[PP] RFC for fixing microbatch splitting for dim != 0
#148458 opened Mar 4, 2025
Remove `torch.testing` from `MOD_SKIPLIST`
#148459 opened Mar 4, 2025
meta registration for torch._scaled_mm with mxfp8
#148461 opened Mar 4, 2025
[aarch64] add libcufile for cu126 and cu128
#148465 opened Mar 4, 2025
[MPS] Introduce strides unary op
#148468 opened Mar 4, 2025
[dynamo] Properly account for non-list instances in list comparison
#148470 opened Mar 4, 2025
[BE][pytree] rename `NodeDef` member to match the type annotations: `*_fn -> *_func`
#148474 opened Mar 4, 2025
Remove warnings on non-buffer tensor constants
#148483 opened Mar 4, 2025
[BE][pytree] rename argument name in register function to match the type annotations: `*_fn -> *_func`
#148484 opened Mar 4, 2025
Demote logger of runtime_asserts_frozen to be fired only on debug mode
#148485 opened Mar 4, 2025
Suppress more warnings
#148488 opened Mar 4, 2025
[Ads] Slice convertIdScoreListFeaturesToIValue in half
#148489 opened Mar 4, 2025
[aot cache][ca] remove restriction on caching ca's aot inference graph
#148491 opened Mar 4, 2025
[triton hash update] update the pinned triton hash
#148492 opened Mar 4, 2025
Implement fast access to individual elements of jagged nested tensors
#148497 opened Mar 4, 2025
[codemod] Remove unused-variable in caffe2/caffe2/core/context_gpu.cu +2
#148501 opened Mar 4, 2025
[Inductor-CPU] Fix perf regression for templated int8 WoQ GEMM for small M dimension
#148502 opened Mar 4, 2025
[triton] Warp specialization support in torchinductor
#148503 opened Mar 4, 2025
Change constexpr annotation to specific initialization (test: triton_kernel_constants)
#148505 opened Mar 4, 2025
Support basic TorchBind in aot_compile and aoti_compile_and_package
#148506 opened Mar 4, 2025
WIP: record how many paramaters we're parsing
#148508 opened Mar 4, 2025
Enable autocast on MAIA device
#148511 opened Mar 5, 2025
Add sparsity
#148513 opened Mar 5, 2025
[ca][aot] maybe mark activations as dynamic
#148516 opened Mar 5, 2025
[PT2] Port use_triton_dot_compress to PT2 pre_grad passes
#148517 opened Mar 5, 2025
[cutlass backend] Forward fix for less aligned gemm shapes
#148521 opened Mar 5, 2025
[Intel GPU][pt2e] Enable quantized grouped convolution at XPU
#148522 opened Mar 5, 2025
Implement gradient for the `residuals` of `torch.linalg.lstsq`
#148526 opened Mar 5, 2025
Fix clang-tidy bugprone* warnings
#148529 opened Mar 5, 2025
[WIP] Initial implementation of Grouped Gemm API
#148531 opened Mar 5, 2025
Skip buffer in dense update
#148533 opened Mar 5, 2025
[dtensor] add CuDNN SDPA op support to DTensor
#148537 opened Mar 5, 2025
[XPU] Add test/kernel.errors.txt to .gitignore.
#148538 opened Mar 5, 2025
Enable Direct Use of Arm Compute Library (ACL) in ATen
#148542 opened Mar 5, 2025
remove TORCH_NCCL_AVOID_RECORD_STREAMS,use stashed_for_allocator_safety_ to save the input ref
#148553 opened Mar 5, 2025
[BE] format `test/inductor/s429861_repro.py`
#148554 opened Mar 5, 2025
[DEBUG] Custom ops perf
#148555 opened Mar 5, 2025
[BE] Remove `onlyCPU` decorator from test_local_scalar_dense
#148559 opened Mar 5, 2025
[ROCm][Windows] Fix ROCm/HIP version header
#148560 opened Mar 5, 2025
[WIP] First version of StaticCudaLauncher
#148561 opened Mar 5, 2025
[ROCm][Windows] Enable hipblaslt for Windows
#148563 opened Mar 5, 2025
[BE][pytree] cleanup parameterized pytree tests
#148569 opened Mar 5, 2025
[DCP] Save Plan Caching: Fix the missing all_plans update in the cache.
#148577 opened Mar 5, 2025
[CI][CUDA][Distributed]Update test_composability.py
#148578 opened Mar 5, 2025
[wip][inductor]lowering scan to while_loop
#148580 opened Mar 5, 2025
Enable Direct Use of Arm Compute Library (ACL) in ATen
#148581 opened Mar 5, 2025
Enable Direct Use of Arm Compute Library (ACL) in ATen
#148584 opened Mar 5, 2025
Enable fast qlinear static/dynamic path for AArch64 through ACL directly
#148585 opened Mar 5, 2025
[PGNCCL] Launch kernel on current stream & remove `record_stream` entirely
#148590 opened Mar 5, 2025
[AOTI] Swith to local cpp compile for fbcode
#148592 opened Mar 5, 2025
Add XPU device to nested_layer_norm
#148593 opened Mar 5, 2025
[CUDA Graphs][NCCL] Set event queries to happen under thread-local mode in `ProcessGroupNCCL.cpp`
#148594 opened Mar 5, 2025
[c10d] Make getDefaultBackend more fault tolerant
#148596 opened Mar 5, 2025
[pytorch] Update flexattention bwd config generation
#148600 opened Mar 5, 2025
Fix for AOTI + CUDAGraphs when calling from Python
#148601 opened Mar 5, 2025
[CI][CUDA] Move away from cuda12.4, Add cuda12.6 eager CI tests
#148602 opened Mar 5, 2025
[ONNX] Expose verification utilities
#148603 opened Mar 5, 2025
[cuda] Add new faster gammabeta backward kernel
#148605 opened Mar 5, 2025
Remove warnings on non-buffer tensor constants (#148483)
#148611 opened Mar 5, 2025
[CI] [inductor] Add cu126 inductor jobs and move away cu124
#148612 opened Mar 5, 2025
[For Discussion][Dynamo] Avoiding skipping module.py inner() frame, to keep forward hooks and forward in the same graph
#148613 opened Mar 5, 2025
Bump to AOTriton 0.9.2 to fix version strings
#148615 opened Mar 5, 2025
Optimize AOTInductor: Caching, Reduced Decompositions, and Improved JSON Handling
#148616 opened Mar 5, 2025
[dynamo] Don't affect stack traces under TORCHDYNAMO_DISABLE
#148618 opened Mar 5, 2025
fix 142457 , fixes double free corruption by adding TORCH_CHECK to ensure weights have the proper size
#148620 opened Mar 5, 2025
update get_default_device to also respect torch.device ctx manager
#148621 opened Mar 5, 2025
stage 2 of depreate silent fallback of tuning gemm
#148622 opened Mar 5, 2025
[mm_logs] follow up to add count info based on shape for inductor `aten.mm`s
#148623 opened Mar 6, 2025
Remove Cuda 12.4 from nightly Binaries
#148625 opened Mar 6, 2025
[triton 3.3] test_triton_kernel_constants fix
#148626 opened Mar 6, 2025
[ONNX] Use torch export to get dynamic shapes for JIT convert strategy
#148627 opened Mar 6, 2025
Remove CAFFE2_USE_EIGEN_FOR_BLAS
#148628 opened Mar 6, 2025
[inductor] lowering for fractional_max_pool3d
#148630 opened Mar 6, 2025
[Window][Inductor UT] Fix for tempfile.NamedTemporaryFile(delete=True) not work on Windows.
#148632 opened Mar 6, 2025
update torch.nn.RelicationPad{1,2,3}d deternimistic documentation
#148633 opened Mar 6, 2025
Subprocess compile (attempt 2)
#148635 opened Mar 6, 2025
[cutlass backend] switch host optimizer to O3
#148637 opened Mar 6, 2025
Remove cppcoreguidelines-pro-type-member-init_fix suppression
#148638 opened Mar 6, 2025
[Intel GPU][quant] Refine zero-point memory creation
#148640 opened Mar 6, 2025
Fix torch.utils.checkpoint import error
#148641 opened Mar 6, 2025
[XPU] Add an implict conversion from XPUStream to sycl::queue*
#148646 opened Mar 6, 2025
[SGD] Add SGD capturable API and tests
#148647 opened Mar 6, 2025
Bump Clang-tidy to 19.1.4
#148648 opened Mar 6, 2025
[Intel GPU] Fix SDPA dummy LSE output to match meta function
#148652 opened Mar 6, 2025
Enable qint8 and quint8 add for AArch64 using ACL directly
#148653 opened Mar 6, 2025
Enable ruff check for `torch/utils/data/typing.ipynb`
#148654 opened Mar 6, 2025
Allow to run flex_attention on HPU
#148656 opened Mar 6, 2025
[associative_scan] Refactoring of input checking and dynamo invocation
#148657 opened Mar 6, 2025
[docs] fix autograd description on convex function case
#148658 opened Mar 6, 2025
[HPU] Add HPU as a supported device for NestedTensor
#148659 opened Mar 6, 2025
Remove deprecated std::aligned_storage_t
#148660 opened Mar 6, 2025
removed tocm triton template cond
#148662 opened Mar 6, 2025
[Profiler][HPU] Fix incorrect availabilities for HPU
#148663 opened Mar 6, 2025
[AOTInductor] Codegen fix
#148664 opened Mar 6, 2025

149 Issues closed by 52 people

[triton 3.3] inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_floor_divide_cuda_float16
#148541 closed Mar 6, 2025
[triton 3.3] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_none_args_aot_codegen_cuda
#148540 closed Mar 6, 2025
DISABLED test_slice_scatter_reinplace_cuda (__main__.GPUTests)
#145189 closed Mar 6, 2025
torch.conj behaves differently on cpu and mps
#148599 closed Mar 6, 2025
The operator 'aten::_linalg_solve_ex.result' is not currently implemented for the MPS device
#148547 closed Mar 6, 2025
RuntimeError on MPS: [srcBuf length] > 0 INTERNAL ASSERT FAILED – Placeholder tensor is empty using huggingface model ibm-granite/granite-timeseries-ttm-r2
#148589 closed Mar 6, 2025
'CUDA error: an illegal memory access was encountered' when using forced_align on cuda device > 0
#148438 closed Mar 6, 2025
Inductor layout constraints for custom operators changed from 2.5->2.6, breaking BC
#148356 closed Mar 6, 2025
[ROCm] sorting torch.bool tensor viewed from torch.uint8 type produces incorrect results
#139972 closed Mar 6, 2025
test_reference_numerics_normal fails with certain versions of numpy/scipy
#148143 closed Mar 5, 2025
[ONNX] stft export fails with dynamo_export
#113067 closed Mar 5, 2025
[ONNX] Record the capture strategy in onnx program
#147674 closed Mar 5, 2025
[aot autograd][inline inbuilt nn modules] AOT Autograd changes _version different from eager
#128198 closed Mar 5, 2025
[Inductor-CPU] ATen SDPA kernel runtime is not captured in profiling results
#148225 closed Mar 5, 2025
[ONNX] aten_unfold needs to support symint
#148337 closed Mar 5, 2025
Decorators like `torch.compiler.allow_in_graph` doesn't account for id reuse
#147777 closed Mar 5, 2025
The `sympy` dependency spec for pytorch on PyPi wheel is still unchanged.
#145225 closed Mar 5, 2025
The latest PyTorch XPU wheel 2.7.0.dev20250117+xpu does not work on Windows
#145155 closed Mar 5, 2025
AOTI doesn't account for constant tensors
#148370 closed Mar 5, 2025
Redundant Try Block in backward()
#148115 closed Mar 5, 2025
dataclasses.replace not supported by dynamo
#136481 closed Mar 5, 2025
`torch.compile` fails with customized Triton Operator on Triton 2.2
#128039 closed Mar 5, 2025
Compile with non-default mode + triton kernel fails
#126864 closed Mar 5, 2025
torch.compile generates wrong code on CPU and compiled code replaces original function
#126848 closed Mar 5, 2025
SIGSEGV error when passing a 0-sized tensor to `_local_scalar_dense`
#145066 closed Mar 5, 2025
DISABLED test_dropout_deterministic_dynamic_shapes_cuda (__main__.DynamicShapesGPUTests)
#132957 closed Mar 5, 2025
DISABLED test_remote_cache_load_function_device_cuda_bfloat16_dynamic_False_bundle_triton_False (__main__.TestFxGraphCache)
#145182 closed Mar 5, 2025
DISABLED test_tensor_subclass_basic (__main__.TestCompiledAutograd)
#145457 closed Mar 5, 2025
DISABLED test_precompilations (__main__.TestMaxAutotune)
#124265 closed Mar 5, 2025
nn.Transformer out[0:-1] not precisely equal to last_out when predicting in tgt mask
#100052 closed Mar 5, 2025
[Flex Attention] Errors with Dynamic Shapes (Cannot determine truth value of Relational)
#146745 closed Mar 5, 2025
DISABLED test_recompile_on_global_state_change_dynamic_shapes (__main__.DynamicShapesMiscTests)
#144896 closed Mar 5, 2025
[xpu] Compilation of pytorch failed, unable to generate RegisterSparseXPU.cpp
#144718 closed Mar 5, 2025
torch.accelerator.is_available() raise RuntimeError if no available CUDA/XPU devices
#144567 closed Mar 5, 2025
[Profiler] Add profiler activity for HPU devices
#148181 closed Mar 5, 2025
[dynamic-shapes][dynamo] Iist index out of range for constraint with out=
#130066 closed Mar 5, 2025
PyTorch defaults to using libuv but is built without support for it on Windows
#139990 closed Mar 5, 2025
Switch to using Docker Images from ECR instead of Docker Hub
#147748 closed Mar 4, 2025
Lintrunner results inconsistent
#128588 closed Mar 4, 2025
[ONNX] torch.matmul() breaks dynamic shapes during export
#148192 closed Mar 4, 2025
Release 2.6.0 validations checklist and cherry-picks
#144503 closed Mar 4, 2025
[RFC] Raising minimal glibc support to: glibc2_28 . Deprecation support for Amazon Linux 2 support
#126551 closed Mar 4, 2025
[DTensor] `clip_grad_norm_` follow-ups
#121020 closed Mar 4, 2025
RuntimeError when using Adam(fused=True) with torch.compile
#126585 closed Mar 4, 2025
[DSD] Test could fail in test_fsdp_dsd.py
#134212 closed Mar 4, 2025
Dtensor shard uses more gpu memory than raw tensor
#133549 closed Mar 4, 2025
[dynamo] Issue with construction nn.Parameter
#127697 closed Mar 4, 2025
Autotuning failure: `Triton Error [CUDA]: invalid argument`
#145984 closed Mar 4, 2025
Adding Small Epsilon in linalg_eig_backward to Improve Numerical Stability on GPU
#147544 closed Mar 4, 2025
I don't use FSDP，it can train.
#148409 closed Mar 4, 2025
built from source windows static library with multiple "unresolved external symbol"
#87499 closed Mar 4, 2025
When using torch.compile to compile the function _kernel_make_viewless_tensor, an error occurs：AssertionError: wrong number of dimensions
#143895 closed Mar 4, 2025
[Triton upstream] [Inductor] [ROCm] OpInfo quantile UT accuracy issues
#147736 closed Mar 4, 2025
[Triton upstream] [Inductor] [ROCm] cpp_wrapper segfaults
#147734 closed Mar 4, 2025
DISABLED test_cache_load_function_device_cuda_bfloat16_dynamic_False_bundle_triton_True_grad_True (__main__.TestFxGraphCache)
#145181 closed Mar 4, 2025
There is a problem with the wording here.
#147696 closed Mar 4, 2025
Triton aarch64 and triton sbsa
#147857 closed Mar 4, 2025
torch.compile reorder_for_compute_comm_overlap sink_waits pass does not work
#127324 closed Mar 4, 2025
Invalid ONNX graph if using float16 dtype for `torch.arange`
#148041 closed Mar 3, 2025
export + aot_export_module on models with non-parameter/buffer tensor state show up as getattrs in the graph without meta['val'] fields
#146074 closed Mar 3, 2025
How to get last layer hidden state of transformer model while convert model to onnx format?
#146682 closed Mar 3, 2025
Unable to export to ONNX | The serialized model is larger than the 2GiB limit imposed by the protobuf library.
#147259 closed Mar 3, 2025
when convert to onnx with dynamix_axis, the Reshape op value is always the same as static, dynamic_axis is useless, it cant't inference right shape dynamically
#99701 closed Mar 3, 2025
Exporting onnx model to a buffer causes "TypeError: expected str, bytes or os.PathLike object, not BytesIO"
#147909 closed Mar 3, 2025
DISABLED test_custom_hook_custom_stream (__main__.TestHSDPWithCustomHook)
#147767 closed Mar 3, 2025
[inductor] [silence] `torch.cdist` outputs inconsistent results with eager
#148064 closed Mar 3, 2025
IndexError: tuple index out of range when running vLLM script
#147839 closed Mar 3, 2025
`torch.multinomial` outputs inconsistency on ARM and x86
#148247 closed Mar 3, 2025
Huge numerical precision error when `torch.tensor(3811, dtype=torch.float16)`
#148321 closed Mar 3, 2025
The issue where opt_output in fx_graph_runnable.py is inconsistent with the actual output when testing run_repro(acc=True)
#147850 closed Mar 3, 2025
HIP error: invalid device function on ROCm RX 7600XT
#147626 closed Mar 3, 2025
`torch.Tensor.pinverse` can cause an `INTERNAL ASSERT FAILED`
#148300 closed Mar 3, 2025
`torch.linalg.cond` can cause an `INTERNAL ASSERT FAILED`
#148299 closed Mar 3, 2025
`torch.linalg.pinv` can cause an `INTERNAL ASSERT FAILED`
#148298 closed Mar 3, 2025
`torch.nansum` can cause a `Segmentation fault (core dumped)`
#148297 closed Mar 3, 2025
INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/NamedTensorUtils.cpp":163, please report a bug to PyTorch
#148278 closed Mar 3, 2025
`torch.sparse.sum` can cause a `Segmentation fault (core dumped)`
#148276 closed Mar 3, 2025
`torch.nn.LazyConvTranspose1d` can cause a `Floating point exception (core dumped)`
#148275 closed Mar 3, 2025
`torch.nn.functional.conv1d` can cause a `Floating point exception (core dumped)`
#147458 closed Mar 3, 2025
`torch.svd` can cause an `INTERNAL ASSERT FAILED`
#147457 closed Mar 3, 2025
[inductor] [cpu] `nn.Tanhshrink-atan2` output inconsistent results with eager
#148241 closed Mar 3, 2025
ncclUnhandledCudaError
#147575 closed Mar 3, 2025
[Break XPU] Newly added test case with CUDA hard code failed on XPU.
#143479 closed Mar 3, 2025
xpu: support triton against clang with nightly wheels
#137518 closed Mar 3, 2025
Error loading ".venv\Lib\site-packages\torch\lib\c10_xpu.dll" or one of its dependencies
#138986 closed Mar 3, 2025
xpu: clarify which Intel GPUs are supported by PyTorch 2.5
#138347 closed Mar 3, 2025
xpu: intel conda channel is not available
#131802 closed Mar 3, 2025
xpu: add check for supported devices to xpu initialization and torch.xpu.is_available()
#131799 closed Mar 3, 2025
xpu: can't build XPU backend without sourcing oneAPI environment variables (/opt/intel/oneapi/setvars.sh)
#127008 closed Mar 3, 2025
xpu: python hangs on exit after check for xpu on multi-dev system
#126259 closed Mar 3, 2025
DISABLED test_cond_autograd_zeros_unused_branch_complex_compile_mode_compile (__main__.TestControlFlow)
#148309 closed Mar 3, 2025
DISABLED test_cond_autograd_zeros_unused_branch_complex_compile_mode_compile (__main__.TestControlFlow)
#148308 closed Mar 3, 2025
[RFC] Add CPP INT8 SDPA Template for Inductor CPU
#144941 closed Mar 3, 2025
DISABLED test_add_tuple_non_optional (__main__.TestScript)
#146136 closed Mar 3, 2025
DISABLED test_non_final_return (__main__.TestScript)
#145975 closed Mar 3, 2025
DISABLED test_scriptable_fn_as_attr (__main__.TestScript)
#145972 closed Mar 3, 2025
DISABLED test_pt2_traceable_aot_eager_cpu_float8_e5m2 (__main__.TestFloat8DtypeCPUOnlyCPU)
#144934 closed Mar 3, 2025
DISABLED test_reassign_module_lhs (__main__.TestScript)
#145973 closed Mar 3, 2025
DISABLED test_memory_stats_multigpu (__main__.TestCudaMultiGPU)
#129860 closed Mar 3, 2025
DISABLED test_ternary_right_associative (__main__.TestScript)
#146137 closed Mar 3, 2025
DISABLED test_tensor_subclasses (__main__.TestScript)
#119949 closed Mar 3, 2025
DISABLED test_pybind_type_comparisons (__main__.TestScript)
#145971 closed Mar 3, 2025
DISABLED test_script_outputs (__main__.TestScript)
#145976 closed Mar 3, 2025
DISABLED test_script_annotation (__main__.TestScript)
#145974 closed Mar 3, 2025
DISABLED test_mm_batching (__main__.TestScript)
#119747 closed Mar 3, 2025
[distributed] Register sharding strategy for aten.amax.default to support float8 rowwise scaling
#147578 closed Mar 2, 2025
Excessive memory usage during compilation start up for (atleast some) in place ops
#148165 closed Mar 2, 2025
[Inductor][CPU] SIGSEGV in `torch.slice_copy` with large step value
#147071 closed Mar 2, 2025
Cudnn header files should be copied into build package as well
#47743 closed Mar 2, 2025
The unevenness of torch.randint() during large range(3e9) sampling.
#148175 closed Mar 2, 2025
nn.Matmul return different ret within Parameter and Tensor
#148280 closed Mar 2, 2025
Wrong macro used when building c10/util/bit_cast.h with std::bit_cast
#148263 closed Mar 1, 2025
[dtensor] write aten.split_tensor using op strategy
#130758 closed Mar 1, 2025
PyTorch nightly MPS SDPA op is unusable
#148194 closed Mar 1, 2025
[Inductor UT] RuntimeError: Tried to register an operator with the same name and overload name multiple times.
#148148 closed Mar 1, 2025
How to install Torch version that supports RTX 5090 on Windows? - CUDA kernel errors might be asynchronously reported at some other API call
#146977 closed Mar 1, 2025
Placeholder tensor is empty!
#123171 closed Mar 1, 2025
CPU-specific Inductor Error with `view` on `torch.nn.Embedding` output
#146390 closed Mar 1, 2025
[Triton upstream] [Inductor] Widespread failures in UTs: AttributeError: 'dict' object has no attribute 'equal_to_1'
#147375 closed Mar 1, 2025
LoweringException: AttributeError: 'Constant' object has no attribute 'get_name'
#141197 closed Feb 28, 2025
Could not guard on data-dependent expression u0 - 7 < 0 (unhinted: u0 - 7 < 0). (Size-like symbols: u0)
#128644 closed Feb 28, 2025
[inductor][cpu]transformers models static/dynamic quant performance/accuracy crash in 2024-06-17 nightly release
#128933 closed Feb 28, 2025
randperm + torch.compile + SAC + CUDA graphs doesn't work
#130123 closed Feb 28, 2025
Extending more info on fake tensor when compiling
#130234 closed Feb 28, 2025
DISABLED test_triton_kernel_multiple_out (__main__.AutogradFunctionTests)
#147214 closed Feb 28, 2025
Rule-based reconfig-and-recompile
#127999 closed Feb 28, 2025
"grapharg" key is missing in FXGraph placeholder nodes when calling torch APIs with "out" syntax in torch.cond and torch.while_loop
#130176 closed Feb 28, 2025
TORCH_COMPILE_CPROFILE=1 broken (strobelight might always be on internally?)
#131953 closed Feb 28, 2025
torch._dynamo.exc.Unsupported: call_function args: TensorVariable() UserDefinedObjectVariable(_tuplegetter)
#131411 closed Feb 28, 2025
[triton 3.3][cpp_wrapper] TypeError: 'NoneType' object is not subscriptable
#148111 closed Feb 28, 2025
F.interpolate returns NAN on MPS if align_corner is True.
#144245 closed Feb 28, 2025
`AssertionError` in `torch.compile`
#147840 closed Feb 28, 2025
Issue with FBGEMM Operators in Exported PyTorch AOT Model When Running in C++: Cound not find schema for fbgemm:xxx
#147065 closed Feb 28, 2025
DISABLED test_pt2_traceable_aot_eager_cpu_float8_e4m3fn (__main__.TestFloat8DtypeCPUOnlyCPU)
#144903 closed Feb 28, 2025
[TensorDict - compile] Dynamo doens't like a simple class decorator - but a function is fine
#130533 closed Feb 28, 2025
[Inductor-CPU] LLaMA doesn't use templated GEMMs for da8w8 quantization for next-token generation
#147954 closed Feb 28, 2025
DISABLED test_remote_cache_load_function_device_cuda_float32_dynamic_False_bundle_triton_False (__main__.TestFxGraphCache)
#145212 closed Feb 28, 2025
DISABLED test_profile_all_threads (__main__.TestProfiler)
#145951 closed Feb 28, 2025
Very large memory increase when combining bfloat16 autocast with torch.compile
#133637 closed Feb 28, 2025
Importing torch_tensorrt causes warning for implicitly cleaned up file
#147744 closed Feb 28, 2025
DISABLED test_arange_dynamic_cuda (__main__.TestInductorDynamicCUDA)
#127067 closed Feb 28, 2025
Long queue for macOS runners
#148127 closed Feb 28, 2025
Torch 2.7.0 nightly cuda 12.6 and cuda 12.8 builds are broken on Amazon linux 2023
#148120 closed Feb 28, 2025
torch.clear_autocast_cache is not traceable
#140759 closed Feb 27, 2025
SubgraphLoweringException in flex_attention when using custom score_mod with torch.dot (MLA)
#148107 closed Feb 27, 2025
DISABLED test_insignificant_strides (__main__.SDPAPatternRewriterCudaDynamicTests)
#146959 closed Feb 27, 2025
FUNC_INLINELIST doesn't exist
#144868 closed Feb 27, 2025
td does not detect required test for mkl-dnn OneDNN update
#148085 closed Feb 27, 2025
DISABLED test_mismatched_global_state (__main__.GraphRegionTrackerTests)
#144895 closed Feb 27, 2025

170 Issues opened by 96 people

[Profiler][HPU] Incorrect availabilities for the HPU device
#148661 opened Mar 6, 2025
Extra onnx::Neg_2 input after torch.onnx.export
#148655 opened Mar 6, 2025
Avoid fork for TORCHINDUCTOR_COMPILE_THREADS > 1
#148651 opened Mar 6, 2025
[Inductor-CPU] With cpp-wrapper, some ATen ops don't get profiled with PyTorch profiler
#148650 opened Mar 6, 2025
[inductor][torchbench][CI] timm models got obvious performance drop with --ci flag
#148645 opened Mar 6, 2025
DISABLED test_set_stance_aot_eager_then_compile (__main__.DecoratorTests)
#148644 opened Mar 6, 2025
DISABLED test_symint_in_slice_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148643 opened Mar 6, 2025
Feature request: throw `torch.cuda.OutOfMemoryError` for TorchScript OOM
#148642 opened Mar 6, 2025
[inductor][cpu] poolformer_m36 AMP static shape multiple thread performance regression in 2025-03-03 nightly release
#148639 opened Mar 6, 2025
[inductor][cpu] basic_gnn_gin and basic_gnn_sage AMP single thread performance regression in 2025-03-03 nightly release
#148636 opened Mar 6, 2025
README doesn't explain how to run tests in the "Test PyTorch" section
#148634 opened Mar 6, 2025
DISABLED test_sdpa_rewriter_11_cuda (__main__.SDPAPatternRewriterCudaDynamicTests)
#148631 opened Mar 6, 2025
Dynamo export: dynamic dims are not exported with the specified names
#148629 opened Mar 6, 2025
DISABLED test_return_captured_var_used_multiple_times_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148624 opened Mar 6, 2025
DISABLED test_host_memory_stats (__main__.TestCuda)
#148607 opened Mar 5, 2025
DISABLED test_nested_tuple_output_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148606 opened Mar 5, 2025
AOTI takes very long time to compile (1:40 hours)
#148572 opened Mar 5, 2025
Suggested fixes sometimes not enough in export
#148568 opened Mar 5, 2025
[Feature Request] Dynamic shapes API requires spec for all arguments.
#148564 opened Mar 5, 2025
DISABLED test_internal_nonlocal_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148558 opened Mar 5, 2025
AOTI is OOM-ing when eager doesn't
#148557 opened Mar 5, 2025
Return type annotation of `Tensor.long()` etc is not narrowed down to dtype-specific names `LongTensor` etc
#148552 opened Mar 5, 2025
Issue with torch.compile
#148551 opened Mar 5, 2025
Name 'equal_valued' cannot be imported in pytorch 2.5.0
#148550 opened Mar 5, 2025
Name 'equal_valued' cannot be imported in pytorch 2.5.0
#148549 opened Mar 5, 2025
Issue with Sparse Tensor Matrix Multiplication and Broadcasting
#148548 opened Mar 5, 2025
Pytorch nightly broken Flash Attention 3 compile with sycl commit - TypeError: _write_ninja_file() got an unexpected keyword argument 'sycl_cflags'
#148546 opened Mar 5, 2025
F.scaled_dot_product_attention calculation output is nan when in dynamic dim under torch.compile mode
#148545 opened Mar 5, 2025
DISABLED test_inlined_functions_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148544 opened Mar 5, 2025
When Ues torch::jit::Module in UE env, there are some structured exception caused by calling module.forward()
#148543 opened Mar 5, 2025
exported modules using custom autograd functions will ignore custom backward function
#148539 opened Mar 5, 2025
Computing only the n first rows of a distance matrix with pdist
#148536 opened Mar 5, 2025
[inductor][cpu]speech_transformer failure in 2025-03-02 nightly release
#148535 opened Mar 5, 2025
[FSDP2] CPU Offload Doest Not Work with `torch.nn.utils.clip_grad_norm`
#148532 opened Mar 5, 2025
macos15 M4 can not install torch-2.6.0-cp310-none-macosx_11_0_arm64.whl
#148528 opened Mar 5, 2025
[FlexAttention] Error using create_block_mask with mask head number greater than 1
#148527 opened Mar 5, 2025
DISABLED test_sdpa_rewriter_11_cuda (__main__.SDPAPatternRewriterCudaTests)
#148525 opened Mar 5, 2025
DISABLED test_graph_break_before___enter__ (__main__.ContextlibContextManagerTests)
#148524 opened Mar 5, 2025
DISABLED test_globals_change_in_other_file (__main__.ContextlibContextManagerTests)
#148523 opened Mar 5, 2025
reshape is decomposed to view setting allow_copy=False making it fail in some case!
#148519 opened Mar 5, 2025
Preview (Nightly) version cuda12.8 cannot find torchaudio file
#148518 opened Mar 5, 2025
DISABLED test_set_stance_eager_then_compile (__main__.DecoratorTests)
#148515 opened Mar 5, 2025
DISABLED test_freevars_as_inputs_to_wrap_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148514 opened Mar 5, 2025
[MAIA][Autocast] torch.autocast doesn't work on MAIA device
#148510 opened Mar 5, 2025
[FSDP2] improve error msg for duplicate wraps
#148504 opened Mar 4, 2025
`distributed.checkpoint.async_save` leading to `TypedStorage is deprecated.`
#148498 opened Mar 4, 2025
UNSTABLE trunk / libtorch-linux-focal-cuda12.4-py3.10-gcc9-debug / build
#148495 opened Mar 4, 2025
[Inductor-CPU] Templated int8 WoQ GEMMs (with BF16 activation) may cause regressions for next-token generation of LLMs
#148494 opened Mar 4, 2025
export lift_constants_pass creates ugly warning
#148487 opened Mar 4, 2025
Export shouldn't warn when registering constant tensor attribute on graph module.
#148482 opened Mar 4, 2025
export is emitting too many not actionable warnings.
#148479 opened Mar 4, 2025
export dynamic shapes API throws weird error on upper bound.
#148478 opened Mar 4, 2025
XPU not available until I sign into server locally
#148477 opened Mar 4, 2025
Illegal memory access in scaled_dot_product_attention if only attn_mask requires grad
#148476 opened Mar 4, 2025
Dynamo replaces exception by hard error in `run_node`
#148475 opened Mar 4, 2025
DISABLED test_capture_untracked_nonlocal_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148464 opened Mar 4, 2025
DISABLED test_set_stance_eager_then_compile_with_graph_break (__main__.DecoratorTests)
#148463 opened Mar 4, 2025
[dynamo] Memory leak
#148460 opened Mar 4, 2025
backport torch.library.custom_op (and improvements) to older versions of PyTorch
#148457 opened Mar 4, 2025
BC-linter should ignore testing/linter/adapters/
#148451 opened Mar 4, 2025
Union type raise error when running python with argument "-O" for torch script.
#148447 opened Mar 4, 2025
DISABLED test_user_defined_binop (__main__.MiscTests)
#148443 opened Mar 4, 2025
DISABLED test_capture_untracked_global_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148442 opened Mar 4, 2025
torch.distributed hangs between 2 Mac Devices
#148440 opened Mar 4, 2025
[cudagraph_trees]RuntimeError: Error: accessing tensor output of CUDAGraphs that has been overwritten by a subsequent run
#148439 opened Mar 4, 2025
ERROR: I got an error about FSDP, when I trained flux model of sparsity with NVIDIA TensorRT Model Optimizer
#148434 opened Mar 4, 2025
DISABLED test_sys_modules (__main__.MiscTests)
#148428 opened Mar 4, 2025
DISABLED test_capture_tracked_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148427 opened Mar 4, 2025
DISABLED test_empty_graph_nested_calls_fullgraph_False_dynamic_shapes (__main__.DynamicShapesReproTests)
#148426 opened Mar 4, 2025
[ROCM] `linalg.eigh` crash with `float64` dtype and shape `[8192,8192]`
#148425 opened Mar 4, 2025
Float8_e4m3fn
#148420 opened Mar 4, 2025
Torch 2.6 doesn't have TCPStore::TCPStore symbol in cu126 binary, but it's available in headers
#148411 opened Mar 4, 2025
The apis in torch._C._nn.pyi is nonexhaustive
#148404 opened Mar 4, 2025
Generate two reduction loops for vectorization
#148402 opened Mar 4, 2025
[inductor][fuzzer] `IndexError` error at `torch.dstack`
#148397 opened Mar 4, 2025
DISABLED test_shape_int_inplace_binops (__main__.MiscTests)
#148392 opened Mar 4, 2025
DISABLED test_untracked_inputs_in_constraints_dynamic_shapes (__main__.DynamicShapesExportTests)
#148390 opened Mar 4, 2025
DISABLED test_sdpa_rewriter_14_cuda (__main__.SDPAPatternRewriterCudaTests)
#148391 opened Mar 4, 2025
[export] Unable to trace ops like min/pow
#148389 opened Mar 4, 2025
[Inductor-CPU] Debug util request: fine-grained mechanism to disable out-of-template epilogues
#148382 opened Mar 3, 2025
Improve nested jagged tensor select performance on batch dim
#148379 opened Mar 3, 2025
DISABLED test_param_shape_binops (__main__.MiscTests)
#148369 opened Mar 3, 2025
DISABLED test_export_with_cond_dynamic_shape_pred_dynamic_shapes (__main__.DynamicShapesExportTests)
#148368 opened Mar 3, 2025
`torch.nn.functional` inconsistent documentation
#148353 opened Mar 3, 2025
torch._check(x > 0) should do something sane when x is a Tensor
#148349 opened Mar 3, 2025
Installation of `pytorch==2.6.0+cu124` doesn't install `triton` and `nvidia` libraries
#148345 opened Mar 3, 2025
[RFC][PGNCCL] Add Float8 support
#148344 opened Mar 3, 2025
[CI] [anaconda] CI Perf Tests
#148342 opened Mar 3, 2025
[CI] [anaconda] Review Devcontainer anaconda usage
#148341 opened Mar 3, 2025
[CI] [anaconda] CI Build and Test scripts MacOS
#148340 opened Mar 3, 2025
[Docs] [anaconda] Review and update
#148339 opened Mar 3, 2025
[CI] [anaconda] CI Build and Test scripts Windows
#148338 opened Mar 3, 2025
[CI] [anaconda] CI Build and Test scripts Linux
#148336 opened Mar 3, 2025
[CI] [anaconda] Docker files have conda environment installed
#148335 opened Mar 3, 2025
[FSDP2] Issues with model not running on all ranks - Grads not matching fairscale implementation
#148334 opened Mar 3, 2025
[export][torchbench] moco fails
#148333 opened Mar 3, 2025
DISABLED test_mark_unbacked_strict (__main__.MiscTests)
#148332 opened Mar 3, 2025
DISABLED test_export_defaults_ok_dynamic_shapes (__main__.DynamicShapesExportTests)
#148331 opened Mar 3, 2025
DISABLED test_sys_modules_dynamic_shapes (__main__.DynamicShapesMiscTests)
#148330 opened Mar 3, 2025
[AOTI][torchbench] microbench_unbacked_tolist_sum fails
#148329 opened Mar 3, 2025
fx graph fails to recognize tensor.T as a 'call_method' node
#148326 opened Mar 3, 2025
`torch.linalg` routines break for inputs of more than 2**32 elements
#148324 opened Mar 3, 2025
compile SageAttention faing error C2872: “std” for latest torch nightly
#148317 opened Mar 3, 2025
[Doc] [Win] libuv installation doc is not correct.
#148315 opened Mar 3, 2025
The recorded step number in profiler is wrong
#148314 opened Mar 3, 2025
DISABLED test_int_shape_inplace_binops (__main__.MiscTests)
#148312 opened Mar 3, 2025
DISABLED test_empty_graph_nested_calls_fullgraph_True_dynamic_shapes (__main__.DynamicShapesReproTests)
#148311 opened Mar 3, 2025
batching rule for `aten::scatter_add_`
#148307 opened Mar 3, 2025
torch.vmap incompatibility with DLPack functions
#148306 opened Mar 3, 2025
broadcast_object_list not release GPU
#148302 opened Mar 3, 2025
DISABLED test_int_shape_binops (__main__.MiscTests)
#148296 opened Mar 3, 2025
DISABLED test_dont_aggressively_write_assert_dynamic_shapes (__main__.DynamicShapesReproTests)
#148295 opened Mar 3, 2025
Triton Kernel Rejects NamedTupleVariable Arguments
#148289 opened Mar 2, 2025
RuntimeError: use_libuv was requested but PyTorch was build without libuv support
#148283 opened Mar 2, 2025
SIGSEGV due to insufficient return value checking for PyFrame_GetLocals
#148273 opened Mar 1, 2025
Should DTensor support `Shard()` placement without dim requirement?
#148269 opened Mar 1, 2025
ONNX Export Produces main_graph Instead of torch_jit and Fails on aten::format in PyTorch 2.x
#148268 opened Mar 1, 2025
Raise a warning when `torch.nn.utils.clip_grad_norm_` receives an exhausted generator
#148259 opened Mar 1, 2025
[FSDP2] HSDP with globally sharded fp32 weights and optimizer states
#148257 opened Mar 1, 2025
Simplify package_data handling in setup.py
#148256 opened Mar 1, 2025
Improve Notation for Score Function in Documentation
#148253 opened Mar 1, 2025
Can't pass `strict=False` when loading a distributed checkpoint. Succeeds without warnings for "unexpected" keys, fails for "missing" keys.
#148252 opened Mar 1, 2025
Errors: train a model of sparsity with tensorrt-model-optimization and FSDP.
#148251 opened Mar 1, 2025
BrokenPipeError: [Errno 32] Broken pipe when lacking Numpy package
#148250 opened Mar 1, 2025
[inductor] `nn.Upsample-torch.linalg.lu_factor` outputs inconsistent results with eager
#148244 opened Mar 1, 2025
[FSDP2] Unclear behavior of `ignored_params` in `fully_shard`
#148242 opened Mar 1, 2025
Add Structured Knowledge Accumulation (SKA) Layer to PyTorch
#148232 opened Mar 1, 2025
make saved_tensor_hooks work better in compile for doing activation compression
#148222 opened Feb 28, 2025
MPS vs Metal vs CPU performance comparison
#148219 opened Feb 28, 2025
DISABLED test_dynamic_sources_dynamic_override (__main__.MiscTests)
#148218 opened Feb 28, 2025
DISABLED test_guard_failure_fn2 (__main__.MiscTests)
#148217 opened Feb 28, 2025
DISABLED test_guard_failure_fn_shape_control_dynamic_shapes (__main__.DynamicShapesMiscTests)
#148216 opened Feb 28, 2025
DISABLED test_mark_unbacked_strict_dynamic_shapes (__main__.DynamicShapesMiscTests)
#148215 opened Feb 28, 2025
DISABLED test_dynamic_sources_dynamic_override_dynamic_shapes (__main__.DynamicShapesMiscTests)
#148214 opened Feb 28, 2025
Regression: Missing Symbols in PyTorch DLL (torch_python)
#148208 opened Feb 28, 2025
Add option to shut down idle async_compile workers after timeout
#148207 opened Feb 28, 2025
Compile breaks flex-attention with jagged tensors
#148201 opened Feb 28, 2025
[Inductor-CPU] qlinear_binary output may have undefined strides with dynamic shape support
#148199 opened Feb 28, 2025
[inductor][triton] Decide how to deprecate "old triton versions"
#148196 opened Feb 28, 2025
Implement batching rule for masked_fill_
#148183 opened Feb 28, 2025
Dynamo failure on handling list comparisons
#148179 opened Feb 28, 2025
[Inductor] Layout created with non-sympy.Expr sizes
#148172 opened Feb 28, 2025
Inference llama after Export PTQ
#148171 opened Feb 28, 2025
PyTorch's nightly version no longer includes the CU118, CU124, and CU121 versions
#148169 opened Feb 28, 2025
Build pytorch for rocm failed
#148167 opened Feb 28, 2025
[Request Help] “torch._dynamo.exc.UserError: Dynamic control flow is not supported at the moment.” “torch._dynamo.exc.UncapturedHigherOrderOpError: Cond doesn't work unless it is captured completely with torch.compile.”
#148193 opened Feb 28, 2025
DISABLED test_nonstrict_trace_pre_existing_custom_class (__main__.DecoratorTests)
#148166 opened Feb 28, 2025
Specifying device_id in init_process_group causes tensor parallel + pipeline parallel to fail
#148162 opened Feb 28, 2025
[MPS][Complex] Conjugations are broken
#148156 opened Feb 28, 2025
[inductor] [cuda] [MultiheadAttention] `nn.MultiheadAttention-torch.reciprocal` outputs a big difference with eager
#148153 opened Feb 28, 2025
[inductor] `F.fractional_max_pool2d` throws `LoweringException: ZeroDivisionError: division by zero` while eager passes the check
#148152 opened Feb 28, 2025
[pytorch elastic] [RHEL] multiple processes call to dist.destroy_process_group() cause an RuntimeError: CUDA error: CUDA-capable device(s) is/are busy or unavailable
#148150 opened Feb 28, 2025
Unify OpOverload._get_dispatch and HigherOrderOperator.dispatch
#148146 opened Feb 28, 2025
Torch export does not preserve original edges between nodes
#148144 opened Feb 28, 2025
Applying online softmax patterns on joint_graph cause 1.2x peak memory regression for TB hf_T5_base model
#148141 opened Feb 28, 2025
Conv/pool doc on ceilmode wrong
#148123 opened Feb 27, 2025
[cutlass backend] C++ compile error for CUTLASS config only get resolved in autotuning stage
#148122 opened Feb 27, 2025
Add int32 support to torch.gather
#148119 opened Feb 27, 2025
[inductor][triton] Explicit kernel-arg mismatch checks
#148116 opened Feb 27, 2025
We should never throw vanilla C++ exceptions
#148114 opened Feb 27, 2025
[inductor][triton] introduce better "APIs" in triton that can clean up our triton/inductor integration
#148113 opened Feb 27, 2025
MLA with Learnable RoPE Tensors is Broken with Flex Attention
#148112 opened Feb 27, 2025
[CI] Remove conda usage from lint related jobs
#148110 opened Feb 27, 2025
NotImplementedError: FlexAttentionHigherOrderVariable() has no type
#148106 opened Feb 27, 2025
Initial investigation for removing MOD_SKIPLIST
#148103 opened Feb 27, 2025
onnx dynamo export does not support aten bucketize
#148098 opened Feb 27, 2025
DISABLED test_njt_causal_bfloat16 (__main__.TestFlexAttention)
#148095 opened Feb 27, 2025
DISABLED test_split_dynamic (__main__.AutoFunctionalizeTests)
#148094 opened Feb 27, 2025
DISABLED test_dynamo_timed (__main__.TestDynamoTimed)
#148093 opened Feb 27, 2025
FSDP2 without sharding works slower than DDP
#148086 opened Feb 27, 2025

490 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Add Windows Arm64 Nightly Builds
#139760 commented on Mar 6, 2025 • 29 new comments
[HOP] Mutation and alias rework
#146658 commented on Mar 6, 2025 • 20 new comments
Add `__context/cause/suppress_context/traceback__` to Exception
#146499 commented on Mar 5, 2025 • 17 new comments
[export][dynamic shapes] add Dim._OBLIVIOUS, _mark_oblivious()
#147881 commented on Mar 3, 2025 • 15 new comments
[inductor] Fix issue with set_linter, improve linter framework
#144620 commented on Mar 4, 2025 • 13 new comments
[pytree] Add public pytree module `torch.utils.pytree`
#137400 commented on Mar 5, 2025 • 12 new comments
fix indirect broadcast
#145992 commented on Mar 6, 2025 • 10 new comments
Fix `torch.nn.functional.hardswish` gradients corner case
#148049 commented on Mar 6, 2025 • 10 new comments
Support subclass constructor capturing in export
#147014 commented on Mar 6, 2025 • 10 new comments
[pytree] add APIs to determine a class is a namedtuple or PyStructSequence
#113257 commented on Mar 5, 2025 • 9 new comments
Enable Accelerator to perform streaming backward
#142097 commented on Mar 6, 2025 • 9 new comments
Enable fast qlinear_dynamic path for AArch64 through ACL directly
#145942 commented on Mar 5, 2025 • 9 new comments
Custom ops support arbitrary input types by migrating to python dispatcher
#147927 commented on Mar 6, 2025 • 8 new comments
[Inductor][CPP] Add float16 support for CppMicroGemmAMX
#147368 commented on Mar 6, 2025 • 7 new comments
[Intel GPU] int4 WOQ gemm XPU Support
#137566 commented on Mar 4, 2025 • 7 new comments
torch.utils.checkpoint preserves torch function mode stack during recompute
#148023 commented on Mar 4, 2025 • 4 new comments
Replace `unimplemented` with `unimplemented_v2' in `codegen.py`
#148069 commented on Mar 5, 2025 • 4 new comments
Parallelize bf16->f32 conversion for gemm(bf16:bf16->bf16)
#147864 commented on Mar 3, 2025 • 4 new comments
Change arg_kwarg_vals propagation strategy
#148046 commented on Mar 6, 2025 • 4 new comments
[export] Add support for invoke_subgraph
#147863 commented on Mar 4, 2025 • 4 new comments
[export] Add export_cache
#147992 commented on Mar 4, 2025 • 3 new comments
[ci][anaconda] Remove conda from linter docker images
#147789 commented on Mar 5, 2025 • 3 new comments
Increase reference count of state tensor in `THPGenerator_reduce` to avoid premature garbage collection in `multiprocessing` spawn method `"forkserver"` and `"spawn"`
#147907 commented on Mar 5, 2025 • 3 new comments
cpu: aarch64: enable gemm-bf16f32
#140159 commented on Mar 6, 2025 • 2 new comments
Enable onednn in pytorch for ppc64le architecture
#143743 commented on Mar 6, 2025 • 2 new comments
Remove unused rand call if not fallback to eager for rand
#147790 commented on Mar 6, 2025 • 2 new comments
Correctly propagate exception to parent tx
#146502 commented on Mar 5, 2025 • 2 new comments
add grad_output shape check for adaptive_avg_pool2d_backward
#145241 commented on Mar 5, 2025 • 2 new comments
[import][inductor] Simplify grid handling
#147583 commented on Mar 4, 2025 • 2 new comments
add `torch.float4_e2m1fn_x2` to PyTorch
#146578 commented on Feb 28, 2025 • 2 new comments
[CD] Enable triton xpu windows build
#147637 commented on Mar 6, 2025 • 2 new comments
Add torch._library.utils.normalize_args_kwargs
#148062 commented on Feb 27, 2025 • 2 new comments
Deprecate DataLoader pin_memory_device param
#146821 commented on Mar 4, 2025 • 2 new comments
[export] check non-negative modulus, avoid unnecessary congruences, in export solver
#144925 commented on Mar 4, 2025 • 2 new comments
cpp_wrapper: reduce memory usage by removing unneeded temporaries
#147403 commented on Mar 6, 2025 • 2 new comments
experimental proposal DCP v2
#146999 commented on Mar 3, 2025 • 2 new comments
Enable a fast path for (static) qlinear for AArch64 through ACL directly.
#147337 commented on Mar 5, 2025 • 2 new comments
[torch/elastic][upstream] Fix the wrong order when start_index is not 0
#147967 commented on Mar 5, 2025 • 2 new comments
[ROCm] opportunistic fastatomics for ReduceAdd operations for MI300 GPUs
#146264 commented on Feb 28, 2025 • 2 new comments
Add ppc64le wheel build support
#147194 commented on Mar 4, 2025 • 1 new comment
[Dtensor] Pass device information in OffsetBasedRNGTracker
#147594 commented on Mar 5, 2025 • 1 new comment
Fix the Problems About Defining Static Variable in Inline Function
#147095 commented on Mar 4, 2025 • 1 new comment
[ONNX] Add draft_export as a strategy
#147529 commented on Mar 5, 2025 • 1 new comment
Sticky cache API for torch.compile
#147528 commented on Feb 28, 2025 • 1 new comment
Small scheduler refactors
#147410 commented on Feb 28, 2025 • 1 new comment
Make codegen dynamic shapes more device agnostic
#146830 commented on Mar 5, 2025 • 1 new comment
Enable pt2e quantization path for arm
#146690 commented on Mar 5, 2025 • 1 new comment
[CD] Annotate linux/arm64 cuda wheels with consistent nvidia dependencies
#145021 commented on Mar 6, 2025 • 1 new comment
Use the device interface for detecting Triton availability
#139171 commented on Feb 28, 2025 • 1 new comment
[WIP][ptd][nccl] use current-stream as nccl-stream under async=False mode
#147820 commented on Mar 3, 2025 • 1 new comment
test 0-dim squeeze in basic.TestSqueeze
#147928 commented on Mar 3, 2025 • 1 new comment
Introduce `UserDefinedExceptionClassVariable`
#146504 commented on Mar 5, 2025 • 1 new comment
Facilitate at::_weight_int4pack_mm_with_scale_and_zeros related registration
#147962 commented on Mar 4, 2025 • 1 new comment
fix simple-spec crash
#147723 commented on Feb 28, 2025 • 1 new comment
Add needs_exact_strides operator tag for Inductor to force exact strides
#148063 commented on Mar 6, 2025 • 1 new comment
[Intel CPU] Fix issue #143483.
#144854 commented on Mar 3, 2025 • 0 new comments
[Intel CPU] Fix issue #143482.
#144760 commented on Mar 3, 2025 • 0 new comments
optimize the decomposition of aten.native_group_norm
#144733 commented on Mar 2, 2025 • 0 new comments
Generalize poison fork logic for each device backend
#144664 commented on Mar 5, 2025 • 0 new comments
[will-not-merge] tuning
#145798 commented on Mar 6, 2025 • 0 new comments
[AsyncMM] re-enable and adapt to cutlass 3.6.0 (#144011)
#145811 commented on Mar 6, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `torch/[a-c]*/` to `ruff format`
#144554 commented on Mar 2, 2025 • 0 new comments
[inductor] Enable docstring_linter on _inductor
#144622 commented on Mar 4, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `test/[a-h]*/` to `ruff format`
#144555 commented on Mar 3, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `test/[i-z]*/` to `ruff format`
#144556 commented on Mar 5, 2025 • 0 new comments
[inductor] Add tests for new docstring_linter features (fix #142496)
#144621 commented on Mar 4, 2025 • 0 new comments
[Test][Linalg][CUDA] Increase niter in test_svd_lowrank_cuda_float64
#145930 commented on Mar 3, 2025 • 0 new comments
[BE][PYFMT] remove `black`: finish `black -> ruff format` migration
#144557 commented on Feb 28, 2025 • 0 new comments
Remove unneeded CUDA logic from _create_build_env
#145822 commented on Mar 4, 2025 • 0 new comments
[inductor] Add features to docstring_linter (see #142496)
#145834 commented on Mar 4, 2025 • 0 new comments
Guard the CPU cpp wrapper tests on having a cpp wrapper
#145847 commented on Mar 4, 2025 • 0 new comments
[micro_pipeline_tp] add logging for all-gather-matmul fusion
#145594 commented on Mar 6, 2025 • 0 new comments
Open up PT UTs to cover additional devices
#145589 commented on Mar 6, 2025 • 0 new comments
[micro_pipeline_tp] support pattern matching row-wise scaled_mm with sharded scale
#145595 commented on Mar 6, 2025 • 0 new comments
Add device support for chunk_cat, all_gather_copy_in, and split_with_…
#145600 commented on Mar 3, 2025 • 0 new comments
General Changes for multi accelerators
#145521 commented on Mar 6, 2025 • 0 new comments
Simplify functional composition in _aot_autograd/dispatch_and_compile_graph.py
#145636 commented on Mar 6, 2025 • 0 new comments
Remove unnecessary "special linking" for `BLAS_LIBRARIES`
#145487 commented on Mar 5, 2025 • 0 new comments
removed check for ConvTranspose3D on MPS
#145366 commented on Feb 28, 2025 • 0 new comments
[c10d] implement ReduceOp.unbox()
#145652 commented on Mar 6, 2025 • 0 new comments
Fix incorrect citation of authors in documentation
#145209 commented on Mar 6, 2025 • 0 new comments
[Inductor] optimize welford reduction
#145061 commented on Mar 5, 2025 • 0 new comments
[Easy] update pip sources for ROCm in nightly pull tool
#145685 commented on Feb 28, 2025 • 0 new comments
Fix support for nccl < 2.17
#145719 commented on Mar 1, 2025 • 0 new comments
[Async-TP] Port _fused_all_gather_matmul_native to cpp to reduce launching overhead
#145794 commented on Mar 6, 2025 • 0 new comments
Made partitioning more(?) deterministic
#145024 commented on Feb 28, 2025 • 0 new comments
[AsyncMM] preliminary tuning
#145795 commented on Mar 6, 2025 • 0 new comments
[test] fix unit test
#144977 commented on Mar 3, 2025 • 0 new comments
[Async-TP] _pipelined_multi_all_gather_and_consume reduce overhead
#145796 commented on Mar 6, 2025 • 0 new comments
[Async-TP] improve algo selection
#145797 commented on Mar 6, 2025 • 0 new comments
Add the max_autotune tests in the periodic jobs.
#143560 commented on Mar 6, 2025 • 0 new comments
Fix space typo in warning message
#143473 commented on Mar 4, 2025 • 0 new comments
[while_loop][jit inductor] auto-unspecialize int input and output to unbacked symints
#143457 commented on Mar 1, 2025 • 0 new comments
Remove all dead type ignores (round 2)
#143256 commented on Mar 3, 2025 • 0 new comments
Enable AArch64 CI scripts to be used for local dev
#143190 commented on Feb 28, 2025 • 0 new comments
Support unique id for Tensor Storage Object
#143093 commented on Mar 4, 2025 • 0 new comments
Update low prec codegen for div/mod
#142350 commented on Mar 3, 2025 • 0 new comments
Fix more undefined errors in TypeCast.h
#142346 commented on Mar 6, 2025 • 0 new comments
[DO NOT MERGE][WIP] CI: Dispatch PR events to the out-of-tree test infra
#142114 commented on Mar 4, 2025 • 0 new comments
Fix platform detection in MKLDNN CMake file
#142067 commented on Mar 3, 2025 • 0 new comments
Add AOT inductor support for _scaled_mm for CPU
#141961 commented on Mar 6, 2025 • 0 new comments
Banned ever saving unclaimed nodes
#141940 commented on Feb 28, 2025 • 0 new comments
Enable CUDA 12.6 OSS CI
#140793 commented on Mar 5, 2025 • 0 new comments
ILP for auto FSDP wrapping
#140298 commented on Feb 28, 2025 • 0 new comments
[Environment Variable][6/N] Use thread-safe getenv functions
#140200 commented on Mar 5, 2025 • 0 new comments
Add torch._scaled_mm for CPU
#139975 commented on Mar 6, 2025 • 0 new comments
[cuDNN] Add an option to force cuDNN usage (incl. SDPA)
#139699 commented on Mar 6, 2025 • 0 new comments
Refactor CMake to install header by build option
#139469 commented on Mar 3, 2025 • 0 new comments
Fix `USE_STATIC_MKL` lost functionality
#138996 commented on Feb 27, 2025 • 0 new comments
[Docker] Create an independent dependecies layer
#138612 commented on Mar 3, 2025 • 0 new comments
[Inductor] introduce comm buffer planning
#138519 commented on Mar 6, 2025 • 0 new comments
Extending SVE VEC Backend Support in PyTorch to SVE128 and SVE512.
#138388 commented on Feb 28, 2025 • 0 new comments
[WIP] Add CachingDeviceAllocatorInterface as the base device allocator
#138222 commented on Mar 4, 2025 • 0 new comments
[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification
#138214 commented on Mar 3, 2025 • 0 new comments
[POC][FX][pytree] cleanup fx pytree implementation
#138202 commented on Mar 3, 2025 • 0 new comments
Add TORCH_CHECK_INDEX in convert_indices_from_coo_to_csr_cpu
#138068 commented on Feb 28, 2025 • 0 new comments
DISABLED test_repeated_calling_cuda (__main__.AOTInductorTestABICompatibleGpu)
#146185 commented on Mar 3, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `torch/[e-n]*/` to `ruff format`
#144553 commented on Mar 2, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `torch/[p-z]*/` to `ruff format`
#144552 commented on Mar 2, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `torch/_[a-h]*/` to `ruff format`
#144551 commented on Mar 2, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `{torch,test}/{nn,optim}/**` to `ruff format`
#144548 commented on Mar 3, 2025 • 0 new comments
[20/N] Fix extra warnings brought by clang-tidy-17
#144473 commented on Mar 4, 2025 • 0 new comments
Support Swiglu for Module and functional
#144465 commented on Mar 5, 2025 • 0 new comments
improve WOQ first token performance on CPU
#144463 commented on Mar 5, 2025 • 0 new comments
[BE][pytree][Easy] change imports `torch.utils._pytree` -> `torch.utils.pytree.python`
#144405 commented on Mar 3, 2025 • 0 new comments
codecache.py: Utilize precompiled headers for CPP python bindings
#144349 commented on Mar 1, 2025 • 0 new comments
[pytree][1/N] change pytree usages to implementation agnostic: `torch.distributed`
#144332 commented on Mar 3, 2025 • 0 new comments
codecache: Remove cpp_prefix.h duplication per build, then precompile it
#144293 commented on Mar 1, 2025 • 0 new comments
Add CUDA aarch64 triton wheel build
#144049 commented on Mar 4, 2025 • 0 new comments
Fix to torch.hub documentation grammar mistakes.
#144016 commented on Mar 3, 2025 • 0 new comments
Brister/always tiled reduction
#144008 commented on Mar 2, 2025 • 0 new comments
Fix typo: change 'recieve' into 'receive'
#143981 commented on Feb 28, 2025 • 0 new comments
Fixed bug in FindMKL.cmake
#143980 commented on Mar 1, 2025 • 0 new comments
[ci] Add riscv opt-int build
#143979 commented on Mar 6, 2025 • 0 new comments
[Submodule] Bump flatbuffers to v24.12.23
#143964 commented on Mar 5, 2025 • 0 new comments
using more descriptive alt text for accessibility
#143958 commented on Feb 28, 2025 • 0 new comments
Remove aten/src/ATen/core/Array.h
#143950 commented on Mar 1, 2025 • 0 new comments
Add `_benchmark_func` convenience method
#143911 commented on Feb 28, 2025 • 0 new comments
Remove remove_non_owning_ref_types
#143805 commented on Mar 3, 2025 • 0 new comments
[CI] enable operator benchmark on CPU
#143733 commented on Feb 27, 2025 • 0 new comments
Apply clang-format for ATen/core/op_registration headers
#143730 commented on Mar 3, 2025 • 0 new comments
ci: Add scaffolding for buidling wheels sequentially
#143672 commented on Mar 4, 2025 • 0 new comments
Extend vec backend with BF16 SVE intrinsics
#143666 commented on Mar 5, 2025 • 0 new comments
Attempt to speed up MPS getTensorStringKey
#143630 commented on Mar 4, 2025 • 0 new comments
Fix test_tensorboard when started w/o tensorboard package
#148079 commented on Feb 27, 2025 • 0 new comments
Onednn pri cache
#147693 commented on Mar 3, 2025 • 0 new comments
[DCP] fix dcp gather_object/scatter_object_list
#147675 commented on Mar 4, 2025 • 0 new comments
Attempt a mixed precision fused adam
#147653 commented on Feb 27, 2025 • 0 new comments
[CUDAGraph] Graph Partition
#147648 commented on Mar 6, 2025 • 0 new comments
Implement metal kernel for MPS binary ops using TensorIterator
#147644 commented on Feb 28, 2025 • 0 new comments
[ROCm] Improve backwards indexing when stride is not one
#147630 commented on Feb 27, 2025 • 0 new comments
Fixed abnormal behavior of LazyLinear when using LayzLinear and load_state together
#147599 commented on Feb 27, 2025 • 0 new comments
Define USE_C10D_XCCL and USE_XCCL in pytorch
#147593 commented on Mar 5, 2025 • 0 new comments
Fix log2, PowByNatural printing
#147592 commented on Mar 4, 2025 • 0 new comments
[ONNX][demo] Rotary embedding
#147576 commented on Mar 5, 2025 • 0 new comments
[partitioner] always ban compiler-driven recompute of collectives by default
#147561 commented on Feb 27, 2025 • 0 new comments
ROCm MX-FP8 Gemm
#147553 commented on Mar 6, 2025 • 0 new comments
[ROCm] Input vectorization in elementwise kernels for tensors with heterogeneous types
#147527 commented on Feb 27, 2025 • 0 new comments
Update pybind11 submodule to 3.0.0-dev test
#147524 commented on Mar 1, 2025 • 0 new comments
Enable UBSAN test
#147511 commented on Mar 4, 2025 • 0 new comments
Document poison fork note for accelerator APIs
#147507 commented on Mar 4, 2025 • 0 new comments
modified device check logic and added tests
#147501 commented on Mar 2, 2025 • 0 new comments
Optimize `dynamo` typing
#147499 commented on Mar 4, 2025 • 0 new comments
Upgrade submodule oneDNN to v3.7
#147498 commented on Feb 28, 2025 • 0 new comments
[CUDA] Replace deprecated usages of cub iterators and thread operators
#147493 commented on Feb 28, 2025 • 0 new comments
[ONNX] Migrate onnx ops decomp functions
#147469 commented on Mar 4, 2025 • 0 new comments
[Inductor] Fix `torch.polygamma()` when n == 1
#147453 commented on Feb 28, 2025 • 0 new comments
Reland "Introduce new template heuristic for triton autotune configs"
#147452 commented on Mar 5, 2025 • 0 new comments
[5/N] Remove unnecessary once flag usage
#147445 commented on Feb 28, 2025 • 0 new comments
[executorch hash update] update the pinned executorch hash
#147422 commented on Mar 6, 2025 • 0 new comments
Fix atomic operation compatibility for ARMv8-A (Raspberry Pi 4) by adjusting compilation flags
#148070 commented on Feb 27, 2025 • 0 new comments
use identity op for alpha=inf in torch.celu and quantized_celu
#148066 commented on Feb 28, 2025 • 0 new comments
[ROCm] Skip Navi4 Row-Wise F8 Tests
#148037 commented on Feb 28, 2025 • 0 new comments
[CUDA][complex] skip `test_reference_numerics_large_jiterator_unary_cuda_complex64` on CUDA
#148024 commented on Mar 5, 2025 • 0 new comments
[while_loop] require stride to be the same as input for body_fn
#148002 commented on Mar 6, 2025 • 0 new comments
[ROCm] Use generated CK config.h rather than system
#147993 commented on Mar 5, 2025 • 0 new comments
Support `contextlib.suppress`
#147990 commented on Mar 5, 2025 • 0 new comments
Bump Protobuf to 5.29
#147963 commented on Feb 28, 2025 • 0 new comments
[Don't merge]Upgrade submodule oneDNN to v3.7 (#147498)(Zi)
#147917 commented on Mar 3, 2025 • 0 new comments
[Draft] Enable cpu_offload for _distribute_state_dict
#147916 commented on Mar 3, 2025 • 0 new comments
Change persistent reduction threshold to 32
#147899 commented on Feb 27, 2025 • 0 new comments
[PT2] Allow tensor type in allowed_getattr_types_for_subgm when verifiying ep
#147898 commented on Mar 4, 2025 • 0 new comments
[targets2buck] Remove tombstone messages proactively
#147897 commented on Feb 28, 2025 • 0 new comments
[Not4Land] test `optree` with HEAD version
#147870 commented on Mar 6, 2025 • 0 new comments
Set disable_clone=True when running opt_gm
#147845 commented on Mar 3, 2025 • 0 new comments
Use /permissive- for MSVC build of torch libraries
#147825 commented on Mar 3, 2025 • 0 new comments
Use torch_compile_options for c10 libraries
#147821 commented on Mar 5, 2025 • 0 new comments
test
#147800 commented on Feb 27, 2025 • 0 new comments
[ROCm] CK Memory-Efficient Attention (attention bias support)
#147778 commented on Mar 1, 2025 • 0 new comments
[cuda] Added a correctness test for layernorm backwards
#147763 commented on Mar 6, 2025 • 0 new comments
[DCP] Work in progress: Demonstrate rank local checkpointing in DCP
#147758 commented on Mar 3, 2025 • 0 new comments
Update CPU tolerance for f16 triplet margin loss
#147742 commented on Mar 4, 2025 • 0 new comments
[WIP][XPU][Inductor] Update Intel triton for release 2.7.
#147727 commented on Mar 5, 2025 • 0 new comments
Avoid linking multiple OMP runtimes in libtorch_cpu.so if BLAS used is OpenBLAS.
#147725 commented on Mar 5, 2025 • 0 new comments
Skip test_dtypes xpu test on bmm and addbmm
#147721 commented on Mar 4, 2025 • 0 new comments
[ROCm][Windows] Enable torchvision build with ROCm on Windows
#147382 commented on Mar 3, 2025 • 0 new comments
Enable qint8 and quint8 add for AArch64 using ACL directly
#146620 commented on Mar 5, 2025 • 0 new comments
[WIP] BaseSubclass
#146612 commented on Mar 6, 2025 • 0 new comments
[NOT FOR LANDING] experimental NVSHMEM integration
#146593 commented on Mar 6, 2025 • 0 new comments
clang-format CUDASymmetricMemory.cu
#146592 commented on Mar 6, 2025 • 0 new comments
[Partitioner] Reduce time consuming of partitions merger
#146582 commented on Mar 1, 2025 • 0 new comments
[Partitioner] Remove unnecessary upstream nodes in dependency viewer
#146580 commented on Mar 1, 2025 • 0 new comments
add python root bin to windows load path.
#146573 commented on Mar 2, 2025 • 0 new comments
[pt2d] Add reorder_comms_preserving_peak_memory pass
#146562 commented on Mar 6, 2025 • 0 new comments
Improve comms debug visualization
#146561 commented on Mar 6, 2025 • 0 new comments
[not for land] temp changes to enable 'simple_fsdp'
#146558 commented on Mar 3, 2025 • 0 new comments
Support contextlib.ExitStack
#146506 commented on Mar 5, 2025 • 0 new comments
Allow setting attribute to NestedUserFunctionVariable
#146505 commented on Mar 4, 2025 • 0 new comments
Update CPython tests for ctx manager to use unittest
#146501 commented on Mar 5, 2025 • 0 new comments
Allow trace through unittest
#146500 commented on Mar 5, 2025 • 0 new comments
Update code_template.py re.compile() is directly applied to the regex…
#146489 commented on Feb 28, 2025 • 0 new comments
Update quantile doc
#146485 commented on Mar 6, 2025 • 0 new comments
[1/N] Use std::string_view in torchgen
#146403 commented on Mar 3, 2025 • 0 new comments
[WIP][dynamic shapes] mark backed size symbols as size-like
#146335 commented on Mar 6, 2025 • 0 new comments
Use device agnostic APIs for device_count and backend in common_fsdp
#146289 commented on Mar 6, 2025 • 0 new comments
[dcp] Minor improvements to filesystem writer
#146273 commented on Mar 3, 2025 • 0 new comments
Format tests by PYFMT
#146267 commented on Mar 3, 2025 • 0 new comments
[2/N] Fix cppcoreguidelines-init-variables suppression
#146237 commented on Mar 6, 2025 • 0 new comments
Subprocess compile
#146134 commented on Mar 6, 2025 • 0 new comments
Move get accelerator to use build time flags when possible
#146098 commented on Mar 4, 2025 • 0 new comments
[ARM] Fix TestDataLoader.test_segfault unexpected success on Aarch6[4
#146090 commented on Feb 28, 2025 • 0 new comments
Force build to conform C++ standard on windows by adding `/permissive-` flag
#147367 commented on Mar 4, 2025 • 0 new comments
Add meta function for out variants of ones,zeros,empty
#147350 commented on Mar 6, 2025 • 0 new comments
Refine XPU oneDNN context manager API
#147349 commented on Mar 6, 2025 • 0 new comments
[ROCm][Windows] Disable Composable Kernels and Triton for Windows builds
#147334 commented on Mar 4, 2025 • 0 new comments
[TESTING] [NO MERGE] Testing new triton commit for release/2.7
#147320 commented on Mar 6, 2025 • 0 new comments
Add the memory and dispatch to the logging module.
#147262 commented on Mar 4, 2025 • 0 new comments
add PrivateUse1 backend in fsdp collecitves
#147260 commented on Feb 28, 2025 • 0 new comments
logging: close handler after removing it
#147235 commented on Mar 4, 2025 • 0 new comments
cpp_wrapper: Fix even more tests
#147225 commented on Mar 6, 2025 • 0 new comments
Fix rms_norm in fp16/bf16
#147203 commented on Mar 3, 2025 • 0 new comments
dynamo: Count number of opcodes processes
#147149 commented on Mar 6, 2025 • 0 new comments
[fsdp] add an experimental allocator hook for buffers that participate in collective communication
#147146 commented on Mar 6, 2025 • 0 new comments
ROCm F8 Datatype Selector
#147142 commented on Feb 28, 2025 • 0 new comments
fake_tensor: Handle op errors more gracefully
#147049 commented on Mar 5, 2025 • 0 new comments
[ROCm] [TunableOp] Enable logging of BLAS parameters
#147034 commented on Mar 6, 2025 • 0 new comments
[BE]: Try to remove unused type ignores - attempt 1
#146989 commented on Mar 3, 2025 • 0 new comments
[Inductor] Unify the data type propagation between Triton and CPP Backend
#146970 commented on Mar 6, 2025 • 0 new comments
cpp_wrapper: Precompile device-specific header files
#146928 commented on Mar 3, 2025 • 0 new comments
Adjust TestInductorOpInfo to depend on backend, not device
#146911 commented on Mar 5, 2025 • 0 new comments
Optimize transformer encoder/decoder init suggestion
#146882 commented on Mar 3, 2025 • 0 new comments
update types on dynamo configs
#146873 commented on Feb 28, 2025 • 0 new comments
Don't look at TESTING_ONLY in fuzzer
#146870 commented on Feb 28, 2025 • 0 new comments
Enable explicitly vectorized `_weight_int8pack_mm` op for FP16 dtype on x86_64 CPU
#146777 commented on Feb 28, 2025 • 0 new comments
cpp_wrapper: persist autotune example tensors until last use
#146706 commented on Mar 6, 2025 • 0 new comments
Enable Windows tests
#146695 commented on Mar 5, 2025 • 0 new comments
AOTI packaged model fails with generic error when run in for loop but succeeds on individual sample
#146524 commented on Mar 3, 2025 • 0 new comments
Cannot import PyTorch in Alpine Docker Container
#71381 commented on Mar 3, 2025 • 0 new comments
expandable_segments does not work for CUDAPluggableAllocator + MemPool
#147851 commented on Mar 3, 2025 • 0 new comments
[ONNX] Migrate torchlib from onnxscript
#139301 commented on Mar 3, 2025 • 0 new comments
No gradient for `residuals` in the return value of `torch.linalg.lstsq`
#147543 commented on Mar 3, 2025 • 0 new comments
Does CUDACachingAllocator.cpp still require deferred event creation?
#147874 commented on Mar 3, 2025 • 0 new comments
torch._scaled_mm reproductibility
#147972 commented on Mar 3, 2025 • 0 new comments
Fp8 scaled-mm row-wise is substantially slower than tensor-wise
#147971 commented on Mar 3, 2025 • 0 new comments
Adam doesn't work with nonzero-dim Tensor betas
#147921 commented on Mar 3, 2025 • 0 new comments
`FxGraphDrawer` fails on `einsum` nodes
#147884 commented on Mar 3, 2025 • 0 new comments
[CI/CD] Deprecating PyTorch’s official Anaconda channel
#138696 commented on Mar 3, 2025 • 0 new comments
MPS support `torch.linalg.norm` on complex numbers
#146691 commented on Mar 3, 2025 • 0 new comments
MPS operator coverage tracking issue (2.6+ version)
#141287 commented on Mar 3, 2025 • 0 new comments
"unhashable type: non-nested SymInt" error when using DTensor and Compiled Autograd together
#127797 commented on Mar 3, 2025 • 0 new comments
Torch Compile mode giving Graph Break for Higher Order Op for ProcessGroup param
#129942 commented on Mar 3, 2025 • 0 new comments
PyTorch C++ API binary compiled with xmake crashes
#129305 commented on Mar 3, 2025 • 0 new comments
Model parameter and gradient memory formats are inconsistent with compiled autograd
#127922 commented on Mar 3, 2025 • 0 new comments
WSL2 RTX A6000 , CUDA out of memory.
#117197 commented on Mar 3, 2025 • 0 new comments
Error loading "torch\lib\aoti_custom_ops.dll" or one of its dependencies, when importing Torch, when building from Source on Windows 11 with cuDNN.
#144931 commented on Mar 3, 2025 • 0 new comments
CMake Error: When installing PyTorch from source, CUDA not being detected.
#134331 commented on Mar 3, 2025 • 0 new comments
User-defined way to hook into Inductor autotuning
#142388 commented on Mar 3, 2025 • 0 new comments
[JIT] support list of nn.Module in torchscript
#36061 commented on Mar 3, 2025 • 0 new comments
[XPU] torch.nn.functional.pad brings wrong results with torch.compile on Intel GPU
#145372 commented on Mar 3, 2025 • 0 new comments
DISABLED test_input_mutation2_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#135295 commented on Mar 3, 2025 • 0 new comments
[DTensor] [distributed]: Operator aten.select.int does not have a sharding strategy registered
#147724 commented on Mar 3, 2025 • 0 new comments
DISABLED test_vdd_clamp_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#134445 commented on Mar 3, 2025 • 0 new comments
DISABLED test_embedding_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#135250 commented on Mar 3, 2025 • 0 new comments
Regression: Multiple OpenMP runtimes linked to libtorch_cpu.so
#146603 commented on Mar 3, 2025 • 0 new comments
Get `aot_autograd`'ed graph without `torch.compile` and freeze constants without Inductor context
#140205 commented on Mar 3, 2025 • 0 new comments
[export] Fail fast on pytorch with `aoti_load_package`
#145730 commented on Feb 28, 2025 • 0 new comments
torch.device context manager change doesn't show in torch.get_default_device
#131328 commented on Mar 6, 2025 • 0 new comments
[RFC] Test Cases Enabling for Accelerators
#146898 commented on Mar 4, 2025 • 0 new comments
DISABLED test_reentrant_parent_error_on_cpu_cuda (__main__.TestAutogradDeviceTypeCUDA)
#86735 commented on Mar 4, 2025 • 0 new comments
partitioner doesn't appear to respect SAC region
#128730 commented on Mar 3, 2025 • 0 new comments
Add a hash function for tensor data
#2569 commented on Mar 3, 2025 • 0 new comments
SerializeError for ScriptObject in AOTInductor
#147283 commented on Mar 3, 2025 • 0 new comments
[ONNX] slice complex tensor needs implementation
#147896 commented on Mar 3, 2025 • 0 new comments
Inconsistent results from `is_compile_supported ` with equivalent device identifiers
#147826 commented on Mar 3, 2025 • 0 new comments
[BUG][PyTorch 2.0 Export][quant]:get_source_partitions() may return different matches with same input graph
#147170 commented on Mar 3, 2025 • 0 new comments
[dynamo] Activation checkpointing tests erroring at runtime
#127115 commented on Mar 3, 2025 • 0 new comments
Support AC with graph break
#139989 commented on Mar 3, 2025 • 0 new comments
Compiled Autograd + Activation Checkpointing/Offloading
#143176 commented on Mar 3, 2025 • 0 new comments
CUDA error when compiling loss function
#143774 commented on Mar 3, 2025 • 0 new comments
Partitioner's auto-AC misbehaves with mixed dtypes
#144470 commented on Mar 3, 2025 • 0 new comments
CheckpointError with `torch.distributed.algorithms._checkpoint.checkpoint_wrapper` and `torch.compile`
#144637 commented on Mar 3, 2025 • 0 new comments
Activation Checkpointing composability with split backward computation
#145511 commented on Mar 3, 2025 • 0 new comments
Label tracking meta-issue (edit me to get automatically CC'ed on issues! cc bot)
#24422 commented on Mar 3, 2025 • 0 new comments
Missing grad_fn information while torch.compile with customized gradient function
#131974 commented on Mar 3, 2025 • 0 new comments
`RuntimeError` not raised for `out=` argument in `torch.tensordot` with `requires_grad` tensors
#147846 commented on Mar 3, 2025 • 0 new comments
Unable to checkpoint model and optimizer state when using Hybrid Sharding Strategy
#102904 commented on Mar 3, 2025 • 0 new comments
[DCP] Native S3/object storage interface
#116198 commented on Mar 3, 2025 • 0 new comments
[DCP] Review and update examples and docstring to reflect DCP save/load API updates
#119070 commented on Mar 3, 2025 • 0 new comments
Distributed checkpoint state_dict load may report nccl error and hide the real root-cause exception
#122529 commented on Mar 3, 2025 • 0 new comments
`torch_save_to_dcp`'s produced dcp checkpoint doesn't fit the load interface for FSDP
#126047 commented on Mar 3, 2025 • 0 new comments
[DCP] DCP does not support objects which are lazy initialized.
#126881 commented on Mar 3, 2025 • 0 new comments
PyTorch's Distributed Checkpoint Cannot Save a Parameter of Size 1
#132366 commented on Mar 3, 2025 • 0 new comments
Incomplete Parameter Gathering on Rank 0 with FSDP Model Saving
#136950 commented on Mar 3, 2025 • 0 new comments
[torch2.4][Distributed Checkpoint] new flatten logic is error-prone
#137327 commented on Mar 3, 2025 • 0 new comments
FSDP learning hangs when the program tries to save the model
#143536 commented on Mar 3, 2025 • 0 new comments
What is the recommended way to use Distributed Checkpointing Save/Load with HSDP?
#145978 commented on Mar 3, 2025 • 0 new comments
[torch.export] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
#126674 commented on Mar 3, 2025 • 0 new comments
DISABLED test_linear_backward_memory_usage_cuda_float32 (__main__.TestNestedTensorSubclassCUDA)
#141292 commented on Feb 28, 2025 • 0 new comments
Hooks not working in version 2.0.1+cu118
#102374 commented on Feb 28, 2025 • 0 new comments
DISABLED test_rng (__main__.TestCompilerBisector)
#139590 commented on Feb 28, 2025 • 0 new comments
DISABLED test_while_loop_schema_gen (__main__.TestHopSchema)
#141202 commented on Feb 28, 2025 • 0 new comments
CTCLoss gradient is incorrect
#52241 commented on Feb 28, 2025 • 0 new comments
DISABLED test_reorder_peak_memory_dfs (__main__.TestOperatorReorderForPeakMemory)
#145183 commented on Feb 28, 2025 • 0 new comments
AdamW is CPU-bottlenecked with FSDP2, with slow foreach kernels
#134191 commented on Feb 28, 2025 • 0 new comments
DISABLED test_split (__main__.AutoFunctionalizeTests)
#148080 commented on Feb 28, 2025 • 0 new comments
[discussion] Support other index dtypes for scatter, scatter_reduce and other indexing functions in addition to int64: uint8, int16, int32, uint16 etc (without casting copies/reallocations)
#61819 commented on Feb 28, 2025 • 0 new comments
Potential rooms for fewer recompilations by introducing higher-level guards
#143670 commented on Feb 27, 2025 • 0 new comments
DISABLED test_item_to_inputs_kernel_nobreak_cuda (__main__.TestInductorDynamicCUDA)
#119538 commented on Feb 27, 2025 • 0 new comments
compilation error on SequenceParallel'ed Dropout
#147757 commented on Feb 27, 2025 • 0 new comments
More informative variable names in AOTAutograd
#110700 commented on Feb 27, 2025 • 0 new comments
`torch.mul` uses `OpMathType` for computation.
#147134 commented on Feb 27, 2025 • 0 new comments
DISABLED test_int64_upsample3d_cuda_bfloat16 (__main__.TestTorchDeviceTypeCUDA)
#146007 commented on Feb 27, 2025 • 0 new comments
make_graphed_callables don't work with FSDP at all even on a simple network
#127225 commented on Feb 27, 2025 • 0 new comments
Let torch.compiler.allow_in_graph work in more situations
#140437 commented on Feb 27, 2025 • 0 new comments
inspect.Signature with functools.partial partially applying tensors doesn't work
#139374 commented on Feb 27, 2025 • 0 new comments
require_exact_stride better handling of expanded dims
#147156 commented on Feb 27, 2025 • 0 new comments
Failed to Open libnvrtc-builtins.so.11.7
#93134 commented on Feb 27, 2025 • 0 new comments
Partitioner stores fp8 copy of all weights between fwd and bwd, causing OOM
#141881 commented on Feb 27, 2025 • 0 new comments
Checkpoint doesn't work with torch_function if torch_function change tensor metadata
#147995 commented on Feb 27, 2025 • 0 new comments
Run Performance Regression Tests on new Version of Triton
#148045 commented on Feb 27, 2025 • 0 new comments
[RFC] Customization of ElasticAgent for fault-tolerance and node-replacement in big ddp job
#148048 commented on Feb 27, 2025 • 0 new comments
Support alpha=inf consistently for torch.celu
#148065 commented on Feb 27, 2025 • 0 new comments
DISABLED test_recompile (__main__.AutoFunctionalizeTests)
#148013 commented on Feb 27, 2025 • 0 new comments
Feature Request: CUDA-Optimized Queue Buffer for PyTorch
#148077 commented on Feb 27, 2025 • 0 new comments
Track follow ups to #147354
#147867 commented on Feb 27, 2025 • 0 new comments
torch/_inductor/cpp_builder.py : _is_gcc Function Incorrectly Classifies clang++ as g++
#146712 commented on Feb 27, 2025 • 0 new comments
DISABLED test_slice (__main__.AutoFunctionalizeTests)
#148035 commented on Feb 27, 2025 • 0 new comments
[RFC] Cuda support matrix for Release 2.7
#145544 commented on Mar 2, 2025 • 0 new comments
MPS Error on sequoia 15.3: NDArray dimension length > INT_MAX'
#146769 commented on Mar 2, 2025 • 0 new comments
Inductor-CPU might load (and store) fewer elements than the vector-width
#146824 commented on Mar 1, 2025 • 0 new comments
[typing] Add static type hints to `torch.distributions`.
#144196 commented on Mar 1, 2025 • 0 new comments
torch.export does not support torchaudio.transforms.Spectrogram
#112844 commented on Mar 1, 2025 • 0 new comments
FlexAttention uses much more GPU memory than FlashAttention-2
#144537 commented on Mar 1, 2025 • 0 new comments
Unable to print in a branch run by torch.cond
#147115 commented on Mar 1, 2025 • 0 new comments
The value of requires_grad is not set when creating the tensor using TensorMaker
#146419 commented on Mar 1, 2025 • 0 new comments
General MPS op coverage tracking issue
#77764 commented on Mar 1, 2025 • 0 new comments
Error using the "index_put_" function
#124288 commented on Mar 1, 2025 • 0 new comments
PyPy support
#17835 commented on Mar 1, 2025 • 0 new comments
DISABLED test_weight_norm_bwd_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#141484 commented on Mar 1, 2025 • 0 new comments
DISABLED test_embedding_bag_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#135215 commented on Mar 1, 2025 • 0 new comments
DISABLED test_embedding_dynamic_shapes_cpu (__main__.DynamicShapesCodegenCpuTests)
#135456 commented on Mar 1, 2025 • 0 new comments
DISABLED test_wait_tensor (__main__.CompileTest)
#148014 commented on Feb 28, 2025 • 0 new comments
DISABLED test_partitioning_unremat_bw (__main__.MinCutPartitioningTests)
#145343 commented on Feb 28, 2025 • 0 new comments
DistributedDataParallel with compile(..., mode="max-autotune") hangs in 2.5+
#140395 commented on Feb 28, 2025 • 0 new comments
Improving expand w/ unbacked symints
#128645 commented on Feb 28, 2025 • 0 new comments
compile_fx inplace modifly the input graph
#138980 commented on Feb 28, 2025 • 0 new comments
DTensor support for fused qkv matmul
#140069 commented on Feb 28, 2025 • 0 new comments
torch.nn.AvgPool2d fails with stride >= 2^31 on CUDA
#147613 commented on Feb 28, 2025 • 0 new comments
[feature][cudagraph] API to clear a bad recording
#127147 commented on Feb 28, 2025 • 0 new comments
[compiled autograd][aot autograd] accumulate grad (on param with non empty grad) mutates inputs and prevents cudagraph
#126938 commented on Feb 28, 2025 • 0 new comments
autograd.function with `setup_context` has a number of issues with `torch.compile`
#130051 commented on Feb 28, 2025 • 0 new comments
inductor error when torch.compile on distrifuser
#128073 commented on Feb 28, 2025 • 0 new comments
Support AOT Autograd level Caching
#125958 commented on Feb 28, 2025 • 0 new comments
torch._dynamo.exc.UserError: Could not guard on data-dependent expression Eq(256*u0, 256) (unhinted: Eq(256*u0, 256)).
#147672 commented on Feb 28, 2025 • 0 new comments
AOTAutogradCache implementation
#128234 commented on Feb 28, 2025 • 0 new comments
Triton pin update for PyTorch 2.7 / Triton 3.3: Upgrading PyTorch-Triton to a version that Supports Blackwell
#146518 commented on Feb 28, 2025 • 0 new comments
torch.export.export creates guards that denies exporting.
#147623 commented on Feb 28, 2025 • 0 new comments
[inductor] [cpu] `torch.scatter` throws `AttributeError: 'int' object has no attribute 'find'` on CPP backend
#148058 commented on Feb 28, 2025 • 0 new comments
Sweep through Potentially BC Breaking Commits in Triton
#148044 commented on Mar 5, 2025 • 0 new comments
Add docs for __tensor_flatten__ / __tensor_unflatten__
#113089 commented on Mar 5, 2025 • 0 new comments
[dynamo] Replace `unimplemented` with `unimplemented_v2`
#147913 commented on Mar 5, 2025 • 0 new comments
[compile] DDPOptimizer + activation checkpointing not supported
#104674 commented on Mar 5, 2025 • 0 new comments
Preload CUDA fails if CUDA libs in different PYTHONPATH
#147001 commented on Mar 5, 2025 • 0 new comments
Looking for valid compiling option for extension based on torch-2.1.0+cpu.cxx11.abi
#143780 commented on Mar 5, 2025 • 0 new comments
Torch Onnx Export (with Dynamo) does not recognize `Remainder` function
#147973 commented on Mar 5, 2025 • 0 new comments
Scaled Dot-Product Attention Invalid Configuration Error on Large batch size
#142228 commented on Mar 5, 2025 • 0 new comments
[ONNX] Run report_exportability when report=True
#139904 commented on Mar 5, 2025 • 0 new comments
[torchbench] torch._dynamo.exc.Unsupported: Graph break due to unsupported builtin None.morphologyEx
#145088 commented on Mar 5, 2025 • 0 new comments
Immediate Global State Mutation After Using `_force_original_view_tracking` Decorator
#147849 commented on Mar 5, 2025 • 0 new comments
Torch 2.6.0 cu126 is missing several dependencies in the METADATA-file
#146679 commented on Mar 5, 2025 • 0 new comments
Assertion error in Flex Attention backward pass when indexing a parameter
#146896 commented on Mar 5, 2025 • 0 new comments
Symbolic shape fails symbol_to_source when caching is enabled
#127970 commented on Mar 5, 2025 • 0 new comments
`torch.compile` and complex numbers
#125718 commented on Mar 5, 2025 • 0 new comments
libtorch_cuda_linalg.so: undefined symbol: mkl_lapack_dsbrdbn on a source built PyTorch 2.6.0 with USE_STATIC_MKL=1 on CUDA platform
#146551 commented on Mar 5, 2025 • 0 new comments
On Linux, passing torch.Generator to multiprocessing.Process crashes for forkserver and spawn start method
#146828 commented on Mar 5, 2025 • 0 new comments
[CUDA][Blackwell] Blackwell Tracking Issue
#145949 commented on Mar 5, 2025 • 0 new comments
Enable more flake8-bugbear lints
#106571 commented on Mar 5, 2025 • 0 new comments
Custom operators registered via decorator slower than ops registered via `torch.Library.{define, impl}`
#139500 commented on Mar 5, 2025 • 0 new comments
[inductor] [silence] `nn.ConvTranspose2d-F.dropout` outputs inconsistent results with eager
#148061 commented on Mar 5, 2025 • 0 new comments
DISABLED test_profiler_mark_wrapper_call_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#135294 commented on Mar 5, 2025 • 0 new comments
DISABLED test_pattern_matcher_multi_user_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#134433 commented on Mar 5, 2025 • 0 new comments
DISABLED test_max_autotune (__main__.TestFlexAttention)
#147200 commented on Mar 5, 2025 • 0 new comments
Video-Llama (version 1) runs much slower using Float16 than Float32 on Kunpeng CPU
#148078 commented on Mar 5, 2025 • 0 new comments
Tensor parallel for convolutions and groupnorm
#133221 commented on Mar 5, 2025 • 0 new comments
RuntimeError: expect_autograd_hooks_ INTERNAL ASSERT FAILED at "../torch/csrc/distributed/c10d/reducer.cpp"
#143580 commented on Mar 5, 2025 • 0 new comments
Pytorch build fail with GCC 14.1.0 due to third_party/fbgemm/src/UtilsAvx512.cc:970:35: error: ‘r’ may be used uninitialized [-Werror=maybe-uninitialized]
#129358 commented on Mar 5, 2025 • 0 new comments
[inductor][cpu]DebertaV2ForMaskedLM, DebertaV2ForQuestionAnswering and eca_halonext26ts max_autotune accuracy failure in 2025-02-24 nightly release
#148074 commented on Mar 5, 2025 • 0 new comments
torch._check doesn't work for .item() then select
#147772 commented on Mar 4, 2025 • 0 new comments
Load cuda deps more aggressively
#137059 commented on Mar 5, 2025 • 0 new comments
Enabling ATen Distribution kernels for AARCH64 using OpenRNG
#134942 commented on Mar 5, 2025 • 0 new comments
Avoid args being parsed when common_utils imported
#134592 commented on Mar 4, 2025 • 0 new comments
Update Doc for Intel XPU Profiling
#134515 commented on Mar 4, 2025 • 0 new comments
[Don't Merge] Try to build custom ops with MKL XPU
#133658 commented on Feb 28, 2025 • 0 new comments
Make IPC features extendable on third-party devices
#133222 commented on Mar 4, 2025 • 0 new comments
autograd codegen: bump VC properly for mutable ops with no returns
#133044 commented on Feb 28, 2025 • 0 new comments
[torch.special] Adding betainc, betaincc, betaincinv, betainccinv, betaln and beta with backward operation
#132135 commented on Mar 6, 2025 • 0 new comments
[pytree] implement key path APIs for CXX pytree
#130141 commented on Feb 28, 2025 • 0 new comments
[CI] Run `lintrunner` on generated `.pyi` stub files in CI
#129887 commented on Feb 28, 2025 • 0 new comments
Add `__all__` to `torch/nn/functional.pyi` and `torch/return_types.pyi`
#129872 commented on Feb 28, 2025 • 0 new comments
[torchgen] Refactor `torchgen.utils.FileManager` to accept `pathlib.Path`
#129871 commented on Feb 28, 2025 • 0 new comments
Use Generic TypeAlias (PEP 585) and Union Type (PEP 604) in generated `.pyi` stub files
#129420 commented on Feb 28, 2025 • 0 new comments
Enable Leak Sanitizer
#127171 commented on Mar 3, 2025 • 0 new comments
[inductor] online softmax
#127011 commented on Mar 4, 2025 • 0 new comments
[AOTAutograd] tweak min-cut partitioner to avoid saving softmax output
#126348 commented on Feb 28, 2025 • 0 new comments
[vision hash update] update the pinned vision hash
#125806 commented on Mar 6, 2025 • 0 new comments
Automated submodule update: FBGEMM
#115316 commented on Mar 6, 2025 • 0 new comments
[pytree] support PyStructSequence types for Python pytree
#113258 commented on Mar 3, 2025 • 0 new comments
Automated submodule update: kineto
#106149 commented on Mar 5, 2025 • 0 new comments
NotImplementedError: Output channels > 65536 not supported at the MPS device.
#144445 commented on Mar 6, 2025 • 0 new comments
[Feature Request] Release original parameters by layer when turning on `freezing_discard_parameters`
#147062 commented on Mar 6, 2025 • 0 new comments
Pytorch DDP across nodes: self._store = TCPStore( # type: ignore[call-arg] RuntimeError: Stop_waiting response is expected
#114357 commented on Mar 6, 2025 • 0 new comments
Deprecation of NVTX 2 (`nvToolsExt`): Recommended to move to NVTX 3
#147011 commented on Mar 6, 2025 • 0 new comments
DISABLED test_remote_cache_load_function_device_cuda_float32_dynamic_False_bundle_triton_True (__main__.TestFxGraphCache)
#145191 commented on Mar 6, 2025 • 0 new comments
[RFC] Request for Feedback and Review on PRs Adding RISC-V and RVV Support
#147513 commented on Mar 6, 2025 • 0 new comments
DISABLED test_tensor_dtype_complex (__main__.CommTest)
#112460 commented on Mar 6, 2025 • 0 new comments
[ONNX] BitwiseOr was generated for bool inputs (invalid)
#147854 commented on Mar 6, 2025 • 0 new comments
Enable CUDA 12.8.0, Disable CUDA 12.4
#145570 commented on Mar 6, 2025 • 0 new comments
AttributeError: Can't pickle local object 'make_opaque_bitwise_fn.<locals>.BitwiseFn'
#147841 commented on Mar 6, 2025 • 0 new comments
python3 -m torch.utils.collect_env not providing expected output.
#147669 commented on Mar 6, 2025 • 0 new comments
DISABLED test_nonstrict_trace_nested_custom_class_error (__main__.DecoratorTests)
#148031 commented on Mar 4, 2025 • 0 new comments
DISABLED test_nonstrict_trace_newly_constructed_custom_class_with_side_effects (__main__.DecoratorTests)
#148032 commented on Mar 4, 2025 • 0 new comments
DISABLED test_nonstrict_trace_nested_custom_class (__main__.DecoratorTests)
#148033 commented on Mar 4, 2025 • 0 new comments
DISABLED test_nonstrict_trace_no_action_at_a_distance (__main__.DecoratorTests)
#148034 commented on Mar 4, 2025 • 0 new comments
DISABLED test_nonstrict_trace_on_method (__main__.DecoratorTests)
#148054 commented on Mar 4, 2025 • 0 new comments
DISABLED test_nonstrict_trace_inside_compiled_function_kwarg (__main__.DecoratorTests)
#148055 commented on Mar 4, 2025 • 0 new comments
DISABLED test_nonstrict_trace_pre_existing_custom_class_with_side_effects (__main__.DecoratorTests)
#148056 commented on Mar 4, 2025 • 0 new comments
PyTorch VS2022 official build Windows binary illegal instruction on AVX2(max ISA level) CPU
#145702 commented on Mar 4, 2025 • 0 new comments
[inductor] Add Python type annotations to `torch/_inductor`
#146167 commented on Mar 4, 2025 • 0 new comments
DISABLED test_per_sample_api_compute_batch_size_not_pytreeable_cpu (__main__.TestExpandedWeightModuleCPU)
#146972 commented on Mar 4, 2025 • 0 new comments
torch.compile() on quantized model: No attribute "meta"
#148072 commented on Mar 4, 2025 • 0 new comments
Triton Error [CUDA]: invalid device context when autograd.backward a triton kernel
#124565 commented on Mar 4, 2025 • 0 new comments
Loading weights using `torch.distributed.checkpoint` leads to large loss values
#145378 commented on Mar 4, 2025 • 0 new comments
incorrect _unsafe_index meta
#139312 commented on Mar 4, 2025 • 0 new comments
DISABLED test_cache_load_function_device_cuda_float32_dynamic_False_bundle_triton_True_grad_True (__main__.TestFxGraphCache)
#145336 commented on Mar 4, 2025 • 0 new comments
TorchInductor CPU Performance Dashboard
#93531 commented on Mar 4, 2025 • 0 new comments
Support FP16 accumulation for faster LLM inference on 4090 like GPUs
#123558 commented on Mar 4, 2025 • 0 new comments
gradient checkpointing with use_reentrant=False cannot reduce peak memory
#147449 commented on Mar 4, 2025 • 0 new comments
[inductor][user triton] support on-device TMA / tensor descriptor API
#148052 commented on Mar 4, 2025 • 0 new comments
Adjust test_mm_triton_kernel_benchmark for unpadded tensors
#147999 commented on Mar 4, 2025 • 0 new comments
torch.compile supported with GIL disabled
#147946 commented on Mar 4, 2025 • 0 new comments
redundant recompilation caused by duplicated Sym()
#144068 commented on Mar 4, 2025 • 0 new comments
Strange recompilations on torch 2.5 + FSDP + UNet
#138813 commented on Mar 4, 2025 • 0 new comments
automatic_dynamic_shapes for mark_unbacked
#136605 commented on Mar 4, 2025 • 0 new comments
dynamo (re)compilation issues: shape (1,1), nn.Parameter, mark_dynamic
#135011 commented on Mar 4, 2025 • 0 new comments
Avoid recompilation for inputs integer number
#132849 commented on Mar 4, 2025 • 0 new comments
StableDiffusion with dynamic=True still recompiles
#104913 commented on Mar 4, 2025 • 0 new comments
Recompilation triggered at each step of the loop involving array indexing
#114293 commented on Mar 4, 2025 • 0 new comments
torch.compile support for SeamlessExpressivity/SeamlessM4T in fairseq2
#114373 commented on Mar 4, 2025 • 0 new comments
DynamicInt helper structure that is equivalent to mark_dynamic on an int
#129623 commented on Mar 4, 2025 • 0 new comments
Verifier (in torch.export.export) does not make use of if-condition inside branches
#147991 commented on Mar 4, 2025 • 0 new comments
Build Triton for aarch64
#130558 commented on Mar 4, 2025 • 0 new comments
`torch.device(0)` makes CUDA init fail in subprocess since `2.5.0`
#144152 commented on Mar 4, 2025 • 0 new comments
Comm reordering can make Inductor use variable before its definition
#147328 commented on Mar 4, 2025 • 0 new comments
Web Page do not match the original documentation
#146683 commented on Mar 4, 2025 • 0 new comments
[ROCm] scaled_dot_product_attention using mem-efficient backend (aotriton) produces wrong outputs with custom attn_mask on torch 2.6.0+rocm6.2.4
#147460 commented on Mar 4, 2025 • 0 new comments
`Illegal instruction (core dumped)` on Raspberry Pi 4 when exporting ONNX with `torch 2.6.0`
#146792 commented on Mar 4, 2025 • 0 new comments
XPU - UserWarning: Failed to initialize XPU devices. when run on the host without Intel GPU Driver
#145433 commented on Mar 4, 2025 • 0 new comments
CI/CD: Figure out what to do with split build
#138750 commented on Mar 4, 2025 • 0 new comments
torch_cuda.dll was built failed to link _cudnn_attention_forward
#147671 commented on Mar 4, 2025 • 0 new comments
DISABLED test_inductor_all_gather_into_tensor_coalesced (__main__.CompileTest)
#146806 commented on Mar 4, 2025 • 0 new comments
Publish pytorch RC docker images before release
#145925 commented on Mar 4, 2025 • 0 new comments
unbind_copy opinformation cause exception while running test_dtensor_ops.py
#147814 commented on Mar 4, 2025 • 0 new comments
[inductor] [silence] inconsistent swap wih eager when compiling `torch.rot90-torch.randn_like`
#147847 commented on Mar 4, 2025 • 0 new comments
Flex Attention is incompatible with selective AC
#147879 commented on Mar 4, 2025 • 0 new comments
torch.compile with the inductor backend slows down (exponentially?) for certain graphs
#148073 commented on Mar 4, 2025 • 0 new comments
DISABLED test_slice_dynamic (__main__.AutoFunctionalizeTests)
#148067 commented on Mar 4, 2025 • 0 new comments
torch.compile for division gives different numeric output vs eager mode division: torch.tensor/torch.tensor
#141753 commented on Mar 4, 2025 • 0 new comments
torch.compile mode="max-autotune" precision appears to be lower
#96693 commented on Mar 4, 2025 • 0 new comments
[RFE][Distributed][NCCL] A feature request for stream management API in PG NCCL
#147729 commented on Mar 4, 2025 • 0 new comments
context_parallel fails with plain sdpa kernel SDPBackend.MATH
#147793 commented on Mar 4, 2025 • 0 new comments
Using DTensor with device meshes that use different devices for input and output
#126795 commented on Mar 4, 2025 • 0 new comments
Memory leak when using SequentialLR and ChainedLR schedulers
#126131 commented on Mar 4, 2025 • 0 new comments
MX basic dtypes in pytorch/pytorch
#146414 commented on Mar 4, 2025 • 0 new comments
grad_fn function disobeys broadcast rules
#144228 commented on Mar 4, 2025 • 0 new comments
AOT subclass desugaring adds static input arguments to inductor
#130502 commented on Mar 4, 2025 • 0 new comments
[ONNX] GNN model inaccuracy: scatter_reduce need to be fixed
#147617 commented on Mar 4, 2025 • 0 new comments
roundtrip cast between float32|bfloat16 and e8m0 should work in torchinductor
#147875 commented on Mar 4, 2025 • 0 new comments
Error : torch/utils/_sympy/interp.py:176] [0/2] failed while executing pow_by_natural([VR1, int_oo], VR[-1, -1]])
#148003 commented on Mar 4, 2025 • 0 new comments
returning tensors of dtype torch.float8_e8m0fnu should work with torchinductor
#147873 commented on Mar 4, 2025 • 0 new comments
[inductor] `torch.slice_scatter` throws `AssertionError` when meeting internal `float32`
#147842 commented on Mar 4, 2025 • 0 new comments