-
Notifications
You must be signed in to change notification settings - Fork 23.5k
Insights: pytorch/pytorch
Overview
Could not load contribution data
Please try again later
1 Pull request merged by 1 person
-
Add a docstring to build.sh
#144566 merged
Mar 5, 2025
201 Pull requests opened by 114 people
-
Implement needs_exact_strides for mutable custom operators
#148091 opened
Feb 27, 2025 -
Rename node.meta["arg_kwarg_vals"] to node.meta["eager_input_vals"]
#148092 opened
Feb 27, 2025 -
[ROCm][Windows] Fix OpenMP Flags for clang-cl
#148097 opened
Feb 27, 2025 -
[Experiment] meaure the effect of combining cpp-wrapper and cudagraphs
#148100 opened
Feb 27, 2025 -
Inductor respects exact strides on custom ops by default
#148104 opened
Feb 27, 2025 -
Use myst_nb in docs
#148105 opened
Feb 27, 2025 -
stable torch library draft
#148124 opened
Feb 27, 2025 -
Use correct boxed_forward_device_index when running `CompiledFxGraph.post_compile`
#148130 opened
Feb 27, 2025 -
[WIP] [Inductor] Use real input to autotune user defined triton kernels
#148131 opened
Feb 27, 2025 -
WIP enable aten convolution out in lowerings
#148132 opened
Feb 27, 2025 -
Checks kv pair indexing in OrderedPreservingDictTest.test_range_insert
#148136 opened
Feb 27, 2025 -
Disable cudnn to avoid creating guards that denies exporting
#148140 opened
Feb 28, 2025 -
draft
#148160 opened
Feb 28, 2025 -
[Don't merge]Upgrade submodule oneDNN to v3.7 (#147498)(Z7)
#148163 opened
Feb 28, 2025 -
Improvement with comprehensive docstrings and implementation of class method for the code.
#148170 opened
Feb 28, 2025 -
[Don't merge]Upgrade submodule oneDNN to v3.7 (#147498)(ZI)
#148173 opened
Feb 28, 2025 -
Fix torch.matmul related out dtype check
#148174 opened
Feb 28, 2025 -
Fix addbmm & addmv & baddbmm out dtype check
#148176 opened
Feb 28, 2025 -
[Dynamo] Replace `unimplemented` with`unimplemented_v2` in `torch/_dynamo/variables/base.py`
#148177 opened
Feb 28, 2025 -
[pytree] add another simplified pytree module `torch.pytree`
#148180 opened
Feb 28, 2025 -
[BE][PYFMT] migrate PYFMT for `torch/ao/` to `ruff format`
#148185 opened
Feb 28, 2025 -
[BE][PYFMT] migrate PYFMT for `test/inductor/` to `ruff format`
#148186 opened
Feb 28, 2025 -
Support huggingface reading and writing for multi rank case
#148189 opened
Feb 28, 2025 -
[BE][Ez]: Use itertools.chain.from_iterable when possible
#148190 opened
Feb 28, 2025 -
Enable oneDNN dispatch for gemm bf16bf16->bf16
#148197 opened
Feb 28, 2025 -
Move estimate runtime and pick loop order heuristics into choices.py
#148202 opened
Feb 28, 2025 -
[debug] 'No available kernel' error for cudnn on A100
#148204 opened
Feb 28, 2025 -
[cond] support output the same unbacked symbol from two branches
#148206 opened
Feb 28, 2025 -
[inductor] support dilation in max_pool2d lowering
#148209 opened
Feb 28, 2025 -
[inductor] Lowerings for max_pool3d
#148210 opened
Feb 28, 2025 -
[not for merge] [AOTI] selectively build code at O1
#148212 opened
Feb 28, 2025 -
Expose functions used in custom backend in torch_python dll
#148213 opened
Feb 28, 2025 -
ROCm: Disable torch check for Multiplication of two Float8_e5m2 matrices
#148228 opened
Feb 28, 2025 -
[cutlass backend] Expand addmm test to all 4 broadcastable shape bias
#148234 opened
Mar 1, 2025 -
Make require_contiguous require exact strides instead of stride order
#148235 opened
Mar 1, 2025 -
[cutlass backend] try reenable subproc add mm test
#148236 opened
Mar 1, 2025 -
[ROCm][TunableOp] Add support for rowwise scaling on scaled GEMM.
#148238 opened
Mar 1, 2025 -
handle jk for emulation runs
#148240 opened
Mar 1, 2025 -
[fx] Move map_aggregate to C++
#148243 opened
Mar 1, 2025 -
Set requires grad in TensorMaker::make_tensor()
#148255 opened
Mar 1, 2025 -
[BE]: No include left behind - recursive glob setuptools support
#148258 opened
Mar 1, 2025 -
[fx] Move Node._update_args_kwargs to C++
#148260 opened
Mar 1, 2025 -
[fx] Move Node._prepend/Node._remove_from_list to C++
#148261 opened
Mar 1, 2025 -
Typo Errors fixed in multiple files
#148262 opened
Mar 1, 2025 -
Fix bug when Inductor include path contains spaces
#148271 opened
Mar 1, 2025 -
[torch] Fix unsafe concurrent access to autocast_enabled
#148281 opened
Mar 2, 2025 -
[fx] Optimizations for node name generation
#148288 opened
Mar 2, 2025 -
[fx] Optimize TracerBase.create_arg and Graph._gen_python_code
#148292 opened
Mar 3, 2025 -
Treat CUDA warnings as errors
#148294 opened
Mar 3, 2025 -
```torch.as_strided``` negative stride SIGSEV fix when using ```torch.compile```
#148301 opened
Mar 3, 2025 -
Better log message to update pr_time_benchmarks/expected_results.csv
#148303 opened
Mar 3, 2025 -
Reland: [inductor] Simplify grid handling
#148305 opened
Mar 3, 2025 -
do not run `test_ck_blas_library` on cpu
#148316 opened
Mar 3, 2025 -
[ATen][CUDA] Optimize 128 bit vectorization
#148320 opened
Mar 3, 2025 -
Checking for cuda version to see if bf16 is natively supported or emulated
#148322 opened
Mar 3, 2025 -
[Windows][Inductor][XPU] Unload triton pyd files to be able to remove them on Windows.
#148323 opened
Mar 3, 2025 -
Flex unskip
#148327 opened
Mar 3, 2025 -
[pytree] simplify public API exposition with `__module__`
#148328 opened
Mar 3, 2025 -
Update CURL url for manywheel images
#148343 opened
Mar 3, 2025 -
[test] cutlass
#148351 opened
Mar 3, 2025 -
[DO NOT MERGE] Test new ROCm CI Navi31 nodes
#148355 opened
Mar 3, 2025 -
test index_put
#148357 opened
Mar 3, 2025 -
[inductor] Improve type annotations in _inductor/ir.py
#148358 opened
Mar 3, 2025 -
[AOTI][dashboard] Skip torchbench models not supported by export
#148359 opened
Mar 3, 2025 -
Enabling xpu in OffsetBasedRNGTracker .
#148360 opened
Mar 3, 2025 -
[@no-merge] Enable process based async cp + caching
#148373 opened
Mar 3, 2025 -
Documents torch.cuda.MemPool API
#148374 opened
Mar 3, 2025 -
[Utilization] Add utilization monitor for linux build
#148375 opened
Mar 3, 2025 -
[reland][ca] side-effect free inital trace: compiled_args
#148376 opened
Mar 3, 2025 -
Throws error when using torch.cuda.MemPool with expandable segments
#148378 opened
Mar 3, 2025 -
[Optimus][Auto-AC] Support activation quantization
#148380 opened
Mar 3, 2025 -
[ca] remove compiled_autograd_tracing
#148381 opened
Mar 3, 2025 -
[Docs][TunableOp] TunableOp documentation update
#148384 opened
Mar 4, 2025 -
[dynamo] Remove dead code path around `functools.partial` objects
#148386 opened
Mar 4, 2025 -
Add new GHA workflow to cache ROCm CI docker images on MI300 CI runners periodically
#148394 opened
Mar 4, 2025 -
[Docs] update bucketize documentaion
#148400 opened
Mar 4, 2025 -
[dynamo] show stack above dynamo in graph break user tracebacks
#148401 opened
Mar 4, 2025 -
Use oneDNN v3.7.1 for Intel GPU
#148403 opened
Mar 4, 2025 -
Add api info for torch._C._nn.pyi
#148405 opened
Mar 4, 2025 -
ci: Move s390x builds with the rest
#148406 opened
Mar 4, 2025 -
Enable ASAN on inductor CUDA tests
#148407 opened
Mar 4, 2025 -
[WIP] Add `device` arg to `_lazy_clone`
#148408 opened
Mar 4, 2025 -
Add api info for torch._C._nn.pyi [1/N]
#148410 opened
Mar 4, 2025 -
Disable flake8 advice C416
#148412 opened
Mar 4, 2025 -
Automated perf_linter changes: generators
#148413 opened
Mar 4, 2025 -
Automated perf_linter changes: list constructors
#148414 opened
Mar 4, 2025 -
Automated perf_linter changes: x in (...)
#148415 opened
Mar 4, 2025 -
Add perf_linter to auto-fix some anti-patterns
#148416 opened
Mar 4, 2025 -
Add 'x in {...}' patterns to perf_linter
#148417 opened
Mar 4, 2025 -
[2/N] Use Python 3.9 typing
#148418 opened
Mar 4, 2025 -
ci: Add sccache to manylinux images
#148419 opened
Mar 4, 2025 -
[set_linter] allow x in {...}
#148422 opened
Mar 4, 2025 -
[Intel GPU][pt2e]: Collapse 3D input to 2D for matmul in qlinear_pointwise_binary fusion
#148423 opened
Mar 4, 2025 -
Temp test
#148424 opened
Mar 4, 2025 -
Optimize `torch.distributions` Score function
#148429 opened
Mar 4, 2025 -
Introduce guard_or_true, guard_or_false and avoid guard_size_oblivious in decompositions.py
#148430 opened
Mar 4, 2025 -
set non_blocking to true in torch._foreach_copy_ to improve performance
#148431 opened
Mar 4, 2025 -
[ROCm] Add TF32 option for Flex Attention for gfx90a
#148432 opened
Mar 4, 2025 -
Do not crash when compiling quantized LORA models
#148435 opened
Mar 4, 2025 -
Expand docs for `nn.functional`, and make the wording consistent
#148436 opened
Mar 4, 2025 -
[ROCm] Incorporate ROCm triton specific tuning parameters
#148437 opened
Mar 4, 2025 -
Let `CUDAExtension` to find stub libs
#148441 opened
Mar 4, 2025 -
Update s390x docker image
#148444 opened
Mar 4, 2025 -
Fix test failures on non-x86 Linux
#148445 opened
Mar 4, 2025 -
[inductor][triton] Block ptr analysis fix assert on matched index expression
#148446 opened
Mar 4, 2025 -
Enable more nightly tests on s390x
#148452 opened
Mar 4, 2025 -
Update docstring to match code.
#148455 opened
Mar 4, 2025 -
[PP] RFC for fixing microbatch splitting for dim != 0
#148458 opened
Mar 4, 2025 -
Remove `torch.testing` from `MOD_SKIPLIST`
#148459 opened
Mar 4, 2025 -
meta registration for torch._scaled_mm with mxfp8
#148461 opened
Mar 4, 2025 -
[aarch64] add libcufile for cu126 and cu128
#148465 opened
Mar 4, 2025 -
[MPS] Introduce strides unary op
#148468 opened
Mar 4, 2025 -
[dynamo] Properly account for non-list instances in list comparison
#148470 opened
Mar 4, 2025 -
[BE][pytree] rename `NodeDef` member to match the type annotations: `*_fn -> *_func`
#148474 opened
Mar 4, 2025 -
Remove warnings on non-buffer tensor constants
#148483 opened
Mar 4, 2025 -
[BE][pytree] rename argument name in register function to match the type annotations: `*_fn -> *_func`
#148484 opened
Mar 4, 2025 -
Demote logger of runtime_asserts_frozen to be fired only on debug mode
#148485 opened
Mar 4, 2025 -
Suppress more warnings
#148488 opened
Mar 4, 2025 -
[Ads] Slice convertIdScoreListFeaturesToIValue in half
#148489 opened
Mar 4, 2025 -
[aot cache][ca] remove restriction on caching ca's aot inference graph
#148491 opened
Mar 4, 2025 -
[triton hash update] update the pinned triton hash
#148492 opened
Mar 4, 2025 -
Implement fast access to individual elements of jagged nested tensors
#148497 opened
Mar 4, 2025 -
[codemod] Remove unused-variable in caffe2/caffe2/core/context_gpu.cu +2
#148501 opened
Mar 4, 2025 -
[Inductor-CPU] Fix perf regression for templated int8 WoQ GEMM for small M dimension
#148502 opened
Mar 4, 2025 -
[triton] Warp specialization support in torchinductor
#148503 opened
Mar 4, 2025 -
Change constexpr annotation to specific initialization (test: triton_kernel_constants)
#148505 opened
Mar 4, 2025 -
Support basic TorchBind in aot_compile and aoti_compile_and_package
#148506 opened
Mar 4, 2025 -
WIP: record how many paramaters we're parsing
#148508 opened
Mar 4, 2025 -
Enable autocast on MAIA device
#148511 opened
Mar 5, 2025 -
Add sparsity
#148513 opened
Mar 5, 2025 -
[ca][aot] maybe mark activations as dynamic
#148516 opened
Mar 5, 2025 -
[PT2] Port use_triton_dot_compress to PT2 pre_grad passes
#148517 opened
Mar 5, 2025 -
[cutlass backend] Forward fix for less aligned gemm shapes
#148521 opened
Mar 5, 2025 -
[Intel GPU][pt2e] Enable quantized grouped convolution at XPU
#148522 opened
Mar 5, 2025 -
Implement gradient for the `residuals` of `torch.linalg.lstsq`
#148526 opened
Mar 5, 2025 -
Fix clang-tidy bugprone* warnings
#148529 opened
Mar 5, 2025 -
[WIP] Initial implementation of Grouped Gemm API
#148531 opened
Mar 5, 2025 -
Skip buffer in dense update
#148533 opened
Mar 5, 2025 -
[dtensor] add CuDNN SDPA op support to DTensor
#148537 opened
Mar 5, 2025 -
[XPU] Add test/kernel.errors.txt to .gitignore.
#148538 opened
Mar 5, 2025 -
Enable Direct Use of Arm Compute Library (ACL) in ATen
#148542 opened
Mar 5, 2025 -
remove TORCH_NCCL_AVOID_RECORD_STREAMS,use stashed_for_allocator_safety_ to save the input ref
#148553 opened
Mar 5, 2025 -
[BE] format `test/inductor/s429861_repro.py`
#148554 opened
Mar 5, 2025 -
[DEBUG] Custom ops perf
#148555 opened
Mar 5, 2025 -
[BE] Remove `onlyCPU` decorator from test_local_scalar_dense
#148559 opened
Mar 5, 2025 -
[ROCm][Windows] Fix ROCm/HIP version header
#148560 opened
Mar 5, 2025 -
[WIP] First version of StaticCudaLauncher
#148561 opened
Mar 5, 2025 -
[ROCm][Windows] Enable hipblaslt for Windows
#148563 opened
Mar 5, 2025 -
[BE][pytree] cleanup parameterized pytree tests
#148569 opened
Mar 5, 2025 -
[DCP] Save Plan Caching: Fix the missing all_plans update in the cache.
#148577 opened
Mar 5, 2025 -
[CI][CUDA][Distributed]Update test_composability.py
#148578 opened
Mar 5, 2025 -
[wip][inductor]lowering scan to while_loop
#148580 opened
Mar 5, 2025 -
Enable Direct Use of Arm Compute Library (ACL) in ATen
#148581 opened
Mar 5, 2025 -
Enable Direct Use of Arm Compute Library (ACL) in ATen
#148584 opened
Mar 5, 2025 -
Enable fast qlinear static/dynamic path for AArch64 through ACL directly
#148585 opened
Mar 5, 2025 -
[PGNCCL] Launch kernel on current stream & remove `record_stream` entirely
#148590 opened
Mar 5, 2025 -
[AOTI] Swith to local cpp compile for fbcode
#148592 opened
Mar 5, 2025 -
Add XPU device to nested_layer_norm
#148593 opened
Mar 5, 2025 -
[CUDA Graphs][NCCL] Set event queries to happen under thread-local mode in `ProcessGroupNCCL.cpp`
#148594 opened
Mar 5, 2025 -
[c10d] Make getDefaultBackend more fault tolerant
#148596 opened
Mar 5, 2025 -
[pytorch] Update flexattention bwd config generation
#148600 opened
Mar 5, 2025 -
Fix for AOTI + CUDAGraphs when calling from Python
#148601 opened
Mar 5, 2025 -
[CI][CUDA] Move away from cuda12.4, Add cuda12.6 eager CI tests
#148602 opened
Mar 5, 2025 -
[ONNX] Expose verification utilities
#148603 opened
Mar 5, 2025 -
[cuda] Add new faster gammabeta backward kernel
#148605 opened
Mar 5, 2025 -
Remove warnings on non-buffer tensor constants (#148483)
#148611 opened
Mar 5, 2025 -
[CI] [inductor] Add cu126 inductor jobs and move away cu124
#148612 opened
Mar 5, 2025 -
Bump to AOTriton 0.9.2 to fix version strings
#148615 opened
Mar 5, 2025 -
Optimize AOTInductor: Caching, Reduced Decompositions, and Improved JSON Handling
#148616 opened
Mar 5, 2025 -
[dynamo] Don't affect stack traces under TORCHDYNAMO_DISABLE
#148618 opened
Mar 5, 2025 -
fix 142457 , fixes double free corruption by adding TORCH_CHECK to ensure weights have the proper size
#148620 opened
Mar 5, 2025 -
update get_default_device to also respect torch.device ctx manager
#148621 opened
Mar 5, 2025 -
stage 2 of depreate silent fallback of tuning gemm
#148622 opened
Mar 5, 2025 -
[mm_logs] follow up to add count info based on shape for inductor `aten.mm`s
#148623 opened
Mar 6, 2025 -
Remove Cuda 12.4 from nightly Binaries
#148625 opened
Mar 6, 2025 -
[triton 3.3] test_triton_kernel_constants fix
#148626 opened
Mar 6, 2025 -
[ONNX] Use torch export to get dynamic shapes for JIT convert strategy
#148627 opened
Mar 6, 2025 -
Remove CAFFE2_USE_EIGEN_FOR_BLAS
#148628 opened
Mar 6, 2025 -
[inductor] lowering for fractional_max_pool3d
#148630 opened
Mar 6, 2025 -
[Window][Inductor UT] Fix for tempfile.NamedTemporaryFile(delete=True) not work on Windows.
#148632 opened
Mar 6, 2025 -
update torch.nn.RelicationPad{1,2,3}d deternimistic documentation
#148633 opened
Mar 6, 2025 -
Subprocess compile (attempt 2)
#148635 opened
Mar 6, 2025 -
[cutlass backend] switch host optimizer to O3
#148637 opened
Mar 6, 2025 -
Remove cppcoreguidelines-pro-type-member-init_fix suppression
#148638 opened
Mar 6, 2025 -
[Intel GPU][quant] Refine zero-point memory creation
#148640 opened
Mar 6, 2025 -
Fix torch.utils.checkpoint import error
#148641 opened
Mar 6, 2025 -
[XPU] Add an implict conversion from XPUStream to sycl::queue*
#148646 opened
Mar 6, 2025 -
[SGD] Add SGD capturable API and tests
#148647 opened
Mar 6, 2025 -
Bump Clang-tidy to 19.1.4
#148648 opened
Mar 6, 2025 -
[Intel GPU] Fix SDPA dummy LSE output to match meta function
#148652 opened
Mar 6, 2025 -
Enable qint8 and quint8 add for AArch64 using ACL directly
#148653 opened
Mar 6, 2025 -
Enable ruff check for `torch/utils/data/typing.ipynb`
#148654 opened
Mar 6, 2025 -
Allow to run flex_attention on HPU
#148656 opened
Mar 6, 2025 -
[associative_scan] Refactoring of input checking and dynamo invocation
#148657 opened
Mar 6, 2025 -
[docs] fix autograd description on convex function case
#148658 opened
Mar 6, 2025 -
[HPU] Add HPU as a supported device for NestedTensor
#148659 opened
Mar 6, 2025 -
Remove deprecated std::aligned_storage_t
#148660 opened
Mar 6, 2025 -
removed tocm triton template cond
#148662 opened
Mar 6, 2025 -
[Profiler][HPU] Fix incorrect availabilities for HPU
#148663 opened
Mar 6, 2025 -
[AOTInductor] Codegen fix
#148664 opened
Mar 6, 2025
149 Issues closed by 52 people
-
[triton 3.3] inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_none_args_aot_codegen_cuda
#148540 closed
Mar 6, 2025 -
DISABLED test_slice_scatter_reinplace_cuda (__main__.GPUTests)
#145189 closed
Mar 6, 2025 -
torch.conj behaves differently on cpu and mps
#148599 closed
Mar 6, 2025 -
The operator 'aten::_linalg_solve_ex.result' is not currently implemented for the MPS device
#148547 closed
Mar 6, 2025 -
'CUDA error: an illegal memory access was encountered' when using forced_align on cuda device > 0
#148438 closed
Mar 6, 2025 -
Inductor layout constraints for custom operators changed from 2.5->2.6, breaking BC
#148356 closed
Mar 6, 2025 -
[ROCm] sorting torch.bool tensor viewed from torch.uint8 type produces incorrect results
#139972 closed
Mar 6, 2025 -
test_reference_numerics_normal fails with certain versions of numpy/scipy
#148143 closed
Mar 5, 2025 -
[ONNX] stft export fails with dynamo_export
#113067 closed
Mar 5, 2025 -
[ONNX] Record the capture strategy in onnx program
#147674 closed
Mar 5, 2025 -
[aot autograd][inline inbuilt nn modules] AOT Autograd changes _version different from eager
#128198 closed
Mar 5, 2025 -
[Inductor-CPU] ATen SDPA kernel runtime is not captured in profiling results
#148225 closed
Mar 5, 2025 -
[ONNX] aten_unfold needs to support symint
#148337 closed
Mar 5, 2025 -
Decorators like `torch.compiler.allow_in_graph` doesn't account for id reuse
#147777 closed
Mar 5, 2025 -
The `sympy` dependency spec for pytorch on PyPi wheel is still unchanged.
#145225 closed
Mar 5, 2025 -
The latest PyTorch XPU wheel 2.7.0.dev20250117+xpu does not work on Windows
#145155 closed
Mar 5, 2025 -
AOTI doesn't account for constant tensors
#148370 closed
Mar 5, 2025 -
Redundant Try Block in backward()
#148115 closed
Mar 5, 2025 -
dataclasses.replace not supported by dynamo
#136481 closed
Mar 5, 2025 -
`torch.compile` fails with customized Triton Operator on Triton 2.2
#128039 closed
Mar 5, 2025 -
Compile with non-default mode + triton kernel fails
#126864 closed
Mar 5, 2025 -
torch.compile generates wrong code on CPU and compiled code replaces original function
#126848 closed
Mar 5, 2025 -
SIGSEGV error when passing a 0-sized tensor to `_local_scalar_dense`
#145066 closed
Mar 5, 2025 -
DISABLED test_dropout_deterministic_dynamic_shapes_cuda (__main__.DynamicShapesGPUTests)
#132957 closed
Mar 5, 2025 -
DISABLED test_tensor_subclass_basic (__main__.TestCompiledAutograd)
#145457 closed
Mar 5, 2025 -
DISABLED test_precompilations (__main__.TestMaxAutotune)
#124265 closed
Mar 5, 2025 -
nn.Transformer out[0:-1] not precisely equal to last_out when predicting in tgt mask
#100052 closed
Mar 5, 2025 -
[Flex Attention] Errors with Dynamic Shapes (Cannot determine truth value of Relational)
#146745 closed
Mar 5, 2025 -
DISABLED test_recompile_on_global_state_change_dynamic_shapes (__main__.DynamicShapesMiscTests)
#144896 closed
Mar 5, 2025 -
[xpu] Compilation of pytorch failed, unable to generate RegisterSparseXPU.cpp
#144718 closed
Mar 5, 2025 -
torch.accelerator.is_available() raise RuntimeError if no available CUDA/XPU devices
#144567 closed
Mar 5, 2025 -
[Profiler] Add profiler activity for HPU devices
#148181 closed
Mar 5, 2025 -
[dynamic-shapes][dynamo] Iist index out of range for constraint with out=
#130066 closed
Mar 5, 2025 -
PyTorch defaults to using libuv but is built without support for it on Windows
#139990 closed
Mar 5, 2025 -
Switch to using Docker Images from ECR instead of Docker Hub
#147748 closed
Mar 4, 2025 -
Lintrunner results inconsistent
#128588 closed
Mar 4, 2025 -
[ONNX] torch.matmul() breaks dynamic shapes during export
#148192 closed
Mar 4, 2025 -
Release 2.6.0 validations checklist and cherry-picks
#144503 closed
Mar 4, 2025 -
[RFC] Raising minimal glibc support to: glibc2_28 . Deprecation support for Amazon Linux 2 support
#126551 closed
Mar 4, 2025 -
[DTensor] `clip_grad_norm_` follow-ups
#121020 closed
Mar 4, 2025 -
RuntimeError when using Adam(fused=True) with torch.compile
#126585 closed
Mar 4, 2025 -
[DSD] Test could fail in test_fsdp_dsd.py
#134212 closed
Mar 4, 2025 -
Dtensor shard uses more gpu memory than raw tensor
#133549 closed
Mar 4, 2025 -
[dynamo] Issue with construction nn.Parameter
#127697 closed
Mar 4, 2025 -
Autotuning failure: `Triton Error [CUDA]: invalid argument`
#145984 closed
Mar 4, 2025 -
Adding Small Epsilon in linalg_eig_backward to Improve Numerical Stability on GPU
#147544 closed
Mar 4, 2025 -
I don't use FSDP,it can train.
#148409 closed
Mar 4, 2025 -
built from source windows static library with multiple "unresolved external symbol"
#87499 closed
Mar 4, 2025 -
[Triton upstream] [Inductor] [ROCm] OpInfo quantile UT accuracy issues
#147736 closed
Mar 4, 2025 -
[Triton upstream] [Inductor] [ROCm] cpp_wrapper segfaults
#147734 closed
Mar 4, 2025 -
There is a problem with the wording here.
#147696 closed
Mar 4, 2025 -
Triton aarch64 and triton sbsa
#147857 closed
Mar 4, 2025 -
torch.compile reorder_for_compute_comm_overlap sink_waits pass does not work
#127324 closed
Mar 4, 2025 -
Invalid ONNX graph if using float16 dtype for `torch.arange`
#148041 closed
Mar 3, 2025 -
How to get last layer hidden state of transformer model while convert model to onnx format?
#146682 closed
Mar 3, 2025 -
Exporting onnx model to a buffer causes "TypeError: expected str, bytes or os.PathLike object, not BytesIO"
#147909 closed
Mar 3, 2025 -
DISABLED test_custom_hook_custom_stream (__main__.TestHSDPWithCustomHook)
#147767 closed
Mar 3, 2025 -
[inductor] [silence] `torch.cdist` outputs inconsistent results with eager
#148064 closed
Mar 3, 2025 -
IndexError: tuple index out of range when running vLLM script
#147839 closed
Mar 3, 2025 -
`torch.multinomial` outputs inconsistency on ARM and x86
#148247 closed
Mar 3, 2025 -
Huge numerical precision error when `torch.tensor(3811, dtype=torch.float16)`
#148321 closed
Mar 3, 2025 -
HIP error: invalid device function on ROCm RX 7600XT
#147626 closed
Mar 3, 2025 -
`torch.Tensor.pinverse` can cause an `INTERNAL ASSERT FAILED`
#148300 closed
Mar 3, 2025 -
`torch.linalg.cond` can cause an `INTERNAL ASSERT FAILED`
#148299 closed
Mar 3, 2025 -
`torch.linalg.pinv` can cause an `INTERNAL ASSERT FAILED`
#148298 closed
Mar 3, 2025 -
`torch.nansum` can cause a `Segmentation fault (core dumped)`
#148297 closed
Mar 3, 2025 -
INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/NamedTensorUtils.cpp":163, please report a bug to PyTorch
#148278 closed
Mar 3, 2025 -
`torch.sparse.sum` can cause a `Segmentation fault (core dumped)`
#148276 closed
Mar 3, 2025 -
`torch.nn.LazyConvTranspose1d` can cause a `Floating point exception (core dumped)`
#148275 closed
Mar 3, 2025 -
`torch.nn.functional.conv1d` can cause a `Floating point exception (core dumped)`
#147458 closed
Mar 3, 2025 -
`torch.svd` can cause an `INTERNAL ASSERT FAILED`
#147457 closed
Mar 3, 2025 -
[inductor] [cpu] `nn.Tanhshrink-atan2` output inconsistent results with eager
#148241 closed
Mar 3, 2025 -
ncclUnhandledCudaError
#147575 closed
Mar 3, 2025 -
[Break XPU] Newly added test case with CUDA hard code failed on XPU.
#143479 closed
Mar 3, 2025 -
xpu: support triton against clang with nightly wheels
#137518 closed
Mar 3, 2025 -
Error loading ".venv\Lib\site-packages\torch\lib\c10_xpu.dll" or one of its dependencies
#138986 closed
Mar 3, 2025 -
xpu: clarify which Intel GPUs are supported by PyTorch 2.5
#138347 closed
Mar 3, 2025 -
xpu: intel conda channel is not available
#131802 closed
Mar 3, 2025 -
xpu: add check for supported devices to xpu initialization and torch.xpu.is_available()
#131799 closed
Mar 3, 2025 -
xpu: can't build XPU backend without sourcing oneAPI environment variables (/opt/intel/oneapi/setvars.sh)
#127008 closed
Mar 3, 2025 -
xpu: python hangs on exit after check for xpu on multi-dev system
#126259 closed
Mar 3, 2025 -
DISABLED test_cond_autograd_zeros_unused_branch_complex_compile_mode_compile (__main__.TestControlFlow)
#148309 closed
Mar 3, 2025 -
DISABLED test_cond_autograd_zeros_unused_branch_complex_compile_mode_compile (__main__.TestControlFlow)
#148308 closed
Mar 3, 2025 -
[RFC] Add CPP INT8 SDPA Template for Inductor CPU
#144941 closed
Mar 3, 2025 -
DISABLED test_add_tuple_non_optional (__main__.TestScript)
#146136 closed
Mar 3, 2025 -
DISABLED test_non_final_return (__main__.TestScript)
#145975 closed
Mar 3, 2025 -
DISABLED test_scriptable_fn_as_attr (__main__.TestScript)
#145972 closed
Mar 3, 2025 -
DISABLED test_pt2_traceable_aot_eager_cpu_float8_e5m2 (__main__.TestFloat8DtypeCPUOnlyCPU)
#144934 closed
Mar 3, 2025 -
DISABLED test_reassign_module_lhs (__main__.TestScript)
#145973 closed
Mar 3, 2025 -
DISABLED test_memory_stats_multigpu (__main__.TestCudaMultiGPU)
#129860 closed
Mar 3, 2025 -
DISABLED test_ternary_right_associative (__main__.TestScript)
#146137 closed
Mar 3, 2025 -
DISABLED test_tensor_subclasses (__main__.TestScript)
#119949 closed
Mar 3, 2025 -
DISABLED test_pybind_type_comparisons (__main__.TestScript)
#145971 closed
Mar 3, 2025 -
DISABLED test_script_outputs (__main__.TestScript)
#145976 closed
Mar 3, 2025 -
DISABLED test_script_annotation (__main__.TestScript)
#145974 closed
Mar 3, 2025 -
DISABLED test_mm_batching (__main__.TestScript)
#119747 closed
Mar 3, 2025 -
[distributed] Register sharding strategy for aten.amax.default to support float8 rowwise scaling
#147578 closed
Mar 2, 2025 -
Excessive memory usage during compilation start up for (atleast some) in place ops
#148165 closed
Mar 2, 2025 -
[Inductor][CPU] SIGSEGV in `torch.slice_copy` with large step value
#147071 closed
Mar 2, 2025 -
Cudnn header files should be copied into build package as well
#47743 closed
Mar 2, 2025 -
The unevenness of torch.randint() during large range(3e9) sampling.
#148175 closed
Mar 2, 2025 -
nn.Matmul return different ret within Parameter and Tensor
#148280 closed
Mar 2, 2025 -
Wrong macro used when building c10/util/bit_cast.h with std::bit_cast
#148263 closed
Mar 1, 2025 -
[dtensor] write aten.split_tensor using op strategy
#130758 closed
Mar 1, 2025 -
PyTorch nightly MPS SDPA op is unusable
#148194 closed
Mar 1, 2025 -
Placeholder tensor is empty!
#123171 closed
Mar 1, 2025 -
CPU-specific Inductor Error with `view` on `torch.nn.Embedding` output
#146390 closed
Mar 1, 2025 -
LoweringException: AttributeError: 'Constant' object has no attribute 'get_name'
#141197 closed
Feb 28, 2025 -
Could not guard on data-dependent expression u0 - 7 < 0 (unhinted: u0 - 7 < 0). (Size-like symbols: u0)
#128644 closed
Feb 28, 2025 -
[inductor][cpu]transformers models static/dynamic quant performance/accuracy crash in 2024-06-17 nightly release
#128933 closed
Feb 28, 2025 -
randperm + torch.compile + SAC + CUDA graphs doesn't work
#130123 closed
Feb 28, 2025 -
Extending more info on fake tensor when compiling
#130234 closed
Feb 28, 2025 -
DISABLED test_triton_kernel_multiple_out (__main__.AutogradFunctionTests)
#147214 closed
Feb 28, 2025 -
Rule-based reconfig-and-recompile
#127999 closed
Feb 28, 2025 -
TORCH_COMPILE_CPROFILE=1 broken (strobelight might always be on internally?)
#131953 closed
Feb 28, 2025 -
torch._dynamo.exc.Unsupported: call_function args: TensorVariable() UserDefinedObjectVariable(_tuplegetter)
#131411 closed
Feb 28, 2025 -
[triton 3.3][cpp_wrapper] TypeError: 'NoneType' object is not subscriptable
#148111 closed
Feb 28, 2025 -
F.interpolate returns NAN on MPS if align_corner is True.
#144245 closed
Feb 28, 2025 -
`AssertionError` in `torch.compile`
#147840 closed
Feb 28, 2025 -
DISABLED test_pt2_traceable_aot_eager_cpu_float8_e4m3fn (__main__.TestFloat8DtypeCPUOnlyCPU)
#144903 closed
Feb 28, 2025 -
[TensorDict - compile] Dynamo doens't like a simple class decorator - but a function is fine
#130533 closed
Feb 28, 2025 -
[Inductor-CPU] LLaMA doesn't use templated GEMMs for da8w8 quantization for next-token generation
#147954 closed
Feb 28, 2025 -
DISABLED test_profile_all_threads (__main__.TestProfiler)
#145951 closed
Feb 28, 2025 -
Very large memory increase when combining bfloat16 autocast with torch.compile
#133637 closed
Feb 28, 2025 -
Importing torch_tensorrt causes warning for implicitly cleaned up file
#147744 closed
Feb 28, 2025 -
DISABLED test_arange_dynamic_cuda (__main__.TestInductorDynamicCUDA)
#127067 closed
Feb 28, 2025 -
Long queue for macOS runners
#148127 closed
Feb 28, 2025 -
Torch 2.7.0 nightly cuda 12.6 and cuda 12.8 builds are broken on Amazon linux 2023
#148120 closed
Feb 28, 2025 -
torch.clear_autocast_cache is not traceable
#140759 closed
Feb 27, 2025 -
SubgraphLoweringException in flex_attention when using custom score_mod with torch.dot (MLA)
#148107 closed
Feb 27, 2025 -
DISABLED test_insignificant_strides (__main__.SDPAPatternRewriterCudaDynamicTests)
#146959 closed
Feb 27, 2025 -
FUNC_INLINELIST doesn't exist
#144868 closed
Feb 27, 2025 -
td does not detect required test for mkl-dnn OneDNN update
#148085 closed
Feb 27, 2025 -
DISABLED test_mismatched_global_state (__main__.GraphRegionTrackerTests)
#144895 closed
Feb 27, 2025
170 Issues opened by 96 people
-
[Profiler][HPU] Incorrect availabilities for the HPU device
#148661 opened
Mar 6, 2025 -
Extra onnx::Neg_2 input after torch.onnx.export
#148655 opened
Mar 6, 2025 -
Avoid fork for TORCHINDUCTOR_COMPILE_THREADS > 1
#148651 opened
Mar 6, 2025 -
[Inductor-CPU] With cpp-wrapper, some ATen ops don't get profiled with PyTorch profiler
#148650 opened
Mar 6, 2025 -
[inductor][torchbench][CI] timm models got obvious performance drop with --ci flag
#148645 opened
Mar 6, 2025 -
DISABLED test_set_stance_aot_eager_then_compile (__main__.DecoratorTests)
#148644 opened
Mar 6, 2025 -
DISABLED test_symint_in_slice_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148643 opened
Mar 6, 2025 -
Feature request: throw `torch.cuda.OutOfMemoryError` for TorchScript OOM
#148642 opened
Mar 6, 2025 -
README doesn't explain how to run tests in the "Test PyTorch" section
#148634 opened
Mar 6, 2025 -
DISABLED test_sdpa_rewriter_11_cuda (__main__.SDPAPatternRewriterCudaDynamicTests)
#148631 opened
Mar 6, 2025 -
Dynamo export: dynamic dims are not exported with the specified names
#148629 opened
Mar 6, 2025 -
DISABLED test_host_memory_stats (__main__.TestCuda)
#148607 opened
Mar 5, 2025 -
DISABLED test_nested_tuple_output_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148606 opened
Mar 5, 2025 -
AOTI takes very long time to compile (1:40 hours)
#148572 opened
Mar 5, 2025 -
Suggested fixes sometimes not enough in export
#148568 opened
Mar 5, 2025 -
[Feature Request] Dynamic shapes API requires spec for all arguments.
#148564 opened
Mar 5, 2025 -
DISABLED test_internal_nonlocal_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148558 opened
Mar 5, 2025 -
AOTI is OOM-ing when eager doesn't
#148557 opened
Mar 5, 2025 -
Return type annotation of `Tensor.long()` etc is not narrowed down to dtype-specific names `LongTensor` etc
#148552 opened
Mar 5, 2025 -
Issue with torch.compile
#148551 opened
Mar 5, 2025 -
Name 'equal_valued' cannot be imported in pytorch 2.5.0
#148550 opened
Mar 5, 2025 -
Name 'equal_valued' cannot be imported in pytorch 2.5.0
#148549 opened
Mar 5, 2025 -
Issue with Sparse Tensor Matrix Multiplication and Broadcasting
#148548 opened
Mar 5, 2025 -
F.scaled_dot_product_attention calculation output is nan when in dynamic dim under torch.compile mode
#148545 opened
Mar 5, 2025 -
DISABLED test_inlined_functions_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148544 opened
Mar 5, 2025 -
exported modules using custom autograd functions will ignore custom backward function
#148539 opened
Mar 5, 2025 -
Computing only the n first rows of a distance matrix with pdist
#148536 opened
Mar 5, 2025 -
[inductor][cpu]speech_transformer failure in 2025-03-02 nightly release
#148535 opened
Mar 5, 2025 -
[FSDP2] CPU Offload Doest Not Work with `torch.nn.utils.clip_grad_norm`
#148532 opened
Mar 5, 2025 -
macos15 M4 can not install torch-2.6.0-cp310-none-macosx_11_0_arm64.whl
#148528 opened
Mar 5, 2025 -
[FlexAttention] Error using create_block_mask with mask head number greater than 1
#148527 opened
Mar 5, 2025 -
DISABLED test_sdpa_rewriter_11_cuda (__main__.SDPAPatternRewriterCudaTests)
#148525 opened
Mar 5, 2025 -
DISABLED test_graph_break_before___enter__ (__main__.ContextlibContextManagerTests)
#148524 opened
Mar 5, 2025 -
DISABLED test_globals_change_in_other_file (__main__.ContextlibContextManagerTests)
#148523 opened
Mar 5, 2025 -
reshape is decomposed to view setting allow_copy=False making it fail in some case!
#148519 opened
Mar 5, 2025 -
Preview (Nightly) version cuda12.8 cannot find torchaudio file
#148518 opened
Mar 5, 2025 -
DISABLED test_set_stance_eager_then_compile (__main__.DecoratorTests)
#148515 opened
Mar 5, 2025 -
DISABLED test_freevars_as_inputs_to_wrap_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148514 opened
Mar 5, 2025 -
[MAIA][Autocast] torch.autocast doesn't work on MAIA device
#148510 opened
Mar 5, 2025 -
[FSDP2] improve error msg for duplicate wraps
#148504 opened
Mar 4, 2025 -
`distributed.checkpoint.async_save` leading to `TypedStorage is deprecated.`
#148498 opened
Mar 4, 2025 -
UNSTABLE trunk / libtorch-linux-focal-cuda12.4-py3.10-gcc9-debug / build
#148495 opened
Mar 4, 2025 -
export lift_constants_pass creates ugly warning
#148487 opened
Mar 4, 2025 -
Export shouldn't warn when registering constant tensor attribute on graph module.
#148482 opened
Mar 4, 2025 -
export is emitting too many not actionable warnings.
#148479 opened
Mar 4, 2025 -
export dynamic shapes API throws weird error on upper bound.
#148478 opened
Mar 4, 2025 -
XPU not available until I sign into server locally
#148477 opened
Mar 4, 2025 -
Illegal memory access in scaled_dot_product_attention if only attn_mask requires grad
#148476 opened
Mar 4, 2025 -
Dynamo replaces exception by hard error in `run_node`
#148475 opened
Mar 4, 2025 -
DISABLED test_capture_untracked_nonlocal_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148464 opened
Mar 4, 2025 -
DISABLED test_set_stance_eager_then_compile_with_graph_break (__main__.DecoratorTests)
#148463 opened
Mar 4, 2025 -
[dynamo] Memory leak
#148460 opened
Mar 4, 2025 -
backport torch.library.custom_op (and improvements) to older versions of PyTorch
#148457 opened
Mar 4, 2025 -
BC-linter should ignore testing/linter/adapters/
#148451 opened
Mar 4, 2025 -
Union type raise error when running python with argument "-O" for torch script.
#148447 opened
Mar 4, 2025 -
DISABLED test_user_defined_binop (__main__.MiscTests)
#148443 opened
Mar 4, 2025 -
DISABLED test_capture_untracked_global_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148442 opened
Mar 4, 2025 -
torch.distributed hangs between 2 Mac Devices
#148440 opened
Mar 4, 2025 -
ERROR: I got an error about FSDP, when I trained flux model of sparsity with NVIDIA TensorRT Model Optimizer
#148434 opened
Mar 4, 2025 -
DISABLED test_sys_modules (__main__.MiscTests)
#148428 opened
Mar 4, 2025 -
DISABLED test_capture_tracked_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148427 opened
Mar 4, 2025 -
DISABLED test_empty_graph_nested_calls_fullgraph_False_dynamic_shapes (__main__.DynamicShapesReproTests)
#148426 opened
Mar 4, 2025 -
[ROCM] `linalg.eigh` crash with `float64` dtype and shape `[8192,8192]`
#148425 opened
Mar 4, 2025 -
Float8_e4m3fn
#148420 opened
Mar 4, 2025 -
Torch 2.6 doesn't have TCPStore::TCPStore symbol in cu126 binary, but it's available in headers
#148411 opened
Mar 4, 2025 -
The apis in torch._C._nn.pyi is nonexhaustive
#148404 opened
Mar 4, 2025 -
Generate two reduction loops for vectorization
#148402 opened
Mar 4, 2025 -
[inductor][fuzzer] `IndexError` error at `torch.dstack`
#148397 opened
Mar 4, 2025 -
DISABLED test_shape_int_inplace_binops (__main__.MiscTests)
#148392 opened
Mar 4, 2025 -
DISABLED test_untracked_inputs_in_constraints_dynamic_shapes (__main__.DynamicShapesExportTests)
#148390 opened
Mar 4, 2025 -
DISABLED test_sdpa_rewriter_14_cuda (__main__.SDPAPatternRewriterCudaTests)
#148391 opened
Mar 4, 2025 -
[export] Unable to trace ops like min/pow
#148389 opened
Mar 4, 2025 -
[Inductor-CPU] Debug util request: fine-grained mechanism to disable out-of-template epilogues
#148382 opened
Mar 3, 2025 -
Improve nested jagged tensor select performance on batch dim
#148379 opened
Mar 3, 2025 -
DISABLED test_param_shape_binops (__main__.MiscTests)
#148369 opened
Mar 3, 2025 -
DISABLED test_export_with_cond_dynamic_shape_pred_dynamic_shapes (__main__.DynamicShapesExportTests)
#148368 opened
Mar 3, 2025 -
`torch.nn.functional` inconsistent documentation
#148353 opened
Mar 3, 2025 -
torch._check(x > 0) should do something sane when x is a Tensor
#148349 opened
Mar 3, 2025 -
Installation of `pytorch==2.6.0+cu124` doesn't install `triton` and `nvidia` libraries
#148345 opened
Mar 3, 2025 -
[RFC][PGNCCL] Add Float8 support
#148344 opened
Mar 3, 2025 -
[CI] [anaconda] CI Perf Tests
#148342 opened
Mar 3, 2025 -
[CI] [anaconda] Review Devcontainer anaconda usage
#148341 opened
Mar 3, 2025 -
[CI] [anaconda] CI Build and Test scripts MacOS
#148340 opened
Mar 3, 2025 -
[Docs] [anaconda] Review and update
#148339 opened
Mar 3, 2025 -
[CI] [anaconda] CI Build and Test scripts Windows
#148338 opened
Mar 3, 2025 -
[CI] [anaconda] CI Build and Test scripts Linux
#148336 opened
Mar 3, 2025 -
[CI] [anaconda] Docker files have conda environment installed
#148335 opened
Mar 3, 2025 -
[FSDP2] Issues with model not running on all ranks - Grads not matching fairscale implementation
#148334 opened
Mar 3, 2025 -
[export][torchbench] moco fails
#148333 opened
Mar 3, 2025 -
DISABLED test_mark_unbacked_strict (__main__.MiscTests)
#148332 opened
Mar 3, 2025 -
DISABLED test_export_defaults_ok_dynamic_shapes (__main__.DynamicShapesExportTests)
#148331 opened
Mar 3, 2025 -
DISABLED test_sys_modules_dynamic_shapes (__main__.DynamicShapesMiscTests)
#148330 opened
Mar 3, 2025 -
[AOTI][torchbench] microbench_unbacked_tolist_sum fails
#148329 opened
Mar 3, 2025 -
fx graph fails to recognize tensor.T as a 'call_method' node
#148326 opened
Mar 3, 2025 -
`torch.linalg` routines break for inputs of more than 2**32 elements
#148324 opened
Mar 3, 2025 -
compile SageAttention faing error C2872: “std” for latest torch nightly
#148317 opened
Mar 3, 2025 -
[Doc] [Win] libuv installation doc is not correct.
#148315 opened
Mar 3, 2025 -
The recorded step number in profiler is wrong
#148314 opened
Mar 3, 2025 -
DISABLED test_int_shape_inplace_binops (__main__.MiscTests)
#148312 opened
Mar 3, 2025 -
DISABLED test_empty_graph_nested_calls_fullgraph_True_dynamic_shapes (__main__.DynamicShapesReproTests)
#148311 opened
Mar 3, 2025 -
batching rule for `aten::scatter_add_`
#148307 opened
Mar 3, 2025 -
torch.vmap incompatibility with DLPack functions
#148306 opened
Mar 3, 2025 -
broadcast_object_list not release GPU
#148302 opened
Mar 3, 2025 -
DISABLED test_int_shape_binops (__main__.MiscTests)
#148296 opened
Mar 3, 2025 -
DISABLED test_dont_aggressively_write_assert_dynamic_shapes (__main__.DynamicShapesReproTests)
#148295 opened
Mar 3, 2025 -
Triton Kernel Rejects NamedTupleVariable Arguments
#148289 opened
Mar 2, 2025 -
RuntimeError: use_libuv was requested but PyTorch was build without libuv support
#148283 opened
Mar 2, 2025 -
SIGSEGV due to insufficient return value checking for PyFrame_GetLocals
#148273 opened
Mar 1, 2025 -
Should DTensor support `Shard()` placement without dim requirement?
#148269 opened
Mar 1, 2025 -
ONNX Export Produces main_graph Instead of torch_jit and Fails on aten::format in PyTorch 2.x
#148268 opened
Mar 1, 2025 -
Raise a warning when `torch.nn.utils.clip_grad_norm_` receives an exhausted generator
#148259 opened
Mar 1, 2025 -
[FSDP2] HSDP with globally sharded fp32 weights and optimizer states
#148257 opened
Mar 1, 2025 -
Simplify package_data handling in setup.py
#148256 opened
Mar 1, 2025 -
Improve Notation for Score Function in Documentation
#148253 opened
Mar 1, 2025 -
Errors: train a model of sparsity with tensorrt-model-optimization and FSDP.
#148251 opened
Mar 1, 2025 -
BrokenPipeError: [Errno 32] Broken pipe when lacking Numpy package
#148250 opened
Mar 1, 2025 -
[inductor] `nn.Upsample-torch.linalg.lu_factor` outputs inconsistent results with eager
#148244 opened
Mar 1, 2025 -
[FSDP2] Unclear behavior of `ignored_params` in `fully_shard`
#148242 opened
Mar 1, 2025 -
Add Structured Knowledge Accumulation (SKA) Layer to PyTorch
#148232 opened
Mar 1, 2025 -
make saved_tensor_hooks work better in compile for doing activation compression
#148222 opened
Feb 28, 2025 -
MPS vs Metal vs CPU performance comparison
#148219 opened
Feb 28, 2025 -
DISABLED test_dynamic_sources_dynamic_override (__main__.MiscTests)
#148218 opened
Feb 28, 2025 -
DISABLED test_guard_failure_fn2 (__main__.MiscTests)
#148217 opened
Feb 28, 2025 -
DISABLED test_guard_failure_fn_shape_control_dynamic_shapes (__main__.DynamicShapesMiscTests)
#148216 opened
Feb 28, 2025 -
DISABLED test_mark_unbacked_strict_dynamic_shapes (__main__.DynamicShapesMiscTests)
#148215 opened
Feb 28, 2025 -
DISABLED test_dynamic_sources_dynamic_override_dynamic_shapes (__main__.DynamicShapesMiscTests)
#148214 opened
Feb 28, 2025 -
Regression: Missing Symbols in PyTorch DLL (torch_python)
#148208 opened
Feb 28, 2025 -
Add option to shut down idle async_compile workers after timeout
#148207 opened
Feb 28, 2025 -
Compile breaks flex-attention with jagged tensors
#148201 opened
Feb 28, 2025 -
[Inductor-CPU] qlinear_binary output may have undefined strides with dynamic shape support
#148199 opened
Feb 28, 2025 -
[inductor][triton] Decide how to deprecate "old triton versions"
#148196 opened
Feb 28, 2025 -
Implement batching rule for masked_fill_
#148183 opened
Feb 28, 2025 -
Dynamo failure on handling list comparisons
#148179 opened
Feb 28, 2025 -
[Inductor] Layout created with non-sympy.Expr sizes
#148172 opened
Feb 28, 2025 -
Inference llama after Export PTQ
#148171 opened
Feb 28, 2025 -
PyTorch's nightly version no longer includes the CU118, CU124, and CU121 versions
#148169 opened
Feb 28, 2025 -
Build pytorch for rocm failed
#148167 opened
Feb 28, 2025 -
DISABLED test_nonstrict_trace_pre_existing_custom_class (__main__.DecoratorTests)
#148166 opened
Feb 28, 2025 -
Specifying device_id in init_process_group causes tensor parallel + pipeline parallel to fail
#148162 opened
Feb 28, 2025 -
[MPS][Complex] Conjugations are broken
#148156 opened
Feb 28, 2025 -
Unify OpOverload._get_dispatch and HigherOrderOperator.dispatch
#148146 opened
Feb 28, 2025 -
Torch export does not preserve original edges between nodes
#148144 opened
Feb 28, 2025 -
Applying online softmax patterns on joint_graph cause 1.2x peak memory regression for TB hf_T5_base model
#148141 opened
Feb 28, 2025 -
Conv/pool doc on ceilmode wrong
#148123 opened
Feb 27, 2025 -
[cutlass backend] C++ compile error for CUTLASS config only get resolved in autotuning stage
#148122 opened
Feb 27, 2025 -
Add int32 support to torch.gather
#148119 opened
Feb 27, 2025 -
[inductor][triton] Explicit kernel-arg mismatch checks
#148116 opened
Feb 27, 2025 -
We should never throw vanilla C++ exceptions
#148114 opened
Feb 27, 2025 -
[inductor][triton] introduce better "APIs" in triton that can clean up our triton/inductor integration
#148113 opened
Feb 27, 2025 -
MLA with Learnable RoPE Tensors is Broken with Flex Attention
#148112 opened
Feb 27, 2025 -
[CI] Remove conda usage from lint related jobs
#148110 opened
Feb 27, 2025 -
NotImplementedError: FlexAttentionHigherOrderVariable() has no type
#148106 opened
Feb 27, 2025 -
Initial investigation for removing MOD_SKIPLIST
#148103 opened
Feb 27, 2025 -
onnx dynamo export does not support aten bucketize
#148098 opened
Feb 27, 2025 -
DISABLED test_njt_causal_bfloat16 (__main__.TestFlexAttention)
#148095 opened
Feb 27, 2025 -
DISABLED test_split_dynamic (__main__.AutoFunctionalizeTests)
#148094 opened
Feb 27, 2025 -
DISABLED test_dynamo_timed (__main__.TestDynamoTimed)
#148093 opened
Feb 27, 2025 -
FSDP2 without sharding works slower than DDP
#148086 opened
Feb 27, 2025
490 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add Windows Arm64 Nightly Builds
#139760 commented on
Mar 6, 2025 • 29 new comments -
[HOP] Mutation and alias rework
#146658 commented on
Mar 6, 2025 • 20 new comments -
Add `__context/cause/suppress_context/traceback__` to Exception
#146499 commented on
Mar 5, 2025 • 17 new comments -
[export][dynamic shapes] add Dim._OBLIVIOUS, _mark_oblivious()
#147881 commented on
Mar 3, 2025 • 15 new comments -
[inductor] Fix issue with set_linter, improve linter framework
#144620 commented on
Mar 4, 2025 • 13 new comments -
[pytree] Add public pytree module `torch.utils.pytree`
#137400 commented on
Mar 5, 2025 • 12 new comments -
fix indirect broadcast
#145992 commented on
Mar 6, 2025 • 10 new comments -
Fix `torch.nn.functional.hardswish` gradients corner case
#148049 commented on
Mar 6, 2025 • 10 new comments -
Support subclass constructor capturing in export
#147014 commented on
Mar 6, 2025 • 10 new comments -
[pytree] add APIs to determine a class is a namedtuple or PyStructSequence
#113257 commented on
Mar 5, 2025 • 9 new comments -
Enable Accelerator to perform streaming backward
#142097 commented on
Mar 6, 2025 • 9 new comments -
Enable fast qlinear_dynamic path for AArch64 through ACL directly
#145942 commented on
Mar 5, 2025 • 9 new comments -
Custom ops support arbitrary input types by migrating to python dispatcher
#147927 commented on
Mar 6, 2025 • 8 new comments -
[Inductor][CPP] Add float16 support for CppMicroGemmAMX
#147368 commented on
Mar 6, 2025 • 7 new comments -
[Intel GPU] int4 WOQ gemm XPU Support
#137566 commented on
Mar 4, 2025 • 7 new comments -
torch.utils.checkpoint preserves torch function mode stack during recompute
#148023 commented on
Mar 4, 2025 • 4 new comments -
Replace `unimplemented` with `unimplemented_v2' in `codegen.py`
#148069 commented on
Mar 5, 2025 • 4 new comments -
Parallelize bf16->f32 conversion for gemm(bf16:bf16->bf16)
#147864 commented on
Mar 3, 2025 • 4 new comments -
Change arg_kwarg_vals propagation strategy
#148046 commented on
Mar 6, 2025 • 4 new comments -
[export] Add support for invoke_subgraph
#147863 commented on
Mar 4, 2025 • 4 new comments -
[export] Add export_cache
#147992 commented on
Mar 4, 2025 • 3 new comments -
[ci][anaconda] Remove conda from linter docker images
#147789 commented on
Mar 5, 2025 • 3 new comments -
Increase reference count of state tensor in `THPGenerator_reduce` to avoid premature garbage collection in `multiprocessing` spawn method `"forkserver"` and `"spawn"`
#147907 commented on
Mar 5, 2025 • 3 new comments -
cpu: aarch64: enable gemm-bf16f32
#140159 commented on
Mar 6, 2025 • 2 new comments -
Enable onednn in pytorch for ppc64le architecture
#143743 commented on
Mar 6, 2025 • 2 new comments -
Remove unused rand call if not fallback to eager for rand
#147790 commented on
Mar 6, 2025 • 2 new comments -
Correctly propagate exception to parent tx
#146502 commented on
Mar 5, 2025 • 2 new comments -
add grad_output shape check for adaptive_avg_pool2d_backward
#145241 commented on
Mar 5, 2025 • 2 new comments -
[import][inductor] Simplify grid handling
#147583 commented on
Mar 4, 2025 • 2 new comments -
add `torch.float4_e2m1fn_x2` to PyTorch
#146578 commented on
Feb 28, 2025 • 2 new comments -
[CD] Enable triton xpu windows build
#147637 commented on
Mar 6, 2025 • 2 new comments -
Add torch._library.utils.normalize_args_kwargs
#148062 commented on
Feb 27, 2025 • 2 new comments -
Deprecate DataLoader pin_memory_device param
#146821 commented on
Mar 4, 2025 • 2 new comments -
[export] check non-negative modulus, avoid unnecessary congruences, in export solver
#144925 commented on
Mar 4, 2025 • 2 new comments -
cpp_wrapper: reduce memory usage by removing unneeded temporaries
#147403 commented on
Mar 6, 2025 • 2 new comments -
experimental proposal DCP v2
#146999 commented on
Mar 3, 2025 • 2 new comments -
Enable a fast path for (static) qlinear for AArch64 through ACL directly.
#147337 commented on
Mar 5, 2025 • 2 new comments -
[torch/elastic][upstream] Fix the wrong order when start_index is not 0
#147967 commented on
Mar 5, 2025 • 2 new comments -
[ROCm] opportunistic fastatomics for ReduceAdd operations for MI300 GPUs
#146264 commented on
Feb 28, 2025 • 2 new comments -
Add ppc64le wheel build support
#147194 commented on
Mar 4, 2025 • 1 new comment -
[Dtensor] Pass device information in OffsetBasedRNGTracker
#147594 commented on
Mar 5, 2025 • 1 new comment -
Fix the Problems About Defining Static Variable in Inline Function
#147095 commented on
Mar 4, 2025 • 1 new comment -
[ONNX] Add draft_export as a strategy
#147529 commented on
Mar 5, 2025 • 1 new comment -
Sticky cache API for torch.compile
#147528 commented on
Feb 28, 2025 • 1 new comment -
Small scheduler refactors
#147410 commented on
Feb 28, 2025 • 1 new comment -
Make codegen dynamic shapes more device agnostic
#146830 commented on
Mar 5, 2025 • 1 new comment -
Enable pt2e quantization path for arm
#146690 commented on
Mar 5, 2025 • 1 new comment -
[CD] Annotate linux/arm64 cuda wheels with consistent nvidia dependencies
#145021 commented on
Mar 6, 2025 • 1 new comment -
Use the device interface for detecting Triton availability
#139171 commented on
Feb 28, 2025 • 1 new comment -
[WIP][ptd][nccl] use current-stream as nccl-stream under async=False mode
#147820 commented on
Mar 3, 2025 • 1 new comment -
test 0-dim squeeze in basic.TestSqueeze
#147928 commented on
Mar 3, 2025 • 1 new comment -
Introduce `UserDefinedExceptionClassVariable`
#146504 commented on
Mar 5, 2025 • 1 new comment -
Facilitate at::_weight_int4pack_mm_with_scale_and_zeros related registration
#147962 commented on
Mar 4, 2025 • 1 new comment -
fix simple-spec crash
#147723 commented on
Feb 28, 2025 • 1 new comment -
Add needs_exact_strides operator tag for Inductor to force exact strides
#148063 commented on
Mar 6, 2025 • 1 new comment -
[Intel CPU] Fix issue #143483.
#144854 commented on
Mar 3, 2025 • 0 new comments -
[Intel CPU] Fix issue #143482.
#144760 commented on
Mar 3, 2025 • 0 new comments -
optimize the decomposition of aten.native_group_norm
#144733 commented on
Mar 2, 2025 • 0 new comments -
Generalize poison fork logic for each device backend
#144664 commented on
Mar 5, 2025 • 0 new comments -
[will-not-merge] tuning
#145798 commented on
Mar 6, 2025 • 0 new comments -
[AsyncMM] re-enable and adapt to cutlass 3.6.0 (#144011)
#145811 commented on
Mar 6, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `torch/[a-c]*/` to `ruff format`
#144554 commented on
Mar 2, 2025 • 0 new comments -
[inductor] Enable docstring_linter on _inductor
#144622 commented on
Mar 4, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `test/[a-h]*/` to `ruff format`
#144555 commented on
Mar 3, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `test/[i-z]*/` to `ruff format`
#144556 commented on
Mar 5, 2025 • 0 new comments -
[inductor] Add tests for new docstring_linter features (fix #142496)
#144621 commented on
Mar 4, 2025 • 0 new comments -
[Test][Linalg][CUDA] Increase niter in test_svd_lowrank_cuda_float64
#145930 commented on
Mar 3, 2025 • 0 new comments -
[BE][PYFMT] remove `black`: finish `black -> ruff format` migration
#144557 commented on
Feb 28, 2025 • 0 new comments -
Remove unneeded CUDA logic from _create_build_env
#145822 commented on
Mar 4, 2025 • 0 new comments -
[inductor] Add features to docstring_linter (see #142496)
#145834 commented on
Mar 4, 2025 • 0 new comments -
Guard the CPU cpp wrapper tests on having a cpp wrapper
#145847 commented on
Mar 4, 2025 • 0 new comments -
[micro_pipeline_tp] add logging for all-gather-matmul fusion
#145594 commented on
Mar 6, 2025 • 0 new comments -
Open up PT UTs to cover additional devices
#145589 commented on
Mar 6, 2025 • 0 new comments -
[micro_pipeline_tp] support pattern matching row-wise scaled_mm with sharded scale
#145595 commented on
Mar 6, 2025 • 0 new comments -
Add device support for chunk_cat, all_gather_copy_in, and split_with_…
#145600 commented on
Mar 3, 2025 • 0 new comments -
General Changes for multi accelerators
#145521 commented on
Mar 6, 2025 • 0 new comments -
Simplify functional composition in _aot_autograd/dispatch_and_compile_graph.py
#145636 commented on
Mar 6, 2025 • 0 new comments -
Remove unnecessary "special linking" for `BLAS_LIBRARIES`
#145487 commented on
Mar 5, 2025 • 0 new comments -
removed check for ConvTranspose3D on MPS
#145366 commented on
Feb 28, 2025 • 0 new comments -
[c10d] implement ReduceOp.unbox()
#145652 commented on
Mar 6, 2025 • 0 new comments -
Fix incorrect citation of authors in documentation
#145209 commented on
Mar 6, 2025 • 0 new comments -
[Inductor] optimize welford reduction
#145061 commented on
Mar 5, 2025 • 0 new comments -
[Easy] update pip sources for ROCm in nightly pull tool
#145685 commented on
Feb 28, 2025 • 0 new comments -
Fix support for nccl < 2.17
#145719 commented on
Mar 1, 2025 • 0 new comments -
[Async-TP] Port _fused_all_gather_matmul_native to cpp to reduce launching overhead
#145794 commented on
Mar 6, 2025 • 0 new comments -
Made partitioning more(?) deterministic
#145024 commented on
Feb 28, 2025 • 0 new comments -
[AsyncMM] preliminary tuning
#145795 commented on
Mar 6, 2025 • 0 new comments -
[test] fix unit test
#144977 commented on
Mar 3, 2025 • 0 new comments -
[Async-TP] _pipelined_multi_all_gather_and_consume reduce overhead
#145796 commented on
Mar 6, 2025 • 0 new comments -
[Async-TP] improve algo selection
#145797 commented on
Mar 6, 2025 • 0 new comments -
Add the max_autotune tests in the periodic jobs.
#143560 commented on
Mar 6, 2025 • 0 new comments -
Fix space typo in warning message
#143473 commented on
Mar 4, 2025 • 0 new comments -
[while_loop][jit inductor] auto-unspecialize int input and output to unbacked symints
#143457 commented on
Mar 1, 2025 • 0 new comments -
Remove all dead type ignores (round 2)
#143256 commented on
Mar 3, 2025 • 0 new comments -
Enable AArch64 CI scripts to be used for local dev
#143190 commented on
Feb 28, 2025 • 0 new comments -
Support unique id for Tensor Storage Object
#143093 commented on
Mar 4, 2025 • 0 new comments -
Update low prec codegen for div/mod
#142350 commented on
Mar 3, 2025 • 0 new comments -
Fix more undefined errors in TypeCast.h
#142346 commented on
Mar 6, 2025 • 0 new comments -
[DO NOT MERGE][WIP] CI: Dispatch PR events to the out-of-tree test infra
#142114 commented on
Mar 4, 2025 • 0 new comments -
Fix platform detection in MKLDNN CMake file
#142067 commented on
Mar 3, 2025 • 0 new comments -
Add AOT inductor support for _scaled_mm for CPU
#141961 commented on
Mar 6, 2025 • 0 new comments -
Banned ever saving unclaimed nodes
#141940 commented on
Feb 28, 2025 • 0 new comments -
Enable CUDA 12.6 OSS CI
#140793 commented on
Mar 5, 2025 • 0 new comments -
ILP for auto FSDP wrapping
#140298 commented on
Feb 28, 2025 • 0 new comments -
[Environment Variable][6/N] Use thread-safe getenv functions
#140200 commented on
Mar 5, 2025 • 0 new comments -
Add torch._scaled_mm for CPU
#139975 commented on
Mar 6, 2025 • 0 new comments -
[cuDNN] Add an option to force cuDNN usage (incl. SDPA)
#139699 commented on
Mar 6, 2025 • 0 new comments -
Refactor CMake to install header by build option
#139469 commented on
Mar 3, 2025 • 0 new comments -
Fix `USE_STATIC_MKL` lost functionality
#138996 commented on
Feb 27, 2025 • 0 new comments -
[Docker] Create an independent dependecies layer
#138612 commented on
Mar 3, 2025 • 0 new comments -
[Inductor] introduce comm buffer planning
#138519 commented on
Mar 6, 2025 • 0 new comments -
Extending SVE VEC Backend Support in PyTorch to SVE128 and SVE512.
#138388 commented on
Feb 28, 2025 • 0 new comments -
[WIP] Add CachingDeviceAllocatorInterface as the base device allocator
#138222 commented on
Mar 4, 2025 • 0 new comments -
[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification
#138214 commented on
Mar 3, 2025 • 0 new comments -
[POC][FX][pytree] cleanup fx pytree implementation
#138202 commented on
Mar 3, 2025 • 0 new comments -
Add TORCH_CHECK_INDEX in convert_indices_from_coo_to_csr_cpu
#138068 commented on
Feb 28, 2025 • 0 new comments -
DISABLED test_repeated_calling_cuda (__main__.AOTInductorTestABICompatibleGpu)
#146185 commented on
Mar 3, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `torch/[e-n]*/` to `ruff format`
#144553 commented on
Mar 2, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `torch/[p-z]*/` to `ruff format`
#144552 commented on
Mar 2, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `torch/_[a-h]*/` to `ruff format`
#144551 commented on
Mar 2, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `{torch,test}/{nn,optim}/**` to `ruff format`
#144548 commented on
Mar 3, 2025 • 0 new comments -
[20/N] Fix extra warnings brought by clang-tidy-17
#144473 commented on
Mar 4, 2025 • 0 new comments -
Support Swiglu for Module and functional
#144465 commented on
Mar 5, 2025 • 0 new comments -
improve WOQ first token performance on CPU
#144463 commented on
Mar 5, 2025 • 0 new comments -
[BE][pytree][Easy] change imports `torch.utils._pytree` -> `torch.utils.pytree.python`
#144405 commented on
Mar 3, 2025 • 0 new comments -
codecache.py: Utilize precompiled headers for CPP python bindings
#144349 commented on
Mar 1, 2025 • 0 new comments -
[pytree][1/N] change pytree usages to implementation agnostic: `torch.distributed`
#144332 commented on
Mar 3, 2025 • 0 new comments -
codecache: Remove cpp_prefix.h duplication per build, then precompile it
#144293 commented on
Mar 1, 2025 • 0 new comments -
Add CUDA aarch64 triton wheel build
#144049 commented on
Mar 4, 2025 • 0 new comments -
Fix to torch.hub documentation grammar mistakes.
#144016 commented on
Mar 3, 2025 • 0 new comments -
Brister/always tiled reduction
#144008 commented on
Mar 2, 2025 • 0 new comments -
Fix typo: change 'recieve' into 'receive'
#143981 commented on
Feb 28, 2025 • 0 new comments -
Fixed bug in FindMKL.cmake
#143980 commented on
Mar 1, 2025 • 0 new comments -
[ci] Add riscv opt-int build
#143979 commented on
Mar 6, 2025 • 0 new comments -
[Submodule] Bump flatbuffers to v24.12.23
#143964 commented on
Mar 5, 2025 • 0 new comments -
using more descriptive alt text for accessibility
#143958 commented on
Feb 28, 2025 • 0 new comments -
Remove aten/src/ATen/core/Array.h
#143950 commented on
Mar 1, 2025 • 0 new comments -
Add `_benchmark_func` convenience method
#143911 commented on
Feb 28, 2025 • 0 new comments -
Remove remove_non_owning_ref_types
#143805 commented on
Mar 3, 2025 • 0 new comments -
[CI] enable operator benchmark on CPU
#143733 commented on
Feb 27, 2025 • 0 new comments -
Apply clang-format for ATen/core/op_registration headers
#143730 commented on
Mar 3, 2025 • 0 new comments -
ci: Add scaffolding for buidling wheels sequentially
#143672 commented on
Mar 4, 2025 • 0 new comments -
Extend vec backend with BF16 SVE intrinsics
#143666 commented on
Mar 5, 2025 • 0 new comments -
Attempt to speed up MPS getTensorStringKey
#143630 commented on
Mar 4, 2025 • 0 new comments -
Fix test_tensorboard when started w/o tensorboard package
#148079 commented on
Feb 27, 2025 • 0 new comments -
Onednn pri cache
#147693 commented on
Mar 3, 2025 • 0 new comments -
[DCP] fix dcp gather_object/scatter_object_list
#147675 commented on
Mar 4, 2025 • 0 new comments -
Attempt a mixed precision fused adam
#147653 commented on
Feb 27, 2025 • 0 new comments -
[CUDAGraph] Graph Partition
#147648 commented on
Mar 6, 2025 • 0 new comments -
Implement metal kernel for MPS binary ops using TensorIterator
#147644 commented on
Feb 28, 2025 • 0 new comments -
[ROCm] Improve backwards indexing when stride is not one
#147630 commented on
Feb 27, 2025 • 0 new comments -
Fixed abnormal behavior of LazyLinear when using LayzLinear and load_state together
#147599 commented on
Feb 27, 2025 • 0 new comments -
Define USE_C10D_XCCL and USE_XCCL in pytorch
#147593 commented on
Mar 5, 2025 • 0 new comments -
Fix log2, PowByNatural printing
#147592 commented on
Mar 4, 2025 • 0 new comments -
[ONNX][demo] Rotary embedding
#147576 commented on
Mar 5, 2025 • 0 new comments -
[partitioner] always ban compiler-driven recompute of collectives by default
#147561 commented on
Feb 27, 2025 • 0 new comments -
ROCm MX-FP8 Gemm
#147553 commented on
Mar 6, 2025 • 0 new comments -
[ROCm] Input vectorization in elementwise kernels for tensors with heterogeneous types
#147527 commented on
Feb 27, 2025 • 0 new comments -
Update pybind11 submodule to 3.0.0-dev test
#147524 commented on
Mar 1, 2025 • 0 new comments -
Enable UBSAN test
#147511 commented on
Mar 4, 2025 • 0 new comments -
Document poison fork note for accelerator APIs
#147507 commented on
Mar 4, 2025 • 0 new comments -
modified device check logic and added tests
#147501 commented on
Mar 2, 2025 • 0 new comments -
Optimize `dynamo` typing
#147499 commented on
Mar 4, 2025 • 0 new comments -
Upgrade submodule oneDNN to v3.7
#147498 commented on
Feb 28, 2025 • 0 new comments -
[CUDA] Replace deprecated usages of cub iterators and thread operators
#147493 commented on
Feb 28, 2025 • 0 new comments -
[ONNX] Migrate onnx ops decomp functions
#147469 commented on
Mar 4, 2025 • 0 new comments -
[Inductor] Fix `torch.polygamma()` when n == 1
#147453 commented on
Feb 28, 2025 • 0 new comments -
Reland "Introduce new template heuristic for triton autotune configs"
#147452 commented on
Mar 5, 2025 • 0 new comments -
[5/N] Remove unnecessary once flag usage
#147445 commented on
Feb 28, 2025 • 0 new comments -
[executorch hash update] update the pinned executorch hash
#147422 commented on
Mar 6, 2025 • 0 new comments -
Fix atomic operation compatibility for ARMv8-A (Raspberry Pi 4) by adjusting compilation flags
#148070 commented on
Feb 27, 2025 • 0 new comments -
use identity op for alpha=inf in torch.celu and quantized_celu
#148066 commented on
Feb 28, 2025 • 0 new comments -
[ROCm] Skip Navi4 Row-Wise F8 Tests
#148037 commented on
Feb 28, 2025 • 0 new comments -
[CUDA][complex] skip `test_reference_numerics_large_jiterator_unary_cuda_complex64` on CUDA
#148024 commented on
Mar 5, 2025 • 0 new comments -
[while_loop] require stride to be the same as input for body_fn
#148002 commented on
Mar 6, 2025 • 0 new comments -
[ROCm] Use generated CK config.h rather than system
#147993 commented on
Mar 5, 2025 • 0 new comments -
Support `contextlib.suppress`
#147990 commented on
Mar 5, 2025 • 0 new comments -
Bump Protobuf to 5.29
#147963 commented on
Feb 28, 2025 • 0 new comments -
[Don't merge]Upgrade submodule oneDNN to v3.7 (#147498)(Zi)
#147917 commented on
Mar 3, 2025 • 0 new comments -
[Draft] Enable cpu_offload for _distribute_state_dict
#147916 commented on
Mar 3, 2025 • 0 new comments -
Change persistent reduction threshold to 32
#147899 commented on
Feb 27, 2025 • 0 new comments -
[PT2] Allow tensor type in allowed_getattr_types_for_subgm when verifiying ep
#147898 commented on
Mar 4, 2025 • 0 new comments -
[targets2buck] Remove tombstone messages proactively
#147897 commented on
Feb 28, 2025 • 0 new comments -
[Not4Land] test `optree` with HEAD version
#147870 commented on
Mar 6, 2025 • 0 new comments -
Set disable_clone=True when running opt_gm
#147845 commented on
Mar 3, 2025 • 0 new comments -
Use /permissive- for MSVC build of torch libraries
#147825 commented on
Mar 3, 2025 • 0 new comments -
Use torch_compile_options for c10 libraries
#147821 commented on
Mar 5, 2025 • 0 new comments -
test
#147800 commented on
Feb 27, 2025 • 0 new comments -
[ROCm] CK Memory-Efficient Attention (attention bias support)
#147778 commented on
Mar 1, 2025 • 0 new comments -
[cuda] Added a correctness test for layernorm backwards
#147763 commented on
Mar 6, 2025 • 0 new comments -
[DCP] Work in progress: Demonstrate rank local checkpointing in DCP
#147758 commented on
Mar 3, 2025 • 0 new comments -
Update CPU tolerance for f16 triplet margin loss
#147742 commented on
Mar 4, 2025 • 0 new comments -
[WIP][XPU][Inductor] Update Intel triton for release 2.7.
#147727 commented on
Mar 5, 2025 • 0 new comments -
Avoid linking multiple OMP runtimes in libtorch_cpu.so if BLAS used is OpenBLAS.
#147725 commented on
Mar 5, 2025 • 0 new comments -
Skip test_dtypes xpu test on bmm and addbmm
#147721 commented on
Mar 4, 2025 • 0 new comments -
[ROCm][Windows] Enable torchvision build with ROCm on Windows
#147382 commented on
Mar 3, 2025 • 0 new comments -
Enable qint8 and quint8 add for AArch64 using ACL directly
#146620 commented on
Mar 5, 2025 • 0 new comments -
[WIP] BaseSubclass
#146612 commented on
Mar 6, 2025 • 0 new comments -
[NOT FOR LANDING] experimental NVSHMEM integration
#146593 commented on
Mar 6, 2025 • 0 new comments -
clang-format CUDASymmetricMemory.cu
#146592 commented on
Mar 6, 2025 • 0 new comments -
[Partitioner] Reduce time consuming of partitions merger
#146582 commented on
Mar 1, 2025 • 0 new comments -
[Partitioner] Remove unnecessary upstream nodes in dependency viewer
#146580 commented on
Mar 1, 2025 • 0 new comments -
add python root bin to windows load path.
#146573 commented on
Mar 2, 2025 • 0 new comments -
[pt2d] Add reorder_comms_preserving_peak_memory pass
#146562 commented on
Mar 6, 2025 • 0 new comments -
Improve comms debug visualization
#146561 commented on
Mar 6, 2025 • 0 new comments -
[not for land] temp changes to enable 'simple_fsdp'
#146558 commented on
Mar 3, 2025 • 0 new comments -
Support contextlib.ExitStack
#146506 commented on
Mar 5, 2025 • 0 new comments -
Allow setting attribute to NestedUserFunctionVariable
#146505 commented on
Mar 4, 2025 • 0 new comments -
Update CPython tests for ctx manager to use unittest
#146501 commented on
Mar 5, 2025 • 0 new comments -
Allow trace through unittest
#146500 commented on
Mar 5, 2025 • 0 new comments -
Update code_template.py re.compile() is directly applied to the regex…
#146489 commented on
Feb 28, 2025 • 0 new comments -
Update quantile doc
#146485 commented on
Mar 6, 2025 • 0 new comments -
[1/N] Use std::string_view in torchgen
#146403 commented on
Mar 3, 2025 • 0 new comments -
[WIP][dynamic shapes] mark backed size symbols as size-like
#146335 commented on
Mar 6, 2025 • 0 new comments -
Use device agnostic APIs for device_count and backend in common_fsdp
#146289 commented on
Mar 6, 2025 • 0 new comments -
[dcp] Minor improvements to filesystem writer
#146273 commented on
Mar 3, 2025 • 0 new comments -
Format tests by PYFMT
#146267 commented on
Mar 3, 2025 • 0 new comments -
[2/N] Fix cppcoreguidelines-init-variables suppression
#146237 commented on
Mar 6, 2025 • 0 new comments -
Subprocess compile
#146134 commented on
Mar 6, 2025 • 0 new comments -
Move get accelerator to use build time flags when possible
#146098 commented on
Mar 4, 2025 • 0 new comments -
[ARM] Fix TestDataLoader.test_segfault unexpected success on Aarch6[4
#146090 commented on
Feb 28, 2025 • 0 new comments -
Force build to conform C++ standard on windows by adding `/permissive-` flag
#147367 commented on
Mar 4, 2025 • 0 new comments -
Add meta function for out variants of ones,zeros,empty
#147350 commented on
Mar 6, 2025 • 0 new comments -
Refine XPU oneDNN context manager API
#147349 commented on
Mar 6, 2025 • 0 new comments -
[ROCm][Windows] Disable Composable Kernels and Triton for Windows builds
#147334 commented on
Mar 4, 2025 • 0 new comments -
[TESTING] [NO MERGE] Testing new triton commit for release/2.7
#147320 commented on
Mar 6, 2025 • 0 new comments -
Add the memory and dispatch to the logging module.
#147262 commented on
Mar 4, 2025 • 0 new comments -
add PrivateUse1 backend in fsdp collecitves
#147260 commented on
Feb 28, 2025 • 0 new comments -
logging: close handler after removing it
#147235 commented on
Mar 4, 2025 • 0 new comments -
cpp_wrapper: Fix even more tests
#147225 commented on
Mar 6, 2025 • 0 new comments -
Fix rms_norm in fp16/bf16
#147203 commented on
Mar 3, 2025 • 0 new comments -
dynamo: Count number of opcodes processes
#147149 commented on
Mar 6, 2025 • 0 new comments -
[fsdp] add an experimental allocator hook for buffers that participate in collective communication
#147146 commented on
Mar 6, 2025 • 0 new comments -
ROCm F8 Datatype Selector
#147142 commented on
Feb 28, 2025 • 0 new comments -
fake_tensor: Handle op errors more gracefully
#147049 commented on
Mar 5, 2025 • 0 new comments -
[ROCm] [TunableOp] Enable logging of BLAS parameters
#147034 commented on
Mar 6, 2025 • 0 new comments -
[BE]: Try to remove unused type ignores - attempt 1
#146989 commented on
Mar 3, 2025 • 0 new comments -
[Inductor] Unify the data type propagation between Triton and CPP Backend
#146970 commented on
Mar 6, 2025 • 0 new comments -
cpp_wrapper: Precompile device-specific header files
#146928 commented on
Mar 3, 2025 • 0 new comments -
Adjust TestInductorOpInfo to depend on backend, not device
#146911 commented on
Mar 5, 2025 • 0 new comments -
Optimize transformer encoder/decoder init suggestion
#146882 commented on
Mar 3, 2025 • 0 new comments -
update types on dynamo configs
#146873 commented on
Feb 28, 2025 • 0 new comments -
Don't look at TESTING_ONLY in fuzzer
#146870 commented on
Feb 28, 2025 • 0 new comments -
Enable explicitly vectorized `_weight_int8pack_mm` op for FP16 dtype on x86_64 CPU
#146777 commented on
Feb 28, 2025 • 0 new comments -
cpp_wrapper: persist autotune example tensors until last use
#146706 commented on
Mar 6, 2025 • 0 new comments -
Enable Windows tests
#146695 commented on
Mar 5, 2025 • 0 new comments -
AOTI packaged model fails with generic error when run in for loop but succeeds on individual sample
#146524 commented on
Mar 3, 2025 • 0 new comments -
Cannot import PyTorch in Alpine Docker Container
#71381 commented on
Mar 3, 2025 • 0 new comments -
expandable_segments does not work for CUDAPluggableAllocator + MemPool
#147851 commented on
Mar 3, 2025 • 0 new comments -
[ONNX] Migrate torchlib from onnxscript
#139301 commented on
Mar 3, 2025 • 0 new comments -
No gradient for `residuals` in the return value of `torch.linalg.lstsq`
#147543 commented on
Mar 3, 2025 • 0 new comments -
Does CUDACachingAllocator.cpp still require deferred event creation?
#147874 commented on
Mar 3, 2025 • 0 new comments -
torch._scaled_mm reproductibility
#147972 commented on
Mar 3, 2025 • 0 new comments -
Fp8 scaled-mm row-wise is substantially slower than tensor-wise
#147971 commented on
Mar 3, 2025 • 0 new comments -
Adam doesn't work with nonzero-dim Tensor betas
#147921 commented on
Mar 3, 2025 • 0 new comments -
`FxGraphDrawer` fails on `einsum` nodes
#147884 commented on
Mar 3, 2025 • 0 new comments -
[CI/CD] Deprecating PyTorch’s official Anaconda channel
#138696 commented on
Mar 3, 2025 • 0 new comments -
MPS support `torch.linalg.norm` on complex numbers
#146691 commented on
Mar 3, 2025 • 0 new comments -
MPS operator coverage tracking issue (2.6+ version)
#141287 commented on
Mar 3, 2025 • 0 new comments -
"unhashable type: non-nested SymInt" error when using DTensor and Compiled Autograd together
#127797 commented on
Mar 3, 2025 • 0 new comments -
Torch Compile mode giving Graph Break for Higher Order Op for ProcessGroup param
#129942 commented on
Mar 3, 2025 • 0 new comments -
PyTorch C++ API binary compiled with xmake crashes
#129305 commented on
Mar 3, 2025 • 0 new comments -
Model parameter and gradient memory formats are inconsistent with compiled autograd
#127922 commented on
Mar 3, 2025 • 0 new comments -
WSL2 RTX A6000 , CUDA out of memory.
#117197 commented on
Mar 3, 2025 • 0 new comments -
Error loading "torch\lib\aoti_custom_ops.dll" or one of its dependencies, when importing Torch, when building from Source on Windows 11 with cuDNN.
#144931 commented on
Mar 3, 2025 • 0 new comments -
CMake Error: When installing PyTorch from source, CUDA not being detected.
#134331 commented on
Mar 3, 2025 • 0 new comments -
User-defined way to hook into Inductor autotuning
#142388 commented on
Mar 3, 2025 • 0 new comments -
[JIT] support list of nn.Module in torchscript
#36061 commented on
Mar 3, 2025 • 0 new comments -
[XPU] torch.nn.functional.pad brings wrong results with torch.compile on Intel GPU
#145372 commented on
Mar 3, 2025 • 0 new comments -
DISABLED test_input_mutation2_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#135295 commented on
Mar 3, 2025 • 0 new comments -
[DTensor] [distributed]: Operator aten.select.int does not have a sharding strategy registered
#147724 commented on
Mar 3, 2025 • 0 new comments -
DISABLED test_vdd_clamp_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#134445 commented on
Mar 3, 2025 • 0 new comments -
DISABLED test_embedding_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#135250 commented on
Mar 3, 2025 • 0 new comments -
Regression: Multiple OpenMP runtimes linked to libtorch_cpu.so
#146603 commented on
Mar 3, 2025 • 0 new comments -
Get `aot_autograd`'ed graph without `torch.compile` and freeze constants without Inductor context
#140205 commented on
Mar 3, 2025 • 0 new comments -
[export] Fail fast on pytorch with `aoti_load_package`
#145730 commented on
Feb 28, 2025 • 0 new comments -
torch.device context manager change doesn't show in torch.get_default_device
#131328 commented on
Mar 6, 2025 • 0 new comments -
[RFC] Test Cases Enabling for Accelerators
#146898 commented on
Mar 4, 2025 • 0 new comments -
DISABLED test_reentrant_parent_error_on_cpu_cuda (__main__.TestAutogradDeviceTypeCUDA)
#86735 commented on
Mar 4, 2025 • 0 new comments -
partitioner doesn't appear to respect SAC region
#128730 commented on
Mar 3, 2025 • 0 new comments -
Add a hash function for tensor data
#2569 commented on
Mar 3, 2025 • 0 new comments -
SerializeError for ScriptObject in AOTInductor
#147283 commented on
Mar 3, 2025 • 0 new comments -
[ONNX] slice complex tensor needs implementation
#147896 commented on
Mar 3, 2025 • 0 new comments -
Inconsistent results from `is_compile_supported ` with equivalent device identifiers
#147826 commented on
Mar 3, 2025 • 0 new comments -
[BUG][PyTorch 2.0 Export][quant]:get_source_partitions() may return different matches with same input graph
#147170 commented on
Mar 3, 2025 • 0 new comments -
[dynamo] Activation checkpointing tests erroring at runtime
#127115 commented on
Mar 3, 2025 • 0 new comments -
Support AC with graph break
#139989 commented on
Mar 3, 2025 • 0 new comments -
Compiled Autograd + Activation Checkpointing/Offloading
#143176 commented on
Mar 3, 2025 • 0 new comments -
CUDA error when compiling loss function
#143774 commented on
Mar 3, 2025 • 0 new comments -
Partitioner's auto-AC misbehaves with mixed dtypes
#144470 commented on
Mar 3, 2025 • 0 new comments -
CheckpointError with `torch.distributed.algorithms._checkpoint.checkpoint_wrapper` and `torch.compile`
#144637 commented on
Mar 3, 2025 • 0 new comments -
Activation Checkpointing composability with split backward computation
#145511 commented on
Mar 3, 2025 • 0 new comments -
Label tracking meta-issue (edit me to get automatically CC'ed on issues! cc bot)
#24422 commented on
Mar 3, 2025 • 0 new comments -
Missing grad_fn information while torch.compile with customized gradient function
#131974 commented on
Mar 3, 2025 • 0 new comments -
`RuntimeError` not raised for `out=` argument in `torch.tensordot` with `requires_grad` tensors
#147846 commented on
Mar 3, 2025 • 0 new comments -
Unable to checkpoint model and optimizer state when using Hybrid Sharding Strategy
#102904 commented on
Mar 3, 2025 • 0 new comments -
[DCP] Native S3/object storage interface
#116198 commented on
Mar 3, 2025 • 0 new comments -
[DCP] Review and update examples and docstring to reflect DCP save/load API updates
#119070 commented on
Mar 3, 2025 • 0 new comments -
Distributed checkpoint state_dict load may report nccl error and hide the real root-cause exception
#122529 commented on
Mar 3, 2025 • 0 new comments -
`torch_save_to_dcp`'s produced dcp checkpoint doesn't fit the load interface for FSDP
#126047 commented on
Mar 3, 2025 • 0 new comments -
[DCP] DCP does not support objects which are lazy initialized.
#126881 commented on
Mar 3, 2025 • 0 new comments -
PyTorch's Distributed Checkpoint Cannot Save a Parameter of Size 1
#132366 commented on
Mar 3, 2025 • 0 new comments -
Incomplete Parameter Gathering on Rank 0 with FSDP Model Saving
#136950 commented on
Mar 3, 2025 • 0 new comments -
[torch2.4][Distributed Checkpoint] new flatten logic is error-prone
#137327 commented on
Mar 3, 2025 • 0 new comments -
FSDP learning hangs when the program tries to save the model
#143536 commented on
Mar 3, 2025 • 0 new comments -
What is the recommended way to use Distributed Checkpointing Save/Load with HSDP?
#145978 commented on
Mar 3, 2025 • 0 new comments -
[torch.export] RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
#126674 commented on
Mar 3, 2025 • 0 new comments -
DISABLED test_linear_backward_memory_usage_cuda_float32 (__main__.TestNestedTensorSubclassCUDA)
#141292 commented on
Feb 28, 2025 • 0 new comments -
Hooks not working in version 2.0.1+cu118
#102374 commented on
Feb 28, 2025 • 0 new comments -
DISABLED test_rng (__main__.TestCompilerBisector)
#139590 commented on
Feb 28, 2025 • 0 new comments -
DISABLED test_while_loop_schema_gen (__main__.TestHopSchema)
#141202 commented on
Feb 28, 2025 • 0 new comments -
CTCLoss gradient is incorrect
#52241 commented on
Feb 28, 2025 • 0 new comments -
DISABLED test_reorder_peak_memory_dfs (__main__.TestOperatorReorderForPeakMemory)
#145183 commented on
Feb 28, 2025 • 0 new comments -
AdamW is CPU-bottlenecked with FSDP2, with slow foreach kernels
#134191 commented on
Feb 28, 2025 • 0 new comments -
DISABLED test_split (__main__.AutoFunctionalizeTests)
#148080 commented on
Feb 28, 2025 • 0 new comments -
[discussion] Support other index dtypes for scatter, scatter_reduce and other indexing functions in addition to int64: uint8, int16, int32, uint16 etc (without casting copies/reallocations)
#61819 commented on
Feb 28, 2025 • 0 new comments -
Potential rooms for fewer recompilations by introducing higher-level guards
#143670 commented on
Feb 27, 2025 • 0 new comments -
DISABLED test_item_to_inputs_kernel_nobreak_cuda (__main__.TestInductorDynamicCUDA)
#119538 commented on
Feb 27, 2025 • 0 new comments -
compilation error on SequenceParallel'ed Dropout
#147757 commented on
Feb 27, 2025 • 0 new comments -
More informative variable names in AOTAutograd
#110700 commented on
Feb 27, 2025 • 0 new comments -
`torch.mul` uses `OpMathType` for computation.
#147134 commented on
Feb 27, 2025 • 0 new comments -
DISABLED test_int64_upsample3d_cuda_bfloat16 (__main__.TestTorchDeviceTypeCUDA)
#146007 commented on
Feb 27, 2025 • 0 new comments -
make_graphed_callables don't work with FSDP at all even on a simple network
#127225 commented on
Feb 27, 2025 • 0 new comments -
Let torch.compiler.allow_in_graph work in more situations
#140437 commented on
Feb 27, 2025 • 0 new comments -
inspect.Signature with functools.partial partially applying tensors doesn't work
#139374 commented on
Feb 27, 2025 • 0 new comments -
require_exact_stride better handling of expanded dims
#147156 commented on
Feb 27, 2025 • 0 new comments -
Failed to Open libnvrtc-builtins.so.11.7
#93134 commented on
Feb 27, 2025 • 0 new comments -
Partitioner stores fp8 copy of all weights between fwd and bwd, causing OOM
#141881 commented on
Feb 27, 2025 • 0 new comments -
Checkpoint doesn't work with torch_function if torch_function change tensor metadata
#147995 commented on
Feb 27, 2025 • 0 new comments -
Run Performance Regression Tests on new Version of Triton
#148045 commented on
Feb 27, 2025 • 0 new comments -
[RFC] Customization of ElasticAgent for fault-tolerance and node-replacement in big ddp job
#148048 commented on
Feb 27, 2025 • 0 new comments -
Support alpha=inf consistently for torch.celu
#148065 commented on
Feb 27, 2025 • 0 new comments -
DISABLED test_recompile (__main__.AutoFunctionalizeTests)
#148013 commented on
Feb 27, 2025 • 0 new comments -
Feature Request: CUDA-Optimized Queue Buffer for PyTorch
#148077 commented on
Feb 27, 2025 • 0 new comments -
Track follow ups to #147354
#147867 commented on
Feb 27, 2025 • 0 new comments -
torch/_inductor/cpp_builder.py : _is_gcc Function Incorrectly Classifies clang++ as g++
#146712 commented on
Feb 27, 2025 • 0 new comments -
DISABLED test_slice (__main__.AutoFunctionalizeTests)
#148035 commented on
Feb 27, 2025 • 0 new comments -
[RFC] Cuda support matrix for Release 2.7
#145544 commented on
Mar 2, 2025 • 0 new comments -
MPS Error on sequoia 15.3: NDArray dimension length > INT_MAX'
#146769 commented on
Mar 2, 2025 • 0 new comments -
Inductor-CPU might load (and store) fewer elements than the vector-width
#146824 commented on
Mar 1, 2025 • 0 new comments -
[typing] Add static type hints to `torch.distributions`.
#144196 commented on
Mar 1, 2025 • 0 new comments -
torch.export does not support torchaudio.transforms.Spectrogram
#112844 commented on
Mar 1, 2025 • 0 new comments -
FlexAttention uses much more GPU memory than FlashAttention-2
#144537 commented on
Mar 1, 2025 • 0 new comments -
Unable to print in a branch run by torch.cond
#147115 commented on
Mar 1, 2025 • 0 new comments -
The value of requires_grad is not set when creating the tensor using TensorMaker
#146419 commented on
Mar 1, 2025 • 0 new comments -
General MPS op coverage tracking issue
#77764 commented on
Mar 1, 2025 • 0 new comments -
Error using the "index_put_" function
#124288 commented on
Mar 1, 2025 • 0 new comments -
PyPy support
#17835 commented on
Mar 1, 2025 • 0 new comments -
DISABLED test_weight_norm_bwd_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#141484 commented on
Mar 1, 2025 • 0 new comments -
DISABLED test_embedding_bag_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#135215 commented on
Mar 1, 2025 • 0 new comments -
DISABLED test_embedding_dynamic_shapes_cpu (__main__.DynamicShapesCodegenCpuTests)
#135456 commented on
Mar 1, 2025 • 0 new comments -
DISABLED test_wait_tensor (__main__.CompileTest)
#148014 commented on
Feb 28, 2025 • 0 new comments -
DISABLED test_partitioning_unremat_bw (__main__.MinCutPartitioningTests)
#145343 commented on
Feb 28, 2025 • 0 new comments -
DistributedDataParallel with compile(..., mode="max-autotune") hangs in 2.5+
#140395 commented on
Feb 28, 2025 • 0 new comments -
Improving expand w/ unbacked symints
#128645 commented on
Feb 28, 2025 • 0 new comments -
compile_fx inplace modifly the input graph
#138980 commented on
Feb 28, 2025 • 0 new comments -
DTensor support for fused qkv matmul
#140069 commented on
Feb 28, 2025 • 0 new comments -
torch.nn.AvgPool2d fails with stride >= 2^31 on CUDA
#147613 commented on
Feb 28, 2025 • 0 new comments -
[feature][cudagraph] API to clear a bad recording
#127147 commented on
Feb 28, 2025 • 0 new comments -
[compiled autograd][aot autograd] accumulate grad (on param with non empty grad) mutates inputs and prevents cudagraph
#126938 commented on
Feb 28, 2025 • 0 new comments -
autograd.function with `setup_context` has a number of issues with `torch.compile`
#130051 commented on
Feb 28, 2025 • 0 new comments -
inductor error when torch.compile on distrifuser
#128073 commented on
Feb 28, 2025 • 0 new comments -
Support AOT Autograd level Caching
#125958 commented on
Feb 28, 2025 • 0 new comments -
torch._dynamo.exc.UserError: Could not guard on data-dependent expression Eq(256*u0, 256) (unhinted: Eq(256*u0, 256)).
#147672 commented on
Feb 28, 2025 • 0 new comments -
AOTAutogradCache implementation
#128234 commented on
Feb 28, 2025 • 0 new comments -
Triton pin update for PyTorch 2.7 / Triton 3.3: Upgrading PyTorch-Triton to a version that Supports Blackwell
#146518 commented on
Feb 28, 2025 • 0 new comments -
torch.export.export creates guards that denies exporting.
#147623 commented on
Feb 28, 2025 • 0 new comments -
[inductor] [cpu] `torch.scatter` throws `AttributeError: 'int' object has no attribute 'find'` on CPP backend
#148058 commented on
Feb 28, 2025 • 0 new comments -
Sweep through Potentially BC Breaking Commits in Triton
#148044 commented on
Mar 5, 2025 • 0 new comments -
Add docs for __tensor_flatten__ / __tensor_unflatten__
#113089 commented on
Mar 5, 2025 • 0 new comments -
[dynamo] Replace `unimplemented` with `unimplemented_v2`
#147913 commented on
Mar 5, 2025 • 0 new comments -
[compile] DDPOptimizer + activation checkpointing not supported
#104674 commented on
Mar 5, 2025 • 0 new comments -
Preload CUDA fails if CUDA libs in different PYTHONPATH
#147001 commented on
Mar 5, 2025 • 0 new comments -
Looking for valid compiling option for extension based on torch-2.1.0+cpu.cxx11.abi
#143780 commented on
Mar 5, 2025 • 0 new comments -
Torch Onnx Export (with Dynamo) does not recognize `Remainder` function
#147973 commented on
Mar 5, 2025 • 0 new comments -
Scaled Dot-Product Attention Invalid Configuration Error on Large batch size
#142228 commented on
Mar 5, 2025 • 0 new comments -
[ONNX] Run report_exportability when report=True
#139904 commented on
Mar 5, 2025 • 0 new comments -
[torchbench] torch._dynamo.exc.Unsupported: Graph break due to unsupported builtin None.morphologyEx
#145088 commented on
Mar 5, 2025 • 0 new comments -
Immediate Global State Mutation After Using `_force_original_view_tracking` Decorator
#147849 commented on
Mar 5, 2025 • 0 new comments -
Torch 2.6.0 cu126 is missing several dependencies in the METADATA-file
#146679 commented on
Mar 5, 2025 • 0 new comments -
Assertion error in Flex Attention backward pass when indexing a parameter
#146896 commented on
Mar 5, 2025 • 0 new comments -
Symbolic shape fails symbol_to_source when caching is enabled
#127970 commented on
Mar 5, 2025 • 0 new comments -
`torch.compile` and complex numbers
#125718 commented on
Mar 5, 2025 • 0 new comments -
libtorch_cuda_linalg.so: undefined symbol: mkl_lapack_dsbrdbn on a source built PyTorch 2.6.0 with USE_STATIC_MKL=1 on CUDA platform
#146551 commented on
Mar 5, 2025 • 0 new comments -
On Linux, passing torch.Generator to multiprocessing.Process crashes for forkserver and spawn start method
#146828 commented on
Mar 5, 2025 • 0 new comments -
[CUDA][Blackwell] Blackwell Tracking Issue
#145949 commented on
Mar 5, 2025 • 0 new comments -
Enable more flake8-bugbear lints
#106571 commented on
Mar 5, 2025 • 0 new comments -
Custom operators registered via decorator slower than ops registered via `torch.Library.{define, impl}`
#139500 commented on
Mar 5, 2025 • 0 new comments -
[inductor] [silence] `nn.ConvTranspose2d-F.dropout` outputs inconsistent results with eager
#148061 commented on
Mar 5, 2025 • 0 new comments -
DISABLED test_profiler_mark_wrapper_call_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#135294 commented on
Mar 5, 2025 • 0 new comments -
DISABLED test_pattern_matcher_multi_user_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#134433 commented on
Mar 5, 2025 • 0 new comments -
DISABLED test_max_autotune (__main__.TestFlexAttention)
#147200 commented on
Mar 5, 2025 • 0 new comments -
Video-Llama (version 1) runs much slower using Float16 than Float32 on Kunpeng CPU
#148078 commented on
Mar 5, 2025 • 0 new comments -
Tensor parallel for convolutions and groupnorm
#133221 commented on
Mar 5, 2025 • 0 new comments -
RuntimeError: expect_autograd_hooks_ INTERNAL ASSERT FAILED at "../torch/csrc/distributed/c10d/reducer.cpp"
#143580 commented on
Mar 5, 2025 • 0 new comments -
Pytorch build fail with GCC 14.1.0 due to third_party/fbgemm/src/UtilsAvx512.cc:970:35: error: ‘r’ may be used uninitialized [-Werror=maybe-uninitialized]
#129358 commented on
Mar 5, 2025 • 0 new comments -
[inductor][cpu]DebertaV2ForMaskedLM, DebertaV2ForQuestionAnswering and eca_halonext26ts max_autotune accuracy failure in 2025-02-24 nightly release
#148074 commented on
Mar 5, 2025 • 0 new comments -
torch._check doesn't work for .item() then select
#147772 commented on
Mar 4, 2025 • 0 new comments -
Load cuda deps more aggressively
#137059 commented on
Mar 5, 2025 • 0 new comments -
Enabling ATen Distribution kernels for AARCH64 using OpenRNG
#134942 commented on
Mar 5, 2025 • 0 new comments -
Avoid args being parsed when common_utils imported
#134592 commented on
Mar 4, 2025 • 0 new comments -
Update Doc for Intel XPU Profiling
#134515 commented on
Mar 4, 2025 • 0 new comments -
[Don't Merge] Try to build custom ops with MKL XPU
#133658 commented on
Feb 28, 2025 • 0 new comments -
Make IPC features extendable on third-party devices
#133222 commented on
Mar 4, 2025 • 0 new comments -
autograd codegen: bump VC properly for mutable ops with no returns
#133044 commented on
Feb 28, 2025 • 0 new comments -
[torch.special] Adding betainc, betaincc, betaincinv, betainccinv, betaln and beta with backward operation
#132135 commented on
Mar 6, 2025 • 0 new comments -
[pytree] implement key path APIs for CXX pytree
#130141 commented on
Feb 28, 2025 • 0 new comments -
[CI] Run `lintrunner` on generated `.pyi` stub files in CI
#129887 commented on
Feb 28, 2025 • 0 new comments -
Add `__all__` to `torch/nn/functional.pyi` and `torch/return_types.pyi`
#129872 commented on
Feb 28, 2025 • 0 new comments -
[torchgen] Refactor `torchgen.utils.FileManager` to accept `pathlib.Path`
#129871 commented on
Feb 28, 2025 • 0 new comments -
Use Generic TypeAlias (PEP 585) and Union Type (PEP 604) in generated `.pyi` stub files
#129420 commented on
Feb 28, 2025 • 0 new comments -
Enable Leak Sanitizer
#127171 commented on
Mar 3, 2025 • 0 new comments -
[inductor] online softmax
#127011 commented on
Mar 4, 2025 • 0 new comments -
[AOTAutograd] tweak min-cut partitioner to avoid saving softmax output
#126348 commented on
Feb 28, 2025 • 0 new comments -
[vision hash update] update the pinned vision hash
#125806 commented on
Mar 6, 2025 • 0 new comments -
Automated submodule update: FBGEMM
#115316 commented on
Mar 6, 2025 • 0 new comments -
[pytree] support PyStructSequence types for Python pytree
#113258 commented on
Mar 3, 2025 • 0 new comments -
Automated submodule update: kineto
#106149 commented on
Mar 5, 2025 • 0 new comments -
NotImplementedError: Output channels > 65536 not supported at the MPS device.
#144445 commented on
Mar 6, 2025 • 0 new comments -
[Feature Request] Release original parameters by layer when turning on `freezing_discard_parameters`
#147062 commented on
Mar 6, 2025 • 0 new comments -
Pytorch DDP across nodes: self._store = TCPStore( # type: ignore[call-arg] RuntimeError: Stop_waiting response is expected
#114357 commented on
Mar 6, 2025 • 0 new comments -
Deprecation of NVTX 2 (`nvToolsExt`): Recommended to move to NVTX 3
#147011 commented on
Mar 6, 2025 • 0 new comments -
DISABLED test_remote_cache_load_function_device_cuda_float32_dynamic_False_bundle_triton_True (__main__.TestFxGraphCache)
#145191 commented on
Mar 6, 2025 • 0 new comments -
[RFC] Request for Feedback and Review on PRs Adding RISC-V and RVV Support
#147513 commented on
Mar 6, 2025 • 0 new comments -
DISABLED test_tensor_dtype_complex (__main__.CommTest)
#112460 commented on
Mar 6, 2025 • 0 new comments -
[ONNX] BitwiseOr was generated for bool inputs (invalid)
#147854 commented on
Mar 6, 2025 • 0 new comments -
Enable CUDA 12.8.0, Disable CUDA 12.4
#145570 commented on
Mar 6, 2025 • 0 new comments -
AttributeError: Can't pickle local object 'make_opaque_bitwise_fn.<locals>.BitwiseFn'
#147841 commented on
Mar 6, 2025 • 0 new comments -
python3 -m torch.utils.collect_env not providing expected output.
#147669 commented on
Mar 6, 2025 • 0 new comments -
DISABLED test_nonstrict_trace_nested_custom_class_error (__main__.DecoratorTests)
#148031 commented on
Mar 4, 2025 • 0 new comments -
DISABLED test_nonstrict_trace_newly_constructed_custom_class_with_side_effects (__main__.DecoratorTests)
#148032 commented on
Mar 4, 2025 • 0 new comments -
DISABLED test_nonstrict_trace_nested_custom_class (__main__.DecoratorTests)
#148033 commented on
Mar 4, 2025 • 0 new comments -
DISABLED test_nonstrict_trace_no_action_at_a_distance (__main__.DecoratorTests)
#148034 commented on
Mar 4, 2025 • 0 new comments -
DISABLED test_nonstrict_trace_on_method (__main__.DecoratorTests)
#148054 commented on
Mar 4, 2025 • 0 new comments -
DISABLED test_nonstrict_trace_inside_compiled_function_kwarg (__main__.DecoratorTests)
#148055 commented on
Mar 4, 2025 • 0 new comments -
DISABLED test_nonstrict_trace_pre_existing_custom_class_with_side_effects (__main__.DecoratorTests)
#148056 commented on
Mar 4, 2025 • 0 new comments -
PyTorch VS2022 official build Windows binary illegal instruction on AVX2(max ISA level) CPU
#145702 commented on
Mar 4, 2025 • 0 new comments -
[inductor] Add Python type annotations to `torch/_inductor`
#146167 commented on
Mar 4, 2025 • 0 new comments -
DISABLED test_per_sample_api_compute_batch_size_not_pytreeable_cpu (__main__.TestExpandedWeightModuleCPU)
#146972 commented on
Mar 4, 2025 • 0 new comments -
torch.compile() on quantized model: No attribute "meta"
#148072 commented on
Mar 4, 2025 • 0 new comments -
Triton Error [CUDA]: invalid device context when autograd.backward a triton kernel
#124565 commented on
Mar 4, 2025 • 0 new comments -
Loading weights using `torch.distributed.checkpoint` leads to large loss values
#145378 commented on
Mar 4, 2025 • 0 new comments -
incorrect _unsafe_index meta
#139312 commented on
Mar 4, 2025 • 0 new comments -
DISABLED test_cache_load_function_device_cuda_float32_dynamic_False_bundle_triton_True_grad_True (__main__.TestFxGraphCache)
#145336 commented on
Mar 4, 2025 • 0 new comments -
TorchInductor CPU Performance Dashboard
#93531 commented on
Mar 4, 2025 • 0 new comments -
Support FP16 accumulation for faster LLM inference on 4090 like GPUs
#123558 commented on
Mar 4, 2025 • 0 new comments -
gradient checkpointing with use_reentrant=False cannot reduce peak memory
#147449 commented on
Mar 4, 2025 • 0 new comments -
[inductor][user triton] support on-device TMA / tensor descriptor API
#148052 commented on
Mar 4, 2025 • 0 new comments -
Adjust test_mm_triton_kernel_benchmark for unpadded tensors
#147999 commented on
Mar 4, 2025 • 0 new comments -
torch.compile supported with GIL disabled
#147946 commented on
Mar 4, 2025 • 0 new comments -
redundant recompilation caused by duplicated Sym()
#144068 commented on
Mar 4, 2025 • 0 new comments -
Strange recompilations on torch 2.5 + FSDP + UNet
#138813 commented on
Mar 4, 2025 • 0 new comments -
automatic_dynamic_shapes for mark_unbacked
#136605 commented on
Mar 4, 2025 • 0 new comments -
dynamo (re)compilation issues: shape (1,1), nn.Parameter, mark_dynamic
#135011 commented on
Mar 4, 2025 • 0 new comments -
Avoid recompilation for inputs integer number
#132849 commented on
Mar 4, 2025 • 0 new comments -
StableDiffusion with dynamic=True still recompiles
#104913 commented on
Mar 4, 2025 • 0 new comments -
Recompilation triggered at each step of the loop involving array indexing
#114293 commented on
Mar 4, 2025 • 0 new comments -
torch.compile support for SeamlessExpressivity/SeamlessM4T in fairseq2
#114373 commented on
Mar 4, 2025 • 0 new comments -
DynamicInt helper structure that is equivalent to mark_dynamic on an int
#129623 commented on
Mar 4, 2025 • 0 new comments -
Verifier (in torch.export.export) does not make use of if-condition inside branches
#147991 commented on
Mar 4, 2025 • 0 new comments -
Build Triton for aarch64
#130558 commented on
Mar 4, 2025 • 0 new comments -
`torch.device(0)` makes CUDA init fail in subprocess since `2.5.0`
#144152 commented on
Mar 4, 2025 • 0 new comments -
Comm reordering can make Inductor use variable before its definition
#147328 commented on
Mar 4, 2025 • 0 new comments -
Web Page do not match the original documentation
#146683 commented on
Mar 4, 2025 • 0 new comments -
[ROCm] scaled_dot_product_attention using mem-efficient backend (aotriton) produces wrong outputs with custom attn_mask on torch 2.6.0+rocm6.2.4
#147460 commented on
Mar 4, 2025 • 0 new comments -
`Illegal instruction (core dumped)` on Raspberry Pi 4 when exporting ONNX with `torch 2.6.0`
#146792 commented on
Mar 4, 2025 • 0 new comments -
XPU - UserWarning: Failed to initialize XPU devices. when run on the host without Intel GPU Driver
#145433 commented on
Mar 4, 2025 • 0 new comments -
CI/CD: Figure out what to do with split build
#138750 commented on
Mar 4, 2025 • 0 new comments -
torch_cuda.dll was built failed to link _cudnn_attention_forward
#147671 commented on
Mar 4, 2025 • 0 new comments -
DISABLED test_inductor_all_gather_into_tensor_coalesced (__main__.CompileTest)
#146806 commented on
Mar 4, 2025 • 0 new comments -
Publish pytorch RC docker images before release
#145925 commented on
Mar 4, 2025 • 0 new comments -
unbind_copy opinformation cause exception while running test_dtensor_ops.py
#147814 commented on
Mar 4, 2025 • 0 new comments -
[inductor] [silence] inconsistent swap wih eager when compiling `torch.rot90-torch.randn_like`
#147847 commented on
Mar 4, 2025 • 0 new comments -
Flex Attention is incompatible with selective AC
#147879 commented on
Mar 4, 2025 • 0 new comments -
torch.compile with the inductor backend slows down (exponentially?) for certain graphs
#148073 commented on
Mar 4, 2025 • 0 new comments -
DISABLED test_slice_dynamic (__main__.AutoFunctionalizeTests)
#148067 commented on
Mar 4, 2025 • 0 new comments -
torch.compile for division gives different numeric output vs eager mode division: torch.tensor/torch.tensor
#141753 commented on
Mar 4, 2025 • 0 new comments -
torch.compile mode="max-autotune" precision appears to be lower
#96693 commented on
Mar 4, 2025 • 0 new comments -
[RFE][Distributed][NCCL] A feature request for stream management API in PG NCCL
#147729 commented on
Mar 4, 2025 • 0 new comments -
context_parallel fails with plain sdpa kernel SDPBackend.MATH
#147793 commented on
Mar 4, 2025 • 0 new comments -
Using DTensor with device meshes that use different devices for input and output
#126795 commented on
Mar 4, 2025 • 0 new comments -
Memory leak when using SequentialLR and ChainedLR schedulers
#126131 commented on
Mar 4, 2025 • 0 new comments -
MX basic dtypes in pytorch/pytorch
#146414 commented on
Mar 4, 2025 • 0 new comments -
grad_fn function disobeys broadcast rules
#144228 commented on
Mar 4, 2025 • 0 new comments -
AOT subclass desugaring adds static input arguments to inductor
#130502 commented on
Mar 4, 2025 • 0 new comments -
[ONNX] GNN model inaccuracy: scatter_reduce need to be fixed
#147617 commented on
Mar 4, 2025 • 0 new comments -
roundtrip cast between float32|bfloat16 and e8m0 should work in torchinductor
#147875 commented on
Mar 4, 2025 • 0 new comments -
Error : torch/utils/_sympy/interp.py:176] [0/2] failed while executing pow_by_natural([VR1, int_oo], VR[-1, -1]])
#148003 commented on
Mar 4, 2025 • 0 new comments -
returning tensors of dtype torch.float8_e8m0fnu should work with torchinductor
#147873 commented on
Mar 4, 2025 • 0 new comments -
[inductor] `torch.slice_scatter` throws `AssertionError` when meeting internal `float32`
#147842 commented on
Mar 4, 2025 • 0 new comments