[release/2.11] Skip linalg.eig tests when MAGMA is not available#3072
[release/2.11] Skip linalg.eig tests when MAGMA is not available#3072ethanwee1 wants to merge 11 commits intoROCm:release/2.11from
Conversation
This PR fixes the unit test,
test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction FAILED
[0.1163s]
```
Traceback (most recent call last):
File "/var/lib/jenkins/pytorch/test/test_cuda.py", line 471, in test_set_per_process_memory_fraction
tmp_tensor = torch.empty(application, dtype=torch.int8, device="cuda")
RuntimeError: Trying to create tensor with negative dimension -5681285432: [-5681285432]
```
This error occurs only on gfx1101 arch.
This error is coming from an integer overflow when another unit test,
test/test_cuda.py::TestCuda::test_randint_generation_for_large_numel
creates a tensor with a huge numel, which overflows into a higher
torch.cuda.max_memory_reserved() when you call
test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction
afterward. To avoid this we introduced torch.cuda.empty_cache() and
torch.cuda.reset_peak_memory_stats() to clean up CUDA states.
JIRA: https://ontrack-internal.amd.com/browse/SWDEV-535295
(cherry picked from commit f86d184)
(cherry picked from commit 1b44228)
…d_memory_with_allocator (pytorch#2811) Use try/finally block. This follows a similar pattern elsewhere in test_cuda.py. Fixes #ROCm/TheRock#2118.
…ersistent reduction and no_x_dim removal (pytorch#2454) Cherry-pick of ROCm#2417 Need to resolve conflicts --------- Co-authored-by: Jack Taylor <108682042+jataylo@users.noreply.github.com> (cherry picked from commit eb47158) [release/2.9][ROCm][inductor] Add ROCm specific persistent reduction config. (pytorch#2861) In support of [SWDEV-566103](https://ontrack-internal.amd.com/browse/SWDEV-566103) [release/2.10] Fix Inductor Triton Heuristics (pytorch#2931) The ROCm release/2.10 branch was created by applying 15 commits to upstream release/2.10 branch. (See pytorch/pytorch@release/2.10...ROCm:pytorch:release/2.10) This PR fixes the issue with the missing disable_pointwise_autotuning function. There are three commits in this PR: First commit is a revert: 1c96f23 - Autotuning support for persistent reduction since it is already available in upstream release/2.10 and is not needed. (It reintroduced disable_pointwise_autotuning function.) The second commit (b9facd0) is needed for provenance, so I can apply the third commit: e5eee74 - Heuristics improvements for reduction kernels which was reverted last minute before the release/2.10 cutoff and then re-landed shortly afterwards the cutoff date but with a minor change. --------- Co-authored-by: Pandya, Vivek Vasudevbhai <vpandya@qti.qualcomm.com>
[AUTOGENERATED] release/2.11_IFU_20260224
…CL race condition (pytorch#3054) Cherry-pick of ROCm#3043 Co-authored-by: tom.jen <tomjen12@amd.com> Co-authored-by: Jeff Daily <jeff.daily@amd.com>
…orch#3057) Removing need for fences in normalization kernel by converting the stores into atomics+return. This is crucial for perf in architectures with split caches (e.g. MI300), where fences are inherently costly. This change speedups `batch_norm_stats ` function for tensors in `channels_last` format. ### Performance result on MI300: <img width="2311" height="1537" alt="batchnorm_latency_comparison" src="https://github.com/user-attachments/assets/dee39088-9f55-499a-a39b-b170805416bb" /> **Particular Example:** Before: Avg time for shape (20, 896, 59, 91): **1102.39 us** After: Avg time for shape (20, 896, 59, 91): **122.94 us** Reproducer: ``` import torch shapes = [(20, 896, 59, 91)] eps = 1e-5 for shape in shapes: x = torch.randn(shape, device='cuda', dtype=torch.bfloat16) x = x.to(memory_format=torch.channels_last) for _ in range(20): _ = torch.batch_norm_stats(x, eps) torch.cuda.synchronize() start_evt = torch.cuda.Event(enable_timing=True) end_evt = torch.cuda.Event(enable_timing=True) start_evt.record() for _ in range(100): _ = torch.batch_norm_stats(x, eps) end_evt.record() torch.cuda.synchronize() print(f"Avg time for shape {shape}: {start_evt.elapsed_time(end_evt) / 100 * 1e3:.2f} us") ``` Related fix which is released: pytorch#161180 Pull Request resolved: pytorch#175286 Approved by: https://github.com/amd-hhashemi, https://github.com/jerrymannil, https://github.com/jeffdaily
torch.linalg.eig requires MAGMA on ROCm (hipsolver does not support eig). Add skipCUDAIfNoMagma to test_linalg_eig_stride_consistency in test_torchinductor.py and test_torch_return_types_returns in test_vmap.py to match the skip pattern used by linalg eig tests in test_linalg.py.
|
Jenkins build for 173556af377911e6b652276b641ce6cd84936048 commit finished as FAILURE |
|
@ethanwee1 The |
There was a problem hiding this comment.
Pull request overview
This PR targets CUDA test stability on builds where MAGMA is not available by skipping two MAGMA-dependent test cases that currently run (and fail) despite missing MAGMA.
Changes:
- Adds MAGMA-based CUDA skip coverage for an Inductor stride-consistency test that exercises
torch.linalg.eig. - Adds MAGMA-based CUDA skip coverage for a functorch vmap return-type test that exercises
torch.linalg.eig.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
test/inductor/test_torchinductor.py |
Adds @skipCUDAIfNoMagma to the Inductor linalg.eig stride-consistency test. |
test/functorch/test_vmap.py |
Adds @skipCUDAIfNoMagma to the vmap return-types test that includes torch.linalg.eig. |
You can also share your feedback on Copilot code review. Take the survey.
| return res | ||
|
|
||
| test(self, op, tuple(inputs), in_dims=tuple(in_dims)) | ||
|
|
|
|
||
| @skipCUDAIfNoMagma | ||
| def test_torch_return_types_returns(self, device): | ||
| t = torch.randn(3, 2, 2, device=device) | ||
| self.assertTrue( |
| reference_in_float=False, | ||
| ) | ||
|
|
||
| @skipCUDAIfNoMagma |
|
Jenkins build for aa25ee586d8f0a83f22c770641ab7b6ed1b52bbe commit finished as NOT_BUILT |
skipCUDAIfNoMagma uses skipCUDAIf which accesses slf.device_type, but GPUTests inherits from TestCase (not DeviceTypeTestBase) and doesn't have device_type. Use unittest.skipIf which works without device_type.
|
Jenkins build for aa25ee586d8f0a83f22c770641ab7b6ed1b52bbe commit finished as FAILURE |
a9d24b6 to
752cc24
Compare
Skip test_linalg_eig_stride_consistency_cuda & test_torch_return_types_returns_cuda as they are incorrectly running when MAGMA is not available.