From 897d03d1f23139d30373dd9ed9ff77f2267051ac Mon Sep 17 00:00:00 2001 From: Spencer Bryngelson Date: Mon, 23 Feb 2026 20:47:28 -0500 Subject: [PATCH] Fix GPU example, compiler matrix, and AMD flang consistency - Replace misleading GPU_PARALLEL+GPU_LOOP example with real GPU_PARALLEL_LOOP pattern (750+ uses in codebase); add warning that GPU_LOOP emits empty directives on Cray/AMD compilers - Mark Intel ifx and GNU gfortran as Experimental for --gpu mp (CMake has code paths but not CI-tested or fully supported) - Clarify AMD flang as additionally supported but not CI-gated, consistently across CLAUDE.md, common-pitfalls.md, gpu-and-mpi.md - Clarify GPU_PARALLEL is for scalar reductions, not spatial loops Co-Authored-By: Claude Opus 4.6 --- .claude/rules/common-pitfalls.md | 4 ++-- .claude/rules/gpu-and-mpi.md | 41 ++++++++++++++++++++------------ CLAUDE.md | 5 ++-- 3 files changed, 31 insertions(+), 19 deletions(-) diff --git a/.claude/rules/common-pitfalls.md b/.claude/rules/common-pitfalls.md index 1e5f58aa6b..9861f24fbe 100644 --- a/.claude/rules/common-pitfalls.md +++ b/.claude/rules/common-pitfalls.md @@ -36,10 +36,10 @@ - Boundary condition symmetry requirements must be maintained ## Compiler-Specific Issues -- Code must compile on gfortran, nvfortran, Cray ftn, and Intel ifx +- CI-gated compilers (must always pass): gfortran, nvfortran, Cray ftn, and Intel ifx +- AMD flang is additionally supported for `--gpu mp` builds but not in the CI matrix - Each compiler has different strictness levels and warning behavior - Fypp macros must expand correctly for both GPU and CPU builds -- GPU builds only work with nvfortran, Cray ftn, and AMD flang ## Test System - Tests are generated **programmatically** in `toolchain/mfc/test/cases.py`, not standalone files diff --git a/.claude/rules/gpu-and-mpi.md b/.claude/rules/gpu-and-mpi.md index ad84067b67..47aa93d0e9 100644 --- a/.claude/rules/gpu-and-mpi.md +++ b/.claude/rules/gpu-and-mpi.md @@ -38,20 +38,27 @@ Inline macros (use `$:` prefix): - `$:GPU_WAIT()` — Synchronization barrier. Block macros (use `#:call`/`#:endcall`): -- `GPU_PARALLEL(...)` — GPU parallel region wrapping a code block. +- `GPU_PARALLEL(...)` — GPU parallel region (used for scalar reductions like `maxval`/`minval`). - `GPU_DATA(copy=..., create=..., ...)` — Scoped data region. - `GPU_HOST_DATA(use_device_addr=[...])` — Host code with device pointers. -Block macro usage: +Typical GPU loop pattern (used 750+ times in the codebase): ``` -#:call GPU_PARALLEL(copyin='[var1]', copyout='[var2]') - $:GPU_LOOP(collapse=N) - do k = 0, n; do j = 0, m - ! loop body - end do; end do -#:endcall GPU_PARALLEL +$:GPU_PARALLEL_LOOP(private='[i,j,k,l]', collapse=3) +do l = idwbuff(3)%beg, idwbuff(3)%end + do k = idwbuff(2)%beg, idwbuff(2)%end + do j = idwbuff(1)%beg, idwbuff(1)%end + ! loop body + end do + end do +end do +$:END_GPU_PARALLEL_LOOP() ``` +WARNING: Do NOT use `GPU_PARALLEL` wrapping `GPU_LOOP` for spatial loops. `GPU_LOOP` +emits empty directives on Cray and AMD compilers, causing silent serial execution. +Use `GPU_PARALLEL_LOOP` / `END_GPU_PARALLEL_LOOP` for all parallel spatial loops. + NEVER write raw `!$acc` or `!$omp` directives. Always use `GPU_*` Fypp macros. The precheck source lint will catch raw directives and fail. @@ -67,13 +74,17 @@ The precheck source lint will catch raw directives and fail. - These compile only for Cray (`_CRAYFTN`); other compilers skip them ### Compiler-Backend Matrix -| Compiler | `--gpu acc` (OpenACC) | `--gpu mp` (OpenMP) | CPU-only | -|-----------------|----------------------|---------------------|----------| -| GNU gfortran | No | No | Yes | -| NVIDIA nvfortran| Yes (primary) | Yes | Yes | -| Cray ftn (CCE) | Yes | Yes (primary) | Yes | -| Intel ifx | No | No | Yes | -| AMD flang | No | Yes | Yes | + +CI-gated compilers (must always pass): gfortran, nvfortran, Cray ftn, Intel ifx. +AMD flang is additionally supported for GPU builds but not in the CI matrix. + +| Compiler | `--gpu acc` (OpenACC) | `--gpu mp` (OpenMP) | CPU-only | +|-----------------|----------------------|------------------------|----------| +| GNU gfortran | No | Experimental (AMD GCN) | Yes | +| NVIDIA nvfortran| Yes (primary) | Yes | Yes | +| Cray ftn (CCE) | Yes | Yes (primary) | Yes | +| Intel ifx | No | Experimental (SPIR64) | Yes | +| AMD flang | No | Yes | Yes | ## Preprocessor Defines (`#ifdef` / `#ifndef`) diff --git a/CLAUDE.md b/CLAUDE.md index 2b6a4a6d2d..38918a2091 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -3,7 +3,8 @@ MFC is an exascale multi-physics CFD solver written in modern Fortran 2008+ with Fypp preprocessing. It has three executables (pre_process, simulation, post_process), a Python toolchain for building/running/testing, and supports GPU acceleration via OpenACC and -OpenMP target offload. It must compile with gfortran, nvfortran, Cray ftn, and Intel ifx. +OpenMP target offload. It must compile with gfortran, nvfortran, Cray ftn, and Intel ifx (CI-gated). +AMD flang is additionally supported for OpenMP target offload GPU builds. ## Commands @@ -167,4 +168,4 @@ When reviewing PRs, prioritize in this order: 4. MPI correctness (halo exchange, buffer sizing, GPU_UPDATE calls) 5. GPU code (GPU_* Fypp macros only, no raw pragmas) 6. Physics consistency (pressure formula matches model_eqns) -7. Compiler portability (all four compilers) +7. Compiler portability (4 CI-gated compilers + AMD flang for GPU)