Fix 8 HPC-sensitive bugs: GPU kernels, MPI broadcast, domain decomposition#1242
Fix 8 HPC-sensitive bugs: GPU kernels, MPI broadcast, domain decomposition#1242sbryngelson wants to merge 20 commits intoMFlowCode:masterfrom
Conversation
The z-direction upper-boundary backward difference at iz_s%end uses fields(2) (y-component) instead of fields(3) (z-component) in the third term, corrupting the divergence in all 3D simulations using this finite difference routine. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The bc_x velocity-end broadcast list has bc_x%ve2 duplicated where bc_x%ve3 should be. bc_y and bc_z rows are correct. Non-root ranks get uninitialized bc_x%ve3 in multi-rank 3D runs with x-boundary velocity BCs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fluid_rho is a real(wp) array but is broadcast with MPI_LOGICAL type, silently corrupting reference densities via bit reinterpretation on non-root ranks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
loc_violations is never set to 0 before the conditional that may or may not assign it. Non-violating ranks sum garbage in the reduction. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Initializing a local variable in its declaration gives it the SAVE attribute in Fortran, meaning it would not reset to zero on subsequent calls. Move the initialization to an executable assignment so the variable is properly zeroed each time the subroutine is entered. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In the GRCBC subsonic inflow loop (do i = 2, momxb), L(2) was hardcoded instead of L(i), causing only the second wave amplitude to be updated rather than each wave amplitude in the loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The damage factor was applied inside the stress component loop, causing G_K (and G) to be multiplied by the damage factor on every iteration. With N stress components, the effective shear modulus was reduced by damage^N instead of damage^1. Move the damage application before the loop so it is applied exactly once per cell. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The IGR conditional block unconditionally reset recon_order to weno_order in its else branch, overwriting the muscl_order that was correctly set by the recon_type check above. Remove the else branch so the original recon_order is preserved when IGR is inactive. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The NaN diagnostic check used q_comm(i)%sf(j, k, l) but the value was unpacked into q_comm(i)%sf(j + unpack_offset, k, l). This meant the check was reading a stale or unrelated cell instead of the just- received value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
CodeAnt AI is reviewing your PR. Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughThis PR addresses seven critical and high-severity bugs across core computational, MPI communication, and boundary condition modules. Changes include correcting a finite-difference stencil field index, fixing MPI broadcast type and variable duplication, initializing uninitialized variables, removing erroneous conditional branches, and adjusting loop indexing patterns. Changes
Possibly related issues
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Nitpicks 🔍
|
|
CodeAnt AI finished reviewing your PR. |
There was a problem hiding this comment.
Pull request overview
This PR batches several correctness fixes in GPU-executed kernels and MPI communication paths (broadcasts + halo exchange diagnostics), plus a domain-decomposition bug affecting reconstruction order—aimed at eliminating silent corruption and compiler/MPI-stack-dependent behavior in HPC runs.
Changes:
- Fix multiple GPU-path physics/kernel correctness issues (FD stencil component, GRCBC wave amplitudes indexing, damage-model shear modulus degradation).
- Fix MPI communication correctness (missing broadcast field, wrong MPI datatype, Intel-only NaN check indexing) and a domain decomposition reconstruction-order overwrite.
- Fix compiler-sensitive undefined behavior in pre_process by initializing a reduction input.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
src/common/m_finite_differences.fpp |
Corrects z-boundary backward FD stencil to use fields(3) (z-velocity) consistently. |
src/simulation/m_cbc.fpp |
Fixes GRCBC subsonic inflow loop to write L(i) instead of overwriting L(2). |
src/common/m_variables_conversion.fpp |
Applies continuous damage factor to G_K/G once (outside the stress-component loop) to avoid exponential compounding. |
src/simulation/m_mpi_proxy.fpp |
Fixes user-input broadcast list to include bc_x%ve3 (was duplicating bc_x%ve2). |
src/pre_process/m_mpi_proxy.fpp |
Broadcasts fluid_rho with the correct MPI real datatype (mpi_p) instead of MPI_LOGICAL. |
src/pre_process/m_data_output.fpp |
Initializes loc_violations before MPI reduction to avoid undefined values on some compilers. |
src/common/m_mpi_common.fpp |
Fixes Intel-only NaN check to read the just-unpacked element and prevents non-IGR runs from clobbering MUSCL recon order. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1242 +/- ##
==========================================
- Coverage 44.05% 44.05% -0.01%
==========================================
Files 70 70
Lines 20496 20496
Branches 1989 1991 +2
==========================================
- Hits 9030 9029 -1
Misses 10328 10328
- Partials 1138 1139 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The z-direction upper-boundary backward difference at iz_s%end uses fields(2) (y-component) instead of fields(3) (z-component) in the third term, corrupting the divergence in all 3D simulations using this finite difference routine. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The bc_x velocity-end broadcast list has bc_x%ve2 duplicated where bc_x%ve3 should be. bc_y and bc_z rows are correct. Non-root ranks get uninitialized bc_x%ve3 in multi-rank 3D runs with x-boundary velocity BCs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fluid_rho is a real(wp) array but is broadcast with MPI_LOGICAL type, silently corrupting reference densities via bit reinterpretation on non-root ranks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
loc_violations is never set to 0 before the conditional that may or may not assign it. Non-violating ranks sum garbage in the reduction. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Initializing a local variable in its declaration gives it the SAVE attribute in Fortran, meaning it would not reset to zero on subsequent calls. Move the initialization to an executable assignment so the variable is properly zeroed each time the subroutine is entered. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
In the GRCBC subsonic inflow loop (do i = 2, momxb), L(2) was hardcoded instead of L(i), causing only the second wave amplitude to be updated rather than each wave amplitude in the loop. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The damage factor was applied inside the stress component loop, causing G_K (and G) to be multiplied by the damage factor on every iteration. With N stress components, the effective shear modulus was reduced by damage^N instead of damage^1. Move the damage application before the loop so it is applied exactly once per cell. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The IGR conditional block unconditionally reset recon_order to weno_order in its else branch, overwriting the muscl_order that was correctly set by the recon_type check above. Remove the else branch so the original recon_order is preserved when IGR is inactive. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The NaN diagnostic check used q_comm(i)%sf(j, k, l) but the value was unpacked into q_comm(i)%sf(j + unpack_offset, k, l). This meant the check was reading a stale or unrelated cell instead of the just- received value. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3adf7d6 to
239e520
Compare
|
CodeAnt AI is running Incremental review Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
|
CodeAnt AI Incremental review completed. |
…on/MFC into fix/hpc-bugfixes-batch
The Intel-compiler NaN check prints logged raw loop indices (j,k,l) instead of the actual array indices used (j+unpack_offset, etc.), making the diagnostic misleading when debugging MPI recv errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code Review7 files changed:
Summary
FindingsNo bugs found. Checked for bugs and CLAUDE.md compliance (no CLAUDE.md files exist in this repository). Improvement opportunities:
|
User description
Summary
Batches 8 bug fixes that touch GPU parallel regions, MPI broadcasts, or compiler-sensitive code paths. These need HPC CI validation across NVHPC, CCE, and AMD compilers.
Not included: #1225 (3D viscous GPU private clauses) and #1181 (MUSCL-THINC right-state + GPU privates) remain as separate PRs due to higher risk — they add new
privateclause variables to GPU parallel loops, which is the most compiler-sensitive change possible.GPU kernel fixes
Fix z-boundary backward FD stencil (#1173)
src/common/m_finite_differences.fppfields(2)(y-velocity) instead offields(3)(z-velocity), corrupting divergence/gradient calculations for all 3D viscous, hyperelastic, and hypoelastic simulations.fields(2)%sf(x, y, z-2)→fields(3)%sf(x, y, z-2)GPU_PARALLEL_LOOPinm_viscous.fpp,m_hypoelastic.fpp,m_hyperelastic.fpp, andm_derived_variables.fpp. Affects every 3D simulation using these modules.Fix GRCBC subsonic inflow using wrong L index (#1224)
src/simulation/m_cbc.fppdo i = 2, momxb, the code writesL(2) = ...instead ofL(i) = .... Only the last species' wave amplitude survives; all others are overwritten.L(2)→L(i)GPU_PARALLEL_LOOP(collapse=2 CBC loop withLas private). Affects multi-species GRCBC subsonic inflow boundary conditions.Fix G_K exponential degradation in damage model (#1227)
src/common/m_variables_conversion.fppG_K = G_K * max((1-damage), 0)is inside thedo i = strxb, strxestress component loop, applying the damage factor N times (exponential(1-damage)^N) instead of once.s_convert_conservative_to_primitiveands_convert_primitive_to_conservative.GPU_PARALLEL_LOOP(collapse=3)withG_Kas private. Produces exponentially wrong stiffness degradation for multi-component stress tensors.MPI broadcast fixes
Fix
bc_x%ve3never MPI-broadcast (#1175)src/simulation/m_mpi_proxy.fppbc_x%ve2twice andbc_x%ve3(z-velocity component) never. All non-root ranks receive uninitialized garbage, which is then pushed to GPU viaGPU_UPDATE.'bc_x%ve2'entry to'bc_x%ve3'in the fypp broadcast list.ve3on non-root ranks.Fix
fluid_rhobroadcast usingMPI_LOGICALinstead ofmpi_p(#1176)src/pre_process/m_mpi_proxy.fppMPI_BCAST(fluid_rho(1), num_fluids_max, MPI_LOGICAL, ...)broadcasts areal(wp)array with the wrong MPI datatype. On 64-bitwp,MPI_LOGICAL(typically 4 bytes) only transfers half the bytes, silently corrupting density values on non-root ranks.MPI_LOGICAL→mpi_pMPI / compiler-sensitive fixes
Fix
loc_violationsused uninitialized inMPI_Allreduce(#1186)src/pre_process/m_data_output.fpploc_violationsis passed tos_mpi_allreduce_sumwithout initialization. Whether this matters depends on the compiler: GCC may zero-initialize stack variables in debug mode, NVHPC and CCE may not.0and restructure as a local variable (not module-level, avoiding implicitSAVE).Fix domain decomposition overwriting
muscl_order(#1229)src/common/m_mpi_common.fppelsebranch ins_mpi_decompose_computational_domainunconditionally setsrecon_order = weno_orderfor all non-IGR cases, even whenrecon_type == MUSCL_TYPEwheremuscl_orderwas already correctly assigned.elsebranch (2 lines deleted).Fix NaN check reading wrong index in MPI unpack (#1231)
src/common/m_mpi_common.fppieee_is_nan(q_comm(i)%sf(j, k, l))should beq_comm(i)%sf(j + unpack_offset, k, l)— the check reads a stale element instead of the just-unpacked one.unpack_offsetto the appropriate index in all 3 directions (x, y, z).#if defined(__INTEL_COMPILER). Dead code on NVHPC/CCE/GCC, but the diagnostic would miss real NaNs in ghost cells on Intel builds.Supersedes
Closes #1173, closes #1175, closes #1176, closes #1186, closes #1224, closes #1227, closes #1229, closes #1231
Test plan
bc_x%ve3broadcast)🤖 Generated with Claude Code
CodeAnt-AI Description
Fix multiple HPC bugs that caused incorrect 3D results, wrong boundary amplitudes, and MPI data corruption
What Changed
Impact
✅ Correct divergence/gradient in 3D viscous and elastic simulations✅ All species preserved for subsonic inflow boundary amplitudes✅ Fewer MPI data corruption cases and accurate cross-rank parameter broadcast💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.
Summary by CodeRabbit
Bug Fixes