Skip to content

Performance Optimizations#1540

Open
rgknox wants to merge 32 commits intoNGEET:mainfrom
rgknox:patch-parallel-lbmathop-prt
Open

Performance Optimizations#1540
rgknox wants to merge 32 commits intoNGEET:mainfrom
rgknox:patch-parallel-lbmathop-prt

Conversation

@rgknox
Copy link
Copy Markdown
Contributor

@rgknox rgknox commented Mar 5, 2026

Description:

This branch contains a litany of performance enhancements to FATES. I may try to break these changes into chunks, but it might not be worth it. With a full FATES run at a single site with 9 patches and ~250 cohorts, speed-ups are about the following:

    1. Everything in this PR: 2/3 to 1/2 total run-time as base (1.5x+)
    1. Everything in 1) AND dual-loop* energy balance in the host: less than 1/2 of total run-time (ie 2x+)
    1. Everything in 2) AND using 4 threads per site (host-side parallelism): 3x+

*dual-loop energy balancing is a host-side change that will work with these changes. It reduces the total calls to photosynthesis to achieve temperature convergence in CanopyFluxesMod.

This PR is should be coupled with, or precede: E3SM-Project/E3SM#8143

These changes can be described in several categories:

  • Breaking up photosynthesis and canopy radiation drivers to have patch-specific driver subroutines, which accommodates patch-level parallelism and a refactored "double loop" land-energy balance calculation
  • Utilizing automatic arrays (stack memory) versus dynamically allocated scratch arrays (heap) where appropriate
  • Defining patch-level data-structures to help reduce memory usage and perform more efficient calculations: the list of unique pfts on each patch, number of unique pfts, and maximum number of veg-layers on each patch
  • Using arrays of cohort data (as attached to the patch data structure) during high frequency operations, which are filled at the end of the dynamics timestep during summarization
  • Created a pointer array that helps us quickly identify the fates linked-list patch associated with the host's patch index, which is synonymous with patch%patchno. Ie, so we can do this: patch => site%pa_vec(ifp)
  • Removing unnecessary routines
  • Moving routines to places in the call sequence where necessary but less frequently (e.g. organ respiration rates need not be inside the land energy balance iterator since they don't affect conductance). Also, temperature affects on biophysical rates (i.e. vcmax, jmax and LMR) need only be updated once per patch-PFT during photosynthesis, and need not be updated for every unique leaf layer.
  • Converting routines to be classified as "elemental". Elemental's primary role is to enable calling routines to operate on a subroutine as either scalar or vector arguments. Its secondary role is that it tells compilers that they can safely in-line code from that routine into the calling routine. This reduces computational overhead, sometimes greatly when it can leverage vectorization and SIMD stuff, but also to some degree when it can't.
  • Pre-calculating as much math as possible in high-frequency routines. For instance when applying temperature corrections to biophysical rates, the math is demanding (exponentials, power functions, divides, etc). So the functions were split into parts to perform as much math as possible before any math is applied that is dependent on vegetation temperature (which changes at the model timestep).
  • Making more "use specific" variants of highest used functions, which allows us to make them faster. For instance, many calls to quadratic smoothing have constants as the "a" term. In these scenarios, we don't need to perform "if" calls to make sure that a is positive and non-zero. Since quadratics are literally the most called function we have, its usefull to get rid of these logicals.
  • Method of retrieving the values (ie the masses) of C, N and P in plant organs that is faster and minimizes "cache misses".

Note:

This work made use of conversations with AI bots, such as gemini and claude (via Kiro). No suggestions were applied directly by the AIs. Blocks of code and vignettes were copied over, but in small sections and were evaluated each in entirety.

Collaborators:

@cdkoven @rosiealice @bishtgautam @peterdschwartz @mpaiao @glemieux @samsrabin

Expectation of Answer Changes:

This work used "do_b4b" logical flags. When these flags are set to true, changes are expected at the roundoff level for all FATES configurations. When these flags are set to false, there are non round-off level changes that are still appropriate. For instance, we don't need to update the growth and home temperatures for leaf temperature acclimation every 30 minutes, its excessive. This can be done once per day, since these are at least 30 day averages. But changing this value will subtly change results. Thus the do_b4b flag.

*Note: Aside from mentioned above. I found FATES can potentially generate chaotic behavior. For instance, when converting Q10 equations to use the exponential function instead of the power function, difference in the math outcome should be incredibly small (e-15). However these differences generated non-trivial differences in results. I'm intending to investigate this more and compare differences from the math alternatives against perturbations of initial conditions.

Checklist

If this is your first time contributing, please read the CONTRIBUTING document.

All checklist items must be checked to enable merging this pull request:

Contributor

WIP, NOT YET

  • The in-code documentation has been updated with descriptive comments
  • The documentation has been assessed to determine if updates are necessary

Integrator

  • FATES PASS/FAIL regression tests were run
  • Evaluation of test results for answer changes was performed and results provided
  • FATES-CLM6 Code Freeze: satellite phenology regression tests are b4b

If satellite phenology regressions are not b4b, please hold merge and notify the FATES development team.

Documentation

Test Results:

CTSM (or) E3SM (specify which) test hash-tag:

CTSM (or) E3SM (specify which) baseline hash-tag:

FATES baseline hash-tag:

Test Output:

@rgknox rgknox added the status: Not Ready The author is signaling that this PR is a work in progress and not ready for integration. label Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: Not Ready The author is signaling that this PR is a work in progress and not ready for integration.

Projects

Status: Finding Reviewers

Development

Successfully merging this pull request may close these issues.

1 participant