Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 11 additions & 7 deletions configs/common/packages.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -110,8 +110,12 @@ packages:
- '@2.6.4'
fms:
require:
- '@2024.02'
- precision=32,64 +quad_precision +openmp +pic build_type=Release +deprecated_io
- '@2025.03'
- precision=32,64
- +quad_precision
- +openmp
- +pic
- build_type=Release
- any_of:
- +gfs_phys constants=GFS
- ~gfs_phys constants=GEOS
Expand Down Expand Up @@ -154,7 +158,7 @@ packages:
- '@1.4.0'
gsibec:
require:
- '@1.2.1'
- '@1.4.1'
gsi-ncdiag:
require:
- '@1.1.2'
Expand Down Expand Up @@ -294,6 +298,10 @@ packages:
pflogger:
require:
- +mpi
pfunit:
require:
- +mpi
- +fhamcrest
pixman:
require:
- +pic
Expand Down Expand Up @@ -431,10 +439,6 @@ packages:
snappy:
require:
- +shared
sp:
require:
- '@2.5.0'
- 'precision=4,d,8'
udunits:
require:
- '@2.2.28'
Expand Down
2 changes: 1 addition & 1 deletion configs/containers/specs/jedi-ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
fms@2024.02,
g2@3.5.1,
g2tmpl@1.17.0,
gsibec@1.2.1,
gsibec@1.4.1,
hdf@4.2.15,
hdf5@1.14.5,
ip@5.4.0,
Expand Down
2 changes: 1 addition & 1 deletion configs/sites/tier1/acorn/packages_intel-19.1.3.304.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ packages:
go:
require: ['%gcc']
gsibec:
require: ['@1.2.1']
require: ['@1.4.1']
harfbuzz:
require: ['%gcc']
libffi:
Expand Down
2 changes: 1 addition & 1 deletion configs/sites/tier1/acorn/packages_oneapi-2024.2.1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ packages:
glib:
require: ['%gcc']
gsibec:
require: ['@1.2.1']
require: ['@1.4.1']
libtiff:
require: ['build_system=cmake']
nco:
Expand Down
2 changes: 1 addition & 1 deletion configs/sites/tier1/aws-pcluster/packages_intel.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ packages:
- '@1.2.0 ~mkl +fftw'
gsibec:
require::
- '@1.2.1 ~mkl'
- '@1.4.1 ~mkl'
py-numpy:
require::
- '^openblas'
Expand Down
2 changes: 1 addition & 1 deletion configs/sites/tier1/gaea-c5/packages_intel.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ packages:
- '@1.2.0 ~mkl +fftw'
gsibec:
require::
- '@1.2.1 ~mkl'
- '@1.4.1 ~mkl'
py-numpy:
require::
- '@1.26'
Expand Down
2 changes: 1 addition & 1 deletion configs/sites/tier1/hera/packages_intel.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ packages:
- '@1.2.0 ~mkl +fftw'
gsibec:
require::
- '@1.2.1 ~mkl'
- '@1.4.1 ~mkl'
py-numpy:
require::
- '^openblas'
Expand Down
2 changes: 1 addition & 1 deletion configs/sites/tier1/hera/packages_oneapi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ packages:
- '@1.2.0 ~mkl +fftw'
gsibec:
require::
- '@1.2.1 ~mkl'
- '@1.4.1 ~mkl'
py-numpy:
require::
- '@1.26'
Expand Down
264 changes: 264 additions & 0 deletions configs/sites/tier1/nas-toss5/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,264 @@
# How to Build **spack-stack** at NAS on TOSS5

This guide documents how to build **spack-stack** on NASA NAS TOSS5 systems, where login nodes have internet access but are CPU-restricted, while compute nodes allow parallel builds but have *no* internet access. Several packages (Rust/Cargo, ecFlow, CRTM) require special handling due to these constraints.

---

## Table of Contents

- [Overview](#overview)
- [Machines Required](#machines-required)
- [Clone spack-stack](#clone-spack-stack)
- [Obtain an Interactive Compute Node](#obtain-an-interactive-compute-node)
- [Setup spack-stack](#setup-spack-stack)
- [Create Environments](#create-environments)
- [oneAPI Environment](#oneapi-environment)
- [GCC Environment](#gcc-environment)
- [Activate the Environment](#activate-the-environment)
- [Concretize the Environment](#concretize-the-environment)
- [Create Source Cache (LOGIN NODE ONLY)](#create-source-cache-login-node-only)
- [Pre-Fetch Cargo Dependencies (LOGIN NODE ONLY)](#pre-fetch-cargo-dependencies-login-node-only)
- [Install Packages](#install-packages)
- [Step 1 — Dependencies of Rust codes and ecFlow (COMPUTE NODE)](#step-1--dependencies-of-rust-codes-and-ecflow-compute-node)
- [Step 2 — Rust codes and ecFlow (ATHFE LOGIN NODE)](#step-2--rust-codes-and-ecflow-athfe-login-node)
- [Step 3 — Remaining Packages (COMPUTE NODE)](#step-3--remaining-packages-compute-node)
- [Packages Requiring Internet](#packages-requiring-internet)
- [Update Module Files](#update-module-files)
- [Deactivate the Environment](#deactivate-the-environment)
- [Debugging Package Builds](#debugging-package-builds)

---

## Overview

Due to NAS system architecture and network restrictions:

- **Login nodes**:
- Have internet
- Limited to **2 processes**

- **Compute nodes** (Turnin):
- No internet
- Allow parallel builds

Some packages (Cargo/Rust, ecFlow, CRTM) require internet or newer CPU features, so the install is broken into multiple steps across different node types.

---

## Machines Required

You will need:

- **An `athfe` login node**
Supports x86_64_v3 binaries → required for building Rust packages and ecFlow.

- **A Turnin compute node**
Used for the main installation with multiple cores.

---

## Clone spack-stack

Use the appropriate branch or tag:

```bash
git clone --recurse-submodules https://github.com/JCSDA/spack-stack.git -b spack-stack-2.1.0 spack-stack-2.1.0
```

---

## Obtain an Interactive Compute Node

NAS login nodes allow only **2 processes**, so use:

```bash
qsub -I -V -X -l select=1:ncpus=128:mpiprocs=128:model=tur_ath -l walltime=12:00:00 -W group_list=s1873 -m b -N Interactive
```

This gives a Turin** compute node for up to 12 hours.

---

## Setup spack-stack

Run on a **login node with internet**:

```bash
cd spack-stack-2.1.0
. setup.sh
```

---

## Create Environments

You only need to create each environment once.

### oneAPI - ifx Environment

```bash
spack stack create env --name ue-oneapi-2025.3.0 --template unified-dev --site nas-toss5 --compiler=oneapi-2025.3.0
cd envs/ue-oneapi-2025.3.0
```

### oneAPI - ifort Environment

```bash
spack stack create env --name ue-oneapi-2024.2.0 --template unified-dev --site nas-toss5 --compiler=oneapi-2024.2.0
cd envs/ue-oneapi-2024.2.0
```

### GCC Environment

```bash
spack stack create env --name ue-gcc-13.2.0 --template unified-dev --site nas-toss5 --compiler=gcc-13.2.0
cd envs/ue-gcc-13.2.0
```

---

## Activate the Environment

```bash
spack env activate .
```

> **Important:** Run this in *every* terminal where you plan to run Spack commands.

---

## Concretize the Environment

Run on a **login node** (internet required for bootstrapping Clingo and other tools):

```bash
spack concretize 2>&1 | tee log.concretize ; bell
```

### Optional `bell` helper

```bash
bell() { tput bel ; printf "\nFinished at: " ; date; }
```

---

## Create Source Cache (LOGIN NODE ONLY)

This downloads all source tarballs for your environment:

```bash
spack mirror create -a -d /swbuild/gmao_SIteam/spack-stack/source-cache
```

> ⚠️ **Do not run this outside an activated environment.**
> Otherwise Spack will attempt to mirror **every** known package/version.

---

## Pre-Fetch Cargo Dependencies (LOGIN NODE ONLY)

Rust packages frequently require network access during build. Pre-fetch their dependencies:

```bash
export CARGO_HOME=/swbuild/gmao_SIteam/spack-stack/cargo-cache
../../util/fetch_cargo_deps.py
```

> ⚠️ **You must also set `CARGO_HOME` on compute nodes** before building.

---

## Install Packages

Installation requires three stages:

| Step | Node Type | Why |
|------|-----------|-----|
| Step 1 | Compute | Build dependencies in parallel, avoids CPU limits |
| Step 2 | `athfe` login | Needed for x86_64_v3 Python and internet access |
| Step 3 | Compute | Finish main installation at high parallelism |

---

### Step 1 — Dependencies of Rust codes and ecFlow (COMPUTE NODE)

```bash
export CARGO_HOME=/swbuild/gmao_SIteam/spack-stack/cargo-cache
spack install -j 16 --verbose --fail-fast --show-log-on-error --no-check-signature \
--only dependencies py-cryptography py-maturin py-rpds-py ecflow 2>&1 | tee log.install.deps-for-rust-and-ecflow ; bell
```

---

### Step 2 — Rust codes and ecFlow (ATHFE LOGIN NODE)

```bash
export CARGO_HOME=/swbuild/gmao_SIteam/spack-stack/cargo-cache
spack install -j 2 -p 1 --verbose --fail-fast --show-log-on-error --no-check-signature \
py-cryptography py-maturin py-rpds-py ecflow 2>&1 | tee log.install.rust-and-ecflow ; bell
```

NAS limits login nodes to 2 processes, hence `-j 2`.

---

### Step 3 — Remaining Packages (COMPUTE NODE)

```bash
export CARGO_HOME=/swbuild/gmao_SIteam/spack-stack/cargo-cache
spack install -j 16 --verbose --fail-fast --show-log-on-error --no-check-signature 2>&1 | tee log.install.after-cargo ; bell
```

> **Note:** You may need to re-run this command multiple times. Some builds fail intermittently but succeed on retry.

---

### Packages Requiring Internet (ATHFE LOGIN NODE)

If you encounter another package that insists on network access:

```bash
spack install -j 2 --verbose --fail-fast --show-log-on-error --no-check-signature <package> |& tee log.install.<package> ; bell
```

Again, this must be done on an **athfe** login node because of the CPU architecture.

Once built, return to the compute node and resume the full installation.

---

## Update Module Files (ATHFE LOGIN NODE)

After installation completes, on an **athfe** login node run:

```bash
spack module tcl refresh -y --delete-tree ; bell
spack stack setup-meta-modules
```

Apparently, spack modulefile generation might use code that spack built for `x86_64_v3`.

---

## Deactivate the Environment

```bash
spack env deactivate
```

---

## Debugging Package Builds

```bash
spack clean
spack stage <package>
spack build-env <package> -- bash --norc --noprofile
```

This drops you into a clean build environment with the package’s full compiler/runtime environment loaded.

---


9 changes: 9 additions & 0 deletions configs/sites/tier1/nas-toss5/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
config:
build_jobs: 6

# Overrides for spack build and staging areas to speed up builds
# and avoid errors with Lustre file locking and xattr issues
build_stage: /swbuild/gmao_SIteam/spack-stack/cache/build_stage
test_stage: /swbuild/gmao_SIteam/spack-stack/cache/test_stage
source_cache: /swbuild/gmao_SIteam/spack-stack/cache/source_cache
misc_cache: /swbuild/gmao_SIteam/spack-stack/cache/misc_cache
8 changes: 8 additions & 0 deletions configs/sites/tier1/nas-toss5/mirrors.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
mirrors:
local-source:
url: file:///swbuild/gmao_SIteam/spack-stack/source-cache
binary: false
local-binary:
url: file:///swbuild/gmao_SIteam/spack-stack/build-cache
binary: true

Loading