Skip to content

{2025.06} Zen5 easystack 2#1393

Merged
bedroge merged 3 commits intoEESSI:mainfrom
casparvl:zen5_stack_2
Feb 18, 2026
Merged

{2025.06} Zen5 easystack 2#1393
bedroge merged 3 commits intoEESSI:mainfrom
casparvl:zen5_stack_2

Conversation

@casparvl
Copy link
Collaborator

No description provided.

@casparvl casparvl changed the title Add second easystack for zen5 {2025.06} Zen5 easystack 2 Feb 18, 2026
@bedroge
Copy link
Collaborator

bedroge commented Feb 18, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-aws-eu-south for:arch=x86_64/amd/zen5

@eessi-bot-aws-eu-south
Copy link

eessi-bot-aws-eu-south bot commented Feb 18, 2026

New job on instance eessi-bot-aws-eu-south for repository eessi.io-2025.06-software
Building on: amd-zen5
Building for: x86_64/amd/zen5
Job dir: /project/def-users/SHARED/jobs/2026.02/pr_1393/21

date job status comment
Feb 18 10:34:44 UTC 2026 submitted job id 21 awaits release by job manager
Feb 18 10:35:43 UTC 2026 released job awaits launch by Slurm scheduler
Feb 18 10:36:48 UTC 2026 running job 21 is running
Feb 18 10:55:23 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-21.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen5-17714120550.tar.zstsize: 0 MiB (372955 bytes)
entries: 120
modules under 2025.06/software/linux/x86_64/amd/zen5/modules/all
M4/1.4.19.lua
software under 2025.06/software/linux/x86_64/amd/zen5/software
M4/1.4.19
reprod directories under 2025.06/software/linux/x86_64/amd/zen5/reprod
M4/1.4.19/20260218_103749UTC
other under 2025.06/software/linux/x86_64/amd/zen5
no other files in tarball
Feb 18 10:55:23 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-21.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Collaborator Author

Hmmmm

g++: fatal error: Killed signal terminated program cc1plus
compilation terminated.
make[2]: *** [Makefile:1158: insn-emit.o] Error 1
make[2]: *** Waiting for unfinished jobs....
rm gcc.pod gfortran.pod
make[2]: Leaving directory '/tmp/bot/easybuild/build/GCCcore/13.3.0/system-system/gcc-13.3.0/stage1_obj/gcc'
make[1]: *** [Makefile:4637: all-gcc] Error 2
make[1]: Leaving directory '/tmp/bot/easybuild/build/GCCcore/13.3.0/system-system/gcc-13.3.0/stage1_obj'
make: *** [Makefile:1048: all] Error 2
JobID           JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
21                 prod x86-64-am+  def-users         16 OUT_OF_ME+    0:125
21.batch          batch             def-users         16 OUT_OF_ME+    0:125
21.extern        extern             def-users         16  COMPLETED      0:0

I'm not sure how much memory gets allocated by the bot, but it's not enough apparently ;-)

NodeName=x86-64-amd-zen5-node1 Arch=x86_64 CoresPerSocket=1
   CPUAlloc=0 CPUEfctv=16 CPUTot=16 CPULoad=0.00
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=gpu:0
   NodeAddr=10.0.0.243 NodeHostName=x86-64-amd-zen5-node1 Version=24.05.8
   OS=Linux 5.14.0-611.30.1.el9_7.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jan 29 05:25:20 EST 2026
   RealMemory=32768 AllocMem=0 FreeMem=25011 Sockets=16 Boards=1
   MemSpecLimit=512
   State=IDLE+CLOUD ThreadsPerCore=1 TmpDisk=0 Weight=4 Owner=N/A MCS_label=N/A
   Partitions=cpubase_bycore_b1,x86-64-amd-zen5-node
   BootTime=2026-02-18T10:13:05 SlurmdStartTime=2026-02-18T10:16:20
   LastBusyTime=2026-02-18T10:54:40 ResumeAfterTime=None
   CfgTRES=cpu=16,mem=32G,billing=16
   AllocTRES=
   CurrentWatts=0 AveWatts=0

So the node has the same as on the other cluster

$ sacct -j 21 --format JobID,MaxRSS,ReqCPUS,AllocCPUS,ReqMem,AllocTRES%40
JobID            MaxRSS  ReqCPUS  AllocCPUS     ReqMem                                AllocTRES
------------ ---------- -------- ---------- ---------- ----------------------------------------
21                            16         16         4G          billing=16,cpu=16,mem=4G,node=1
21.batch       4192884K       16         16                                cpu=16,mem=4G,node=1
21.extern          256K       16         16                     billing=16,cpu=16,mem=4G,node=1

We should probably submit with --exclusive or something. I'll check how we do it on the other cluster and mimic here.

@casparvl
Copy link
Collaborator Author

Checked out https://github.com/EESSI/bot-configs/pull/75

Let's try again...

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-aws-eu-south for:arch=x86_64/amd/zen5

@eessi-bot-aws-eu-south
Copy link

eessi-bot-aws-eu-south bot commented Feb 18, 2026

New job on instance eessi-bot-aws-eu-south for repository eessi.io-2025.06-software
Building on: amd-zen5
Building for: x86_64/amd/zen5
Job dir: /project/def-users/SHARED/jobs/2026.02/pr_1393/22

date job status comment
Feb 18 12:58:16 UTC 2026 submitted job id 22 awaits release by job manager
Feb 18 12:58:37 UTC 2026 released job awaits launch by Slurm scheduler
Feb 18 13:03:41 UTC 2026 running job 22 is running
Feb 18 14:25:21 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-22.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen5-17714246310.tar.zstsize: 2309 MiB (2421478318 bytes)
entries: 5590
modules under 2025.06/software/linux/x86_64/amd/zen5/modules/all
GCC/13.3.0.lua
GCC/14.2.0.lua
GCCcore/13.3.0.lua
GCCcore/14.2.0.lua
M4/1.4.19.lua
software under 2025.06/software/linux/x86_64/amd/zen5/software
GCC/13.3.0
GCC/14.2.0
GCCcore/13.3.0
GCCcore/14.2.0
M4/1.4.19
reprod directories under 2025.06/software/linux/x86_64/amd/zen5/reprod
GCC/13.3.0/20260218_134417UTC
GCC/14.2.0/20260218_142340UTC
GCCcore/13.3.0/20260218_134416UTC
GCCcore/14.2.0/20260218_142338UTC
M4/1.4.19/20260218_130459UTC
other under 2025.06/software/linux/x86_64/amd/zen5
no other files in tarball
Feb 18 14:25:21 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-22.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Feb 18 14:30:47 UTC 2026 uploaded transfer of eessi-2025.06-software-linux-x86_64-amd-zen5-17714246310.tar.zst to S3 bucket succeeded

@casparvl
Copy link
Collaborator Author

test step is not run because of

ESC[31mERROR: Please put a ReFrame configuration file in /project/def-users/bot/shared/reframe_config.py or set RFM_CONFIG_FILES in the environment of this bot instance to point to a valid ReFrame configuration file that matches the bot config. For more information, see https://gitlab.com/eessi/support/-/issues/114#note_2293660921ESC[0m

Shouldn't block this PR - we don't have tests for GCC anyway. But we should fix it :)

@bedroge bedroge added bot:deploy Ask bot to deploy missing software installations to EESSI 2025.06-software.eessi.io 2025.06 version of software.eessi.io zen5 labels Feb 18, 2026
@boegel
Copy link
Contributor

boegel commented Feb 18, 2026

@bedroge Staging PR merged, so this should be merged too?

@bedroge
Copy link
Collaborator

bedroge commented Feb 18, 2026

@bedroge Staging PR merged, so this should be merged too?

Yep, I waited a bit because the ingestion could take a while (it's busy processing a lot of CUDA tarballs), but the tarball has been ingested. Will merge this when the CI is ready.

@bedroge bedroge merged commit d2c30c2 into EESSI:main Feb 18, 2026
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2025.06-software.eessi.io 2025.06 version of software.eessi.io bot:deploy Ask bot to deploy missing software installations to EESSI zen5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments