Skip to content

When testing abacus_2g without CUDA-aware on the Shuguang supercomputer, the BPCG case of the GPU test reports numerical errors. #7077

@LiYuqiii

Description

@LiYuqiii

Describe the Testing Issue

When compiling the Abacus tests with CUDA-aware disabled, during the GPU tests, 11_PW_GPU/001_PW_BPCG_GPU and 16_SDFT_GPU/005_PW_SDFT_MALL_BPCG_GPU run successfully but report numerical errors (errors occur regardless of whether a single or multiple DCUs are used):

11_PW_GPU/001_PW_BPCG_GPU reported the following error:

Image

16_SDFT_GPU/005_PW_SDFT_MALL_BPCG_GPU reported the following error:

Image

Additional Context

No response

Task list for Issue attackers (only for developers)

  • Understand the testing issue described by the developer.
  • Review the specific test case, expected and actual results, and any error messages.
  • Identify the root cause of the test failure or issue.
  • If a possible solution is suggested, evaluate its feasibility and effectiveness.
  • Implement a fix for the test failure or issue, or create a new test case if needed.
  • Verify that the fix resolves the testing issue and the test case passes.
  • Review and update any relevant documentation, such as test plans or user guides.
  • Ensure the testing issue is resolved and close the ticket.
  • Share any lessons learned or best practices with the team to prevent similar issues in the future.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugsBugs that only solvable with sufficient knowledge of DFTDiagoIssues related to diagonalizaiton methodsGPU & DCU & HPCGPU and DCU and HPC related any issues

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions