优化 UnitTest CI 稳定性:固定 GPU 工作目录、收敛 pytest 路径并隔离 stateful tests#867
Open
Aidenwu0209 wants to merge 2 commits intoPaddlePaddle:masterfrom
Open
优化 UnitTest CI 稳定性:固定 GPU 工作目录、收敛 pytest 路径并隔离 stateful tests#867Aidenwu0209 wants to merge 2 commits intoPaddlePaddle:masterfrom
Aidenwu0209 wants to merge 2 commits intoPaddlePaddle:masterfrom
Conversation
|
Thanks for your contribution! |
a993582 to
e11dbfa
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
当前 UnitTest CI 存在一类典型的不稳定现象:
pytest ./tests时出现批量失败--lf时又通过结合现有日志和脚本行为,这类问题主要有 4 个风险来源:
pytest.ini没有固定testpaths/pythonpath,全量执行和单文件执行的收集、导入语义不够稳定本次改动
本 PR 只改了 3 个文件:
pytest.iniscripts/unittest_check.shscripts/unittest_check_gpu.sh主要调整如下:
1. 固定 GPU CI 的工作目录
在
scripts/unittest_check_gpu.sh开头补齐:cd /workspace/$1/PaConvert/echo "Current working directory: $(pwd)"requirements.txt/tests存在性检查目标是让 GPU CI 和 CPU CI 一样,始终从 repo root 执行,避免路径漂移影响:
requirements.txt安装pytest ./teststests/apibase.py中基于os.getcwd()的临时文件路径2. 固定 pytest 的收集和导入路径
在
pytest.ini中新增:testpaths = testspythonpath = tests目标是让以下行为更稳定一致:
pytest tests/pytest 某个单文件.pypython -m pytest同时稳定诸如
from apibase import APIBase这类测试工具模块导入。3. 把 rerun 从“放行机制”改成“诊断信息”
在
scripts/unittest_check.sh和scripts/unittest_check_gpu.sh中统一调整为:--lfrerunpytest-rerunfailures和--reruns=3这样做的目的不是让 CI 更容易绿,而是让 CI 说真话,避免把 flaky 问题洗成“稳定通过”。
4. 隔离会修改全局状态的测试文件
在 CPU/GPU 两个 unittest 脚本中,都把以下 5 个 stateful 文件从主套件中排除,并在主套件之外按文件单独执行:
tests/test_set_default_device.pytests/test_set_default_dtype.pytests/test_set_default_tensor_type.pytests/test_set_num_threads.pytests/test_set_printoptions.py原因是这些测试会修改进程级默认状态,不适合和普通测试混在同一个 pytest 进程里跑。
当前执行策略是:
--ignore上述 5 个文件预期收益
兼容性与影响
这次改动主要是 CI 行为收敛和稳定性优化,可能带来这些影响:
验证
已完成以下直接相关验证:
bash -n scripts/unittest_check.shbash -n scripts/unittest_check_gpu.shpytest.ini中testpaths = tests与pythonpath = tests已生效test_set_default_tensor_type.py已纳入隔离列表--reruns=3当前这条 PR 分支相对
origin/master仅包含这 3 个文件变更,没有混入其他无关改动。后续关注
如果这版合入后仍能复现 “first-run fail / rerun pass”,下一步更值得继续排查的方向是:
tests/apibase.py中动态执行与临时文件路径逻辑